author	date	linktitle	menu	name	url	weight
Nathalie Vialaneix	2018-07-22	Lectures	navbar	Lectures	/lectures/	2

Below you will find a selection of high-quality lectures, tutorials and labs on different aspects of missing values.

General lectures

Statistical Methods for Analysis With Missing Data

(Marie Davidian, course at NC State University, spring 2017)

This course provides an overview of modern statistical frameworks and methods for analysis in the presence of missing data. Both methodological developments and applications are emphasized. The course provides a foundation in the fundamentals of this area that will prepare to read the current literature and to have broad appreciation the implications of missing data for valid inference. Course page.

Introduction and Motivation
Naive Methods
Likelihood-based Methods Under Missing At Random (MAR)
Multiple Imputation Methods Under MAR
Inverse Probability Weighted Methods Under MAR
Pattern Mixture Models
Sensitivity Analysis to Deviations from MAR
Homework 1
Homework 2
Homework 3
Homework 4
Data for homeworks

Dealing With Missing Values in R

(Julie Josse, course at ETH Zürich, winter 2020)

The ability to easily collect and gather a large amount of data from different sources can be seen as an opportunity to better understand many processes. It has already led to breakthroughs in several application areas. However, due to the wide heterogeneity of measurements and objectives, these large databases often exhibit an extraordinary high number of missing values. Hence, in addition to scientific questions, such data also present some important methodological and technical challenges for data analysts. This tutorial gives an overview of the missing values literature as well as the recent improvements that caught the attention of the community due to their ability to handle large matrices with large amount of missing entries. The methods presented in this tutorial are illustrated on medical, environmental and survey data.

Slides
Lecture notes
Tutorial
Data used in tutorial.

Analysis of missing values

(Jae-Kwang Kim, course at Iowa State University, fall 2015)

This course focuses on the theory and methods for missing data analysis. Topics include maximum likelihood estimation under missing data, EM algorithm, Monte Carlo computation techniques, imputation, Bayesian approach, propensity scores, semi-parametric approach, and non-ignorable missing data.

Introduction
Likelihood-based approach
Computation
Imputation
Multiple imputation
Propensity Score approach
Nonignorable missing data

Statistical Methods for Analysis with Missing Data

(Mauricio Sadinle, course at University of Washington, winter 2019)

This course formally introduces methodologies for handling missing data in statistical analyses. It covers naive methods, missing-data assumptions, likelihood-based approaches, Bayesian and multiple imputation approaches, inverse-probability weighting, pattern-mixture models, sensitivity analysis and approaches under nonignorable missingness. Computational tools such as the Expectation-Maximization algorithm and the Gibbs' sampler will be introduced. This course is intended for students who are interested in methodological research.
Course syllabus

Lecture 1: syllabus, motivating examples
Lecture 2: general setup, notation, missing-data mechanisms
Lecture 3: naive methods: complete-case analysis and imputation
Lecture 4: R session 1
Lecture 5: likelihood-based methods
Lecture 6: the EM algorithm
Lecture 7: R session 2 (setup), R script
Lecture 8: introduction to Bayesian inference
Lecture 9: Gibbs sampling, ignorability under Bayesian inference, data augmentation
Lecture 10: multiple imputation
Lecture 11: R session 3 (setup), R script
Lecture 12: inverse probability weighting
Lecture 13: introduction to (weighted generalized) estimating equations
Lecture 14: R session 4 (setup), R script
Lecture 15: identifiability, nonignorability, pattern-mixture models
Lecture 16: pattern-mixture models (continued), sensitivity analysis

Exercices/Homework

Homework 0 (after Lecture 1)
Homework 1 (after Lecture 4)
Homework 2 (after Lecture 7)
Homework 3 (after Lecture 11)
Homework 4 (after Lecture 14)

Multiple imputation

Missing Values in Clinical Research - Multiple Imputation

(Nicole Erler, NIHES course Missing Values in Clinical Research (EP16), May 2018)

This course is the second part of a NIHES course on Missing Values in Clinical Research and it focuses on multiple imputation (MI), specifically the fully conditional specification (FCS, MICE), which is often considered the gold standard to handle missing data. A detailed discussion on what MI(CE) does, which assumptions need to be met in order for it to perform well, and alternative imputation approaches for settings where MICE is not optimal are given. The theoretic considerations will be accompanied by demonstrations and short practical sessions in R, and a workflow for doing MI using the R package mice will be proposed, illustrating how to perform (multiple) imputation for cross-sectional and longitudinal data in R.

Multiple Imputation (course slides)
Multiple imputation using mice (practical)
Multiple imputation in complex settings using mice, JointAI, smcfs, jomo (practical)

Multiple Imputation: Methods and Applications

(Jerry Reiter, short course at the Odum Institute at UNC Chapel Hill, March 2018)

This short course on multiple imputation gives an overview of missing data problems, various solutions to tackle them as well as their limitations. It introduces to MI inferences and provides details on implementation and application of MI.

Multiple Imputation
Example using mice

Missing values and principal component methods

Imputation using principal components

(Julie Josse, course at École Polytechnique, fall 2018)

This tutorial is part of a master course on statistics with R. It discusses different missing values problems and illustrates them on medical, industrial and ecologial data. It provides a detailed introduction to single and multiple imputation via principal component methods, both in theory and in practice. The practical part illustrates how to perform (multiple) imputation using the R package missMDA.

Special focus on principal component methods

Handling missing values in PCA and MCA

(François Husson, video tutorial accompanying the R-package missMDA, 2016)

These two videos can be viewed independently or as a complement to the above tutorial on Imputation using principal components as they provide detailed explanation on how to use the functions of the missMDA package to visualize and analyze missing values and how to perform (multiple) imputation.

Handling missing values in PCA
Handling missing values in MCA

Specific data or application types

Supervised learning with missing values

(Julie Josse, video of Keynote at useR! conference in Toulouse, 2019)

This keynote talk gives an overview of different approaches for inference and prediction tasks. A striking result for the latter is that the widely-used method of imputing with the mean prior to learning can be consistent.

A missing value tour in R (video)
Slides of the talk

Handling missing values in surveys

(Guillaume Chauvet, course at École Nationale de la Statistique et de l'Analyse de l'Information, spring 2015, slides in French)

This course recalls basic concepts of surveys and data collection before discussing how to handle unit non-response and item non-response in surveys.

Traitement des données manquantes dans les Enquêtes

Longitudinal data with missing values

(Dimitris Rizopoulos, talk at Joint Conference on Biometrics & Biopharmaceutical Statistics, August 2017)

In follow-up studies different types of outcomes are typically collected for each subject. These include longitudinally measured responses (e.g., biomarkers), and the time until an event of interest occurs (e.g., death, dropout). Often these outcomes are separately analyzed, but in many occasions it is of scientific interest to study their association. This type of research question has given rise in the class of joint models for longitudinal and time-to-event data. These models constitute an attractive paradigm for the analysis of follow-up data that is mainly applicable in two settings: First, when focus is on a survival outcome and we wish to account for the effect of endogenous time-dependents covariates measured with error, and second, when focus is on the longitudinal outcome and we wish to correct for non-random dropout. This course is aimed at applied researchers and graduate students, and will provide a comprehensive introduction into this modeling framework. It provides explanation when these models should be used in practice, which are the key assumptions behind them, and how they can be utilized to extract relevant information from the data. Emphasis is given on applications, and after the end of the course participants will be able to define appropriate joint models to answer their questions of interest.

Joint Modelling of Longitudinal and Time to Event Data

Time Series Imputation

(Steffen Moritz, talk at useR! 2017, July 2017)

This tutorial gives a short overview about methods for missing data in time series in R in general and subsequently introduces the imputeTS package. The imputeTS package is specifically made for handling missing data in time series and offers several functions for visualization and replacement (imputation) of missing data. Based on usage examples it is shown how imputeTS can be used for time series imputation.

How to deal with Missing Data in Time Series and the imputeTS package

Treatment Effect Estimation with Missing Attributes

(Julie Josse, talk at CIRM virtual conference on Mathematical Methods of Modern Statistics, 2020)

While the problem of missing values in the covariates has been considered very early in the causal inference literature, it remains difficult for practitioners to know which method to use, under which assumptions the different approaches are valid and whether the tools developed are also adapted to more complex data, e.g., for high-dimensional or mixed data. This talk provides a rigorous classification of existing methods according to the main underlying assumptions, which are based either on variants of the classical unconfoundedness assumption or relying on assumptions about the mechanism that generates the missing values. It also highlights two recent contributions on this topic: first an extension of classical doubly robust estimators that allows handling of missing attributes and second an approach to causal inference based on variational autoencoders in the case of latent confounding.

Video
Slides of the talk

Analysis and imputation of missing count data

(Third year students from École Polytechnique, final project of Statistics with R course, December 2018)

The estimation of count data, such as bird abundance, is an important task in many disciplines and can be used for instance by ecologists for species conservation. Collecting count data is often subject to inaccuracies and missing data due to the nature of the counted object and due to multiplicity of actors/sensors collecting the data over more or less long periods of time. Methods such as Correspondence Analysis or Generalized Linear Models can be used to estimate these missing values and allow a more accurate analyses of the count data. The objective of this project is to investigate the abundance for the Eurasian Coot, which is mainly observed in the mediterranean part of North-Africa, and its relation to external geographical and meteorological factors. First, different methods are compared in terms of accuracy, using R packages glm, Rtrim, Lori and missMDA. Afterwards, external factors and their impact on bird abundance are examined and finally the temporal trend is investigated to determine whether the Eurasian coot is declining or not.
This project was carried out in collaboration with the Research Institute for the conservation of Mediterranean wetlands, the association Les Amis des Oiseaux (Friends of the birds) and the Office National de la Chasse et de la Faune Sauvage (National Agency for Hunting and Wildlife).

Estimation of species abundance using log-linear models (slides)
Estimation of species abundance using log-linear models (project report)

Implementation in R

Missing data visualizations with naniar

naniar vignette: Missing data visualizations
(Nicholas Tierney, 2018)

Handling different types of data with different R-packages

useR! tutorial on handling missing values
(Julie Josse & Nicholas Tierney, 2018)

Handling missing values in PCA and MCA with missMDA

Handling missing values in PCA
Handling missing values in MCA

(François Husson, video tutorial accompanying the R-package missMDA, 2016)

Multiple imputation with mice, JointAI, smcfs and jomo

mice vignette: Ad hoc methods and mice
(Stef van Buuren and Gerko Vink, 2018)
Other mice vignettes
(Stef van Buuren and Gerko Vink, 2018)
Multiple imputation with the mice package
(Nicole Erler, NIHES course on multiple imputation, 2018)
Multiple imputation in complex settings (using mice, JointAI, smcfs, jomo)
(Nicole Erler, NIHES course on multiple imputation, 2018)
Example using mice
(Jerry Reiter, 2018)

Random trees and forests with missForest

missForest vignette: Using the missForest Package
(Daniel J. Stekhoven, 2012)

If you wish to contribute some of your own material to this platform, please feel free to contact us .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lectures.md

lectures.md

General lectures

Multiple imputation

Missing values and principal component methods

Specific data or application types

Implementation in R

Files

lectures.md

Latest commit

History

lectures.md

File metadata and controls

General lectures

Multiple imputation

Missing values and principal component methods

Specific data or application types

Implementation in R