Skip to content

UniprJRC/RobMultMissing

Repository files navigation

Robust Multivariate Analysis with Missing Observations

Marco Riani1, Anthony C. Atkinson2, Luca Greco3 , and Aldo Corbellini1

1 Department of Economics and Management and Interdepartmental Research Centre for Robust Statistics

2 London School of Economics

3 Università Telematica Giustino Fortunato

Abstract

We develop a general framework for multivariate analysis with missing observations, with particular emphasis on the computation and use of Mahalanobis distances. When some entries are missing, the usual Mahalanobis distance can only be computed on the observed coordinates, yielding partial distances that are not directly comparable across units with different missingness patterns. To overcome this difficulty, we study a class of adjustments that rescale partial Mahalanobis distances to a common reference scale.

The proposed methodology is based on the EM algorithm for estimating multivariate normal location and scatter in the presence of missing values. We show that this framework allows the computation of adjusted distances without explicit imputation. Seven adjustment methods are considered, including moment-based, determinant-based, and distributional transformations, as well as a model-based correction derived from the conditional expectation of the complete-data Mahalanobis distance. This principled adjustment is shown to be optimal under a mean squared error criterion.

We further extend the methodology to a robust context through a trimmed EM algorithm, thereby combining missing-data estimation with outlier detection. A simulation study compares the proposed adjustments in terms of their ability to reconstruct the complete-data Mahalanobis distances. Across a wide range of settings, the principled EM correction consistently provides the best performance, while chi-square, Beta, and standardization mappings provide useful alternatives.

Finally, we introduce a new graphical diagnostic for assessing whether data are Missing Completely at Random (MCAR), based on the comparison of Mahalanobis distances computed from complete rows with those obtained when all rows are used. This graphical procedure is formalized through a Monte-Carlo test. The methods are illustrated on a dataset of cows with missing measurements, where the analysis reveals both multivariate outliers and clear evidence against MCAR.

The proposed framework provides a flexible and robust approach to multivariate analysis with missing data, combining statistical interpretability, computational efficiency, and practical diagnostic tools.


In the table below you can find the original source (MATLAB live script): .mlx file and the corresponding .ipynb file.

MATLAB live script files

The .mlx file contain both the code and the output that the code produces.

👀 To view the .mlx files click on the "File Exchange button"

▶️ To run the .mlx files in the free MATLAB on line click on "Run in MATLAB Online". The repo will be automatically cloned.

The Jupiter notebook version of the files is also given in the last column of the table below. Similarly to the .mlx files the Jupiter notebook files also contain both the code and the output produced by the code.

Jupiter notebook files

To view the .ipynb files click on the corresponding link.

To run the .ipynb files inside the agnostic environment jupiter notebook follow the instructions in the file ipynbRunInstructions.md.

Note: in order to run the files below you need to have FSDA toolbox installed.

Description Routine name (link to HTML doc file)
EM algorithm for data with missing values (no trimming). mdEM
EM algorithm with trimming (TEM) for data with missing values. mdTEM
Compute squared Mahalanobis distances using only observed entries. mdPartialMD
Rescale partial squared Mahalanobis distances to the full-dimensional scale. mdPartialMD2full
Bootstrap test for change in Mahalanobis distances under MCAR. mdMCARtest
Replace NaNs with conditional mean or random draw from conditional distribution. mdImputeCondMean

The following section contains a table with the source code that enables the reproduction of the Figures of the paper and the simulation study.

FileName View 👀 Run ▶️ Jupiter notebook m format
RobMultMissingFigures.mlx: This code generates Figures from 4 to 9 and Table 1 of the paper. File Exchange Open in MATLAB Online RobMultMissingFigures.ipynb RobMultMissingFigures.m
RobMultMissingSimStudies.mlx: This code generates Figures from 1 to 3 and perform the simulation study described in section 3 of the paper. File Exchange Open in MATLAB Online RobMultMissingSimStudies.ipynb RobMultMissingSimStudies.m

GitHub top language GitHub code size in bytes View on File Exchange

GitHub contributors Maintenance master

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors