name | topic | maintainer | version | source | |
---|---|---|---|---|---|
Robust |
Robust Statistical Methods |
Martin Maechler |
Martin.Maechler@R-project.org |
2023-07-01 |
Robust (or "resistant") methods for statistics modelling have been
available in S from the very beginning in the 1980s; and then in R in
package stats
. Examples are median()
, mean(*, trim =. )
, mad()
,
IQR()
, or also fivenum()
, the statistic behind boxplot()
in
package graphics
) or lowess()
(and loess()
) for robust
nonparametric regression, which had been complemented by runmed()
in
2003. Much further important functionality has been made available in
recommended (and hence present in all R versions) package
r pkg("MASS", priority = "core")
(by Bill Venables and
Brian Ripley, see the book Modern Applied Statistics with
S). Most importantly, they
provide rlm()
for robust regression and cov.rob()
for robust
multivariate scatter and covariance.
This task view is about R add-on packages providing newer or faster, more efficient algorithms and notably for (robustification of) new models.
Please send suggestions for additions and extensions via e-mail to the maintainer or submit an issue or pull request in the GitHub repository linked above.
An international group of scientists working in the field of robust
statistics has made efforts (since October 2005) to coordinate several
of the scattered developments and make the important ones available
through a set of R packages complementing each other. These should build
on a basic package with "Essentials", coined
r pkg("robustbase", priority = "core")
with (potentially
many) other packages building on top and extending the essential
functionality to particular models or applications.
Since 2020 and the 2nd edition of
Robust Statistics: Theory and Methods ,
r pkg("RobStatTM")
covers its estimators and examples,
notably by importing from r pkg("robustbase")
and
r pkg("rrcov", priority = "core")
. Further, there is the
quite comprehensive package
r pkg("robust", priority = "core")
, a version of the robust
library of S-PLUS, as an R package now GPLicensed thanks to Insightful
and Kjell Konis. Originally, there has been much overlap between
r pkg("robustbase")
and r pkg("robust")
, now r pkg("robust")
depends on r pkg("robustbase")
and
r pkg("rrcov")
, where r pkg("robust")
provides convenient
routines for the casual user while r pkg("robustbase")
and
r pkg("rrcov")
contain the underlying functionality, and
provide the more advanced statistician with a large range of options for
robust modeling.
We structure the packages roughly into the following topics, and
typically will first mention functionality in packages
r pkg("robustbase")
, r pkg("rrcov")
and
r pkg("robust")
.
-
Linear Regression:
lmrob()
(r pkg("robustbase")
) andlmRob()
(r pkg("robust")
) where the former uses the latest of the fast-S algorithms and heteroscedasticity and autocorrelation corrected (HAC) standard errors, the latter makes use of the M-S algorithm of Maronna and Yohai (2000), automatically when there are factors among the predictors (where S-estimators (and hence MM-estimators) based on resampling typically badly fail). TheltsReg()
andlmrob.S()
functions are available inr pkg("robustbase")
, but rather for comparison purposes.rlm()
fromr pkg("MASS")
had been the first widely available implementation for robust linear models, and also one of the very first MM-estimation implementations.r pkg("robustreg")
provides very simple M-estimates for linear regression (in pure R). Note that Koenker's quantile regression packager pkg("quantreg")
contains L1 (aka LAD, least absolute deviations)-regression as a special case, doing so also for nonparametric regression via splines. Packager pkg("mblm")
's functionmblm()
fits median-based (Theil-Sen or Siegel's repeated) simple linear models.Note that a location (and scale) model is a regression with only an intercept and may be approached by e.g.,
lmrob(y ~ 1)
. For very small samples, locationrobLoc()
and scalerobScale()
are also provided byr pkg("revss")
. -
Generalized Linear Models ( GLM s) for Regression:
GLMs are provided both viaglmrob()
(r pkg("robustbase")
) andglmRob()
(r pkg("robust")
).r pkg("drgee")
fits "Doubly Robust" Generalized Estimating Equations (GEEs),r pkg("complmrob")
does robust linear regression with compositional data as covariates. -
Generalized Smooth/Additive (GAM-like) Regression:
Packager pkg("GJRM")
'sgamlss()
function with optiongamlss(*, robust = TRUE)
allows fitting many model families robustly (wrapped inside the LSS "location-scale-shape" transformation scope). -
Nonlinear / Smooth (Nonparametric Function) Regression:
Robust Nonlinear model fitting is available throughr pkg("robustbase")
'snlrob()
. -
Mixed-Effects (Linear and Nonlinear) Regression:
Quantile regression (and hence L1 or LAD) for mixed effect models, is available in packager pkg("lqmm")
. Rank-based mixed effect fitting from packager pkg("rlme")
, whereas an MM-like approach for robust linear mixed effects modeling is available from packager pkg("robustlmm")
. More recently,r pkg("skewlmm")
provides robust linear mixed-effects models LMM via scale mixtures of skew-normal distributions.
- Here, the
r pkg("rrcov")
package which builds (Depends
) onr pkg("robustbase")
provides nice S4 class based methods, more methods for robust multivariate variance-covariance estimation, and adds robust PCA methodology. r pkg("rrcov")
is extended byr pkg("rrcovNA")
, providing robust multivariate methods for for incomplete or missing (NA
) data, and byr pkg("rrcovHD")
, providing robust multivariate methods for High Dimensional data.- Specialized robust PCA packages are
r pkg("pcaPP")
(via Projection Pursuit),r pkg("rpca")
(incl "sparse") andr pkg("rospca")
. Historically, note that robust PCA can be performed by using standard R'sprincomp()
, e.g.,X <- stackloss; pc.rob <- princomp(X, covmat= MASS::cov.rob(X))
- Here,
r pkg("robustbase")
contains a slightly more flexible version,covMcd()
thanr pkg("robust")
'sfastmcd()
, and similarly forcovOGK()
. On the other hand,r pkg("robust")
'scovRob()
has automatically chosen methods, notablypairwiseQC()
for large dimensionality p. Packager pkg("robustX")
for experimental, or other not yet established procedures, containsBACON()
andcovNCC()
, the latter providing the neighbor variance estimation (NNVE) method of Wang and Raftery (2002), also available (slightly less optimized) inr pkg("covRobust")
. r pkg("mvoutlier")
(building onr pkg("robustbase")
) provides several methods for outlier identification in high dimensions.r pkg("GSE")
estimates multivariate location and scatter in the presence of missing data.r pkg("RSKC")
provides Robust Sparse K-means Clustering.r pkg("robustDA")
for robust mixture Discriminant Analysis (RMDA) builds a mixture model classifier with noisy class labels.r pkg("robcor")
computes robust pairwise correlations based on scale estimates, particularly onFastQn()
.r pkg("covRobust")
provides the nearest neighbor variance estimation (NNVE) method of Wang and Raftery (2002).
- We are not considering cluster-resistant variance (/standard
error) estimation (aka "sandwich"). Rather e.g. model based and
hierarchical clustering methodology with a particular emphasis on
robustness: Note that
r pkg("cluster")
'spam()
implementing "partioning around medians" is partly robust (medians instead of very unrobust k-means) but is not good enough, as e.g., the k clusters could consist of k-1 outliers one cluster for the bulk of the remaining data. - "Truly" robust clustering is provided by packages
r pkg("genie")
,r pkg("Gmedian")
,r pkg("otrimle")
(trimmed MLE model-based) and notablyr pkg("tclust")
(robust trimmed clustering). - See also the
r view("Cluster")
CRAN task view.
BACON()
(inr pkg("robustX")
) should be applicable for larger (n,p) than traditional robust covariance based outlier detectors.
boxplot.stats()
, etc mentioned above
- R's
runmed()
provides most robust running median filtering. - Package
r pkg("robfilter")
contains robust regression and filtering methods for univariate time series, typically based on repeated (weighted) median regressions. - The
r pkg("RobPer")
provides several methods for robust periodogram estimation, notably for irregularly spaced time series. - Peter Ruckdeschel has started to lead an effort for a robust
time-series package, see
r rforge("robust-ts")
on R-Forge. - Further, robKalman, "Routines for Robust Kalman Filtering --
the ACM- and rLS-filter" , is being developed, see
r rforge("robkalman")
on R-Forge.
- Econometricians tend to like HAC (heteroscedasticity and
autocorrelation corrected) standard errors. For a broad class of
models, these are provided by package
r pkg("sandwich")
; similarlyr pkg("clubSandwich")
andr pkg("clusterSEs")
. Note thatvcov(lmrob())
also uses a version of HAC standard errors for its robustly estimated linear models. See also the CRAN task viewr view("Econometrics")
- There are several packages in the Bioconductor
project providing specialized robust
methods. In addition,
r pkg("RobLoxBioC")
provides infinitesimally robust estimators for preprocessing omics data.
- Package
r pkg("coxrobust")
provides robust estimation in the Cox model.
- Package
r pkg("robsurvey")
provides robust survey regression estimation and the robust Horvitz-Thompson estimator.
r pkg("WRS2")
contains robust tests for ANOVA and ANCOVA and other functionality from Rand Wilcox's collection.r pkg("walrus")
builds onr pkg("WRS2")
's computations, providing a different user interface.r pkg("robeth")
contains R functions interfacing to the extensive RobETH fortran library with many functions for regression, multivariate estimation and more.
- The package
r pkg("distr")
and its several child packages also allow to explore robust estimation concepts, see e.g.,r rforge("distr")
on R-Forge. - Notably, based on these, the project
r rforge("robast")
aims for the implementation of R packages for the computation of optimally robust estimators and tests as well as the necessary infrastructure (mainly S4 classes and methods) and diagnostics; cf. M. Kohl (2005). It includes the R packagesr pkg("RandVar")
,r pkg("RobAStBase")
,r pkg("RobLox")
,r pkg("RobLoxBioC")
,r pkg("RobRex")
. Further,r pkg("ROptEst")
, andr pkg("ROptRegTS")
. r pkg("RobustAFT")
computes Robust Accelerated Failure Time Regression for Gaussian and logWeibull errors.r pkg("robumeta")
for robust variance meta-regression;r pkg("metaplus")
adds robustness via t- or mixtures of normal distributions.r pkg("ssmrob")
provides robust estimation and inference in sample selection models.