Auto Adaptive Robust Regression Python Package
This python package implements the Alternating Gradient Descent, Alternating Gradient Descent with Barzilai-Borwein Method and Alternating Gradient Descent with Backtracking Method. It also includes the Huber Mean Estimation, Huber Covariance Matrix Estimation, Huber Regression and Adaptive Huber Regression from R
library FarmTest, written by Xiaoou Pan.
This python package can be installed on Windows, Mac and Linux.
Install pyAutoAdaptiveRobustRegression
with pip
:
pip install pyAutoAdaptiveRobustRegression
For Windows:
There is no requirement for Windows. The armadillo and openblas libraries have already included.
For Mac:
brew install armadillo
For Linux:
apt install armadillo openblas
Some common error messages along with their solutions are collected below, and we'll keep updating them based on users' feedback:
-
Error: 6): Symbol not found: ___addtf3 Referenced from: /usr/local/opt/gcc/lib/gcc/11/libquadmath.0.dylib Expected in: /usr/lib/libSystem.B.dylib in /usr/local/opt/gcc/lib/gcc/11/libquadmath.0.dylib
Solution: After running
brew config
andbrew doctor
, found out the problem was due to that gcc is not linked. Runningsudo chown -R $(whoami) /usr/local/lib/gcc
and thenbrew link gcc
solved the problem. (more details)
There is one function from Do we need to estimate the variance in robust mean estimation?:
autoarr_mean
: Auto Adaptive Robust Regression Mean Estimation
There are four functions from A new principle for tuning-free Huber regression:
tfhuber_mean
: Tuning-Free Huber Mean Estimationtfhuber_cov
: Tuning-Free Huber Covariance Matrix Estimationtfhuber_reg
: Tuning-Free Huber Regressioncv_tfhuber_lasso
: K-fold Cross-Validated Tuning-Free Huber-Lasso Regression
First, we present an example of mean estimation about Huber and Alternating Gradient Descent related methods. We generate data from a log-normal distribution, which is asymmetric and heavy-tailed.
# Import libraries
import numpy as np
import pyAutoAdaptiveRobustRegression as arr
# Mean estimation
n = 1000
X=np.random.lognormal(0,1.5,n)-np.exp(1.5**2/2)
huber_mean_result = arr.huber_mean(X)
agd_result = arr.agd(X)
agd_bb_result = arr.agd_bb(X)
Second, for each setting, we generate an independent sample of size n = 100 and compute four mean estimators: the Sample Mean, the Huber estimator, the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein Method. Figure 1 displays the α-quantile of the estimation error, with α ranging from 0.5 to 1 based on 2000 simulations.
The four mean estimators perform almost identically for the normal data. For the heavy-tailed skewed distributions, the deviation of the sample mean from the population mean grows rapidly with the confidence level, in striking contrast to the DA-Huber estimator, the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein Method.
Figure 1: Estimation error versus confidence level for the sample mean, the DA-Huber, and the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein estimator based on 2000 simulations
Finally, in Figure 2, we examine the 99%-quantile of the estimation error versus a distribution parameter measuring the tail behavior and the skewness. That is, for normal data we let σ vary between 1 and 4; for skewed generalized t distributions, we increase the shape parameter q from 2.5 to 4; for the lognormal and Pareto distributions, the shape parameters σ and α vary from 0.25 to 2 and 1.5 to 3, respectively.
The DA-Huber, the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein estimator show substantial improvement in the deviations from the population mean because the distribution tends to have heavier tails and becomes more skewed.
Figure 2: Empirical 99%-quantile of the estimation error versus a parameter measuring the tails and skewness for the sample mean, the DA-Huber, and the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein estimator
MIT
Yichi Zhang yichi.zhang@worc.ox.ac.uk, Qiang Sun qiang.sun@utoronto.ca
Sun, Q. (2021). Do we need to estimate the variance in robust mean estimation? Paper
Bose, K., Fan, J., Ke, Y., Pan, X. and Zhou, W.-X. (2020). FarmTest: An R package for factor-adjusted robust multiple testing. R. J. 12 372-387. Paper
Fan, J., Ke, Y., Sun, Q. and Zhou, W.-X. (2019). FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control. J. Amer. Statist. Assoc. 114 1880-1893. Paper
Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive Huber regression. J. Amer. Stat. Assoc. 115 254-265. Paper
Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2020). A new principle for tuning-free Huber regression. Stat. Sinica to appear. Paper