Skip to content

YichiZhang-Oxford/pyAutoAdaptiveRobustRegression

Repository files navigation

pyAutoARR

Auto Adaptive Robust Regression Python Package

Description

This python package implements the Alternating Gradient Descent, Alternating Gradient Descent with Barzilai-Borwein Method and Alternating Gradient Descent with Backtracking Method. It also includes the Huber Mean Estimation, Huber Covariance Matrix Estimation, Huber Regression and Adaptive Huber Regression from R library FarmTest, written by Xiaoou Pan.

Installation

This python package can be installed on Windows, Mac and Linux.

Install pyAutoAdaptiveRobustRegression with pip:

pip install pyAutoAdaptiveRobustRegression

Requirements on Operating Systems

For Windows:

There is no requirement for Windows. The armadillo and openblas libraries have already included.

For Mac:

brew install armadillo

For Linux:

apt install armadillo openblas

Common Error Messages

Some common error messages along with their solutions are collected below, and we'll keep updating them based on users' feedback:

  1. Error: 6): Symbol not found: ___addtf3 Referenced from: /usr/local/opt/gcc/lib/gcc/11/libquadmath.0.dylib Expected in: /usr/lib/libSystem.B.dylib in /usr/local/opt/gcc/lib/gcc/11/libquadmath.0.dylib

    Solution: After running brew config and brew doctor, found out the problem was due to that gcc is not linked. Running sudo chown -R $(whoami) /usr/local/lib/gcc and then brew link gcc solved the problem. (more details)

Functions

There is one function from Do we need to estimate the variance in robust mean estimation?:

  • autoarr_mean: Auto Adaptive Robust Regression Mean Estimation

There are four functions from A new principle for tuning-free Huber regression:

  • tfhuber_mean: Tuning-Free Huber Mean Estimation
  • tfhuber_cov: Tuning-Free Huber Covariance Matrix Estimation
  • tfhuber_reg: Tuning-Free Huber Regression
  • cv_tfhuber_lasso: K-fold Cross-Validated Tuning-Free Huber-Lasso Regression

Examples

First, we present an example of mean estimation about Huber and Alternating Gradient Descent related methods. We generate data from a log-normal distribution, which is asymmetric and heavy-tailed.

# Import libraries
import numpy as np
import pyAutoAdaptiveRobustRegression as arr

# Mean estimation
n = 1000
X=np.random.lognormal(0,1.5,n)-np.exp(1.5**2/2)
huber_mean_result = arr.huber_mean(X)
agd_result = arr.agd(X)
agd_bb_result = arr.agd_bb(X)

Second, for each setting, we generate an independent sample of size n = 100 and compute four mean estimators: the Sample Mean, the Huber estimator, the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein Method. Figure 1 displays the α-quantile of the estimation error, with α ranging from 0.5 to 1 based on 2000 simulations.

The four mean estimators perform almost identically for the normal data. For the heavy-tailed skewed distributions, the deviation of the sample mean from the population mean grows rapidly with the confidence level, in striking contrast to the DA-Huber estimator, the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein Method.

figure 1

Figure 1: Estimation error versus confidence level for the sample mean, the DA-Huber, and the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein estimator based on 2000 simulations

Finally, in Figure 2, we examine the 99%-quantile of the estimation error versus a distribution parameter measuring the tail behavior and the skewness. That is, for normal data we let σ vary between 1 and 4; for skewed generalized t distributions, we increase the shape parameter q from 2.5 to 4; for the lognormal and Pareto distributions, the shape parameters σ and α vary from 0.25 to 2 and 1.5 to 3, respectively.

The DA-Huber, the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein estimator show substantial improvement in the deviations from the population mean because the distribution tends to have heavier tails and becomes more skewed.

figure 2

Figure 2: Empirical 99%-quantile of the estimation error versus a parameter measuring the tails and skewness for the sample mean, the DA-Huber, and the Alternating Gradient Descent estimator, and the Alternating Gradient Descent with Barzilai-Borwein estimator

License

MIT

Author(s)

Yichi Zhang yichi.zhang@worc.ox.ac.uk, Qiang Sun qiang.sun@utoronto.ca

References

Sun, Q. (2021). Do we need to estimate the variance in robust mean estimation? Paper

Bose, K., Fan, J., Ke, Y., Pan, X. and Zhou, W.-X. (2020). FarmTest: An R package for factor-adjusted robust multiple testing. R. J. 12 372-387. Paper

Fan, J., Ke, Y., Sun, Q. and Zhou, W.-X. (2019). FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control. J. Amer. Statist. Assoc. 114 1880-1893. Paper

Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive Huber regression. J. Amer. Stat. Assoc. 115 254-265. Paper

Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2020). A new principle for tuning-free Huber regression. Stat. Sinica to appear. Paper

About

Auto Adaptive Robust Regression Python Package

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages