The code was originally on an R-forge repository.
This package was motivated by correlation issues in real datasets, in particular industrial datasets.
The main idea stands in explicit modeling of the correlations between covariates by a structure of sub-regressions (so it can model complex links, not only correlations between two variables), that simply is a system of linear regressions between the covariates. It points out redundant covariates that can be deleted in a pre-selection step to improve matrix conditioning without significant loss of information and with strong explicative potential because this pre-selection is explained by the structure of sub-regressions, itself easy to interpret. An algorithm to find the sub-regressions structure inherent to the dataset is provided, based on a full generative model and using Monte-Carlo Markov Chain (MCMC) method. This pre-treatment does not depend on a response variable and thus can be used in a more general way with any correlated datasets.
In a second part, a plug-in estimator is defined to get back the redundant covariates sequentially. Then all the covariates are used but the sequential approach acts as a protection against correlations.
This package also contains some functions to make statistics easier.
library(devtools)
install_github("CorReg/CorReg", build_vignettes = TRUE)
Once the package is installed, a vignette showing an example is available using the R command:
RShowDoc("CorReg", package = "CorReg")
CorReg is developed by Clément Théry with contributions from Christophe Biernacki, Gaétan Loridant, Florian Watrin and the A106 team: Quentin Grimonprez, Vincent Kubicki, Samuel Blanck, Jérémie Kellner.
Copyright ArcelorMittal
Model-based covariable decorrelation in linear regression (CorReg): application to missing data and to steel industry. C Thery - 2015, http://www.theses.fr/2015LIL10060