xtune: Tuning feature-specific shrinkage parameters of penalized regression models based on external information
=======
In standard regularized regression (Lasso, Ridge, and Elastic-net), a single penalty parameter
Better prediction accuracy may be achieved by allowing a different amount of shrinkage. Ideally, we want to give a small penalty to important features and a large penalty to unimportant features. We guide the penalized regression model with external data
The objective function of feature-specific shrinkage integrating external information is:
where
when
The idea of external data is that it provides us information on the importance/effect size of regression coefficients. It could be any nominal or quantitative feature-specific information, such as the grouping of predictors, prior knowledge of biological importance, external p-values, function annotations, etc. Each column of
Penalized regression fitting consists of two phases: (1) learning the
tuning parameter(s) (2) estimating the regression coefficients giving
the tuning parameter(s). Phase (1) is the key to achieve good
performance. Cross-validation is widely used to tune a single penalty
parameter, but it is computationally infeasible to tune more than three
penalty parameters. We propose an Empirical Bayes approach to
estimate the multiple tuning parameters. The individual penalties are
interpreted as variance terms of the priors (exponential prior for
Elastic-net) in a random effect formulation of penalized regressions. A
majorization-minimization algorithm is employed for implementation. Once
the tuning parameters (\lambda)s are estimated, and therefore the
penalties are known, phase (2) - estimating the regression coefficients
is done using glmnet
.
Suppose we want to predict a person’s weight loss using his/her weekly dietary intake. Our external information Z could incorporate information about the levels of relevant food constituents in the dietary items.
Primary data X and Y: predicting an individual’s weight loss by his/her weekly dietary items intake
External information Z: the nutrition facts about each dietary item
xtune
can be installed from Github using the following command:
# install.packages("devtools")
library(devtools)
devtools::install_github("JingxuanH/xtune",
build_vignettes = TRUE)
library(xtune)
-
xtune
LASSO: Zeng, Chubing, Duncan Campbell Thomas, and Juan Pablo Lewinger. “Incorporating prior knowledge into regularized regression.” Bioinformatics 37.4 (2021): 514-521. -
xtune
classification with Elastic-net type of penalty: paper coming soon -
xtune
package:
citation("xtune")
#>
#> To cite package 'xtune' in publications use:
#>
#> Jingxuan He and Chubing Zeng (2023). xtune: Regularized Regression
#> with Feature-Specific Penalties Integrating External Information. R
#> package version 2.0.0. https://github.com/JingxuanH/xtune
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {xtune: Regularized Regression with Feature-Specific Penalties Integrating External Information},
#> author = {Jingxuan He and Chubing Zeng},
#> year = {2023},
#> note = {R package version 2.0.0},
#> url = {https://github.com/JingxuanH/xtune},
#> }
Feel free to contact hejingxu@usc.edu
if you have any questions.
To show some examples on how to use this package, we simulated an
example of data that contains 100 observations, 200 predictors, and a
continuous outcome. The external information
library(xtune)
## load the example data
data(example)
The data looks like:
example$X[1:3,1:5]
#> Predictor_1 Predictor_2 Predictor_3 Predictor_4 Predictor_5
#> Observation_1 -0.7667960 0.9212806 2.0149030 0.79004563 -1.4244699
#> Observation_2 -0.8164583 -0.3144157 -0.2253684 0.08712746 -1.0296026
#> Observation_3 -0.1415352 0.6623149 -1.0398456 1.87611212 0.7340254
example$Z[1:5,]
#> External_variable_1 External_variable_2 External_variable_3
#> Predictor_1 1 0 0
#> Predictor_2 1 0 0
#> Predictor_3 0 1 0
#> Predictor_4 0 1 0
#> Predictor_5 0 0 1
#> External_variable_4
#> Predictor_1 0
#> Predictor_2 0
#> Predictor_3 0
#> Predictor_4 0
#> Predictor_5 0
xtune()
is the core function to fit the integrated penalized
regression model. At a minimum, you need to specify the predictor matrix
X
, outcome variable Y
. If an external information matrix Z
is
provided, the function will incorporate Z
to allow differential
shrinkage based on Z. The estimated tuning parameters are returned in
$penalty.vector
.
If you do not provide external information Z
, the function will
perform empirical Bayes tuning to choose the single penalty parameter in
penalized regression, as an alternative to cross-validation. You could
compare the tuning parameter chosen by empirical Bayes tuning to that
choose by cross-validation (see also cv.glmnet
). The default penalty
applied to the predictors is the Elastic-net penalty.
If you provide an identify matrix as external information Z to
xtune()
, the function will estimate a separate tuning parameter
xtune.fit <- xtune(example$X,example$Y,example$Z, family = "linear")
#> Z provided, start estimating individual tuning parameters
#> Start estimating alpha:
#> Done!
To view the penalty parameters estimated by xtune()
xtune.fit$penalty.vector[1:5]
#> [1] 0.005084153 0.005084153 0.014341728 0.014341728 0.049277453
The coef
and predict
functions can be used to extract beta
coefficient estimates and predict response on new data.
coef_xtune(xtune.fit)[1:5]
#> [1] 0.07964816 2.08402043 -1.95702251 0.86824853 -1.31184429
predict_xtune(xtune.fit, example$X)[1:5]
#> Observation_1 Observation_2 Observation_3 Observation_4 Observation_5
#> -2.573373 -2.921847 -5.592820 2.197047 1.685603
More details and examples are also described in the vignettes to further illustrate the usage and syntax of this package.