# asgl <img src="figures/logo.png" align="right" height="150" alt="funq website" /></a>

[![Downloads](https://pepy.tech/badge/asgl)](https://pepy.tech/project/asgl)
[![Downloads](https://pepy.tech/badge/asgl/month)](https://pepy.tech/project/asgl)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Package Version](https://img.shields.io/badge/version-2.1.4-blue.svg)](https://cran.r-project.org/package=asgl)

## Introduction

The `asgl` package is a versatile and robust tool designed for fitting a variety of regression models, including linear regression, quantile regression, and various penalized regression models such as Lasso, Group Lasso, Sparse Group Lasso, and their adaptive variants. The package is especially useful for simultaneous variable selection and prediction in both low and high-dimensional frameworks.

The primary class available to users is the `Regressor` class, which is detailed later in this document.

`asgl` is based on cutting-edge research and methodologies, as outlined in the following papers:

* [Adaptive Sparse Group Lasso in Quantile Regression](https://link.springer.com/article/10.1007/s11634-020-00413-8)
* [`asgl`: A Python Package for Penalized Linear and Quantile Regression](https://arxiv.org/abs/2111.00472)

For a practical introduction to the package, users can refer to the user guide notebook available in the GitHub repository. Additional accessible explanations can be found on [Towards Data Science: Sparse Group Lasso](https://towardsdatascience.com/sparse-group-lasso-in-python-255e379ab892) and [Towards Data Science: Adaptive Lasso](https://towardsdatascience.com/an-adaptive-lasso-63afca54b80d).

## Dependencies

asgl requires: 

* Python >= 3.9
* cvxpy >= 1.2.0
* numpy >= 1.20.0
* scikit-learn >= 1.0
* pytest >= 7.1.2

## User installation

The easiest way to install asgl is using `pip`:


## Testing

After installation, you can launch the test suite from the source directory (you will need to have `pytest >= 7.1.2` installed) by runnig:

## Key features:

The `Regressor` class includes the following list of parameters:

* model: str, default='lm'
  * Type of model to fit. Options are 'lm' (linear regression) and 'qr' (quantile regression).
* penalization: str or None, default='lasso'
  * Type of penalization to use. Options are 'lasso', 'gl' (group lasso), 'sgl' (sparse group lasso), 'alasso' (adaptive lasso), 'agl' (adaptive group lasso), 'asgl' (adaptive sparse group lasso), or None.
* quantile: float, default=0.5
  * Quantile level for quantile regression models. Valid values are between 0 and 1.
* fit_intercept: bool, default=True
  * Whether to fit an intercept in the model.
* lambda1: float, default=0.1
  * Constant that multiplies the penalization, controlling the strength. Must be a non-negative float i.e. in `[0, inf)`. Larger values will result in larger penalizations.
* alpha: float, default=0.5
  * Constant that performs tradeoff between individual and group penalizations in sgl and asgl penalizations.
        ``alpha=1`` enforces a lasso penalization while ``alpha=0`` enforces a group lasso penalization.
* solver: str, default='default'
  * Solver to be used by `cvxpy`. Default uses optimal alternative depending on the problem. Users can check available solvers via the command `cvxpy.installed_solvers()`.
* weight_technique: str, default='pca_pct'
  * Technique used to fit adaptive weights. Options include 'pca_1', 'pca_pct', 'pls_1', 'pls_pct', 'lasso', 'unpenalized', and 'sparse_pca'. For low dimensional problems (where the number of variables is smaller than the number of observations) the usage of the 'unpenalized' weight_technique alternative is encouraged. For high dimensional problems (where the number of variables is larger than the number of observations) the default alternative is encouraged.
* individual_power_weight: float, default=1
  * Power to which individual weights are raised. This parameter only has effect in adaptive penalizations. ('alasso' and 'asgl').
* group_power_weight: float, default=1
  * Power to which group weights are raised. This parameter only has effect in adaptive penalizations with a grouped structure ('agl' and 'asgl').
* variability_pct: float, default=0.9
  * Percentage of variability explained by PCA, PLS, and sparse PCA components. This parameter only has effect in adaptiv penalizations where `weight_technique` is equal to 'pca_pct', 'pls_pct' or 'sparse_pca'.
* lambda1_weights: float, default=0.1
  * The value of the parameter ``lambda1`` used to solve the lasso model if ``weight_technique='lasso'``
* spca_alpha: float, default=1e-5
  * Sparse PCA parameter. This parameter only has effect if `weight_technique='sparse_pca'`See scikit-learn implementation for more details.
* spca_ridge_alpha: float, default=1e-2
  * Sparse PCA parameter. This parameter only has effect if `weight_technique='sparse_pca'`See scikit-learn implementation for more details.
* individual_weights: array or None, default=None
  * Custom individual weights for adaptive penalizations. If this parameter is informed,
        it overrides the weight estimation process defined by parameter ``weight_technique`` and allows the user to
        provide custom weights. It must be either `None` or be an array with  non-negative float values and length equal to the number of variables.
* group_weights: array or None, default=None
  * Custom group weights for adaptive penalizations.  If this parameter is informed,
        it overrides the weight estimation process defined by parameter ``weight_technique`` and allows the user to
        provide custom weights. It must be either `None` or be an array with  non-negative float values and length equal to the number of groups (as defined by `group_index`)
* tol: float, default=1e-3
  * Tolerance for coefficients to be considered zero.
* weight_tol: float, default=1e-4
  * Tolerance value used to avoid ZeroDivision errors when computing the weights.

## Examples

### Example 1: Linear Regression with Lasso Penalization

In [1]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from asgl import Regressor

# Generate synthetic regression data
X, y = make_regression(n_samples=1000, n_features=50, n_informative=25, bias=10, noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)

# Create a Regressor object configured for linear regression with Lasso penalization
model = Regressor(model='lm', penalization='lasso', lambda1=0.1)
model.fit(X_train, y_train)

# Make predictions on the test data
predictions = model.predict(X_test)

# Evaluate the model's performance using mean squared error
mse = mean_squared_error(predictions, y_test)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 26.946356773986654


### Example 2: Quantile Regression with Adaptive Sparse Group Lasso Penalization

Group-based penalizations like Group Lasso, Sparse Group Lasso, and their adaptive variants, assume that there is a group structure within the regressors. This structure can be useful in various applications, such as when using dummy variables where all the dummies of the same variable belong to the same group, or in genetic data analysis where genes are grouped into genetic pathways.

For scenarios where the regressors have a known grouped structure, this information can be passed to the `Regressor` class during model fitting using the `group_index` parameter. This parameter is an array where each element indicates the group at which the associated variable belongs.  The following example demonstrates this with a synthetic group_index. The model will be optimized using scikit-learn's `RandomizedSearchCV` function.

In [2]:
import numpy as np
from sklearn.model_selection import RandomizedSearchCV

# Generate synthetic regression data
X, y = make_regression(n_samples=1000, n_features=50, n_informative=25, bias=10, noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)

# Define the group structure
group_index = np.random.randint(1, 5, size=50)

# Create a Regressor object configured for quantile regression with Adaptive Sparse Group Lasso penalization
model = Regressor(model='qr', penalization='asgl', quantile=0.5)

# Define the parameter grid for RandomizedSearchCV
param_grid = {'lambda1': [1e-4, 1e-3, 1e-2, 1e-1, 1], 'alpha': [0, 0.2, 0.4, 0.6, 0.8, 1]}
rscv = RandomizedSearchCV(model, param_grid, scoring='neg_median_absolute_error')
rscv.fit(X_train, y_train, **{'group_index': group_index})

In [3]:
rscv.best_params_

{'lambda1': 1, 'alpha': 0.4}

In [4]:
rscv.best_score_

np.float64(-238.3365880217471)

## Example 3: Customizing weights

The `asgl` package offers several built-in methods for estimating adaptive weights, controlled via the `weight_technique` parameter. For more details onto the inners of each of these alternatives, refer to the [associated research paper](https://link.springer.com/article/10.1007/s11634-020-00413-8) or to thethe next section for an overview. However, for users requiring extensive customization, the package allows for the direct specification of custom weights through the `individual_weights` and `group_weights` parameters. This allows the users to implement their own weight computation techniques and use them within the `asgl` framework.

When using custom weights, ensure that the length of `individual_weights` matches the number of variables, and the length of `group_weights` matches the number of groups. Below is an example demonstrating how to fit a model with custom individual and group weights:

In [5]:
# Generate custom weights
custom_individual_weights = np.random.rand(X_train.shape[1])
custom_group_weights = np.random.rand(len(np.unique(group_index)))

# Create a Regressor object with custom weights
model = Regressor(model='lm', penalization='asgl', individual_weights=custom_individual_weights, group_weights=custom_group_weights)

# Fit the model
model.fit(X_train, y_train, group_index=group_index)

# Mathematical formulations (with code!)
___

For an in-depth analysis of the mathematical formulations we highly encourage to read our original [paper](https://link.springer.com/article/10.1007/s11634-020-00413-8), however, here the basics of the formulations will be covered, including code examples.

### Model formulations
___
Given a response vector $y$ and a predictors matrix $X$, the package implements two possible risk functions:

#### Linear models
In the usual linear models the risk function is defined as ,

$$ R(\beta)=\|y-X\beta\|_2^2$$

This model can be fit by simply defining
 * `model=lm`: lm referring to linear model
 * `penalization=None`: This fits an unpenalized model
 * `fit_intercept=True`: Default value is `True`. If the intercept of the model is not required, it can be set to `False`.

In [6]:
lm_model = Regressor(model='lm', penalization=None)
lm_model.fit(X=X, y=y)

coef = lm_model.coef_
intercept = lm_model.intercept_
print(np.round(coef, 1))

[ 0.3 -0.1  0.2 61.7 -0.  56.5 74.3 91.3  0.2  0.   0.1  0.2  0.1 -0.1
  0.1 10.9 69.   0.1 22.7  0.   7.2 94.7 91.5 -0.4 75.4 -0.3 38.1  5.3
 -0.   0.2 -0.2  0.  70.4  0.5 59.2  7.   0.  -0.1 12.2 69.4  1.3 86.8
 21.1 -0.  96.2 -0.2 78.3 -0.   0.  -0.1]


The output provided in `coef` are the $\beta$ coefficients of the model. If `intercept` is set to `True` (or if it is left with the default value) then the first element of this array is the intercept.

#### Quantile regression models
Quantile regression models are not as well known as linear regression models, but are a very powerful alternative. These models provide an estimation of the conditional quantiles of a response variable, and are specially suited for heteroscedastik datasets. The risk function in these models is defined as,

$$R(\beta)=\frac{1}{n} \sum_{i=1}^{n} \rho_{\tau}(y_{i}-x_{i}^{t} \beta)$$

where $\rho_{\tau}(u)=u(\tau-I(u<0))$

This model can be fit as,
* `model=qr`: qr refers to quantile regression
* `penalization=None`: This fits an unpenalized model
* `tau=0.5`: $\tau$ must be a value between 0 and 1, it controls the quantile to be fit. A value of 0.5 fits a model for the median of y.
* `intercept=True`: It has the same definition as in the lasso penalization.

In [7]:
qr_model = Regressor(model='qr', penalization=None, quantile=0.5)
qr_model.fit(X=X, y=y)

coef = qr_model.coef_
print(np.round(coef, 1))

[ 0.3  0.1  0.2 61.7  0.  56.5 74.3 91.2  0.1  0.1 -0.1 -0.1  0.1  0.
 -0.1 10.8 69.   0.2 22.9  0.2  7.1 94.6 91.6 -0.4 75.2 -0.6 38.3  5.2
  0.1  0.3 -0.2 -0.  70.5  0.7 59.2  7.1 -0.1 -0.  12.3 69.6  1.2 86.9
 21.   0.1 96.2  0.1 78.3  0.1 -0.1 -0.1]


For the remaining of the document, we will stick to using linear models, but at any time you can switch to quantile regression models by simply stating `model='qr'`.

### Penalization formulations
___

#### Lasso penalization

Initially defined in 1996 by Tibshirani [(original paper)](http://statweb.stanford.edu/~tibs/lasso/lasso.pdf), lasso penalization boosts individual sparsity. It is defined as an L1 norm on the coefficients where parameter $\lambda$ controls the level of sparsity to be applied.

$$ \min R(\beta) + \lambda\sum_{i=1}^p|\beta_i|$$

It can be fit as,

* `penalization='lasso'`
* `lambda1=0.1`. This parameter is the $\lambda$ defined in the problem formulation. It controls the sparsity of the solution. Large $\lambda$ values are associated with more sparse solutions, since the coefficients are more heavily penalized.

In [8]:
lasso_model = Regressor(model='lm', penalization='lasso',lambda1=0.1)
lasso_model.fit(X=X, y=y)
coef = lasso_model.coef_
print(np.round(coef, 1))

[ 0.2 -0.   0.1 61.7  0.  56.4 74.3 91.3  0.1  0.   0.   0.2  0.  -0.
  0.  10.8 69.   0.  22.7  0.   7.1 94.6 91.5 -0.3 75.3 -0.2 38.   5.2
 -0.   0.2 -0.2  0.  70.4  0.5 59.1  6.9  0.  -0.  12.1 69.4  1.2 86.7
 21.1 -0.  96.2 -0.2 78.3  0.   0.  -0. ]


#### Group lasso penalization

Proposed in 2006 by Yuan and Li [(original paper)](http://www.columbia.edu/~my2550/papers/glasso.final.pdf), group lasso penalization works assuming that predictors from matrix $X$ have a natural grouped structure. The penalization is defined as,

$$ \min R(\beta) +\lambda \sum_{l=1}^{K} \sqrt{p_{l}}\left\|\beta^{(l)}\right\|_{2}$$

where $p_{l}$ is the size of the l-th group. This penalization can be fit by simply defining:

* `penalization='gl'` where gl refers to group lasso
* `lambda1=0.1`, where $\lambda$ is the parameter defined in the lasso penalization
* `group_index=np.random.randint(1, 5, size=50)`. This should be an array of the same length as the number of variables in matrix $X$. Each element on this array indicates the group at which the associated variable belongs. 

In [9]:
group_index = np.random.randint(1, 5, size=50)
group_lasso_model = Regressor(model='lm', penalization='gl',lambda1=0.1)
group_lasso_model.fit(X=X, y=y, group_index=group_index)
coef = group_lasso_model.coef_
print(np.round(coef, 1))

[ 0.1 10.9 22.7 94.5 -0.4 75.3  0.  12.1  0.  -0.1  0.2 56.4  0.2  0.1
 -0.1 68.9 -0.1 -0.2  0.3 -0.1 91.2  0.1 -0.3  5.3 -0.  59.1 69.3  1.3
 86.7 -0.  -0.  61.7 -0.  74.2 -0.   0.1  0.1  0.   7.2 91.4 38.   0.2
 -0.2  0.  70.4  0.5  7.  21.1 96.1 78.3]


#### Sparse group lasso
Defined in 2013 [(original paper)](https://arxiv.org/abs/1001.0736), sparse group lasso is a linear combination of lasso and group lasso penalizations that provide solutions that are both between and within group sparse. The penalization is defined as,

$$ \min R(\beta) + \alpha\lambda\sum_{i=1}^p|\beta_i| +(1-\alpha)\lambda \sum_{l=1}^{K} \sqrt{p_{l}}\left\|\beta^{(l)}\right\|_{2}$$

where $\alpha$ is a parameter defined in $[0,1]$ that balances the penalization applied between lasso and group lasso. Values of $\alpha$ close to 1 produce lasso-like solutions, while values close to 0 produce group lasso-like solutions.. This penalization can be fit defining:

* `penalization='sgl'` where sgl refers to sparse group lasso
* `lambda1=0.1`, where $\lambda$ is the parameter defined in the lasso penalization
* `alpha=0.5`, where $\alpha$ is the parameter described above
* `group_index=np.random.randint(1, 5, size=50)`, as described in the group lasso.

In [10]:
sgl_model = Regressor(model='lm', penalization='sgl',lambda1=0.1, alpha=0.5)
sgl_model.fit(X=X, y=y, group_index=group_index)
coef = sgl_model.coef_
print(np.round(coef, 1))

[ 0.  10.8 22.7 94.6 -0.3 75.3  0.  12.1  0.  -0.   0.2 56.4  0.2  0.
 -0.  68.9 -0.1 -0.2  0.2 -0.  91.3  0.1 -0.2  5.3 -0.  59.1 69.4  1.2
 86.7  0.   0.  61.7  0.  74.3  0.   0.   0.1  0.   7.1 91.5 38.   0.2
 -0.2  0.  70.4  0.5  7.  21.1 96.1 78.3]


#### Adaptive lasso

The adaptive idea was initially proposed by Zou in 2006 [(original paper)](http://users.stat.umn.edu/~zouxx019/Papers/adalasso.pdf) for an adaptive lasso. This idea is based on the usage of additional weights on the penalization as a way to reduce bias and increase the quality of variable selection and prediction accuracy. This way, adaptive lasso is defined as,

$$ \min R(\beta) + \lambda\sum_{i=1}^p\tilde{w_i}|\beta_i|$$

where $\tilde{w_i}$ are weights previously provided by the researcher. This penalization can be fit by defining,

* `penalization='alasso'` where the 'a' before 'lasso' stands for adaptive.
* `lambda1=0.1`, where $\lambda$ is the parameter defined in the lasso penalization
* `individual_weights=np.repeat(0.5, 50)`. Here, `individual_weights` refers to $\tilde{w}$, and it should be of the same length as the number of predictors in X (the number of columns in X, in this case 50)

In [11]:
individual_weights = np.repeat(0.5, 50)
alasso_model = Regressor(model='lm', penalization='alasso',lambda1=0.1, individual_weights=individual_weights)
alasso_model.fit(X=X, y=y)
coef = alasso_model.coef_

#### Adaptive group lasso

In a similar way, adaptive group lasso is defined as,

$$ \min R(\beta) +\lambda \sum_{l=1}^{K} \sqrt{p_{l}}\tilde{v_l}\left\|\beta^{(l)}\right\|_{2}$$

where $\tilde{v_l}$ are also additional weights. This penalization can be fit by defining,

* `penalization='agl'` where the 'a' before 'gl' stand for adaptive.
* `lambda1=0.1`, where $\lambda$ is the parameter defined in the lasso penalization
* `group_index=np.random.randint(1, 5, size=50)`, as described in the group lasso.
* `group_weights=np.repeat(1.5, len(np.unique(group_index)))`. Here `group_weights` refers to $\tilde{v}$ and it should be of the same length as the number of groups considered in `group_index`.


In [12]:
group_weights=np.repeat(1.5, len(np.unique(group_index)))
agl_model = Regressor(model='lm', penalization='agl',lambda1=0.1, group_weights=group_weights)
agl_model.fit(X=X, y=y, group_index=group_index)
coef = agl_model.coef_

#### Adaptive sparse group lasso

Finally, just like it happened when we saw the sgl definition, adaptive saprse group lasso is defined as a linear combination of adaptive lasso and adaptive group lasso,

$$R({\beta})+\alpha\lambda\sum_{j=1}^p\tilde{w}_j\lvert\beta_j\rvert+(1-\alpha)\lambda\sum_{l=1}^K\sqrt{p_l}\tilde{v_l}\left\|\beta^{(l)}\right\|_{2}$$


##### Remarks

Observe that by setting $\tilde{w_i}=1$ and $\tilde{v_l}=1$ we recover the sparse group lasso formulation. And additionally,by setting $\alpha=1$ we recover the lasso formulation and by setting $\alpha=0$ we recover the group lasso formulation. This means that all previous penalizations are particular cases of this adaptive sparse group lasso penalization.

Now we will see how this penalization can be fit assuming that we know some potential values for $\tilde{w_i}$ and $\tilde{v_l}$. After that, we will discuss alternatives for the estimation of such weights.

* `penalization='asgl'` where asgl refers to adaptive sparse group lasso
* `lambda1 = 0.1`, where $\lambda$ is the parameter defined in the lasso penalization
* `alpha=0.5`, where $\alpha$ is the parameter described for the sparse group lasso penalization
* `group_index=np.random.randint(1, 5, size=50)`, as described in the group lasso.
* `individual_weights=np.repeat(0.5, 50)` where `individual_weights` is the parameter defined for adaptive lasso.
* `group_weights=np.repeat(1.5, len(np.unique(group_index)))`. where `group_weights` is the parameter defined for adaptive group lasso.

In [13]:
asgl_model = Regressor(model='lm', penalization='asgl',lambda1=0.1, alpha=0.5, individual_weights=individual_weights, group_weights=group_weights)
asgl_model.fit(X=X, y=y, group_index=group_index)
coef = asgl_model.coef_

#### Weights calculation alternatives

We have seen how to fit an adaptive sparse group lasso model, but for that we used some random weights where $w=\vec{0.5}$ and $v=\vec{0.5}$. Now the question is, **is there some way to obtain an estimation of these weights that actually produce better results than a simple sparse group lasso?** And the answer is, of course! (spoiler: otherwise we would not have created this package). We propose here using principal component analysis (PCA) and partial least squares (PLS) for obtaining these weights. The usage of these techniques in the package, as we will see, is pretty straightforward and does not require any knowledge on how they work internally. You just need to choose the option that best suits you (or compare the results from different options and select the best one).

But before we start with that, let us introduce here how these weights are usually defined:

$$w_i=\frac{1}{{\beta_i}^{\gamma_1}}, \quad v_l=\frac{1}{{\|\beta^l\|_2}^{\gamma_2}}$$

The objective, then is to obtain a value for $\beta$, and we should keep in mind that we will have two extra parameters, $\gamma_1$ and $\gamma_2$, the powers at which the coefficients are risen, that we can modify.

A small $\beta$ coefficient results into a large weight, which is heavily penalized and more likely left outside the final model. On the opposite hand, large $\beta$ coefficients result into small weights that are likely to remain in the final model.

##### PCA based on a subset of components

This is our proposal for the default weight calculation alternative. Simply put, use PCA in order to reduce the number of dimensions of the problem. Fit a non penalized model using the PCA scores (taking advantage of being in a smaller dimension framework) and then project back into the original space. An in-depth explanation of this process can be read in our original [paper](https://link.springer.com/article/10.1007/s11634-020-00413-8). This can be easily done as part of the `Regressor` object:

* `penalization=asgl`
* `weight_technique='pca_pct'`. It refers to PCA percentage (because this technique is based on selecting a percentage of PCA components to fit the weights)
* `individual_power_weight=1`. This is the $\gamma_1$ coefficient value. Default value for this parameter is `1`
* `group_power_weight=1` This is the $\gamma_2$ coefficient value Default value for this parameter is `1`
* `variability_pct=0.9`. The number of PCA components to use for fitting the weight in terms of the total variability they can explain. Default value for this parameter is `variability_pct=0.9`

In [14]:
asgl_model = Regressor(model='lm', penalization='asgl',lambda1=0.1, alpha=0.5, weight_technique='pca_pct', individual_power_weight=1, group_power_weight=1, variability_pct=0.9)
asgl_model.fit(X=X, y=y, group_index=group_index)
coef = asgl_model.coef_

In [15]:
print(f"Let's see what the individual weights look like:\n{np.round(asgl_model.individual_weights, 2)}")

Let's see what the individual weights look like:
[0.48 0.08 0.57 0.02 0.29 0.03 0.02 0.01 0.16 0.07 0.2  0.15 0.1  0.08
 0.07 0.04 0.02 0.09 0.04 0.07 0.11 0.01 0.01 0.13 0.02 0.25 0.04 0.19
 0.04 0.14 0.08 0.39 0.02 0.08 0.02 0.04 0.11 4.3  0.05 0.02 0.1  0.01
 0.05 0.06 0.01 0.15 0.01 0.09 1.15 0.56]


##### PLS based on a subset of components

We propose a similar alternative but built based on partial least squares. Partial least squares is a dimensionality reduction technique that works by maximizing the covariance between the predictors $X$ and the response vector $y$ (as opposed to PCA that work by maximizing the variance of $X$). PLS is then better suitted for prediction purposes, but it is based on least squares regression, so it can be more affected by heteroscedasticity or outliers than the PCA proposal.

* `penalization=asgl`
* `weight_technique='pls_pct'`. It refers to PLS percentage (because this technique is based on selecting a percentage of PLS components to fit the weights)
* `individual_power_weight=1`. This is the $\gamma_1$ coefficient value. Default value for this parameter is `1`
* `group_power_weight=1` This is the $\gamma_2$ coefficient value Default value for this parameter is `1`
* `variability_pct=0.9`. The number of PCA components to use for fitting the weight in terms of the total variability they can explain. Default value for this parameter is `variability_pct=0.9`

In [16]:
asgl_model = Regressor(model='lm', penalization='asgl',lambda1=0.1, alpha=0.5, weight_technique='pls_pct', individual_power_weight=1, group_power_weight=1, variability_pct=0.9)
asgl_model.fit(X=X, y=y, group_index=group_index)
coef = asgl_model.coef_

##### PCA / PLS based on the first component

Each PCA is built as a linear combination of the original variables. This means that another alternative for estimating the weights can be defined as simply using the weights from the first principal component as weights for the adaptive sparse group lasso model. In the same way, it is poosible to use the first PLS compponent to obtain weights

* `penalization=asgl`
* `weight_technique='pca_1'`. It refers to using the first principal component.
* `weight_technique='pls_1'`It refers to using the first PLS component.
* `individual_power_weight=1`. As defined for PCA based on a subset of components.
* `group_power_weight=1`. As defined for PCA based on a subset of components.

In [17]:
asgl_model = Regressor(model='lm', penalization='asgl',lambda1=0.1, alpha=0.5, weight_technique='pca_1', individual_power_weight=1, group_power_weight=1)
asgl_model.fit(X=X, y=y, group_index=group_index)
coef = asgl_model.coef_

##### LASSO

Another alternative consists on running an initial lasso model, and use the estimates of this model as the initial weight for a second model

* `penalization=asgl`
* `weight_technique='lasso'`. It refers to using lasso to obtain the weights
* `lambda1_weights=1e-2`. It is the $\lambda$ value used in the lasso estimation of the weights.
* `individual_power_weight=1`. As defined for PCA based on a subset of components.
* `group_power_weight=1`. As defined for PCA based on a subset of components.

In [18]:
asgl_model = Regressor(model='lm', penalization='asgl',lambda1=0.1, alpha=0.5, weight_technique='lasso', lambda1_weights=1e-2, individual_power_weight=1, group_power_weight=1)
asgl_model.fit(X=X, y=y, group_index=group_index)
coef = asgl_model.coef_

##### Unpenalized model

This alternative can only be used when dealing with a low dimensional dataset (in which the number of observations is larger than the number of variables). In this case, it is possible to fit an initial model with no penalization, and then use these as weights for an adaptive model. We consider two alternatives. An unpenalized linear model and unpenalized quantile regression model. The usage of either one is determined by the parameter `model`.

* `model='lm'`
* `penalization=asgl`
* `weight_technique='unpenalized'`. It refers to using an unpenalized model.
* `individual_power_weight=1`. As defined for PCA based on a subset of components.
* `group_power_weight=1`. As defined for PCA based on a subset of components.


In [19]:
asgl_model = Regressor(model='lm', penalization='asgl',lambda1=0.1, alpha=0.5, weight_technique='unpenalized', individual_power_weight=1, group_power_weight=1)
asgl_model.fit(X=X, y=y, group_index=group_index)
coef = asgl_model.coef_

## What parameter values should you use?

* $\lambda$: This parameter is used in all the penalizations defined in the package, and it controls the level of sparsity applied to a solution. A typical range of values for this parameter goes from $10^{-3}$ up to $10$. eg:
`lambda1=10.0**np.arange(-3, 1.01, 0.2)`

* $\alpha$: This parameter is udes in sparse group lasso and adaptive sparse group lasso techniques. It controls the tradeoff between lasso and group lasso techniques. $\alpha$ values close to $1$ produce lasso solutions, and close to $0$ produce group lasso solutions. Best solutions are achieved usually somewhere close to one of the limit values, so a typical range  for this parameter concentrates more values on the sides and less values on the center. eg: 
`alpha=np.r_[np.arange(0.0, 0.3, 0.02), np.arange(0.3, 0.7, 0.1), np.arange(0.7, 0.99, 0.02)]`

* $\gamma_1$ and $\gamma_2$: These parameters are the powers applied to weights in adaptive penalizations. Usually, these are defined in the interval $[0, 2]$