# `asgl` package

## Introduction
___

`asgl` is a Python package that solves several regression related models for simultaneous variable selection and prediction, in low and high dimensional frameworks. This package is directly related to research work shown on [this paper](https://link.springer.com/article/10.1007/s11634-020-00413-8).

The current version of the package supports:
* Linear regression models
* Quantile regression models

And considers the following penalizations for variable selection:

* No penalized models 
* lasso
* group lasso
* sparse group lasso
* adaptive sparse group lasso

## Requirements 
___
The package makes use of some basic functions from `scikit-learn` and `numpy`, and is built on top of the wonderful `cvxpy` convex optimization module. It is higly encouraged to install `cvxpy` prior of the installation of `asgl` following the instructions from the original authors, that can be found [here](https://www.cvxpy.org/)). Additionally,  `asgl` makes use of python `multiprocessing` module, allowing, if requested, for parallel execution of the code highly reducing computation time.

## Usage example:
___
In the following example we will analyze the `BostonHousing` dataset (available [here](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html#sklearn.datasets.load_boston)). Even though the `asgl` package can easily deal with much more complex datasets, we will work using this one so we are not affected by computation time. We will show how to implement cross validation on a grid of possible parameter values for an sparse group lasso linear model, how to find the optimal parameter values and finally, how to compute the test error.

In [1]:
# Import required packages
import numpy as np
from sklearn.datasets import load_boston
import asgl

In [2]:
# Import test data #
boston = load_boston()
x = boston.data
y = boston.target
group_index = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5])

As it will be seen later, group lasso and sparse group lasso based formulations work taking into account a group structure of the data. The `BostonHousing` dataset does not have any group structure, but as part of this example we define a fake one called `group_index`.

We define an initial grid of values for parameters $\lambda$ and $\alpha$ from the sparse group lasso penalization. More details on the meaning of these parameters as well as the mathematical formulation of the penalization can be found in the **Sparse group lasso** section below.

In [3]:
# Define parameters grid
lambda1 = (10.0 ** np.arange(-3, 1.51, 0.2)) # 23 possible values for lambda
alpha = np.arange(0, 1, 0.05) # 20 possible values for alpha

# Define model parameters
model = 'lm'  # linear model
penalization = 'sgl'  # sparse group lasso penalization
parallel = True  # Code executed in parallel
error_type = 'MSE'  # Error measuremente considered. MSE stands for Mean Squared Error.

In [4]:
num_models = len(lambda1)*len(alpha)
print(f'We define a grid of {num_models} models.')

We define a grid of 460 models.


We have defined a grid of 460 possible models based on the combinations of different $\lambda$ and $\alpha$ values. In order to find the optimal values of the parameters, we consider 5-fold cross validation, and we will run this in parallel. Additionally, we provide a `random_state` value so that splits are reproducible.

In [5]:
# Define a cross validation object
cv_class = asgl.CV(model=model, penalization=penalization, lambda1=lambda1, alpha=alpha,
                   nfolds=5, error_type=error_type, parallel=parallel, random_state=99)

In [6]:
# Compute error using k-fold cross validation
error = cv_class.cross_validation(x=x, y=y, group_index=group_index)

num_models, k_folds = error.shape
# error is a matrix of shape (number_of_models, k_folds)
print(f'We are considering a grid of {num_models} models, optimized based on {k_folds}-folds cross validation')

# Obtain the mean error across different folds
error = np.mean(error, axis=1)

We are considering a grid of 460 models, optimized based on 5-folds cross validation


Let's find the parameter values that minimize the cross validation error.

In [7]:
# Select the minimum error
minimum_error_idx = np.argmin(error)

# Select the parameters associated to mininum error values
optimal_parameters = cv_class.retrieve_parameters_value(minimum_error_idx)
optimal_lambda = optimal_parameters.get('lambda1')
optimal_alpha = optimal_parameters.get('alpha')

In [8]:
print(f' Minimum cross validation error was {error[minimum_error_idx]}.\n Optimal parameter values:\n  Lambda: {optimal_lambda}\n  Alpha: {optimal_alpha}')

 Minimum cross validation error was 23.81466504759943.
 Optimal parameter values:
  Lambda: 0.001
  Alpha: 0.9500000000000001


We have found that the cross validation error is minimized for the parameter values shown above. Now we will consider a final train / test split in which to train the model for the pair of optimal parameters obtained before and compute the final test error. For this, we define an ASGL object, that is used to fit the model with no cross validation involved.

In [9]:
# Define asgl class using optimal values
asgl_model = asgl.ASGL(model=model, penalization=penalization, lambda1=optimal_lambda, alpha=optimal_alpha)

In [10]:
# Split data into train / test
train_idx, test_idx = asgl.train_test_split(nrows=x.shape[0], train_pct=0.7, random_state=1)

# Solve the model
asgl_model.fit(x=x[train_idx, :], y=y[train_idx], group_index=group_index)

# Obtain betas
final_beta_solution = asgl_model.coef_[0]

`asgl.coef_` stores the $\beta$ coefficients of the model found to be optimal based on cross validation. Observe that matrix $X$ has 13 variables and here `final_beta_solution` has length 14. We should take into account that the first element of this array is the intercept of the model. We can turn the intercept off by setting `intercept=False` in the `asgl.ASGL` definition.

In [11]:
np.round(final_beta_solution, 1)

array([ 34.2,  -0.1,   0. ,  -0. ,   3.1, -18.2,   4.3,  -0. ,  -1.6,
         0.3,  -0. ,  -0.9,   0. ,  -0.4])

In [12]:
# Obtain predictions
final_prediction = asgl_model.predict(x_new=x[test_idx, :])

# Obtain final errors
final_error = asgl.error_calculator(y_true=y[test_idx], 
                                    prediction_list=final_prediction,
                                    error_type=error_type)

print(f'Final error is {np.round(final_error[0], 2)}')

Final error is 29.44


# Mathematical formulations (with code!)
___

For an in-depth analysis of the mathematical formulations we highly encourage to read our original [paper](https://link.springer.com/article/10.1007/s11634-020-00413-8), however, here the basics of the formulations wil be covered, including code examples.

### Model formulations
___
Given a response vector $y$ and a predictors matrix $X$, the package implements two possible risk functions:

#### Linear models
In the usual linear models the risk function is defined as ,

$$ R(\beta)=\|y-X\beta\|_2^2$$

This model can be fit by simply defining
 * `model=lm`: lm referring to linear model
 * `penalization=None`: This fits an unpenalized model
 * `intercept=True`: Default value is `True`. If the intercept of the model is not required, it can be set to `False`.

In [13]:
lm_model = asgl.ASGL(model='lm', penalization=None)
lm_model.fit(x=x, y=y)

coef = lm_model.coef_[0]
print(np.round(coef, 1))

[ 36.5  -0.1   0.    0.    2.7 -17.8   3.8   0.   -1.5   0.3  -0.   -1.
   0.   -0.5]


The output provided in `coef` are the $\beta$ coefficients of the model. If `intercept` is set to `True` (or if it is left with the default value) then the first element of this array is the intercept.

#### Quantile regression models
Quantile regression models are not as well known as linear regression models, but are a very powerful alternative. These models provide an estimation of the conditional quantiles of a response variable, and are specially suited for heteroscedastik datasets. The risk function in these models is defined as,

$$R(\beta)=\frac{1}{n} \sum_{i=1}^{n} \rho_{\tau}(y_{i}-x_{i}^{t} \beta)$$

where $\rho_{\tau}(u)=u(\tau-I(u<0))$

This model can be fit as,
* `model=qr`: qr refers to quantile regression
* `penalization=None`: This fits an unpenalized model
* `tau=0.5`: $\tau$ must be a value between 0 and 1, it controls the quantile to be fit. A value of 0.5 fits a model for the median of y.
* `intercept=True`: It has the same definition as in the lasso penalization.

In [14]:
qr_model = asgl.ASGL(model='qr', penalization=None, tau=0.5)
qr_model.fit(x=x, y=y)

coef = qr_model.coef_[0]
print(np.round(coef, 1))

[14.9 -0.1  0.   0.   1.3 -9.2  5.3 -0.  -1.   0.2 -0.  -0.7  0.  -0.3]


### Penalization formulations
___

#### Lasso penalization

Initially defined in 1996 by Tibshirani [(original paper)](http://statweb.stanford.edu/~tibs/lasso/lasso.pdf), lasso penalization boosts individual sparsity. It is defined as an L1 norm on the coefficients where parameter $\lambda$ controls the level of sparsity to be applied.

$$ \min R(\beta) + \lambda\sum_{i=1}^p|\beta_i|$$

It can be fit as,

* `penalization='lasso'`
* `lambda1 = [0.001, 0.01, 0.1, 1, 10]`. This parameter is the $\lambda$ defined in the problem formulation. It controls the sparsity of the solution. Large $\lambda$ values are associated with more sparse solutions, since the coefficients are more heavily penalized.

In [15]:
lambda1 = [0.001, 0.01, 0.1, 1, 10]
lasso_model = asgl.ASGL(model='lm', penalization='lasso',lambda1=lambda1)
lasso_model.fit(x=x, y=y)
coef = lasso_model.coef_

Observe that `lambda1` is defined as a list of values. We fit a linear model with a lasso penalization for each possible $\lambda$ value, and the coefficients of these models are all stored in `lasso_model.coef_`. This way, the coefficients associated to the third $\lambda$ value are stored the third in `coef`.

In [16]:
print(f'The model coefficients associated to lambda value 1 (which is the fourth value in the lambda1 array) are:\n{np.round(coef[3],1)}')

The model coefficients associated to lambda value 1 (which is the fourth value in the lambda1 array) are:
[32.5 -0.1  0.  -0.   0.   0.   2.5  0.  -0.9  0.3 -0.  -0.8  0.  -0.7]


#### Group lasso penalization

Proposed in 2006 by Yuan and Li [(original paper)](http://www.columbia.edu/~my2550/papers/glasso.final.pdf), group lasso penalization works assuming that predictors from matrix $X$ have a natural grouped structure. The penalization is defined as,

$$ \min R(\beta) +\lambda \sum_{l=1}^{K} \sqrt{p_{l}}\left\|\beta^{(l)}\right\|_{2}$$

where $p_{l}$ is the size of the l-th group. This penalization can be fit by simply defining:

* `penalization='gl'` where gl refers to group lasso
* `lambda1 = [0.001, 0.01, 0.1, 1, 10]`, where $\lambda$ is the parameter defined in the lasso penalization
* `group_index=np.array([1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2])`. This should be an array of the same length as the number of variables in matrix $X$. Each element on this array indicates the group at which the associated variable belongs. For example, the first three variables from $X$ belong to group 1, while the next three belong to group 2.

In [17]:
lambda1 = [0.001, 0.01, 0.1, 1, 10]
group_index = np.array([1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2])
group_lasso_model = asgl.ASGL(model='lm', penalization='gl',lambda1=lambda1)
group_lasso_model.fit(x=x, y=y, group_index=group_index)
coef = group_lasso_model.coef_

print(f'The model coefficients associated to lambda value 10 (which is the fith value in the lambda1 array) are:\n{np.round(coef[4],1)}')

The model coefficients associated to lambda value 10 (which is the fith value in the lambda1 array) are:
[30.9 -0.   0.  -0.   0.  -0.   0.   0.  -0.   0.1 -0.  -0.2  0.  -0.5]


#### Sparse group lasso
Defined in 2013 [(original paper)](https://arxiv.org/abs/1001.0736), sparse group lasso is a linear combination of lasso and group lasso penalizations that provide solutions that are both between and within group sparse. The penalization is defined as,

$$ \min R(\beta) + \alpha\lambda\sum_{i=1}^p|\beta_i| +(1-\alpha)\lambda \sum_{l=1}^{K} \sqrt{p_{l}}\left\|\beta^{(l)}\right\|_{2}$$

where $\alpha$ is a parameter defined in $[0,1]$ that balances the penalization applied between lasso and group lasso. Values of $\alpha$ close to 1 produce lasso-like solutions, while values close to 0 produce group lasso-like solutions.. This penalization can be fit defining:

* `penalization='sgl'` where sgl refers to sparse group lasso
* `lambda1 = [0.001, 0.01, 0.1, 1, 10]`, where $\lambda$ is the parameter defined in the lasso penalization
* `alpha=[0, 0.25, 0.5, 0.75, 1]`, where $\alpha$ is the parameter described above
* `group_index=np.array([1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2])`, as described in the group lasso.

In [18]:
lambda1 = [0.001, 0.01, 0.1, 1, 10]
alpha = [0, 0.25, 0.5, 0.75, 1]
group_index = np.array([1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2])
sgl_model = asgl.ASGL(model='lm', penalization='sgl',lambda1=lambda1, alpha=alpha, parallel=True)
sgl_model.fit(x=x, y=y, group_index=group_index)
coef = sgl_model.coef_

We consider a grid with 5 possible $\lambda$ values and another 5 possible $\alpha$ values. So we fit 25 models in total. The coefficients for all the 25 models are stored in coef. Using the function `retrieve_parameters_value(idx)` we can recover the parameter values associated to a specific solution stored in `coef`. For example, if we are interested in recovering what are the parameter values that yielded the coefficients stored in the index 20 in coef, we could run,

In [19]:
coef_20 = coef[20]
param_20 = sgl_model.retrieve_parameters_value(20)

In [20]:
print(f'Coefficients value:\n{np.round(coef_20, 1)}\nParameters value:\n{param_20}')

Coefficients value:
[30.9 -0.   0.  -0.   0.  -0.   0.   0.  -0.   0.1 -0.  -0.2  0.  -0.5]
Parameters value:
{'lambda1': 10, 'alpha': 0, 'lasso_weights': None, 'gl_weights': None}


Here `lasso_weights` and `gl_weights` are parameters used in the adaptive penalization described below, and for that reason are shown as `None`.

#### Adaptive sparse group lasso

The adaptive idea was initially proposed by Zou in 2006 [(original paper)](http://users.stat.umn.edu/~zouxx019/Papers/adalasso.pdf) for an adaptive lasso. This idea is based on the usage of additional weights on the penalization as a way to reduce bias and increase the quality of variable selection and prediction accuracy. This way, adaptive lasso is defined as,

$$ \min R(\beta) + \lambda\sum_{i=1}^p\tilde{w_i}|\beta_i|$$

where $\tilde{w_i}$ are weights previously provided by the researcher. Adaptive group lasso is defined as,

$$ \min R(\beta) +\lambda \sum_{l=1}^{K} \sqrt{p_{l}}\tilde{v_l}\left\|\beta^{(l)}\right\|_{2}$$

where $\tilde{v_l}$ are also additional weights, and the sparse group lasso is defined as,

$$R({\beta})+\alpha\lambda\sum_{j=1}^p\tilde{w}_j\lvert\beta_j\rvert+(1-\alpha)\lambda\sum_{l=1}^K\sqrt{p_l}\tilde{v_l}\left\|\beta^{(l)}\right\|_{2}$$


##### Remarks

Observe that by setting $\tilde{w_i}=1$ and $\tilde{v_l}=1$ we recover the sparse group lasso formulation. And additionally,by setting $\alpha=1$ we recover the lasso formulation and by setting $\alpha=0$ we recover the group lasso formulation. This means that all previous penalizations are particular cases of this adaptive sparse group lasso penalization.

Now we will see how this penalization can be fit assuming that we know some potential values for $\tilde{w_i}$ and $\tilde{v_l}$. After that, we will discuss alternatives for the estimation of such weights.

* `penalization='asgl'` where asgl refers to adaptive sparse group lasso
* `lambda1 = [0.001, 0.01, 0.1, 1, 10]`, where $\lambda$ is the parameter defined in the lasso penalization
* `alpha=[0, 0.25, 0.5, 0.75, 1]`, where $\alpha$ is the parameter described for the sparse group lasso penalization
* `group_index=np.array([1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2])`, as described in the group lasso.
* `lasso_weights=np.repeat(0.5, 13)`. Here, `lasso_weights` refers to $\tilde{w}$, and it should be of the same length as the number of predictors in X (the number of columns in X, in this case 13)
* `gl_weights=np.repeat(1.5, 2)`. Here `gl_weights` refers to $\tilde{v}$ and it should be of the same length as the number of groups considered in `group_index` (in this case, 2)

As it happended with parameters $\lambda$ and $\alpha$, it can be interesting to fit models for different weight values. This can be easily done by simply storing all the weight candidates into a list. For example,

`lasso_weights=[np.repeat(0.5, 13), np.repeat(0.75, 13), np.repeat(1, 13)]`

would fit models for the three candidate values of the lasso weights.

In [21]:
lambda1 = [0.001, 0.01, 0.1, 1, 10]
alpha = [0, 0.25, 0.5, 0.75, 1]
group_index = np.array([1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2])

lasso_weights = [np.repeat(0.5, 13), np.repeat(0.75, 13), np.repeat(1, 13)]
gl_weights = [np.repeat(0.5, 2)]
asgl_model = asgl.ASGL(model='lm', penalization='asgl',lambda1=lambda1, alpha=alpha, 
                       lasso_weights=lasso_weights, gl_weights=gl_weights, parallel=True)
asgl_model.fit(x=x, y=y, group_index=group_index)
coef = asgl_model.coef_
len(coef)

75

We are considering here a grid of 5 $\lambda$ values, 5 $\alpha$ values, 3 $\tilde{w}$ values and 1 $\tilde{v}$ values. A total number of 75 models are fitted.

#### Weights calculation alternatives

We have seen how to fit an adaptive sparse group lasso model, but for that we used some random weights where $w=\vec{0.5}$ and $v=\vec{0.5}$. Now the question is, **is there some way to obtain an estimation of these weights that actually produce better results than a simple sparse group lasso?** And the answer is, of course! (spoiler: otherwise we would not have created this package). We propose here using principal component analysis (PCA) and partial least squares (PLS) for obtaining these weights. The usage of these techniques in the package, as we will see, is pretty straightforward and does not require any knowledge on how they work internally. You just need to choose the option that best suits you (or compare the results from different options and select the best one).

But before we start with that, let us introduce here how these weights are usually defined:

$$w_i=\frac{1}{{\beta_i}^{\gamma_1}}, \quad v_l=\frac{1}{{\|\beta^l\|_2}^{\gamma_2}}$$

The objective, then is to obtain a value for $\beta$, and we should keep in mind that we will have two extra parameters, $\gamma_1$ and $\gamma_2$, the powers at which the coefficients are risen, that we can modify.

A small $\beta$ coefficient results into a large weight, which is heavily penalized and more likely left outside the final model. On the opposite hand, large $\beta$ coefficients result into small weights that are likely to remain in the final model.

##### PCA based on a subset of components

This is our proposal for the default weight calculation alternative. Simply put, use PCA in order to reduce the number of dimensions of the problem. Fit a non penalized model using the PCA scores (taking advantage of being in a smaller dimension framework) and then project back into the original space. An in-depth explanation of this process can be read in our original [paper](https://link.springer.com/article/10.1007/s11634-020-00413-8). This can be easily done in the package by using the `WEIGHTS` class:

* `penalization=asgl`
* `weight_technique='pca_pct'`. It refers to PCA percentage (because this technique is based on selecting a percentage of PCA components to fit the weights)
* `lasso_power_weight=[0.6, 0.8, 1]`. This is the $\gamma_1$ coefficient value. Default value for this parameter is `lasso_power_weight=1`
* `gl_power_weight=[0.6, 0.8, 1]` This is the $\gamma_2$ coefficient value Default value for this parameter is `lasso_power_weight=1`
* `variability_pct=0.9`. The number of PCA components to use for fitting the weight in terms of the total variability they can explain. Default value for this parameter `variability_pct=0.9`

In [22]:
# Obtain weight values
penalization = 'asgl'
weight_technique = 'pca_pct'
lasso_power_weight = [0.6, 0.8, 1]
gl_power_weight = [0.6, 0.8, 1]
variability_pct = 0.9

weights = asgl.WEIGHTS(penalization=penalization, weight_technique=weight_technique, lasso_power_weight=lasso_power_weight, 
                       gl_power_weight=gl_power_weight, variability_pct=variability_pct)
lasso_weights, gl_weights = weights.fit(x, y, group_index=group_index)

In [23]:
print(f"Let's see what the weights look like:\n{np.round(lasso_weights[0], 2)}")

Let's see what the weights look like:
[  71.35   59.22   78.99 1938.93  843.08  527.11   40.48  183.18   60.17
    9.96  219.61   12.03   82.16]


In [24]:
# Use the weights obtained before to fit an asgl model
asgl_model = asgl.ASGL(model='lm', penalization='asgl',lambda1=lambda1, alpha=alpha, 
                       lasso_weights=lasso_weights, gl_weights=gl_weights, parallel=True)
asgl_model.fit(x=x, y=y, group_index=group_index)
coef = asgl_model.coef_

##### PLS based on a subset of components

We propose a similar alternative but built based on partial least squares. Partial least squares is a dimensionality reduction technique that works by maximizing the covariance between the predictors $X$ and the response vector $y$ (as opposed to PCA that work by maximizing the variance of $X$). PLS is then better suitted for prediction purposes, but it is based on least squares regression, so it can be more affected by heteroscedasticity or outliers than the PCA proposal.

* `penalization=asgl`
* `weight_technique='pls_pct'`. It refers to PLS percentage (because this technique is based on selecting a percentage of PLS components to fit the weights)
* `lasso_power_weight=[0.8, 1, 1.2]`. As defined for PCA based on a subset of components.
* `gl_power_weight=[0.8, 1, 1.2]`. As defined for PCA based on a subset of components.
* `variability_pct=0.9`. As defined for PCA based on a subset of components. 

In [25]:
# Obtain weight values
penalization = 'asgl'
weight_technique = 'pls_pct'
lasso_power_weight = [0.8, 1, 1.2]
gl_power_weight = [0.8, 1, 1.2]
variability_pct = 0.9

weights = asgl.WEIGHTS(penalization=penalization, weight_technique=weight_technique, lasso_power_weight=lasso_power_weight,
                     gl_power_weight=gl_power_weight, variability_pct=variability_pct)
lasso_weights, gl_weights = weights.fit(x, y, group_index=group_index)

##### PCA / PLS based on the first component

Each PCA is built as a linear combination of the original variables. This means that another alternative for estimating the weights can be defined as simply using the weights from the first principal component as weights for the adaptive sparse group lasso model. This alternative produce worse results than the previous ones, but runs faster. In the same way, it is poosible to use the first PLS compponent to obtain weights

* `penalization=asgl`
* `weight_technique='pca_1'`. It refers to using the first principal component.
* `weight_technique='pls_1'`It refers to using the first PLS component.
* `lasso_power_weight=[0.8, 1, 1.2]`. As defined for PCA based on a subset of components.
* `gl_power_weight=[0.8, 1, 1.2]`. As defined for PCA based on a subset of components.

In [26]:
# Obtain weight values
penalization = 'asgl'
weight_technique = 'pca_1'
lasso_power_weight = [0.8, 1, 1.2]
gl_power_weight = [0.8, 1, 1.2]

weights = asgl.WEIGHTS(penalization=penalization, weight_technique=weight_technique, 
                       lasso_power_weight=lasso_power_weight, gl_power_weight=gl_power_weight)
lasso_weights, gl_weights = weights.fit(x, y, group_index=group_index)

weight_technique = 'pls_1'
weights_pls = asgl.WEIGHTS(penalization=penalization, weight_technique=weight_technique, 
                           lasso_power_weight=lasso_power_weight, gl_power_weight=gl_power_weight)
lasso_weights, gl_weights = weights_pls.fit(x, y, group_index=group_index)

##### Unpenalized model

This alternative can only be used when dealing with a low dimensional dataset (in which the number of observations is larger than the number of variables). In this case, it is possible to fit an initial model with no penalization, and then use this as weights for an adaptive model. We consider two alternatives. An unpenalized linear model and unpenalized quantile regression model.

* `penalization=asgl`
* `weight_technique='unpenalized_lm'`. It refers to using an unpenalized linear model.
* `weight_technique='unpenalized_qr'`It refers to using an unpenalized quantile regression model.
* `lasso_power_weight=[0.8, 1, 1.2]`. As defined for PCA based on a subset of components.
* `gl_power_weight=[0.8, 1, 1.2]`. As defined for PCA based on a subset of components.


In [27]:
# Obtain weight values
penalization = 'asgl'
weight_technique = 'unpenalized_lm'
lasso_power_weight = [0.8, 1, 1.2]
gl_power_weight = [0.8, 1, 1.2]

weights = asgl.WEIGHTS(penalization=penalization, weight_technique=weight_technique, lasso_power_weight=lasso_power_weight,
                     gl_power_weight=gl_power_weight)
lasso_weights, gl_weights = weights.fit(x, y, group_index=group_index)

weight_technique = 'unpenalized_qr'
weights_qr = asgl.WEIGHTS(penalization=penalization, weight_technique=weight_technique, lasso_power_weight=lasso_power_weight,
                     gl_power_weight=gl_power_weight)
lasso_weights, gl_weights = weights_qr.fit(x, y, group_index=group_index)

## What parameter values should you use?

* $\lambda$: This parameter is used in all the penalizations defined in the package, and it controls the level of sparsity applied to a solution. A typical range of values for this parameter goes from $10^{-3}$ up to $10$. eg:
`lambda1=10.0**np.arange(-3, 1.01, 0.2)`

* $\alpha$: This parameter is udes in sparse group lasso and adaptive sparse group lasso techniques. It controls the tradeoff between lasso and group lasso techniques. $\alpha$ values close to $1$ produce lasso solutions, and close to $0$ produce group lasso solutions. Best solutions are achieved usually somewhere close to one of the limit values, so a typical range  for this parameter concentrates more values on the sides and less values on the center. eg: 
`alpha=np.r_[np.arange(0.0, 0.3, 0.02), np.arange(0.3, 0.7, 0.1), np.arange(0.7, 0.99, 0.02)]`

* $\gamma_1$ and $\gamma_2$: These parameters are the powers applied to weights in adaptive penalizations. Usually, these are defined in the interval $[0, 2]$

## Main functions on the package

There are three main functions in this package:


### `asgl.ASGL.fit`
This function is used for fitting any model (unpenalized or penalized) on a dataset. Input parameter values of this function are:

* `model`: model to be fit (accepts `'lm'` or `'qr'`)
* `penalization`: penalization to use (accepts `None`, `'lasso'`, `'gl'`, `'sgl'`, `'asgl'`, `'asgl_lasso'`, `'asgl_gl'`)
* `intercept`: boolean, wheter to fit the model including intercept or not
* `tol`:  tolerance for a coefficient in the model to be considered as 0
* `lambda1`: parameter value that controls the level of shrinkage applied on penalizations
* `alpha`: parameter value, tradeoff between lasso and group lasso in sgl penalization
* `tau`: quantile level in quantile regression models
* `lasso_weights`: lasso weights in adaptive penalizations
* `gl_weights`: group lasso weights in adaptive penalizations
* `parallel`: boolean, wheter to execute the code in parallel or sequentially
* `num_cores`: if parallel is set to true, the number of cores to use in the execution. Default is (max - 1)
* `solver`: solver to be used by CVXPY. Default uses optimal alternative depending on the problem
* `max_iters`: CVXPY parameter. Default is 500


### `asgl.WEIGHTS.fit`

This function is used for fitting weights used later by adaptive penalizations. Input parameter values of this function are:

* `model`: model to be fit using these weights (accepts `'lm'` or `'qr'`)
* `penalization`: penalization to use (`'asgl'`, `'asgl_lasso'`, `'asgl_gl'`)
* `tau`: quantile level in quantile regression models
* `weight_technique`: weight technique to use for fitting the adaptive weights. Accepts `'pca_1'`, `'pca_pct'`, `'pls_1'`, `'pls_pct'`, `'unpenalized_lm'`, `'unpenalized_qr'`, `'spca'`
* `weight_tol`: Tolerance value used for avoiding ZeroDivision errors
* `lasso_power_weight`: parameter value, power at which the lasso weights are risen. Default is 1
* `gl_power_weight`: parameter value, power at which the group lasso weights are risen. Default is 1
* `variability_pct`: parameter value, percentage of variability explained by pca or pls components used in `'pca_pct'`, `'pls_pct'` and `'spca'`. Default is 0.9 (90%)
* `spca_alpha`: sparse PCA parameter. Default is $10^{-5}$
* `spca_ridge_alpha`: sparse PCA parameter. Default is $10^{-2}$

### `asgl.CV.cross_validation`

This function performs cross validation over a dataset in order to obtain optimal parameter values for a model. Input parameter values of this function are:

* All the parameters described for `asgl.ASGL.fit` and `asgl.WEIGHTS.fit` methods
* `error_type`: error measurement to use. Accepts:
   * `'MSE'`: mean squared error
   * `'MAE'`: mean absolute error
   * `'MDAE'`: mean absolute deviation error
   * `'QRE'`: quantile regression error
* `random_state`: random state value in case reproducible data splits are required
* `nfolds`: number of folds in which the dataset should be split. Default value is 5

