# Extensive Elastic Net Hyperparameter Selection in RENT

This Jupyter notebook illustrates how RENT chooses hyperparameters **C** and **l1_ratio** for elastic net regularization, where the selection is embedded in the training step. For each hyperparameter combination the user tries, multiple elastic net models are trained and evaluated. As a result, the best combination with regard to two criteria shown in this notebook is selected. The hyperparameter selection in the Jupyter notebooks **RENT applied to a binary classification problem** and **RENT applied to a regression problem** happens with cross validation and is the same for classification and regression tasks. Therefore, the extensive hyperparameter search is shown on the classification problem only, but can be used for regression problems equivalently. Compared to the cross-validation approach, the extensive parameter search has high computational costs --- especially for large datasets it is not recommended. The input parameter `autoEnetParSel`, which is `True` by default, indicates if the cross-validated hyperparameter search is used. Hence, `autoEnetParSel` will be `False` in this notebook. Once the hyperparameters are selected, the flow of feature selection and post-hoc analysis remains the same as in the other Jupyter notebooks.

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 2000)

from RENT import RENT

import warnings
warnings.filterwarnings("ignore")

### Load dataset

In [2]:
train_data = pd.read_csv("data/wisconsin_train.csv").iloc[:,1:]
train_labels = pd.read_csv("data/wisconsin_train_labels.csv").iloc[:,1].values
test_data = pd.read_csv("data/wisconsin_test.csv").iloc[:,1:]
test_labels = pd.read_csv("data/wisconsin_test_labels.csv").iloc[:,1:].values

### Define and train the RENT model

When setting `autoEnetParSel=False`, RENT computes `K` models for each hyperparameter combination of `C` and `l1_ratios`. Therefore, in this example $\text{len(C)}\cdot \text{len(l1_ratios)}\cdot \text{K} = 3 \cdot 7 \cdot 100 = 2100$ models are computed in total, whereas with cross-validation the hyperparameters are chosen beforehand and $\text{K}=100$ models would be computed. 

In [3]:
# Define a range of regularisation parameters C for elastic net. A minimum of at least one value is required.
my_C_params = [0.1, 1, 10]

# Define a reange of l1-ratios for elastic net.  A minimum of at least one value is required.
my_l1_ratios = [0, 0.1, 0.25, 0.5, 0.75, 0.9, 1]

# Define setting for RENT
model = RENT.RENT_Classification(data=train_data, 
                                 target=train_labels, 
                                 feat_names=train_data.columns, 
                                 C=my_C_params, 
                                 l1_ratios=my_l1_ratios,
                                 autoEnetParSel=False,
                                 poly='OFF',
                                 testsize_range=(0.25,0.25),
                                 scoring='mcc',
                                 classifier='logreg',
                                 K=100,
                                 random_state = 0,
                                 verbose=1)

data dimension: (399, 30)  data type: <class 'pandas.core.frame.DataFrame'>
target dimension: (399,)
regularization parameters C: [0.1, 1, 10]
elastic net l1_ratios: [0, 0.1, 0.25, 0.5, 0.75, 0.9, 1]
poly: OFF
number of models in ensemble: 100
random state: 0
verbose: 1
classifier: logreg
scoring: mcc


In [4]:
model.train()

Compared to the Jupyter notebook **RENT applied to a binary classification problem**, where it takes less than 4 seconds to train the model, it takes more than 2 minutes when `autoEnetParSel=False`. Hence, runtime is the main justification for using the faster cross-validation approach. For datasets where a single model can be fitted in reasonable time it is also worth trying the full hyperparameter search.

In [5]:
model.get_runtime()

161.0327591896057

### Hyperparameter selection

The method `get_enetParam_matrices()` returns three matrices which are used for selecting a hyperparameter combination. In each matrix, the columns represent the different values for `C` and the rows contain `l1_ratios`, respectively.

In [6]:
scores, zeroes, harmonic_mean = model.get_enetParam_matrices()

For each combination of `C` and `l1_ratio`, `K` models have been computed. The first matrix, `Scores`, shows the average prediction scores for each hyperparameter combination over the `K` models. In this example, we conclude that that the combination $(\text{C}=0.1, \text{l1_ratio}=0)$ yields the highest average score. 

In [7]:
scores

Scores,0.1,1.0,10.0
0.0,0.942949,0.939164,0.916429
0.1,0.939039,0.939138,0.916034
0.25,0.938605,0.93892,0.916021
0.5,0.926711,0.939421,0.914288
0.75,0.918131,0.938161,0.911101
0.9,0.913619,0.93429,0.908983
1.0,0.913086,0.932802,0.907211


Nevertheless, we are searching for a combination that delivers a high score and simultaneously reduces the size of the feature set notably. Therefore, we also compute the matrix `zeroes`, returning the average amount of features set to $0$ for each hyperparameter combination. For $(\text{C}=0.1, \text{l1_ratio}=1)$, on average more than $0.8\%$ of all features are assigned the weight $0$, which is the highest value in the matrix. On the other side, for the combination with the highest score $(\text{C}=0.1, \text{l1_ratio}=0)$ no features are set to $0$ (a penalty with $\text{l1_ratio}=0$ is equivalent to ridge-regression). 

In [8]:
zeroes

Zeroes,0.1,1.0,10.0
0.0,0.0,0.0,0.0
0.1,0.083667,0.039667,0.005
0.25,0.283333,0.107667,0.026
0.5,0.459333,0.226667,0.065333
0.75,0.579667,0.385,0.111667
0.9,0.731667,0.493333,0.144333
1.0,0.801,0.572333,0.167


To get the best out of both, a high score and a high number of features set to zero, we normalize both matrices `Score` and `Zeroes` and calculate their harmonic mean. 

In [9]:
harmonic_mean

Harmonic Mean,0.1,1.0,10.0
0.0,0.0,0.0,0.0
0.1,0.186976,0.093841,0.012177
0.25,0.50436,0.233462,0.057365
0.5,0.559199,0.430724,0.115541
0.75,0.429695,0.618197,0.122244
0.9,0.299748,0.679482,0.077747
1.0,0.282358,0.715295,0.0


The hyperparameter combination with the highest harmonic mean is selected.

In [10]:
model.get_enet_params()

(1.0, 1.0)

### Feature Selection

From this point on, the flow is equivalent as in the Jupyter notebook **RENT applied to a binary classification problem**. We can perform feature selection and post-hoc analysis.

In [11]:
selected_features = model.select_features(tau_1_cutoff=0.9, tau_2_cutoff=0.9, tau_3_cutoff=0.975)

In [12]:
selected_features

array([20, 21, 23, 24, 27, 28])

#### Hyperparameter selection can be changed manually

Since `K` models are computed for each input hyperparameter combination, the user can manually switch hyperparameters with `set_enet_params()` to another computed combination. In this case, RENT selects features based on the user-set hyperparameters.

In [13]:
model.set_enet_params(C=1, l1_ratio=0.9)

In [14]:
model.get_enet_params()

(1, 0.9)

In [15]:
selected_features = model.select_features(tau_1_cutoff=0.9, tau_2_cutoff=0.9, tau_3_cutoff=0.975)

In [16]:
selected_features

array([ 7, 15, 20, 21, 22, 23, 24, 27, 28])