Welcome to Lab 3! The goal of this lab is to see if you can build on your knowledge about the Lasso and Ridge to understand more sophisticated penalization methods. In addition, we'll see how to run cross-validation in `sklearn` and how to implement bootstrap on our own.

**Elastic Net**

The Elastic net is a popular penalization method for linear regression that combines the L1 and L2 penalties.

Useful references:
* https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html
* https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
* https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
* https://en.wikipedia.org/wiki/Elastic_net_regularization

Q1: What is the optimization problem for logistic regression with an elastic net penalty?

$\min_w 1 /n \sum_{i=1}^n \left(y_i \log \hat{p}_{w}(x) + (1 - y_i) \log (1 - \hat{p}_{w}(x)) \right)+  \lambda \alpha \|w\|_1 + 0.5 * \lambda (1 - \alpha) \|w\|^2_2$

Q2: What are the advantages to fitting an elastic net model as opposed to a Lasso or ridge-penalized model?

Lasso regression tends to fit very sparse solutions, sometimes overly sparse. In particular, if there is a group of features that are highly correlated, the lasso tends to just pick one of them at random. On the other hand, elastic net tends to select the entire group of features in this situation, acknowledging the fact that it does not know which feature is the right one.

More generally, the set of models considered by elastic net is a superset of the Lasso, as the Lasso is a special case of the elastic net with $\alpha = 0$. Therefore there is likely some set of hyperparameters for the elastic net that outperforms the Lasso.

Q3: How is the elastic net penalty related to the Lasso and ridge penalties? That is, when is using an elastic net penalty the same as using the Lasso? When is it the same as using the ridge?

The Elastic-net penalty is an additive mixture of the Lasso and ridge penalties. The hyperparameter $\alpha$ the proportion of the lasso penalty and $1 - \alpha$ defines the proportion of the ridge penalty. When $\alpha = 0$, the objective function reduces to that of the Lasso. When $\alpha = 1$, the objective function reduces to that of ridge regression.

Q4: What are the hyperparameters for an elastic net model? How would you tune them?

The hyperparameters for the elastic net are $\alpha$ (ratio for the L1 penalty) and $\lambda$ (overall penalty parameter). We need to test different configurations of the hyperparameters by using a grid search over different values of $\alpha$ and $\lambda$. We should search over values of $\alpha$ ranging from 0 to 1. We should search over positive values of $\lambda$.

Q5: Load the breast cancer dataset and split 50/50 for training and test: https://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-wisconsin-diagnostic-dataset .

Use the following code:
```
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

cancer_data = load_breast_cancer()
X = cancer_data.data
y = cancer_data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
```

In [5]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

cancer_data = load_breast_cancer()
X = cancer_data.data
y = cancer_data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

Q6: Tune the hyperparameters of an elastic net model using 3-fold cross-validation. Use `GridSearchCV` from `sklearn`. What set of hyperparameters did you pick?

In [10]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

import numpy as np

elastic_net = LogisticRegression(penalty="elasticnet", solver="saga", max_iter=10000)

# Define hyperparameters grid
param_grid = {
    'C': [1e-4, 0.001, 0.01, 0.1, 1, 10],  # Values for alpha
    'l1_ratio': np.arange(0.1,1.1,0.1),  # Values for l1_ratio
}

print(param_grid)

# Perform grid search with cross-validation
elastic_net_grid_search = GridSearchCV(estimator=elastic_net, param_grid=param_grid, cv=3, n_jobs=-1)
elastic_net_grid_search.fit(X_train, y_train)

# Get the best hyperparameters
print("Best Hyperparameters:", elastic_net_grid_search.best_params_)

{'C': [0.0001, 0.001, 0.01, 0.1, 1, 10], 'l1_ratio': array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])}
Best Hyperparameters: {'C': 0.001, 'l1_ratio': 0.6}


Q7: What is the AUC of the elastic net model with the selected hyperparameters?

In [13]:
from sklearn.metrics import roc_auc_score
test_auc = roc_auc_score(
    y_test,
    elastic_net_grid_search.predict_proba(X_test)[:,1]
)
test_auc

0.978391356542617

Q8: Create 95\% confidence intervals for the AUC using the bootstrap.

In [14]:
import numpy as np
n_test = y_test.size
boot_aucs = []
for b in range(100):
    rand_idxs = np.random.choice(n_test, n_test, replace=True)
    boot_auc = roc_auc_score(
        y_test[rand_idxs],
        elastic_net_grid_search.predict_proba(X_test[rand_idxs])[:,1]
    )
    boot_aucs.append(boot_auc)

diff_quantiles = np.quantile(test_auc - np.array(boot_aucs), q=[0.025,0.975])
(test_auc + diff_quantiles[0], test_auc + diff_quantiles[1])

(0.9641894970601177, 0.9947312284491894)