# LAB3: Sparsity
Author: Mathurin Massias (mathurin.massias@gmail.com)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

from scipy.io import loadmat

from sklearn.linear_model import ElasticNet, ElasticNetCV
from sklearn.model_selection import train_test_split

from lab3_utils import create_random_data

## Dataset generation and model fitting

In [None]:
n_samples = 100
n_features = 200
n_informative_features = 50

X, y = create_random_data(n_samples, n_features, n_informative_features, 
                          noise_level=0.3)
print("X shape:", X.shape)
print("y shape:", y.shape)


train_size = 0.8  # proportion of dataset used for training
X_train, X_test, y_train, y_test = train_test_split(
    X, y, shuffle=False, train_size=train_size)
print("Training dataset shape:", X_train.shape)

In sklearn, the objective function of the ElasticNet optimization is:
$$\frac{1}{2 \times \text{n_samples}} \Vert y - X w \Vert_2^2 + \alpha \times \left( \text{l1_ratio} \times \Vert w \Vert_1 + \frac{1 - \text{l1_ratio}}{2} \Vert w \Vert_2^2\right)$$

See the docstring for more information in the next cell:

In [None]:
ElasticNet?

In [None]:
# fit a classifier with arbitrary values for L1 and L2 penalization
clf = ElasticNet(alpha=0.1, l1_ratio=0.1)

In [None]:
# fit the model and print some its first coefficients
# beware that sklearn fits an intercept by default
clf.fit(X_train, y_train)
print("50 first coefficients of estimated w:\n", clf.coef_[:50])
print("Intercept: %f" % clf.intercept_)
print("Nonzero coefficients: %d" % (clf.coef_ != 0.).sum())
print("Testing error: %.4f" % np.mean((y_test - clf.predict(X_test)) ** 2))

In [None]:
# test the influence of l1_ratio on the sparsity of the solution
l1_ratios = [0., 0., 0.]  # TODO choose your values

train_errs = np.zeros(len(l1_ratios))
test_errors = np.zeros_like(test_errs)

for i, l1_ratio in enumerate(l1_ratios):
    clf = # TODO
    # TODO fit and check sparsity
    # TODO compute train and test errors
    train_errs[i] = 
    test_errs[i] = 
    
plt.figure()
plt.plot(l1_ratios, test_errs, label='Test error')
plt.plot(l1_ratios, train_errs, label='Train error')
plt.xlabel("l1_ratio")
plt.legend();

In [None]:
# TODO also check the influence of alpha. 
# What happens when alpha becomes too big?
alphas = np.geomspace(1e-4, 1e4, num=9)

## Parameter selection with cross validation
In the next section, we use scikit-learn's built in functions to perform cross validated selection of alpha and l1_ratio.

## Classification data

Load some data verifying $y = \text{sign}(X w)$ where $w$ is $s$-sparse but you do not know $s$:

In [None]:
data = loadmat("../../data/part3-data.mat")

In [None]:
X = data["X"]
y = data["Y"][:, 0]
print(X.shape, y.shape)
# TODO check numerically that y only contains 1s and -1s

Now you must infer $s$.
A first approach should be based on the Cross-Validation procedure used in the previous part.

In [None]:
# TODO find optimal from a CV point of view

Another way to try to estimate $s$ is to measure the correlation between
the columns of $X$ and $y$. Indeed, the zero coefficients in $w$ will ignore the
corresponding columns in $X$ while generating $y$. 


In [None]:
# TODO compute correlation
corr = 

In [None]:
# sort:
idx = np.argsort()
plt.plot(corr[idx[::-1]])

In [None]:
# TODO identify the cutoff numerically, get indices of highest correlated features
highly_corr_feats = # TODO

Finally, use again the code of the first part, to tune the sparsity parameter l1_ratio so that
it selects only $s$ features ($s$ being your sparsity estimate from the previous
question). Look at which are the selected features in your solution. Do they
correspond to the ones you identified with the correlation approach? 
If they do not, can you figure out why does this happen?