<h1>Machine Learning - Lasso Feature Selection</h1>
    
<h2>Lasso</h2>

Lasso is an unsupervised learning method for feature selection. LASSO, short for Least Absolute Shrinkage and Selection Operator, is a statistical formula whose main purpose is the feature selection and regularization of data models. LASSO introduces parameters to the sum of a model, giving it an upper bound that acts as a constraint for the sum to include absolute parameters within an allowable range.



<h2>How does it work?</h2>


The LASSO method regularizes model parameters by shrinking the regression coefficients, reducing some of them to zero. The feature selection phase occurs after the shrinkage, where every non-zero value is selected to be used in the model.Trying to minimize the cost function, Lasso regression will automatically select those features that are useful, discarding the useless or redundant features. In Lasso regression, discarding a feature will make its coefficient equal to 0.

I have fitted a Lasso regression on a scaled version of our dataset and we consider only those features that have a coefficient different from 0. Obviously, we first need to tune α hyperparameter in order to have the right kind of Lasso regression.




##### First, let’s import some libraries:

In [84]:
import numpy as np
%matplotlib inline
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import Lasso
import warnings
warnings.filterwarnings("ignore")



[numpy](https://numpy.org/) is a library for working with arrays and matricies in Python, [scikit-learn](https://scikit-learn.org/stable/) is a popular library for machine learning.

##### I import the dataset and its feature name or (columns names).

In [85]:
from sklearn.datasets import load_boston
bh = datasets.load_boston()
X = bh.data
y = bh.target
print(bh)
features = load_boston()['feature_names']


{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
        4.9800e+00],
       [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
        9.1400e+00],
       [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
        4.0300e+00],
       ...,
       [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
        5.6400e+00],
       [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
        6.4800e+00],
       [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
        7.8800e+00]]), 'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
       18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
       15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
       13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
       21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
       35.4, 24.7, 31.6, 23.3, 19.6, 1

##### Split the dataset into test and train using sklearn

In [86]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

##### 
I have create pipeline which contains standarscaler for scaling data and lasso model.




In [87]:
pipeline = Pipeline([ ('scaler',StandardScaler()),('model',Lasso())])

##### 
I have apply gridSearch for values of alpha start 0.1 to 5 with increase of 0.1 step.



In [102]:
search = GridSearchCV(pipeline,
                      {'model__alpha':np.arange(0.1,5,0.1)},
                      cv = 5, scoring="neg_mean_squared_error",verbose=4
                      )

##### 
I have fit the grid search.

In [103]:
search.fit(X_train,y_train)

Fitting 5 folds for each of 49 candidates, totalling 245 fits
[CV 1/5] END ................model__alpha=0.1;, score=-34.099 total time=   0.0s
[CV 2/5] END ................model__alpha=0.1;, score=-21.732 total time=   0.0s
[CV 3/5] END ................model__alpha=0.1;, score=-28.455 total time=   0.0s
[CV 4/5] END ................model__alpha=0.1;, score=-16.658 total time=   0.0s
[CV 5/5] END ................model__alpha=0.1;, score=-26.024 total time=   0.0s
[CV 1/5] END ................model__alpha=0.2;, score=-34.304 total time=   0.0s
[CV 2/5] END ................model__alpha=0.2;, score=-22.571 total time=   0.0s
[CV 3/5] END ................model__alpha=0.2;, score=-29.455 total time=   0.0s
[CV 4/5] END ................model__alpha=0.2;, score=-16.578 total time=   0.0s
[CV 5/5] END ................model__alpha=0.2;, score=-26.788 total time=   0.0s
[CV 1/5] END model__alpha=0.30000000000000004;, score=-34.669 total time=   0.0s
[CV 2/5] END model__alpha=0.30000000000000004;,

GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('scaler', StandardScaler()),
                                       ('model', Lasso())]),
             param_grid={'model__alpha': array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3,
       1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6,
       2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9,
       4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9])},
             scoring='neg_mean_squared_error', verbose=4)

##### The best value for α is:

In [104]:
search.best_params_

{'model__alpha': 0.1}

##### For get the values of the coefficients of Lasso regression.

In [105]:
coefficients = search.best_estimator_.named_steps['model'].coef_
coefficients

array([-0.66991383,  0.43061628, -0.        ,  0.87630476, -1.36149475,
        2.89584589, -0.18487001, -2.25812838,  0.46253968, -0.        ,
       -1.90039821,  0.9373862 , -3.95913625])

#####  Check the importance of feature by absolute value of its coefficient.

In [106]:
importance = np.abs(coefficients)
importance

array([0.66991383, 0.43061628, 0.        , 0.87630476, 1.36149475,
       2.89584589, 0.18487001, 2.25812838, 0.46253968, 0.        ,
       1.90039821, 0.9373862 , 3.95913625])

##### Its clear that two features have zero importance

In [107]:
np.array(features)[importance > 0]

array(['CRIM', 'ZN', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'PTRATIO',
       'B', 'LSTAT'], dtype='<U7')

##### The discarded features are:

In [108]:
np.array(features)[importance == 0]

array(['INDUS', 'TAX'], dtype='<U7')