## Week 2 – End-to-end Machine Learning

# Setup

# Get the data

In [1]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
data = load_boston()
X_train, X_test, y_train, y_test = train_test_split(data['data'], data['target'])

The Boston dataset has 506 samples and 13 features.

In [2]:
from sklearn.preprocessing import StandardScaler, RobustScaler, QuantileTransformer
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.decomposition import PCA
from sklearn.linear_model import Ridge

Let's setup the following 3 tasks:

  - **Data Normalization:** 
  - **Dimensionality Reduction:** (PCA)
  - **Regression:** Ridge regression

In [4]:
scaler = StandardScaler()
pca = PCA()
ridge = Ridge()

In [None]:
ridge.fit

Now process each component

In [6]:
X_train = scaler.fit_transform(X_train)
X_train = pca.fit_transform(X_train)
ridge.fit(X_train, y_train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

How can we put it all inside a `Pipeline` ?

In [7]:
from sklearn.pipeline import Pipeline
pipe = Pipeline(
    YOUR_CODE_HERE
    )

Now, fit and train using the pipeline

In [8]:
pipe = pipe.fit(YOUR_CODE_HERE)
print('Testing score: ', YOUR_CODE_HERE)

Testing score:  -3454.849147957968


access the `explained_variance_` of the `PCA` step

In [9]:
print(pipe#YOUR_CODE_HERE)

[1.0026455 1.0026455 1.0026455 1.0026455 1.0026455 1.0026455 1.0026455
 1.0026455 1.0026455 1.0026455 1.0026455 1.0026455 1.0026455]


On every object of the pipeline, the methods `fit_transform` are invoked during training, while `transform` (or `predict`) are called during test. 


## Hyperparameter tuning

Let's aim to optimize the number of components selected by `PCA` and the regularization factor of the Ridge regressor.

To evaluate the number of components of `PCA` we can evaluate how the accuracy changes while changing the number of components from  from 1 to 10. For the regularization paramenter we can choose from an exponential range of values:

In [12]:
import numpy as np
n_features_to_test = np.arange(1, 11)

alpha_to_test = 2.0**np.arange(-6, +6)

# Define a dictionary with all the parameters:

params = #YOUR_CODE_HERE



Now, fit and score the full grid

In [19]:

from sklearn.model_selection import GridSearchCV


#How can you paralelize the search?

gridsearch = GridSearchCV(YOUR_CODE_HERE, verbose=1).fit(YOUR_CODE_HERE)

print('Final score is: ', gridsearch.score(YOUR_CODE_HERE))

Fitting 3 folds for each of 120 candidates, totalling 360 fits
Final score is:  -3502.8211425772465


[Parallel(n_jobs=-1)]: Done 360 out of 360 | elapsed:    1.0s finished


In [20]:
#Print the besta parameters:
gridsearch.best_params_

{'reduce_dim__n_components': 10, 'regressor__alpha': 2.0}

## Pipeline tuning

Let's select which algorithm to use. For example for the data normalization part we can try 

`StandardScaler`, `RobustScaler`, and `QuantileTransformer`

In [21]:
scalers_to_test = YOUR_CODE_HERE

In [23]:
params = {'scaler': scalers_to_test,
        'reduce_dim__n_components': n_features_to_test,\
        'regressor__alpha': alpha_to_test}

# We can now train the model
gridsearch = GridSearchCV(YOUR_CODE_HERE, verbose=1).fit(YOUR_CODE_HERE)
print('Final score is: ', gridsearch.score(YOUR_CODE_HERE))


Fitting 3 folds for each of 360 candidates, totalling 1080 fits
Final score is:  0.04118577226240905


[Parallel(n_jobs=1)]: Done 1080 out of 1080 | elapsed:   12.9s finished


In [31]:
gridsearch.best_params_

{'reduce_dim__n_components': 10,
 'regressor__alpha': 0.015625,
 'scaler': QuantileTransformer(copy=True, ignore_implicit_zeros=False, n_quantiles=1000,
           output_distribution='uniform', random_state=None,
           subsample=100000)}

In [30]:
gridsearch.best_estimator_

Pipeline(memory=None,
     steps=[('scaler', QuantileTransformer(copy=True, ignore_implicit_zeros=False, n_quantiles=1000,
          output_distribution='uniform', random_state=None,
          subsample=100000)), ('reduce_dim', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)), ('regressor', Ridge(alpha=0.015625, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001))])

What if the estimators we are trying to select use different parameters? Say for the dimentionality reduction part we consider `PCA` and `SelectKBest`.

We can then past a  list of parameter dictionaries

In [32]:
params = [
        {'scaler': scalers_to_test,
         'reduce_dim': YOUR_CODE_HERE,
         'reduce_dim__n_components': n_features_to_test,\
         'regressor__alpha': alpha_to_test},

        {'scaler': scalers_to_test,
         'reduce_dim': YOUR_CODE_HERE,
         'reduce_dim__k': n_features_to_test,\
         'regressor__alpha': alpha_to_test}
        ]

In [33]:
gridsearch = GridSearchCV(YOUR_CODE_HERE, n_jobs=-1).fit(X_train, y_train)
print('Final score is: ', gridsearch.score(X_test, y_test))

Fitting 3 folds for each of 720 candidates, totalling 2160 fits


[Parallel(n_jobs=-1)]: Done 348 tasks      | elapsed:    2.8s
[Parallel(n_jobs=-1)]: Done 1848 tasks      | elapsed:   12.4s


Final score is:  -1242.2165440907665


[Parallel(n_jobs=-1)]: Done 2160 out of 2160 | elapsed:   14.6s finished


In [34]:
gridsearch.best_params_

{'reduce_dim': SelectKBest(k=10, score_func=<function f_regression at 0x1a1587e730>),
 'reduce_dim__k': 10,
 'regressor__alpha': 4.0,
 'scaler': StandardScaler(copy=True, with_mean=True, with_std=True)}

In [35]:
gridsearch.best_estimator_

Pipeline(memory=None,
     steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('reduce_dim', SelectKBest(k=10, score_func=<function f_regression at 0x1a1587e730>)), ('regressor', Ridge(alpha=4.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001))])

In [36]:
gridsearch.best_score_

0.6917208528283689