### Codio Activity 9.7: Ridge vs. Sequential Feature Selection

**Expected Time: 60 Minutes**

**Total Points: 40**

This activity focuses on comparing the results of a `Ridge` regression model with that of a `LinearRegression` model built using `SequentialFeatureSelector`.  Both of these approaches seek to limit the complexity of the model.  The `Ridge` estimator applies a penalty that shrinks the coefficients of the model while using the `SequentialFeatureSelector` selects a subset of features to build a model with.  

#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)


In [2]:
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn import set_config
set_config(display="diagram")

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

### The Insurance Data

For this example, we return to the insurance data with cubic features.  Below the train and test data is loaded and the train and test sets are determined.  Recall that the target feature has the logarithm applied to it.  

In [3]:
train = pd.read_csv('data/train_cubic.csv')
test = pd.read_csv('data/test_cubic.csv')

In [4]:
X_train, y_train = train.drop('target_log', axis = 1), train['target_log']
X_test, y_test = test.drop('target_log', axis = 1), test['target_log']

### Problem 1

#### Feature Selection Pipeline

**10 Points**

To begin, use the pipeline below to construct a grid search over the `n_features_to_select` parameter of the `SequentialFeatureSelector` transformer.  Consider 2, 3, 4, and 5 features in your search.  Create the dictionary to be used in the search as `param_dict`.  

Assign your grid to `selector_grid`, fit on the training data, and determine the mean squared error on the train and test set.  Assign the errors as floats to `selector_train_mse` and `selector_test_mse` respectively.

In [5]:
selector_pipe = Pipeline([('selector', SequentialFeatureSelector(LinearRegression())),
                         ('model', LinearRegression())])
selector_pipe

In [11]:
### GRADED

param_dict = {}
selector_grid = ''
selector_train_mse = ''
selector_test_mse = ''

# YOUR CODE HERE
param_dict = {'selector__n_features_to_select': [2, 3, 4, 5]}
selector_grid = GridSearchCV(selector_pipe, param_grid=param_dict)
selector_grid.fit(X_train, y_train)
train_preds = selector_grid.predict(X_train)
test_preds = selector_grid.predict(X_test)
selector_train_mse = mean_squared_error(y_train, train_preds)
selector_test_mse = mean_squared_error(y_test, test_preds)
# ANSWER CHECK
print(f'Train MSE: {selector_train_mse}')
print(f'Test MSE: {selector_test_mse}')

Train MSE: 0.6031734290034885
Test MSE: 0.5655875591380699


### Problem 2

#### Ridge Grid

**10 Points**

Now, construct a `Pipeline` that contains two steps -- `scaler` and `ridge` that first standard scales the data and then build a ridge regression model.  Assign your pipeline as `ridge_pipe`.  Use this to execute the grid search over the `alpha` hyperparameter of the `Ridge` estimator using the training data. Determine the mean squared error on the train and test data. 

Assign the errors as floats to `ridge_train_mse` and `ridge_test_mse` respectively.

In [7]:
### GRADED

ridge_param_dict = ''
ridge_pipe = ''
ridge_grid = ''
ridge_train_mse = ''
ridge_test_mse = ''

# YOUR CODE HERE
ridge_pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge())])

ridge_param_dict = {'ridge__alpha': np.logspace(0, 10, 50)}
ridge_grid = GridSearchCV(estimator=ridge_pipe, param_grid=ridge_param_dict)
ridge_grid.fit(X_train,y_train)
ridge_train_mse = mean_squared_error(y_train, ridge_grid.predict(X_train))
ridge_test_mse = mean_squared_error(y_test, ridge_grid.predict(X_test))

# ANSWER CHECK
print(f'Train MSE: {ridge_train_mse}')
print(f'Test MSE: {ridge_test_mse}')
ridge_pipe

Train MSE: 0.5870277750390882
Test MSE: 0.5532169282339894


### Problem 3

#### Examining the "best" model

**10 Points**

Your results should suggest that the model using the sequential feature selector and `LinearRegression` estimator.  This was fit with the object `selector_grid`.  One question we may have is what was the optimal number of features selected and what were they?  

Use the `selector_grid` to extract both the feature names and their associated coefficients.  This will involve:

- `.best_estimator_`: extract the best estimator/selector pair from your grid search
- `.named_steps['selector']`: extract the selector from the pipeline
- `.named_steps['model']`: extract the model from the pipeline
- `.get_support()`: extract best features from selector.  This returns booleans as to whether feature was selected, we can use this to slice our train data.  

```python
X_train.columns[best_selector.get_support()]
```

- `.coef_`: coefficients from best model

In [12]:
### GRADED

best_estimator = ''
best_selector = ''
best_model = ''
feature_names = ''
coefs = ''


# YOUR CODE HERE
best_estimator = selector_grid.best_estimator_
best_selector = best_estimator.named_steps['selector']
best_model = best_estimator.named_steps['model']
feature_names = X_train.columns[best_selector.get_support()]
coefs = best_model.coef_


# Answer check
print(best_estimator)
print(f'Features from best selector: {feature_names}.')
print('Coefficient values: ')
print('===================')
pd.DataFrame([coefs.T], columns = feature_names, index = ['model'])

Pipeline(steps=[('selector',
                 SequentialFeatureSelector(estimator=LinearRegression(),
                                           n_features_to_select=2)),
                ('model', LinearRegression())])
Features from best selector: Index(['age', 'bmi children'], dtype='object').
Coefficient values: 


Unnamed: 0,age,bmi children
model,0.032852,0.00368


### Problem 4

#### Comparing observations 

**10 Points**

According to your model, predict the billed costs for person 1 and person 2 below:

- **Person 1**: Age = 30, bmi = 40, children = 0
- **Person 2**: Age = 45, bmi = 50, children = 2

Use the information from **Problem 3** and the model coefficients to make these predictions.

Note that you will want to transform your predictions.  From your model the predictions are in terms of the logarithm of cost.  To transform the logarithm to the actual value, use `np.exp` -- the inverse of a logarithm. Assign your predictions as floats to `person1` and `person2` below.  Your solution will be checked to two decimal point accuracy. 

In [15]:
### GRADED
ages = [30, 45]
bmis = [40, 50]
childrens = [0, 2]
person1 = ''
person2 = ''

# YOUR CODE HERE
person1 = np.exp(best_model.intercept_ + coefs[0]*ages[0] + coefs[1]*(bmis[0]*childrens[0]))
person2 = np.exp(best_model.intercept_ + coefs[0]*ages[1] + coefs[1]*(bmis[1]*childrens[1]))


# Answer check
print(f'Person 1 value: {person1}')
print(f'Person 2 value: {person2}')
print(f'The difference between Person 1 and Person 2 is {person2 - person1: .2f}')

Person 1 value: 5898.780939517866
Person 2 value: 13950.816694851044
The difference between Person 1 and Person 2 is  8052.04


The models here could be revisited and more encoding of features and different polynomial terms can be incorporated.  More important is understanding how to construct the pipelines and interrogate the resulting models to understand what they say about your data.  Does having a higher body mass matter if one does not have children?  Does this seem reasonable?