#### Regularized linear regression

Now we'll focus on how to reduce dimensionality using regressions.

You'll be working on the numeric ANSUR body measurements dataset to predict a persons Body Mass Index (BMI) using the Lasso() regressor. 

BMI is a metric derived from body height and weight but those two features have been removed from the dataset to give the model a challenge.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [19]:
df = pd.read_csv("an_male.csv")
df.head()

Unnamed: 0,abdominalextensiondepthsitting,acromialheight,acromionradialelength,anklecircumference,axillaheight,balloffootcircumference,balloffootlength,biacromialbreadth,bicepscircumferenceflexed,bicristalbreadth,...,waistbreadth,waistcircumference,waistdepth,waistfrontlengthsitting,waistheightomphalion,wristcircumference,wristheight,weight_kg,stature_m,BMI
0,266,1467,337,222,1347,253,202,401,369,274,...,329,933,240,440,1054,175,853,81.5,1.776,25.838761
1,233,1395,326,220,1293,245,193,394,338,257,...,316,870,225,371,1054,167,815,72.6,1.702,25.062103
2,287,1430,341,230,1327,256,196,427,408,261,...,329,964,255,411,1041,180,831,92.9,1.735,30.86148
3,234,1347,310,230,1239,262,199,401,359,262,...,315,857,205,399,968,176,793,79.4,1.655,28.988417
4,250,1585,372,247,1478,267,224,435,356,263,...,303,868,214,379,1245,188,954,94.6,1.914,25.823034


In [20]:
X = df.drop("BMI", axis =1)

In [82]:
y = df["BMI"]

0       25.838761
1       25.062103
2       30.861480
3       28.988417
4       25.823034
          ...    
4077    23.689663
4078    28.761967
4079    29.130633
4080    24.766866
4081    29.477038
Name: BMI, Length: 4082, dtype: float64

In [22]:
from sklearn.model_selection import train_test_split

In [23]:
# Set the test size to 30% to get a 70-30% train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

In [24]:
from sklearn.preprocessing import StandardScaler

In [25]:
# Fit the scaler on the training features and transform these in one go
X_train_std = scaler.fit_transform(X_train)

In [26]:
from sklearn.linear_model import Lasso

In [27]:
# Create the Lasso model
la = Lasso()

In [28]:
# Fit it to the standardized training data
la.fit(X_train_std, y_train)

Lasso()

In [29]:
# Transform the test set with the pre-fitted scaler
X_test_std = scaler.transform(X_test)

Now that you've trained the Lasso model, you'll score its predictive capacity (R^2) on the test set 

In [30]:
# Calculate the coefficient of determination (R squared) on X_test_std
r_squared = la.score(X_test_std, y_test)
r_squared

0.8470200125853813

In [31]:
print("The model can predict {0:.1%} of the variance in the test set.".format(r_squared))

The model can predict 84.7% of the variance in the test set.


In [33]:
la.coef_

array([ 0.09198472, -0.        , -0.        ,  0.        , -0.        ,
        0.        , -0.        ,  0.        ,  0.26349677,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.30062148, -0.        , -0.        , -0.        ,
        0.06019966, -0.        ,  0.        ,  0.84028524,  0.        ,
       -0.        , -0.        ,  0.        , -0.        ,  0.        ,
        0.        , -0.        ,  0.        , -0.        ,  0.        ,
       -0.        , -0.        ,  0.        ,  0.        , -0.        ,
       -0.        ,  0.        ,  0.        , -0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        , -0.        ,  0.        ,  0.        ,  0.        ,
       -0.        , -0.        , -0.        , -0.        ,  0.        ,
        0.        ,  0.01276255,  0.        , -0.        , -0.        ,
       -0.        , -0.        ,  0.        , -0.        , -0.  

We'll count how many features are ignored because their coefficient is reduced to zero.

In [34]:
# Create a list that has True values when coefficients equal 0
zero_coef = la.coef_ == 0

In [35]:
# Calculate how many features have a zero coefficient
n_ignored = sum(zero_coef)
n_ignored

84

In [36]:
print("The model has ignored {} out of {} features.".format(n_ignored, len(la.coef_)))

The model has ignored 84 out of 93 features.


#### Adjusting the regularization strength

Your current Lasso model has an  score of 84.7%. When a model applies overly powerful regularization it can suffer from high bias, hurting its predictive power.

Let's improve the balance between predictive power and model simplicity by tweaking the alpha parameter.

In [77]:
# Find the highest alpha value with R-squared above 98% from the options: 1, 0.5, 0.1, and 0.01
la = Lasso(alpha = 0.1, random_state=0)

In [78]:
# Fits the model and calculates performance stats
la.fit(X_train_std, y_train)

Lasso(alpha=0.1, random_state=0)

In [79]:
r_squared = la.score(X_test_std, y_test)
r_squared

0.9915041489437504

In [80]:
n_ignored_features = sum(la.coef_ == 0)
n_ignored_features

75

In [81]:
# Print peformance stats 
print("The model can predict {0:.1%} of the variance in the test set.".format(r_squared))
print("{} out of {} features were ignored.".format(n_ignored_features, len(la.coef_)))

The model can predict 99.2% of the variance in the test set.
75 out of 93 features were ignored.


With this more appropriate regularization strength we can predict 99% of the variance in the BMI value while ignoring 2/3 of the features.

#### Combining feature selectors
In the previous lesson we saw how Lasso models allow you to tweak the strength of regularization with the alpha parameter.

We manually set this alpha parameter to find a balance between removing as much features as possible and model accuracy. However, manually finding a good alpha value can be tedious. Good news is, there is a way to automate this.

In [224]:
X = df.drop('bicepscircumferenceflexed', axis =1)

In [225]:
y = df["bicepscircumferenceflexed"]

In [303]:
# Set the test size to 30% to get a 70-30% train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

In [304]:
# Fit the scaler on the training features and transform these in one go
X_train_std = scaler.fit_transform(X_train)

#### LassoCV regressor

The LassoCV() class will use cross validation to try out different alpha settings and select the best one. When we fit this model to our training data it will get an alpha_ attribute with the optimal value.

#### Creating a LassoCV regressor

You'll be predicting biceps circumference on a subsample of the male ANSUR dataset using the LassoCV() regressor that automatically tunes the regularization strength (alpha value) using Cross-Validation.

In [305]:
from sklearn.linear_model import LassoCV

In [306]:
# Create and fit the LassoCV model on the training set
lcv = LassoCV(max_iter=15000)

In [307]:
lcv.fit(X_train_std,y_train)

LassoCV(max_iter=15000)

In [308]:
print('Optimal alpha = {0:.3f}'.format(lcv.alpha_))

Optimal alpha = 0.030


In [309]:
# Transform the test set with the pre-fitted scaler
X_test_std = scaler.transform(X_test)

In [310]:
# Calculate R squared on the test set
r_squared = lcv.score(X_test_std, y_test)


In [311]:
print('The model explains {0:.1%} of the test set variance'.format(r_squared))

The model explains 87.7% of the test set variance


To actually remove the features to which the Lasso regressor assigned a zero coefficient, we once again create a mask with True values for all non-zero coefficients.

In [312]:
# Create a mask for coefficients not equal to zero
lcv_mask = lcv.coef_ != 0

In [313]:
print('{} features out of {} selected'.format(sum(lcv_mask), len(lcv_mask)))

81 features out of 93 selected


#### Ensemble models for extra votes

The LassoCV() model selected 81 out of 93 features. Not bad, but not a spectacular dimensionality reduction either. Let's use two more models to select the 10 features they consider most important using the Recursive Feature Eliminator (RFE).

#### Feature selection with gradient boosting

The second model we train is a gradient boosting model. Like random forests gradient boosting is an ensemble method that will calculate feature importance values. We've wrapped a Recursive Feature Selector or RFE, around the model to have it select the same number of features as the LassoCV() regressor did. We can then use the support_ attribute of the fitted model to create gb_mask.

In [390]:
from sklearn.feature_selection import RFE
from sklearn.ensemble import GradientBoostingRegressor

In [391]:
# Select 10 features with RFE on a GradientBoostingRegressor, drop 3 features on each step
rfe_gb = RFE(estimator=GradientBoostingRegressor(), 
             n_features_to_select=10, step=3, verbose=1)

In [392]:
rfe_gb.fit(X_train, y_train)

RFE(estimator=GradientBoostingRegressor(), n_features_to_select=10, step=3,
    verbose=1)

In [393]:
# Calculate the R squared on the test set
r_squared = rfe_gb.score(X_test,y_test)
print('The model can explain {0:.1%} of the variance in the test set'.format(r_squared))

The model can explain 83.6% of the variance in the test set


In [318]:
# Assign the support array to gb_mask
gb_mask = rfe_gb.support_

####  Feature selection with random forest

The final model we train is a random forest regressor model. We've wrapped a Recursive Feature Selector or RFE, around the model to have it select the same number of features as the LassoCV() regressor did. We can then use the support_ attribute of the fitted model to create rf_mask.

In [319]:
from sklearn.ensemble import RandomForestRegressor

In [320]:
# Select 10 features with RFE on a RandomForestRegressor, drop 3 features on each step
rfe_rf = RFE(estimator=RandomForestRegressor(), 
             n_features_to_select=10, step=3, verbose=1)

In [321]:
rfe_rf.fit(X_train, y_train)

Fitting estimator with 93 features.
Fitting estimator with 90 features.
Fitting estimator with 87 features.
Fitting estimator with 84 features.
Fitting estimator with 81 features.
Fitting estimator with 78 features.
Fitting estimator with 75 features.
Fitting estimator with 72 features.
Fitting estimator with 69 features.
Fitting estimator with 66 features.
Fitting estimator with 63 features.
Fitting estimator with 60 features.
Fitting estimator with 57 features.
Fitting estimator with 54 features.
Fitting estimator with 51 features.
Fitting estimator with 48 features.
Fitting estimator with 45 features.
Fitting estimator with 42 features.
Fitting estimator with 39 features.
Fitting estimator with 36 features.
Fitting estimator with 33 features.
Fitting estimator with 30 features.
Fitting estimator with 27 features.
Fitting estimator with 24 features.
Fitting estimator with 21 features.
Fitting estimator with 18 features.
Fitting estimator with 15 features.
Fitting estimator with 12 fe

RFE(estimator=RandomForestRegressor(), n_features_to_select=10, step=3,
    verbose=1)

In [322]:
# Calculate the R squared on the test set
r_squared = rfe_rf.score(X_test, y_test)
print('The model can explain {0:.1%} of the variance in the test set'.format(r_squared))


The model can explain 82.9% of the variance in the test set


In [323]:
# Assign the support array to gb_mask
rf_mask = rfe_rf.support_

####  Combining the feature selectors

Finally, we can start counting the votes on whether to select a feature. We use NumPy's sum() function, pass it the three masks in a list, and set the axis argument to 0. 

We'll then get an array with the number of votes that each feature got. What we do with this vote then depends on how conservative we want to be. If we want to make sure we don't lose any information, we could select all features with at least one vote. 

In [324]:
# Sum the votes of the three models
votes = np.sum([lcv_mask,rf_mask,gb_mask], axis = 0)
print(votes)

[1 1 1 1 0 1 1 1 1 3 1 1 1 1 2 1 1 1 0 1 1 3 1 3 1 1 2 1 1 1 1 1 1 0 1 1 3
 3 1 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 3 1 1 0 1 1 1 0
 1 3 1 1 1 1 1 0 1 1 1 3 1 1 1 0 1 1 3]


In this example we chose to have at least 3 models voting for a feature in order to keep it. 

In [368]:
# Create a mask for features selected by all 3 models
meta_mask = votes == 3
print(meta_mask)

[False False False False False False False False False  True False False
 False False False False False False False False False  True False  True
 False False False False False False False False False False False False
  True  True False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False  True False False False False False
 False False False  True False False False False False False False False
 False  True False False False False False False  True]


All that is left now is to actually implement the dimensionality reduction. We can do it by applying the mask to our feature dataset X with the loc method.

In [369]:
# Apply the dimensionality reduction on X
X_reduced = X.loc[:, meta_mask]
print(X_reduced.columns)

Index(['bideltoidbreadth', 'chestbreadth', 'chestdepth',
       'forearmcircumferenceflexed', 'forearmforearmbreadth',
       'shouldercircumference', 'thighcircumference', 'waistdepth', 'BMI'],
      dtype='object')


In [370]:
# Plug the reduced dataset into a linear regression pipeline
X_train, X_test, y_train, y_test = train_test_split(X_reduced, y, test_size=0.3, random_state=0)

In [371]:
from sklearn.linear_model import LinearRegression

In [372]:
lm = LinearRegression()

In [373]:
lm.fit(scaler.fit_transform(X_train), y_train)

LinearRegression()

In [374]:
r_squared = lm.score(scaler.transform(X_test), y_test)
print('The model can explain {0:.1%} of the variance in the test set using {1:} features.'.format(r_squared, len(lm.coef_)))

The model can explain 84.7% of the variance in the test set using 9 features.
