# Embedded Feature Selection

## Types

1. Selecting features from coefficients/weights (Linear models).
2. Selecting features from Tree models (e.g Random forests, Decision trees, etc)

### Selecting features from coefficients/weights (Linear models).

* Regularization involves applying a penalty to the various machine learning model parameters in order to limit the model's freedom and prevent overfitting. 
* The penalty is applied to the coefficients that multiply each predictor in a linear model. The ability to reduce some coefficients to zero is a feature of the Lasso regularization, or l1. 
* As a result, the model can do without those features. 

In [1]:
import numpy as np
import pandas as pd

# pandas settings
pd.options.display.max_rows = 1_000
pd.options.display.max_columns = 1_000
pd.options.display.max_colwidth = 1_000

from src.data_manager import load_data, split_data

%load_ext lab_black

In [2]:
# Load data
fp = "../../data/student-por.csv"
data = load_data(filename=fp, sep=";")

data.head(3)

Shape of data: (649, 33)

Duration: 0.01 seconds


Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,course,mother,2,2,0,yes,no,no,no,yes,yes,no,no,4,3,4,1,1,3,4,0,11,11
1,GP,F,17,U,GT3,T,1,1,at_home,other,course,father,1,2,0,no,yes,no,no,no,yes,yes,no,5,3,3,1,1,3,2,9,11,11
2,GP,F,15,U,LE3,T,1,1,at_home,other,other,mother,1,2,0,yes,no,no,no,yes,yes,yes,no,4,3,2,2,3,3,6,12,13,12


In [3]:
from sklearn.feature_selection import SelectFromModel
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso, LinearRegression
from sklearn import metrics

In [4]:
# Select the numneric variables (they don't require so much preprocessing!)
num_df = data.select_dtypes(include=[int, float])

# These are highly correlated features with the target: G3
grades = ["G1", "G2"]
num_df = num_df.assign(prev_grades=num_df[grades].mean(axis=1))
num_df = num_df.drop(columns=["G1", "G2"])
num_df.head()

Unnamed: 0,age,Medu,Fedu,traveltime,studytime,failures,famrel,freetime,goout,Dalc,Walc,health,absences,G3,prev_grades
0,18,4,4,2,2,0,4,3,4,1,1,3,4,11,5.5
1,17,1,1,1,2,0,5,3,3,1,1,3,2,11,10.0
2,15,1,1,1,2,0,4,3,2,2,3,3,6,12,12.5
3,15,4,2,1,3,0,3,2,2,1,1,5,0,14,14.0
4,16,3,3,1,2,0,4,3,2,1,2,5,0,13,12.0


In [5]:
RANDOM_STATE = 123
TARGET = "G3"
TEST_SIZE = 0.2

X_train, X_validation, y_train, y_validation = split_data(
    data=num_df, target=TARGET, random_state=RANDOM_STATE, test_size=TEST_SIZE
)

X_train.shape, X_validation.shape

Shape of X_train: (519, 14), 
Shape of X_validation: (130, 14)
Duration: 0.004 seconds


((519, 14), (130, 14))

In [6]:
scaler = StandardScaler()
scaler.fit(X_train)

# A constant that multiplies the L1 term, controlling regularization strength
# Alpha must be a non-negative float. The higher, the stricter the regularisation.
# It is totally tunable.
ALPHA = 0.08
# Transformer for selecting features based on importance weights
selector = SelectFromModel(Lasso(alpha=ALPHA, random_state=RANDOM_STATE))
selector.fit(scaler.transform(X_train), y_train)

In [7]:
# List of the selected features
selected_feat = X_train.columns[(selector.get_support())]

print(f"Total features: {X_train.shape[1]}")
print(f"Selected features: {len(selected_feat)}")

selected_feat

Total features: 14
Selected features: 5


Index(['age', 'failures', 'health', 'absences', 'prev_grades'], dtype='object')

In [8]:
shrank_features = np.sum(selector.estimator_.coef_ == 0)

print(f"Features with coefficients that shrank to zero: {shrank_features}")

Features with coefficients that shrank to zero: 9


#### Compare The Two Models

* Models built with all the numeric features.
* Models built with the selected features.

In [9]:
# All Features
scaler = StandardScaler()
all_feats_model = LinearRegression()
all_feats_model.fit(scaler.fit_transform(X_train), y_train)
y_t_pred = all_feats_model.predict(scaler.transform(X_train))
y_pred = all_feats_model.predict(scaler.transform(X_validation))
all_feats_metric_train = metrics.r2_score(y_true=y_train, y_pred=y_t_pred)
all_feats_metric_test = metrics.r2_score(y_true=y_validation, y_pred=y_pred)

# Selected features
scaler = StandardScaler()
X_train_1, X_validation_1 = X_train[selected_feat], X_validation[selected_feat]
selected_feats_model = LinearRegression()
selected_feats_model.fit(scaler.fit_transform(X_train_1), y_train)
y_t_pred_1 = selected_feats_model.predict(scaler.transform(X_train_1))
y_pred_1 = selected_feats_model.predict(scaler.transform(X_validation_1))
selected_feats_metric_train = metrics.r2_score(y_true=y_train, y_pred=y_t_pred_1)
selected_feats_metric_test = metrics.r2_score(y_true=y_validation, y_pred=y_pred_1)

# Compare models
print(
    f"Number of features: {X_train.shape[1]}\n"
    f"All features Train MSE: {all_feats_metric_train}\n"
    f"All features Test MSE: {all_feats_metric_test}\n\n"
    f"Number of features: {X_train_1.shape[1]}\n"
    f"Selected features Train MSE: {selected_feats_metric_train}\n"
    f"Selected features Test MSE: {selected_feats_metric_test}"
)

Number of features: 14
All features Train MSE: 0.8184574701253668
All features Test MSE: 0.8494629140432887

Number of features: 5
Selected features Train MSE: 0.8168529390173203
Selected features Test MSE: 0.8470094336233717


<br><hr>

### Classification

In [10]:
fp = "../../data/titanic_train.csv"
data = load_data(filename=fp)

data.head()

Shape of data: (891, 12)

Duration: 0.009 seconds


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [11]:
# Select the numneric variables (they don't require so much preprocessing!)
num_df = data.select_dtypes(include=[int, float])
num_df = num_df.dropna()
num_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
0,1,0,3,22.0,1,0,7.25
1,2,1,1,38.0,1,0,71.2833
2,3,1,3,26.0,0,0,7.925
3,4,1,1,35.0,1,0,53.1
4,5,0,3,35.0,0,0,8.05


In [12]:
from sklearn.linear_model import LogisticRegression

TARGET = "Survived"

X_train, X_validation, y_train, y_validation = split_data(
    data=num_df, target=TARGET, random_state=RANDOM_STATE, test_size=TEST_SIZE
)

X_train.shape, X_validation.shape

Shape of X_train: (571, 6), 
Shape of X_validation: (143, 6)
Duration: 0.005 seconds


((571, 6), (143, 6))

In [13]:
# Inverse of regularization strength; must be a positive float.
# Smaller values specify stronger regularization.
C = 0.025
scaler = StandardScaler()

log_model = LogisticRegression(
    penalty="l1", C=C, solver="liblinear", random_state=RANDOM_STATE
)
selector = SelectFromModel(log_model)
selector.fit(scaler.fit_transform(X_train), y_train)

# Return an index that selects the retained features from a feature vector.
# True means that the feature was selected.
selector.get_support()

array([False,  True, False, False, False,  True])

In [14]:
# List of the selected features
selected_feat = X_train.columns[(selector.get_support())]
print(f"Total features: {X_train.shape[1]}")
print(f"Selected features: {len(selected_feat)}")

selected_feat

Total features: 6
Selected features: 2


Index(['Pclass', 'Fare'], dtype='object')

### Comment

* The number of features dropped will increase as the penalty is increased. 
* In order to avoid setting a penalty that is either too high and eliminates many features or too low and retains unnecessary features, you must keep an eye on and monitor the performance of the final model.