# Feature selection

## Why is feature selection important?
Removing noisy features will help with **memory, computational cost and accury of the model**. Also:
1. Garbage in, garbage out
2. Curse of Dimensionality: An increase in the dimensions can in theory, add more information to the data thereby improving the quality of data but practically increases the noise and redundancy during its analysis. 
3. Occam's Razor: With all else being equal, simpler solutions to problems are preferred over more complex ones.

Seven feature selection techniques are:
1. Domain knowledge
2. Missing values
3. Correlation with the target class label
4. Corelation between the features: to avoid multicollineairity
5. dimension reduction techniques (such as PCA)
6. Forward (or backwards) feature selection
7. Feature importance

## Implementation

Sklearn has a RFE (Recursive Feature Elimination) implementation.
Some important parameters to the RFE function:
1. **estimator**: Supervised learning estimator
2. **n_features_to_select**: final number of features to select, if none, then 50% of total features selected
3. **step**: # of features to remove each iteration
4. **importance_getter**: if 'auto' uses the feature importance either through a coef_ or feature_importances_ attributes of estimator

In [2]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
import pandas as pd

# Read in the data
df = pd.read_csv('data/Advertising.csv')
df = df.drop(['Unnamed: 0'], axis=1)
X = df.drop(['sales'], axis=1)
y = df['sales']

rfe_selector = RFE(estimator=LinearRegression(), n_features_to_select=2, step=1, importance_getter='auto')
rfe_selector.fit(X, y)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   TV         200 non-null    float64
 1   radio      200 non-null    float64
 2   newspaper  200 non-null    float64
 3   sales      200 non-null    float64
dtypes: float64(4)
memory usage: 6.4 KB


RFE(estimator=LinearRegression(), n_features_to_select=2)

In [4]:
# The best set of features that are selected, denoted as True, False
print(rfe_selector.support_)
# All selected features are marked 1, the unselected redundant features ranked in increasing order
print(rfe_selector.ranking_)

# Get a mask, or integer index, of the features selected
rfe_support = rfe_selector.get_support()
rfe_feature = X.loc[:,rfe_support].columns.tolist()

# print selected features
print('The selected features are', rfe_feature)

[ True  True False]
[1 1 2]
The selected features are ['TV', 'radio']
