# Feature Selection

**What is Feature Selection and its uses?**
- Feature selection is one of the core concepts in machine learning, which hugely impacts the performance of a model.
- The data features that we use to train your machine learning models have a huge influence on the performance that we can achive.
- Irrelavant or partially relevant features can negatively impact the model.
- Feature selection and Data cleaning is one of the most important steps in model designing.


**Benefits of Feature Selection:**
- Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise.
- Improves Accuracy: Less misleading data means model accuracy improvse.
- Reduces Training time: Less data means algorithm train faster.

**Types of Features Selection algorithms:**
- There are two main types of Feature Selection Algorithms.
  - Wrapper Feature Selection Methods.
  - Filter Feature Selection Methods.
  

# Wrapper Feature Selection Methods

**What is meant by Wrapper Feature Selection?**
- It creates many models, with different subset of input features and select those features that result in best performing model according to a performance metric.
- These methods are unconcerned about the variable types, although they can be computationally expensive.
- **Recursive Feature Elimination(RFE)** is a good example of wrapper feature selection method.

***Definition:***
- *Wrapper methods evaluvate multiple models using procedures that add/or remove predictors to find the optimal combination that maximizes the model performance.*

# Filter Feature Selection Methods

**What is meant by Filter Feature Selection?**
- Filter Features Selection uses statistical techniques to evaluvate the relationship between each predictor and the response variable and these scores are used as a basis to choose(filter) those predictors that can be used in your model.

**Definition:**
- *Filter methods evaluvate the relevance of the predictors outside the predictive models and subsequently model only the predictors that pass some criterion*

**Application of Filter Feature Selection:**
- It is common to use correlation type statistical measures between input and output variables as a basis for Filter Feature Selection.
- As such the choice of statistical measure is highly dependant upon the variable or predictor data types.
- Common datatype include both quantitative and qualitative although it may subdivide such as integer and floating point in quantitative and boolean, ordinal or nominal in qualitative predictors.
- Common input predictor datatypes:
  - Quantitative Predictors.
    - Integer Variables
    - Floating Point Varaibles
  - Qualitative Predictors.
    - Boolean Variables(dichotomos)
    - Ordinal Variables
    - Nominal Varaibles

## Statistics for Filter based Feature Selection

- There are two main groups of variables to consider are : predictors and response variables.
- Predictors are the input variables. These are the variables that we wish to reduce in size
- Response variable are the output variables, which the model is intended to predict.
- The type of response variable  typically indicates the type of predictive modelling problem being performed.
  - Numerical Output: Regression predictive modelling problem.
  - Categorical Output: Classification predictive modellig problem.
- Statistical Measures used in Filter Feature Selection are generally calculated one input variable at a time with the response variable.These are reffered to as the univariate statistical measures.

$\text{Input Features}\rightarrow\begin{cases}
\text{Numerical Features}\rightarrow\text{Output Variable}
\begin{cases}
\text{Numerical}\begin{cases}
\text{Pearson's}\\
\text{Spearman's}
\end{cases}\\
\text{Categorical}\begin{cases}
\text{ANOVA}\\
\text{Kendall's}
\end{cases}
\end{cases}\\
\text{Categorical Features}\rightarrow\text{Output Variable}\begin{cases}
\text{Numerical}\begin{cases}
\text{ANOVA}\\
\text{Kendall's}
\end{cases}\\
\text{Categorical}\begin{cases}
\text{Chi-Squared}\\
\text{Mutual Information}
\end{cases}
\end{cases}
\end{cases}$

## Numerical Input, Numerical Output

- This is a regression predictive modeling problem with numerical input variables.
- The most common techniques are to usea correlation coefficient, such as Pearsons's for a linear correlation, or rank-based methods for a non linear correlation.
  - Pearson's Correlation Coefficient (Linear)
  - Spearman's Rank Coeffiecient (Non Linear)

## Numerical Input, Categorical Output

- This is a classification predictive modeling problem with numerical input problems.
- This might be common example of a common regression problem.
- Again, the most common techniques are correlation based, although this is the case, they must take categorical target into account.
  - ANOVA correlation coefficient (Linear)(ANalysis Of VAriance)
  - Kendall's rank coeffiencient (Non Linear)
- Kendall does assume that the categorical variables are ordinal.

## Categorical Input, Numerical Output

- This is a regression predictive modeling with categorical input variables.
- This is a strange example of regression problem.
- Neverthless, you can use the same "Numerical Input, Categorical Output" methods, but in reverse.

## Catergorical Input, Categorical Output

- This is a Classification Predictive Modeling Problem with categorical input variable.
- The most common correaltion measure for categorical data the chi-squared test. We can also use the mutual information(information gain) from the field of information theory.
  - Chi-Squared test(contigency tables)
  - Mutual information
- Mutual information is a powerful method that may prove useful for both numerical and categorical data, e.g,it is agnostic to datatypes.

# Tips and Tricks for Feature Selection

## Correlation Statistics

- The scikit-learn library provides an implementation of some of the most useful statistical measures.
- For example:
  - Pearson's correlation coefficient: [f_regression()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html)
  - ANOVA: [f_classif()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_classif.html)
  - Chi-Squares: [chi2()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html)
  - Mutual Information: [mutual_info_classif()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html) and [mutual_info_regression()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_regression.html)
- Also the SciPy Library provides an implementation of Kendall's tau([kendalltau](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html)) and Spearman's rank Correlation([spreamanr](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html))

## Selection Method

- The scikit-learn library also provides many different filtering methods once statistics have been calculated for each input predictor with the response.
- The Two most popular methods include,
  - select top K predictors: [SelectKBest](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html)
  - Select the top percentil Predictors: [SelectPercentile](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html)

## Transforming Predictors

- Consider Transforming the predictors inorder to access different statistical methods.
- Example, We can transform a categorical variable into ordinal, even if it is not, and see if any intersting results are observed.
- We can also convert numerical variables into discrete(i.e, binning); and try them as categorical measures.


# Examples

## Regression Feature Selection(Numerical Input,Numerical Output)

**This is an Example for features selection for regression problem that has numerical input and numerical output.**

**We will be using Pearson's Correlation Coefficient via f_regression()**

In [0]:
from sklearn.datasets import make_regression
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression

## Generating a Dataset
X, y = make_regression(n_samples=100,n_features=10,n_informative=10)

## Define Feature Selection
fs = SelectKBest(score_func=f_regression,k=8)

In [0]:
X_selected = fs.fit_transform(X,y)

In [3]:
print(X_selected.shape)

(100, 8)


## Classification Feature Selection(Numerical Input, Categorical Output)

**This is an Example for features selection for Classification problem that has numerical input and categorical outputs**

**Feature Selection is performed by ANOVA via f_classif()**

In [0]:
from sklearn.datasets import make_classification
from sklearn.feature_selection import f_classif,SelectKBest

In [0]:
X, y = make_classification(n_samples=100,n_features=20,n_informative=5)

In [0]:
fs = SelectKBest(score_func=f_classif,k=5)

In [0]:
X_selected = fs.fit_transform(X,y)

In [8]:
print(X_selected.shape)

(100, 5)


## Recursive Feature Selection

In [0]:
import pandas as pd
import numpy as np

In [0]:
adv = pd.read_csv('/content/drive/My Drive/Repos/Git/Statistics-Basics/An Introduction to Statistical Learning/Dataset/Advertising.csv')

In [0]:
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE

In [12]:
adv.head(3)

Unnamed: 0.1,Unnamed: 0,TV,Radio,Newspaper,Sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3


In [0]:
adv.drop('Unnamed: 0',1, inplace=True)

In [0]:
X = adv.drop('Sales',1)
y = adv.Sales

In [0]:
model = LinearRegression()
rfe = RFE(estimator=model,n_features_to_select=2)
fit = rfe.fit(X,y)


In [16]:
fit.ranking_

array([1, 1, 2])

In [17]:
X.head()

Unnamed: 0,TV,Radio,Newspaper
0,230.1,37.8,69.2
1,44.5,39.3,45.1
2,17.2,45.9,69.3
3,151.5,41.3,58.5
4,180.8,10.8,58.4


In [18]:
fit.support_

array([ True,  True, False])

**Observation:**
- Its seen that the first 2 predictors are selected our of the 3 predictors.
- The TV and Radio are the 2 features.