# Feature Slection API:

1. Filter Based
2. Wrapper Based

## Filter Based Feature Selection

#### VarianceThreshold, SelectKBest, SelectPercentile, GenericUnivariateSelect

In [3]:
# Some important Feature selectors:

from sklearn.feature_selection import VarianceThreshold, SelectKBest, SelectPercentile, GenericUnivariateSelect, RFE, RFECV, SelectFromModel, SequentialFeatureSelector

# RFE: Recursive feature elimination
# RFECV: Recursive feature elimination with CV

# there are tree based and kernel based feature selectors as well. Not covered here

### Variance Threshold

removes all the features with variance below certain threshold as specified by the user from input feature matrix

In [4]:
# In single feature, Multiple some values may be very close to each other and thus may count as same value and may not be useful in training

# By default it removes the feature with zero  variance ie same values


### Univariate Feature selection

#### SelectKBest    |    SelectPercentile   | GenericUnivariateSelect

 SelectKBest    |    SelectPercentile   | GenericUnivariateSelect
 --|---|---
 Removes all but the `k highest scoring features` | Removes all but user specified `highest scoring percentage` of features | Performs Univariate Feature Selection with `configurable strategy` which can be found via `hyper-parameter search`

### Common univariate statistics tests

SelectFpr selects features based on false positive rate test

SelectFdr selects features based on estimated false discovery rate

### Univariate Scoring Function

Each API needs a scoring function to score features

Three classes

 1. MutualInformation(MI)
 2. Chi-square
 3. F-statistics

MI and F-statistics can be used in both classification and regression problems

  `mutual_info_regression` | `f_regression`
  --------|----------
  `mutual_info_classif` | `f_classif` 


chi-square can be used only in classification problem

`chi2`

#### Mutual Information

  1. Measures `dependency` between two variables
  2. Returns `non-negetive` value
  3. `MI = 0` for Independent variabeles
  4. Higher the MI >> Higher the dependency

#### Chi-Square

1. Measures `dependency` between two variables
2. Computes chi-square between `non-negative` features (boolean/frequency) and `class label`
3. Higher value indicates features and labels are correlated

### Illustration

In [6]:
# Select 20 best features based on chi2 method

skb = SelectKBest(chi2, k=20)
X_new = skb.fit_transform(X,y)

NameError: ignored

In [7]:
# Select 20% top

sp = SelectPercentile(chi2, percentile = 20)
X_new = sp.fit_transform(X,y)

NameError: ignored

In [10]:
# Generic Univariate Select

# selects 20 based features based on chi square mode

transformer = GenericUnivariateSelect(cih2, mode = 'k_best', param = 20)
X_new = transformer.fit_transform(X,y)

NameError: ignored

#### other modes

1. percentile (default)
2. k_best
3. fpr
4. fdr
5. fwe

param takes value corresponding to mode

In [13]:
from sklearn.feature_selection import VarianceThreshold
import numpy as np
 
X = np.array([[1, 1, 1], [1,3,4], [1,2,4]])
print(X)
vt = VarianceThreshold()
vt.fit_transform(X)

[[1 1 1]
 [1 3 4]
 [1 2 4]]


array([[1, 1],
       [3, 4],
       [2, 4]])

# DO NOT USE REGRESSION FEATURE SCORING FUNCTION WITH A CLASSIFICATION PROBLEM. IT WILL LEAD TO USELESS RESULTS

## Wrapper Based Feature Selection:

#### RFE, RFECV, SelectFromModel, SequentialFeatureSelector


Filter | wrapper
---|---
use `scoring function`| use `estimator class`

### RFE: Recursive Feature Elimination

uses an estimator to recursively remove feature. 

initially fits an estimator on all features

Obtain Feature importance from the estimator and removes least important features

Repeats process by removing features one by one untill desired no. of features are obtained

### RFECV: Recursive Feature Elimination with CV

Use `RFECV` when we dont want to specify desired number of features

it performs `RFE` on cross validation loop to find the optimal number of features

### SelectFromModel 

selects `desired number of imporant features` as specified with `max_features` above certain threshold of feature importance as obtained from trained estimator

The feature importance can be obtained via - `coef_`, `feature_importances_` or `importance_getter` callable from trained estimator

The feature importance threshold can be specified:
1. numerically
2. string argument- built in `mean`, `median`, or `0.1*mean`

In [15]:
# here we use linear Support Vector classifier to get coefficients of features for SelectFromModel Transformer

# penalty = l1 regularization
clf = LinearSVC(c=0.01, penalty ='l1', dual = False)
clf = clf.fit(X,y)
clf.coef_

model = SelectFromModel(clf, prefit = True)
X_new = model.transform(X)

# it ends up selecting features with non-zero weights or coef : L1 Regularizer 

NameError: ignored

### Sequential Feature Selection

performs feature selection by selecting or deselecting features one by one in greedy manner. 

uses one of the below approach

Forward Selection | Backward Selection
------|-------
Starts with zero features and `adds` feature one by one until desired features obtained | Start with all features and `deselect` feature one by one until desired features obtained


Forward and Backward Feature selection do not yield the same results

Select the direction that is efficient in feature selection i.e. it takes less number of iterations


SFS dont need coef_ or feature_importances_ attributes unlike in RFE

SFS may be slower than RFE and SelectFromModel

## Heterogenous Feature Transformation

### Composite Transformer

`sklearn.compose` can be used to apply transformations on subset of features



`ColumnTransformer` | `TransformedTargetRegressor`
----|----
applies `set of transformers` to `columns` of array or pandasdf and then `concatenates transformed output` from different transformers into `single matrix` | Tranforms the `target variable y` before fitting regression model
Useful for transforming `Heterogenous Data`| The predicted values are mapped back to original space via inverse transform
Combines different Feature selection mechanisms and transfomation into single transformer object | Takes `transformer` and `regressor` as input

In [16]:
# Each tuple has format (estimator_name, estimator, columnIndices)

from sklearn.compose import ColumnTransformer
colum_tr = ColumnTransformer([
    ('ageScaler', CountVectorizer(),[0]),
    ('genderencoder',OneHotEncoder(dtype='int'),[1])],
    remainder = 'drop', verbose_feature_names_out = False)

NameError: ignored

In [20]:
# illustration:

# create a feature matrix
import numpy as np
X = np.array([[20,'male'],
              [11.2,'female'],
              [15.6,'female'],
              [13,'male'],
              [18.6,'male'],
              [16.4,'female']])
print(X)

# first feature represents weight and second gender in a classof 6


[['20' 'male']
 ['11.2' 'female']
 ['15.6' 'female']
 ['13' 'male']
 ['18.6' 'male']
 ['16.4' 'female']]


In [23]:
# apply MaxAbsScaler on wt column and OneHotEncoder on Second Colum
from sklearn.preprocessing import MaxAbsScaler, OneHotEncoder

colum_tr = ColumnTransformer([
    ('ageScaler',MaxAbsScaler(),[0]),
    ('genderScaler', OneHotEncoder(dtype='int'),[1])],
    remainder ='drop', verbose_feature_names_out = False)

colum_tr.fit_transform(X)

array([[1.  , 0.  , 1.  ],
       [0.56, 1.  , 0.  ],
       [0.78, 1.  , 0.  ],
       [0.65, 0.  , 1.  ],
       [0.93, 0.  , 1.  ],
       [0.82, 1.  , 0.  ]])

In [29]:
# illustration

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.compose import TransformedTargetRegressor
tt = TransformedTargetRegressor(regressor = LinearRegression(),
                                func = np.log,
                                inverse_func = np.exp)

X = np.arange(4).reshape(-1,1)
print('X: ', X)
# return the flatten array
y = np.exp(2*X).ravel()
print('y ',y)

print(tt.fit(X,y))

X:  [[0]
 [1]
 [2]
 [3]]
y  [  1.           7.3890561   54.59815003 403.42879349]
TransformedTargetRegressor(func=<ufunc 'log'>, inverse_func=<ufunc 'exp'>,
                           regressor=LinearRegression())


In [26]:
# example
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MinMaxScaler, OrdinalEncoder

X = np.array([[1, 'yes'], [2, 'no'], [3, 'no']])
ct = ColumnTransformer([('scaler', MinMaxScaler(),[0]),
                                             ('pass', 'passthrough',[0]),
                                             ('encoder', OrdinalEncoder(),[1])])

print(ct.fit_transform(X))

[['0.0' '1' '1.0']
 ['0.5' '2' '0.0']
 ['1.0' '3' '0.0']]
