# Feature Selection

- Sometimes in a real world dataset, all features do not contribute well enough towards fitting a model.
- The features that do not contribute significantly, can be removed. It leads to decrease in size of the dataset and hence, the computation cost of fitting a model.
- `sklearn.feature_selection` provides many APls to accomplish this task.
    - Variance Threshold
    - SelectKBest
    - SelectPercentile
    - GenericUnivariate Select
    - RFE (Recursive Feature Elimination)
    - RFECV (Recursive Feature Elimination with Cross-Validation)
    - SelectFromModel
    - SequentialFeatureSelector

### Variance Threshold
- Removes all features with variance below a certain threshold, as specified by the user, from input feature matrix

### Univariate Feature Selection
- Univariate feature selection selects features based on univariate statistical tests.
    - SelectKBest
        - Removes all but the k highest scoring features
    - SelectPercentile
        - Removes all but a user-specified highest scoring percentage of features
    - GenericUnivariateSelection
        - Performs univariate feature selection with a configurable strategy, which can be found via hyper-parameter search.

### 


sklearn provides one more class of univariate feature selection methods that work on common univariate statistical tests for each feature:
- SelectFpr selects features based on a false positive rate test.
- SelectFdr selects features based on an estimated false discovery rate.
- SelectFwe selects features based on family-wise error rate.

### Univariate scoring function
- Each API need a scoring function to score each feature.
- Three classes of scoring functions are proposed:
    - Mutual information (MI)
    - Chi-square
    - F-statistics

- MI and F-statistics can be used in both classification and regression problems.
    - MI :
        - `mutual_info_regression`, `mutual_info_classif`
    - F-statistics
        - `f_regression`, `f_classif`
- Chi-square can only be used for Classification
    - `chi2`


### SelectKBest

In [1]:
# sk = SelectKBest(chi2, k=20)
# X_new = skb.fit_transform(x, y)

### SelectPercentile

In [None]:
# sp = SelectPercentile(chi2, percentile=20)
# X_new = sp.fit_transform(x, y)

### GenericVariateSelect

In [None]:
# transformer = GenericUnivariateSelect(chi2, mode='k_best', param=20)
# X_new = transformer.fit_transform(X, y)

## Wrapper based

### RFE
- Uses an estimator to recursively remove features.
    - Initially fits an estimator on all features.
- Obtains feature importance from the estimator and removes the least important feature.
- Repeats the process by removing features one by one, until desired number of features are obtained.

### SelectFromModel
Selects desired number of important features (as specified with max_features parameter) above certain threshold of feature importance as obtained from the trained estimator.
- The feature importance is obtained via `coef_`, `feature_importances_` or an `importance_getter`callable from the trained estimator
- The feature importance threshold can be specified either numerically or through string argument based on built-in heuristics such as 'mean', 'median' and float multiples of these like '0.1*mean'.

In [4]:
# clf = LinearSVC(C=0.01, penalty="l1", dual=False)
# clf = clf.fit(x, y)
# clf.coef_
# model = SelectFromModel(clf, prefit=True)
# X_new = model.transform(X)

## Sequential feature selection
Performs feature selection by selecting or deselecting features one by one in a greedy manner.
- Forward selection
    - Starting with a zero feature, it finds one feature that obtains the best cross validation score for an estimator when trained on that feature.
- Backward selection
    - Starting with all features and removes least important features one by one following the idea of forward selection.

## Heterogenous features transformations

### ColumnTransformer ()

In [None]:
# column_trans = ColumnTransformer(
#     [
#         ('ageScaler', CountVectorizer(), [0]),
#         ('genderEncoder', OneHotEncoder(dtype='int'), [1])],
#          remainder='drop', verbose_feature_names_out=False)


In [7]:
import numpy as np 
from sklearn.linear_model import LinearRegression 
from sklearn.compose import TransformedTargetRegressor

tt = TransformedTargetRegressor(regressor=LinearRegression(),
func=np.log, inverse_func=np.exp)
X = np.arange(4).reshape(-1, 1)
y = np.exp(2 * X).ravel()
tt.fit (X, y)

0,1,2
,regressor,LinearRegression()
,transformer,
,func,<ufunc 'log'>
,inverse_func,<ufunc 'exp'>
,check_inverse,True

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


## Chaining Transformer

In [10]:
# si = SimpleImputer()
# x_imputed = si.fit_transform(x)
# ss =StandardScaler()
# X_scaled = ss.fit_transform(x_imputed)

### sklearn.pipeline.Pipeline

- Sequentially apply a list of transformers and estimators.
- Intermediate steps of the pipeline must be 'transformers' that is, they must implement fit and transform methods.
- The final estimator only needs to implement fit.
- The purpose of the pipeline is to assemble several steps that can be cross-validated

### Creating Pipelines

Two ways to create a pipeline object.
- Pipeline()
- make_pipeline

In [None]:
# estimators = [('simpleImputer', SimpleImputer()), 
#               ('standardscaler', Standardscaler())]
# pipe = Pipeline(steps=estimators)

In [12]:
# pipe = make_pipeline(SimpleImputer(), StandardScaler())