### Feature Selection

In high dimensional dataset there are features that entirely irrelevant, insignificant and not important at all. Their contribution towards predicting can be either small or zero. As a result, the following problems are arising:
- Unnecessary resourse allocation for useless features
- Contribute noise for which a ML model can not perform good
- Training Time is arising 

### Main Objectives of Feature Selection
- Improving Predicting Performance
- Reduce Training Time
- Better Understanding of the Process 

### Diminsionality Reduction and Feature Selection. Are they the same?
It is common misconception and these two methods are different. Both methods tend to **reduce the number of attributes** in the dataset, but a **dimensionality reduction** method does so by **creating new combinations of attributes** (sometimes known as feature transformation), whereas **feature selection** methods **include and exclude attributes present in the data without changing them**.

### Dimensionality Reduction Methods 
- Principal Component Analysis
- Linear Descriminant Analysis
- Singular Value Decomposition
- t - SNE

### Feature Selection Methods

### Filters Methods (Univariate Analysis)

Filter methods are generally used as a data preprocessing step and based on statistical methods. Each feature is being considered independently and ranked according to its significance values (usually correaltion with target variable).
Examples of filter methods include: 
- Information Gain
- Chi - Squared test 
- minimum Redundancy Maximum Relevance (mRmR)

**Advantages**
- Not computationally intensive
- Faster than Embedded and Frapper methods
- Good works when number of samples is less than number of features

**Disadvantages**
- Each feature is considered independantly,thus collective feature influence on target variable is impossible to detect

### Wrapper Methods 
A model is being trained on different set of features. Two main approaches exist in this method:
- Forward Selection ( from emty feature set till the set with the best features )
- Backward Selection ( from full feature set till the set with the best features, RFE method is a good example)

**Disadvantages**
- Computationally intensive
- If number of samples is less than number of features, a danger of **overfitting** is increasing

### Embedded Methods 
It is just regularization methods where the main idea of not only minimizing the residuals but also use as little features as possible ( feature minimization ).
 - Ridge Regression
 - LASSO Regression
 - Elastic Net Regression
 
### Conclusion
Probably, Wrapper methods are the best, but they are computationally intesive and sometimes it is not possible to use them. In case of large dataset is is better to use Embedded Methods ( Ridgge, Elastic Net ). Filters methods are fast but not precise.
My rate:
1) Wrapper Methods
2) Emdedded Methods (Regularization)
3) Filter Methods

### Useful Links
https://habr.com/ru/post/264915/ (rus option )

https://www.youtube.com/watch?v=ipb2MhSRGdw&feature=youtu.be (about regularization)

https://towardsdatascience.com/feature-selection-techniques-for-classification-and-python-tips-for-their-application-10c0ddd7918b (feature selection)

https://towardsdatascience.com/feature-selection-in-python-recursive-feature-elimination-19f1c39b8d15 (RFCV + nice charts)

In [7]:
# Libraries for feature selection

# RFECV - stands for Recursive Feature Elimination with Cross Validation, RFE is the same but without 
from sklearn.feature_selection import chi2, RFE, RFECV

from sklearn.linear_model import Ridge, Lasso, ElasticNet