# Feature Selection

Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in.

Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features


To get more details please check below link:

https://towardsdatascience.com/feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e    
https://towardsdatascience.com/why-how-and-when-to-apply-feature-selection-e9c69adfabf2

I will focus on one of the 2 critical parts of getting your models right – feature selection. I will discuss in detail why feature selection plays such a vital role in creating an effective predictive model.

Point which we will cover in Feature Selection:

1.	Importance of Feature Selection in Machine Learning
2.	Filter Methods
3.	Wrapper Methods
4.	Embedded Methods
5.	Difference between Filter and Wrapper methods
6.	Walkthrough example


# Importance of Feature Selection in Machine Learning

Machine learning works on a simple rule – if you put garbage in, you will only get garbage to come out. By garbage here, I mean noise in data.

This becomes even more important when the number of features are very large. You need not use every feature at your disposal for creating an algorithm. You can assist your algorithm by feeding in only those features that are really important. I have myself witnessed feature subsets giving better results than complete set of feature for the same algorithm. Or as Rohan Rao puts it – “Sometimes, less is better!”
Not only in the competitions but this can be very useful in industrial applications as well. You not only reduce the training time and the evaluation time, you also have less things to worry about

Top reasons to use feature selection are:

•	It enables the machine learning algorithm to train faster.

•	It reduces the complexity of a model and makes it easier to interpret.

•	It improves the accuracy of a model if the right subset is chosen.

•	It reduces overfitting.



In [None]:
import sklearn
# Scikit learn provides the Selecting K best features using F-Test
sklearn.feature_selection.f_regression

#For Classification tasks
sklearn.feature_selection.f_classif

# Filter Methods

Filter methods are generally used as a preprocessing step. The selection of features is independent of any machine learning algorithms. Instead, features are selected on the basis of their scores in various statistical tests for their correlation with the outcome variable. The correlation is a subjective term here. For basic guidance, you can refer to the following table for defining correlation co-efficients.

![title](feature-selection/feat1.png)
![title](feature-selection/feat2.png)
![title](feature-selection/feat3.png)

# Wrapper Methods

![title](feature-selection/feat4.png)

In wrapper methods, we try to use a subset of features and train a model using them. Based on the inferences that we draw from the previous model, we decide to add or remove features from your subset. The problem is essentially reduced to a search problem. These methods are usually computationally very expensive.


 Some common examples of wrapper methods are forward feature selection, backward feature elimination, recursive feature elimination, etc.
 
•	Forward Selection: Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.

•	Backward Elimination: In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.

•	Recursive Feature elimination: It is a greedy optimization algorithm which aims to find the best performing feature subset. It repeatedly creates models and keeps aside the best or the worst performing feature at each iteration. It constructs the next model with the left features until all the features are exhausted. It then ranks the features based on the order of their elimination.


One of the best ways for implementing feature selection with wrapper methods is to use Boruta package that finds the importance of a feature by creating shadow features.
It works in the following steps:

1.	Firstly, it adds randomness to the given data set by creating shuffled copies of all features (which are called shadow features).

2.	Then, it trains a random forest classifier on the extended data set and applies a feature importance measure (the default is Mean Decrease Accuracy) to evaluate the importance of each feature where higher means more important.

3.	At every iteration, it checks whether a real feature has a higher importance than the best of its shadow features (i.e. whether the feature has a higher Z-score than the maximum Z-score of its shadow features) and constantly removes features which are deemed highly unimportant.


4.	Finally, the algorithm stops either when all features get confirmed or rejected or it reaches a specified limit of random forest runs.


# Embedded Methods

![title](feature-selection/feat5.png)

Embedded methods combine the qualities’ of filter and wrapper methods. It’s implemented by algorithms that have their own built-in feature selection methods.

Some of the most popular examples of these methods are LASSO and RIDGE regression which have inbuilt penalization functions to reduce overfitting.

•	Lasso regression performs L1 regularization which adds penalty equivalent to absolute value of the magnitude of coefficients.

•	Ridge regression performs L2 regularization which adds penalty equivalent to square of the magnitude of coefficients.
For more details and implementation of LASSO and RIDGE regression, you can refer to this article.

Other examples of embedded methods are Regularized trees, Memetic algorithm, Random multinomial logit.


### How to select features and what are Benefits of performing feature selection before modeling your data?

• Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise.
    
• Improves Accuracy: Less misleading data means modeling accuracy improves.
    
• Reduces Training Time: fewer data points reduce algorithm complexity and algorithms train faster.
    
I want to share my personal experience with this.

I prepared a model by selecting all the features and I got an accuracy of around 65% which is not pretty good for a predictive model and after doing some feature selection and feature engineering without doing any logical changes in my model code my accuracy jumped to 81% which is quite impressive
Now you know why I say feature selection should be the first and most important step of your model design.
