<a href="https://colab.research.google.com/github/Regina1832/Acronyms/blob/A-course-in-ML/FeatureSelection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### VarianceThreshold
VarianceThreshold is a simple baseline approach to feature selection.
It removes all features whose variance doesn’t meet some threshold. 
By default, it removes all zero-variance features, i.e. features that have the same value in all samples.

As an example, suppose that we have a dataset with boolean features, and we want to remove all features that are either one or zero (on or off) in more than 80% of the samples. 
Boolean features are Bernoulli random variables, and the variance of such variables is given by
so we can select using the threshold .8 * (1 - .8):  var = p(1-p)

In [3]:
from sklearn.feature_selection import VarianceThreshold

In [1]:
X = [[0, 0, 1], [0, 1, 1], [1, 0, 1], [0, 1, 1], [0, 1, 1], [0, 1, 1]];

feature :
         x1: [0], [0], [1], [0], [0], [0]  most zero    p=5/6 > 0.8
         x2: [0], [1], [0], [1], [1], [1]   
         x3: [1], [1], [1], [1], [1], [1]  all one

In [4]:
t = .8 * (1 - .8)
sel = VarianceThreshold(threshold = t)

In [5]:
sel.fit_transform(X)    

array([[0],
       [1],
       [0],
       [1],
       [1],
       [1]])

## Univariate feature selection

Univariate feature selection works by selecting the best features based on univariate statistical tests. It can be seen as a preprocessing step to an estimator.
SelectKBest removes all but the  highest scoring features.

In [6]:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

In [7]:
iris = load_iris()
X = iris.data
y = iris.target

In [10]:
s = SelectKBest(chi2, k=3 )

کا تعداد ستون ها

## L1-based feature selection

In [12]:
from sklearn.svm import LinearSVC
from sklearn.feature_selection import SelectFromModel

In [13]:
m = LinearSVC(C=0.01, penalty="l1", dual=False)  # the smaller C the fewer features selected
clf = m.fit(X, y);



In [14]:
s = SelectFromModel(clf, prefit=True)

In [15]:
Xnew = s.transform(X)
Xnew.shape

(150, 3)

## Tree-based feature selection

In [16]:
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel

In [17]:
clf = ExtraTreesClassifier(n_estimators = 50)
clf = clf.fit(X, y)

In [18]:
clf.feature_importances_  

array([0.07555521, 0.06569641, 0.38918415, 0.46956423])

In [19]:
model = SelectFromModel(clf, prefit=True)
Xnew = model.transform(X)
Xnew.shape

(150, 2)

In [20]:
Xnew.shape 

(150, 2)