# CHAPTER - 10: Dimensionality Reduction Using Feature Selection 

Reducing Dimensionality of a feature matrix by creating a new feature(with fewer dimensions) is called Feature Extraction.

Selecting high quality features, informative features and dropping less useful features called Feature Selection.

There are three types of feature selection methods: 
   1) Filter: Selects best features by thier statistical properties.
   2) Wrapper: Use trail and error to find subset of features that produce models with high quality prediction.
   3) Embedded: Select the best feature subset as part or as an extension of a learning algorithms training process

## 10.1 Thresholding Numerical Feature Variance

For Numerical Features:

Removing low variance features from a set of features.

VarianceThreshold: To remove low variance features.

Variance thresholding is one of the most basic approaches of feature selection.

While applying VT:
1) Variance is not centered i.e., it will not work when features contain different sets(ex: dollars set and years set)
2) Variance threshold value is selected manually, we have to use our judgment 

In [7]:
# loading libraries

from sklearn import datasets
from sklearn.feature_selection import VarianceThreshold

In [8]:
# importing data

iris = datasets.load_iris()

In [9]:
# creating features and target

features = iris.data
target = iris.target

In [10]:
# creating thresholder

thresholder = VarianceThreshold(threshold = 0.5)

In [12]:
# creating High Variance Feature Matrix

highVariance_ft = thresholder.fit_transform(features)

In [13]:
# High Variance Feature Matrix

highVariance_ft[0:3]

array([[5.1, 1.4, 0.2],
       [4.9, 1.4, 0.2],
       [4.7, 1.3, 0.2]])

In [14]:
# to view variances

thresholder.fit(features).variances_

array([0.68112222, 0.18871289, 3.09550267, 0.57713289])

In [15]:
# If the feature is standardized(i.e, mean = 0 and variance = 1), VT does not work.

from sklearn.preprocessing import StandardScaler

In [16]:
# Standardize feature matrix

scaler = StandardScaler()
features_std = scaler.fit_transform(features)

In [17]:
# Calculating variance of each feature

selector = VarianceThreshold()
selector.fit(features_std).variances_

array([1., 1., 1., 1.])

## 10.2 Thresholding Binary Feature Variance

For Binary Feature:

Removing low variance set of features from binary categorical features

In [18]:
from sklearn.feature_selection import VarianceThreshold

In [22]:
# Creating a feature matrix with:
#     1) Feature 0: 80% class 0
#     2) Feature 1: 80% class 1
#     3) Feature 2: 60% class 0 & 40% class 1

features = [[0,1,0],
           [0,1,1],
           [0,1,0],
           [0,1,1],
           [0,1,0]]

In [23]:
# Run Threshold by variance

thresholder = VarianceThreshold(threshold = (.75 * (1 - .75)))
thresholder.fit_transform(features)

array([[0],
       [1],
       [0],
       [1],
       [0]])

## 10.3 Handling Highly Correlated Features

Handling features when some of the features are correlated. 

Checking if highly correlated features exist and dropping them if they exist.

In [24]:
# libraries

import pandas as pd
import numpy as np

In [26]:
# feature matrix with highlt correlated features

features = np.array([[1,1,1],
                    [2,2,0],
                    [3,3,1],
                    [4,4,0],
                    [5,5,1],
                    [6,6,0],
                    [7,7,1],
                    [8,7,0],
                    [9,7,1]])

In [27]:
# converting feature matrix into Dataframe

dataframe = pd.DataFrame(features)

In [29]:
# creating correlation matrix

corr_mat = dataframe.corr().abs()

In [32]:
# Selecting upper traingle of correlation matrix

upper = corr_mat.where(np.triu(np.ones(corr_mat.shape), k = 1).astype(np.bool_))

In [33]:
# Finding index of feature columns with correlation greater than 0.95

indexes_to_drop = [ column for column in upper.columns if any(upper[column] > 0.95)]

In [34]:
# droping the features

dataframe.drop(dataframe.columns[indexes_to_drop], axis = 1).head(3)

Unnamed: 0,0,2
0,1,1
1,2,0
2,3,1


In [36]:
# first:creating Correlation Matrix of all features

dataframe.corr()

Unnamed: 0,0,1,2
0,1.0,0.976103,0.0
1,0.976103,1.0,-0.034503
2,0.0,-0.034503,1.0


In [37]:
# second: we check the upper correlation matrix

upper

Unnamed: 0,0,1,2
0,,0.976103,0.0
1,,,0.034503
2,,,


In [38]:
# third: we remove one feature from each of those pairs

## 10.4 Removing Irrelevant Features for Classification

Removing irrelevant features from categorical target vector.

We need to calculate Chi-square if the statistic if the features are categorical.

***Chi-square is used to test the independence of categorical variables.

If the target vector is independent of the feature variable, then it contains no information we can use for classification.

In [39]:
# loading libraries

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2, f_classif

In [40]:
# Loading iris dataset

iris = load_iris()

features = iris.data
target = iris.target

In [41]:
# converting data to categorical data by changing it to integers

features = features.astype(int)

In [43]:
# selecting 2 features with highest chi-sqquared statistic using "SelectKBest"

chi2_selector = SelectKBest(chi2, k = 2)
features_kbest = chi2_selector.fit_transform(features, target)

In [47]:
# show the sizes

print("Original number of features:", features.shape[1])
print("Reduced number of features:", features_kbest.shape[1])

Original number of features: 4
Reduced number of features: 3


We can use "SelectPercentile" to select top n percent features instead of selecting a specific number of features

In [48]:
# loading SelectPercentile library

from sklearn.feature_selection import SelectPercentile

In [49]:
# Select top 75% of features with highest F-values

fvalue_selector = SelectPercentile(f_classif, percentile = 75)
features_kbest = fvalue_selector.fit_transform(features, target)

In [50]:
# Showing the reduced results

print("Original number of feature:", features.shape[1])
print("Reduced number of features:", features_kbest.shape[1])

Original number of feature: 4
Reduced number of features: 3


## 10.5 Recursively Eliminating Features

Selecting best features to keep.

We use Scikit learns RFECV for Recursive Feature Elimination(RFE) using Cross-Validation(CV).

We repeatedly train the model by removing a feature each time until model performance becomes worse, the remaining features are the best

In [51]:
# Loading libraries

import warnings
from sklearn.datasets import make_regression
from sklearn import datasets, linear_model
from sklearn.feature_selection import RFECV

In [52]:
# supressing warnings

warnings.filterwarnings(action = "ignore", module = 'scipy', message = '^internal gelsd')

In [53]:
# generating feature matrix, target vector and true coefficients

features, target = make_regression(n_samples = 10000,
                                  n_features = 100,
                                  n_informative = 2,
                                  random_state = 2)

In [54]:
# creating a linear regression

ols = linear_model.LinearRegression()

In [57]:
# Recursively eliminate the features

rfecv = RFECV(estimator = ols, step = 1, scoring = 'neg_mean_squared_error')
rfecv.fit(features, target)
rfecv.transform(features)

array([[ 1.51278885,  1.06522186, -0.25592483,  1.23913282,  0.82100186,
         0.22228094],
       [-1.21135854,  1.34456723,  1.03992104,  1.48084905, -0.08293439,
        -1.12891946],
       [ 0.03890833, -0.91845034, -0.14531089,  1.73433529,  2.29769945,
        -1.11008524],
       ...,
       [ 0.71536151, -0.95047313,  1.74506759,  0.28464179, -2.13830256,
         1.96873384],
       [-1.30366012, -0.50997804, -0.10368359,  0.34521149,  0.40692248,
         0.22985492],
       [-0.13767296, -0.76122978, -1.23733457, -0.62409358,  0.02734765,
        -1.02436197]])

In [58]:
# Once RFE is conducted we can see the number of features we should keep

rfecv.n_features_

6

In [59]:
# we can also see which of the features we can keep

rfecv.support_

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False,  True,  True,
       False, False, False, False, False,  True, False, False, False,
       False])

In [60]:
# we can see the rankings of the features

rfecv.ranking_

array([68, 62, 26, 50, 70, 56, 78, 69, 38, 18, 83, 33,  6, 24, 61, 52,  2,
       54,  1, 28, 22, 46,  4, 47, 34, 67, 75, 49, 89, 90, 20, 35, 82, 71,
       27, 79, 51, 53, 57, 63,  1, 58, 74, 13, 39, 21, 36, 17, 84, 86, 77,
       25, 91, 37, 40, 94, 85,  7, 44, 93, 11,  8, 31, 42, 41, 73,  1, 14,
       30,  3,  5, 81, 87, 48, 55, 65,  9, 12, 80, 15, 59, 43, 60, 64, 10,
       29, 92, 76,  1,  1, 88, 23, 95, 45, 72,  1, 66, 32, 16, 19])