# CHAPTER-9 Dimensionality Reduction Using Feature Extraction

The goal of dimensionality reduction is to transform our set of features such that the original feature is bigger than the transformed feature, while still keeping the underlying information.

With small loss in our data we reduce the no of features to generate high quality predictions.



## 9.1 Reducing Features using Principle Components

Reducing the number of features while retaining the variance in the data

In [4]:
# Loading libraries
# PCA - Principle Component Analysis

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets

In [9]:
# load the datasets

digits = datasets.load_digits()

In [11]:

# Standardizing the feature matrix

feature = StandardScaler().fit_transform(digits.data)

In [12]:
# creating PCA that will retain 99% of the variance

pca = PCA(n_components = 0.99, whiten = True)

In [17]:
# conduct PCA

feature_pca = pca.fit_transform(feature)

In [18]:
# results

print("Original number of features: ",feature.shape[1])
print("Reduced number of features: ",feature_pca.shape[1])

Original number of features:  64
Reduced number of features:  54


Principle Component Analysis is a popular dimensionality reduction technique.

It is a unsupervised technique, it does not use information from target vector instead considers only feature matrix.

n_components: has 2 opeartions, if the value is greater than 1,
    
    it will return that many features.
    if the value is between 0 and 1 it will return minimum amount of features that retain that much variance. 0.95 and 0.99 are most common values used.
    
"whiten = true": transforms the values of each principle components such that they have zero mean and unit variance

## 9.2 Reducing Features when data is Linearly Inseperable

We use extension of PCA that uses kernels to allow for non-linear dimensionality reduction

In [19]:
# loading libraries

from sklearn.decomposition import PCA, KernelPCA
from sklearn.datasets import make_circles

In [20]:
# creating linearly inseperable data

features, _ = make_circles(n_samples = 1000, random_state = 1, noise = 0.1, factor = 0.1) 

In [21]:
# Apply kernel PCA with radius basis function(RBF) kernal

kpca = KernelPCA(kernel = "rbf", gamma = 15, n_components = 1)
features_kpca = kpca.fit_transform(features)

In [22]:
# results

print("Original number of features: ",features.shape[1])
print("Reduced number of features: ",features_kpca.shape[1])

Original number of features:  2
Reduced number of features:  1


Kernel PCA can reduce the dimensions and also can make data linearly inseperable.

## 9.3 Reducing features by Maximizing Class Sperability

Reducing the features used by a classifier, Linear Discriminant Analysis(LDA) to project features onto component axes that increases the seperation of axes

In [1]:
# loading libraries

from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

In [2]:
# Loading Iris flower dataset

iris = datasets.load_iris()
features = iris.data
target = iris.target

In [3]:
# Creating and running LDA

lda = LinearDiscriminantAnalysis(n_components = 1)

# transform the features

features_lda = lda.fit(features, target).transform(features)

In [4]:
# printing the number of features

print("Original number of features: ", features.shape[1])
print("Reduced number of features: ", features_lda.shape[1])

Original number of features:  4
Reduced number of features:  1


In [6]:
lda.explained_variance_ratio_

# amount of varianced explained by each component, 
# here a single component explained over 99% of the variance

array([0.9912126])

LDA is a classification which is also known for dimensionality reduction.

we can run LDA to return the ratio of variance explained for each component and calculate how many components are required to get above some threshold of explained variance(0.95 or 0.99)

In [9]:
# creating and running LDA

lda = LinearDiscriminantAnalysis(n_components = None)
features_lda = lda.fit(features, target)

In [10]:
# array of explained variance

lda_var_ratio = lda.explained_variance_ratio_

In [11]:
# create function

def select_n_components(var_ratio, goal_var: float) -> int:
    
    total_variance = 0.0 # setting initial variance
    n_components = 0 # initial number of features
    
    # for explained variance of each feature
    for explained_variance in var_ratio:
        
        # adding explained variance to total variance
        total_variance += explained_variance
        
        # adding 1 to no of components
        n_components += 1
        
        # If we reach  our goal level of explained varince
        if total_variance >= goal_var:
            break # end of loop
    
    # return the number of components
    return n_components

In [12]:
# Run function

select_n_components(lda_var_ratio, 0.95)

1

## 9.4 Reducing Features Using Matrix Factorization

Reducing the dimensions of a feature matrix of nonnegative values

In [13]:
# loading the libraries
# NMF: Nonnegatice Matrix Factorization

from sklearn.decomposition import NMF

In [14]:
# load the data

digits = datasets.load_digits()

In [15]:
# load feature matrix

features = digits.data

In [17]:
# creating, fit and applying NMF

nmf = NMF(n_components = 10, random_state = 1)
features_nmf = nmf.fit_transform(features)



In [18]:
# show results

print("Orginal number of features: ", features.shape[1])
print("Reduced number of features: ", features_nmf.shape[1])

Orginal number of features:  64
Reduced number of features:  10


NMF can reduce the dimensioanlity because in matrix multiplication the two factors can have fewer dimensions than the product matrix.

## 9.5 Reducing Features on Sparse Data 

In [19]:
# loading libraries
# using Truncated Singular Valued Decomposition

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import TruncatedSVD
from scipy.sparse import csr_matrix
from sklearn import datasets
import numpy as np

In [21]:
# loading data

digits = datasets.load_digits()

In [22]:
# standardizing the feature matrix

features = StandardScaler().fit_transform(digits.data)

In [23]:
# making sparse matrix

features_sparse = csr_matrix(features)

In [24]:
# creating Truncated Singular Valued Decomposition

tsvd = TruncatedSVD(n_components = 10)

In [25]:
# Conducting TSVD on sparse matrix

features_sparse_tsvd = tsvd.fit(features_sparse).transform(features_sparse)

In [26]:
# show results

print("Original number of features: ", features_sparse.shape[1])
print("Reduced number of features: ", features_sparse_tsvd.shape[1])

Original number of features:  64
Reduced number of features:  10


TSVD is similar to PCA, but unlike PCA TSVD works on sparse feature matrices.


In [27]:
# First three components explain 30% of original data's variance

tsvd.explained_variance_ratio_[0:3].sum()

0.3003938538086374

In [28]:
# creating and running TSVD with one less number of features

tsvd = TruncatedSVD(n_components = features_sparse.shape[1]-1)
features_tsvd = tsvd.fit(features)

In [29]:
# List of explained variances

tsvd_var_ratios = tsvd.explained_variance_ratio_

In [35]:
# create a function

def select_n_components(var_ratio, goal_var):
    
    total_variance = 0.0 # setting initial variance
    n_components = 0 # initial number of features
    
    # explained variance for each feature
    for explained_variance in var_ratio:
        
        total_variance += explained_variance # adding explained variance to total variance
        
        n_components += 1 # adding 1 to the number of components
        
        # if we reach our goal level of explained variance
        if total_variance >= goal_var:
            
            break # end the loop
            
    return n_components # return number of components 

In [36]:
# run function

select_n_components(tsvd_var_ratios, 0.95)

40