# What is Dimensionality Reduction?
In machine learning problems, there are often too many factors on the basis of which the final classification is done. These factors are basically variables called features. The higher the number of features, the harder it gets to visualize the training set and then work on it. The process of selecting a subset of features for use in model construction is called Dimensionality Reduction.

Before Learning the techniques of Dimensionality Reduction, lets understand why it is important to do Dimensionality Reduction in our Dataset.

Reasons :
1) The abundance of redundant and irrelevant features

2) With a fixed number of training samples, the predictive power reduces as the dimensionality increases. [Hughes phenomenon]

3) Other things being equal, simpler explanations are generally better than complex ones.

4) It improves the accuracy of a model if the right subset is chosen.

5) Reduces the Overfitting.

6) It reduces computation time.

7) It helps in data compression, and hence reduced storage space.

# Dimensionality Reduction Techniques

1) Percent missing values

2) Amount of variation

3) Multicollinearity

4) Principal Component Analysis (PCA)

5) Correlation (with the target)

6) Forward selection

7) Backward elimination

8) LASSO

# Percent Missing values
Drop variables/features that have a very high % of missing values.
Review or visualize variables with high % of missing values

# Amount of Variation
Drop variables that have a very low variation.
Either standardize all variables, or use standard deviation 𝜎 to account for variables with difference scales.
Drop variables with zero variation.

# Multicollinearity
Many variables are often correlated with each other, and hence are redundant.
If two or more variables are highly correlated, keeping only one will help reduce dimensionality without much loss of information.
Which variable to keep? The one that has a higher correlation coefficient with the target.

# Principal Component Analysis (PCA)
PCA is defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some scalar projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.
Dimensionality reduction technique which emphasizes variation.

# When to use:
Excessive multicollinearity
Explanation of the predictors is not important.

# Correlation (with the target)
Drop variables that have a very low correlation with the target.
If a variable has a very low correction with the target, it’s not going to useful for the model (prediction).

# Forward selection
Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.
Identify the best variable. (e.g., based on model accuracy)
Add the next best variable into the model.
And so on until some predefined criteria is satisfied.

# Backward elimination
In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.
Start with all variables included in the model.
Drop the least useful variable (e.g., based on the smallest drop in model accuracy)
And so on until some predefined criteria is satisfied.

# LASSO
Using Linear Regression with L1 regularization is called Lasso Regularization.
The LASSO method puts a constraint on the sum of the absolute values of the model parameters, the sum has to be less than a fixed value (upper bound). In order to do so the method apply a shrinking (regularization) process where it penalizes the coefficients of the regression variables shrinking some of them to zero.

# Principal Component Analysis

Principal Component Analysis, or PCA for short, is a method for reducing the dimensionality of data.

It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, whilst retaining the essence of the original data.

The PCA method can be described and implemented using the tools of linear algebra.

PCA is an operation applied to a dataset, represented by an n x m matrix A that results in a projection of A which we will call B. Let’s walk through the steps of this operation


     a11, a12
A = (a21, a22)
     a31, a32
 
B = PCA(A)
# The first step is to calculate the mean values of each column.

M = mean(A)
or

              (a11 + a21 + a31) / 3
M(m11, m12) = (a12 + a22 + a32) / 3

# Next, we need to center the values in each column by subtracting the mean column value.

C = A - M

# The next step is to calculate the covariance matrix of the centered matrix C.

Correlation is a normalized measure of the amount and direction (positive or negative) that two columns change together. Covariance is a generalized and unnormalized version of correlation across multiple columns. A covariance matrix is a calculation of covariance of a given matrix with covariance scores for every column with every other column, including itself.

V = cov(C)

# Finally, we calculate the eigendecomposition of the covariance matrix V. This results in a list of eigenvalues and a list of eigenvectors.

values, vectors = eig(V)

# The eigenvectors represent the directions or components for the reduced subspace of B, whereas the eigenvalues represent the magnitudes for the directions. For more on this topic, see the post:

# Gentle Introduction to Eigendecomposition, Eigenvalues, and Eigenvectors for Machine Learning
The eigenvectors can be sorted by the eigenvalues in descending order to provide a ranking of the components or axes of the new subspace for A.

# If all eigenvalues have a similar value, then we know that the existing representation may already be reasonably compressed or dense and that the projection may offer little. If there are eigenvalues close to zero, they represent components or axes of B that may be discarded.

A total of m or less components must be selected to comprise the chosen subspace. Ideally, we would select k eigenvectors, called principal components, that have the k largest eigenvalues.

B = select(values, vectors)

Other matrix decomposition methods can be used such as Singular-Value Decomposition, or SVD. As such, generally the values are referred to as singular values and the vectors of the subspace are referred to as principal components.

Once chosen, data can be projected into the subspace via matrix multiplication.

P = B^T . A

Where A is the original data that we wish to project, B^T is the transpose of the chosen principal components and P is the projection of A.

This is called the covariance method for calculating the PCA, although there are alternative ways to to calculate it

In [1]:

from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)

[[1 2]
 [3 4]
 [5 6]]
[3. 4.]
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]
[[4. 4.]
 [4. 4.]]
[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
[8. 0.]
[[-2.82842712  0.        ]
 [ 0.          0.        ]
 [ 2.82842712  0.        ]]


# using SKlearn

In [2]:
from numpy import array
from sklearn.decomposition import PCA
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# create the PCA instance
pca = PCA(2)
# fit on data
pca.fit(A)
# access values and vectors
print(pca.components_)
print(pca.explained_variance_)
# transform data
B = pca.transform(A)
print(B)

[[1 2]
 [3 4]
 [5 6]]
[[ 0.70710678  0.70710678]
 [ 0.70710678 -0.70710678]]
[8.00000000e+00 2.25080839e-33]
[[-2.82842712e+00  2.22044605e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 2.82842712e+00 -2.22044605e-16]]
