# Dimensionality Reduction

## Outine
- Introduction to Principal Component Analysis (PCA)
- Manually calculating PCA
- PCA using sklearn
- Wrap up

  
## Dimensionality Reduction using Principal Component Analysis
   
Principal component analysis (PCA) is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components.

PCA is a method used to reduce number of variables in your data by extracting important ones from a large pool. It reduces the dimensionality of your data with the aim of retaining as much information as possible.

To interpret each principal component, examine the magnitude and direction of the coefficients for the original variables. The larger the absolute value of the coefficient, the more important the corresponding variable is in calculating the component. How large the absolute value of a coefficient has to be in order to deem it important is subjective. Use your specialized knowledge to determine at what level the correlation value is important.


  
## Eigenvalue and Eigenvectors
   
Eigenvectors are unit vectors, which means that their length or magnitude is equal to 1.0. They are often referred as right vectors, which simply means a column vector (as opposed to a row vector or a left vector). 

Eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude. For example, a negative eigenvalue may reverse the direction of the eigenvector as part of scaling it.

A matrix that has only positive eigenvalues is referred to as a positive definite matrix, whereas if the eigenvalues are all negative, it is referred to as a negative definite matrix.

For a square matrix A, an Eigenvector and Eigenvalue make this equation true (if we can find them, that is, it is solvable):

<img src="images/eigenvalue.svg" width="400">


## Calculation of Eigendecomposition

The example below first defines a 3×3 square matrix. The eigendecomposition is calculated on the matrix returning the eigenvalues and eigenvectors.


In [4]:
# eigendecomposition
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Square matrix")
print(A)
# calculate eigendecomposition
values, vectors = eig(A)
print("Eigenvalues")
print(values)
print("Eigenvectors")
print(vectors)

Square matrix
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Eigenvalues
[ 1.61168440e+01 -1.11684397e+00 -1.30367773e-15]
Eigenvectors
[[-0.23197069 -0.78583024  0.40824829]
 [-0.52532209 -0.08675134 -0.81649658]
 [-0.8186735   0.61232756  0.40824829]]


## Manually calculating PCA

There is no **pca()** function in NumPy, but we can easily calculate the ``Principal Component Analysis`` step-by-step using NumPy functions.

The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.

PCA is an operation applied to a dataset, represented by an n x m matrix A that results in a projection of A which we will call B. Let’s walk through the steps of this operation.


In [2]:
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print("Original matrix")
print(A)
# step 1: calculate the mean of each column
M = mean(A.T, axis=1)
print("Calculate the means of each columns")
print(M)
# step 2: center columns by subtracting column means
C = A - M
print("Center columns")
print(C)
# step 3: calculate covariance matrix of centered matrix
V = cov(C.T)
print("Calculate covariance matrix of centered matrix")
print(V)
# step 4: perform the eigendecomposition of covariance matrix
values, vectors = eig(V)
print("eigenvectors (PCA components)")
print(vectors)
print("eigenvalues (PCA variance)")
print(values)
# project data into the subspace (reduction)
P = vectors.T.dot(C.T)
print("PCA reduction")
print(P.T)

Original matrix
[[1 2]
 [3 4]
 [5 6]]
Calculate the means of each columns
[3. 4.]
Center columns
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]
Calculate covariance matrix of centered matrix
[[4. 4.]
 [4. 4.]]
eigenvectors (PCA components)
[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
eigenvalues (PCA variance)
[8. 0.]
PCA reduction
[[-2.82842712  0.        ]
 [ 0.          0.        ]
 [ 2.82842712  0.        ]]


# PCA using sklearn

We can calculate a Principal Component Analysis on a dataset using the PCA() class in the scikit-learn library. The benefit of this approach is that once the projection is calculated, it can be applied to new data again and again quite easily.

When creating the class, the number of components can be specified as a parameter.

The class is first fit on a dataset by calling the fit() function, and then the original dataset or other data can be projected into a subspace with the chosen number of dimensions by calling the transform() function.

Once fit, the eigenvalues and principal components can be accessed on the PCA class via the explained_variance_ and components_ attributes.

The example below demonstrates using this class by first creating an instance, fitting it on a 3×10 matrix, accessing the values and vectors of the projection, and transforming the original data.

In [5]:
# Principal Component Analysis
from numpy import array
from sklearn.decomposition import PCA
# define a matrix
#A = array([[1, 2], [3, 4], [5, 6]])
A = array([
[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
print('original matrix')
print(A)
# create the PCA instance
pca = PCA(2)
# fit on data
pca.fit(A)
# access values and vectors
print('PCA components')
print(pca.components_)
print('PCA explained variance')
print(pca.explained_variance_)
# transform data
B = pca.transform(A)
print('PCA transform')
print(B)

original matrix
[[ 1  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]]
PCA components
[[-0.31622777 -0.31622777 -0.31622777 -0.31622777 -0.31622777 -0.31622777
  -0.31622777 -0.31622777 -0.31622777 -0.31622777]
 [ 0.9486833  -0.10540926 -0.10540926 -0.10540926 -0.10540926 -0.10540926
  -0.10540926 -0.10540926 -0.10540926 -0.10540926]]
PCA explained variance
[1.00000000e+03 7.09974815e-30]
PCA transform
[[ 3.16227766e+01 -3.10862447e-15]
 [ 0.00000000e+00  0.00000000e+00]
 [-3.16227766e+01  3.10862447e-15]]


We can see, that with some very minor floating point rounding that we achieve the same principal components, singular values, and projection as in the previous example.

## Summary

- Introduction to Principal Component Analysis (PCA)
- Manually calculating PCA
- PCA using sklearn

Examples, thanks to Jason Brownlee PhD retrieved from: https://machinelearningmastery.com/calculate-principal-component-analysis-scratch-python/

## Extra Credit: Optional


  
### Dimensionality Reduction using Singular Value Decomposition (SVD)
   
SVD means Singular Value Decomposition. The SVD of a matrix X of dimension n×d is given by:

**X = UDV<sup>T</sup>**

Where U and V are square orthogonal:

**U<sup>T</sup>U = I<sub>d</sub>**

**V<sup>T</sup>V = I<sub>d</sub>**

and D is diagonal of dimension d x n

Some additional notes:
- D is not necessarily square
- The SVD of a matrix can be done for any matrix
- SVD is different from the eigenvalue decomposition of a matrix.


### Calculate SVD

The SVD can be calculated by calling the **svd()** function.

The function takes a matrix and returns the U, D and V<sup>T</sup> elements. The **D** diagonal matrix is returned as a vector of singular values. The V matrix is returned in a transposed form, e.g. V<sup>T</sup>

The example below defines a 3×2 matrix and calculates the Singular-value decomposition.

In [1]:
# Singular-value decomposition
from numpy import array
from scipy.linalg import svd
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print("original matrix", A.shape)
print(A)
# SVD
U, D, VT = svd(A)
print("U matrix")
print(U)
print("D matrix")
print(D)
print("V Transpose matrix")
print(VT)

original matrix (3, 2)
[[1 2]
 [3 4]
 [5 6]]
U matrix
[[-0.2298477   0.88346102  0.40824829]
 [-0.52474482  0.24078249 -0.81649658]
 [-0.81964194 -0.40189603  0.40824829]]
D matrix
[9.52551809 0.51430058]
V Transpose matrix
[[-0.61962948 -0.78489445]
 [-0.78489445  0.61962948]]


### Use of SVD for pseudoinverse
The pseudoinverse is the generalization of the matrix inverse for square matrices to rectangular matrices where the number of rows and columns are not equal.

The pseudoinverse provides one way of solving the linear regression equation, specifically when there are more rows than there are columns, which is often the case.

NumPy provides the function pinv() for calculating the pseudoinverse of a rectangular matrix.

The example below defines a 4×2 matrix and calculates the pseudoinverse.

In [6]:
# Pseudoinverse
from numpy import array
from numpy.linalg import pinv
# define matrix
A = array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
[0.7, 0.8]])
print("original matrix")
print(A)
# calculate pseudoinverse
B = pinv(A)
print("pseudoinverse matrix")
print(B)

original matrix
[[0.1 0.2]
 [0.3 0.4]
 [0.5 0.6]
 [0.7 0.8]]
pseudoinverse matrix
[[-1.00000000e+01 -5.00000000e+00  1.28757642e-14  5.00000000e+00]
 [ 8.50000000e+00  4.50000000e+00  5.00000000e-01 -3.50000000e+00]]


### Use of SVD for Dimensionality Reduction
A popular application of SVD is for dimensionality reduction.

Data with a large number of features, such as more features (columns) than observations (rows) may be reduced to a smaller subset of features that are most relevant to the prediction problem.

The result is a matrix with a lower rank that is said to approximate the original matrix.

To do this we can perform an SVD operation on the original data and select the top k largest singular values in the Diagonal. These columns can be selected from the Diagonal and the rows selected from V<sup>T</sup>.

In [7]:
from numpy import array
from numpy import diag
from numpy import zeros
from scipy.linalg import svd
# define a matrix
A = array([
[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
print("original matrix")
print(A)
# Singular-value decomposition
U, D, VT = svd(A)
# create m x n Diagonal matrix
Diagonal = zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
Diagonal[:A.shape[0], :A.shape[0]] = diag(D)
# select the number of elements
n_elements = 2
Diagonal = Diagonal[:, :n_elements]
VT = VT[:n_elements, :]
# reconstruct
B = U.dot(Diagonal.dot(VT))
print("Reconstructed matrix")
print(B)
# transform
T = U.dot(Diagonal)
print("T matrix (reduced dimensionality), U dot product with Digaonal")
print(T)
T = A.dot(VT.T)
print("T matrix (reduced dimensionality), A dot product with VT.T")
print(T)

original matrix
[[ 1  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]]
Reconstructed matrix
[[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
 [11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]
 [21. 22. 23. 24. 25. 26. 27. 28. 29. 30.]]
T matrix (reduced dimensionality), U dot product with Digaonal
[[-18.52157747   6.47697214]
 [-49.81310011   1.91182038]
 [-81.10462276  -2.65333138]]
T matrix (reduced dimensionality), A dot product with VT.T
[[-18.52157747   6.47697214]
 [-49.81310011   1.91182038]
 [-81.10462276  -2.65333138]]


Examples, thanks to Jason Brownlee PhD retrieved from: https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/