# Coding our very own PCA
In this Notebook, we are going to code our own PCA

## Step 1 : Importing Libraries

In [2]:
import numpy as np

## Step 2 : Getting data
We will use random data on Normal distribution in our example

In [3]:
np.random.seed(2344234)

In [4]:
mean_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
X = np.random.multivariate_normal(mean_vec1, cov_mat1, 100)

## Step 3 : Getting Co-Variance Matrix

In [6]:
# np.cov is a good function in Numpy that takes 1D or 2D type data and returns another matrix having
# covariance values between each pair

# However, in case of 2D data, it finds Covariance between rows. Since our features are stacked as
# columns, we need to Transpose our data

X_transpose = X.T
cov = np.cov(X_transpose)
cov

array([[ 1.43340112,  0.21963303,  0.00955885],
       [ 0.21963303,  0.83918163, -0.0329771 ],
       [ 0.00955885, -0.0329771 ,  0.99409076]])

## Step 4 : Finding the Eigen Vectors and Eigen Values from the Co-Variance Matrix

In [7]:
# np.linear_algebra has a nice function called eig() that takes in input co-variance matrix and 
# returns Eigen Values and Eigen Vectors

np.linalg.eig(cov)

(array([1.50577087, 0.76174769, 0.99915494]),
 array([[ 0.94973472,  0.30935084,  0.04802095],
        [ 0.31304638, -0.93965855, -0.13799916],
        [-0.00243314, -0.14609538,  0.98926752]]))

In [8]:
# As we can see, it returns a tuple of Eigen Values and Eigen Vectors

# 1-> First of all, Eigen Vectors are column-wise and not row-wise, i.e. first eigen vector is 
#    formed by first element from each row in matrix and so on.....
# 2-> Secondly, Eigen values are not in descending order as we would like them to be

In [9]:
eig_val, eig_vectors = np.linalg.eig(cov)  # Got the eigen values and eigen vectors

## Step 5 : Sorting the Eigen Vectors and Eigen values in descending order of Eigen Values

In [10]:
# Let's create a new list having list of Eigen Value, Eigen Vector pair
# Then we will call sort on it

eig_val_vector_pairs = []
for i in range(len(eig_val)) :
    eig_vec = eig_vectors[:, i]   # Getting the corrseponding Eigen Vector by selecting ith column
    eig_val_vector_pairs.append((eig_val[i], eig_vec))  # Appending a tuple of Eigen Value-Vector

# Let's sort the Eigen Value-Vector List in decreasing order
eig_val_vector_pairs.sort(reverse=True)

eig_val_vector_pairs

[(1.505770867195348, array([ 0.94973472,  0.31304638, -0.00243314])),
 (0.9991549441401951, array([ 0.04802095, -0.13799916,  0.98926752])),
 (0.7617476873346821, array([ 0.30935084, -0.93965855, -0.14609538]))]

As an excercise, you can use the built-in PCA and see that the Eigen Vectors (pca.components_) and the Eigen Values (pca.explained_variance_) have the same values as our version of it

# Difference in results
One thing that may be different in our results and the built-in PCA results can be that the Eigen-Vectors that the built-in PCA chose are completely negative to the direction that we have chosen.

However, I think you are smart enough to understand that it doesn't matter anyway