# Principal Component Analysis

This challenge will help you gain intuition on how a **Principal Component Analysis** works.  

## 1) Generate Data

We want a dataset with **100 observations** and **2 correlated features**

👇 Run the cell below to generate your data  
💡 Notice the correlation between your features

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate a dataset with 100 observations and 2 correlated features.
seed = np.random.RandomState(42)
feature_1 = seed.normal(5, 1, 100)
feature_2 = .7 * feature_1 + seed.normal(0, .5, 100)
X = np.array([feature_1, feature_2]).T
X = pd.DataFrame(X)

X.corr().round(3)

❓ Make a scatter plot of your two features against each other

In [2]:
# YOUR CODE HERE

☝️ You can see the positive correlation between the features  

Our observations are packed along a single line, it is not easy to spot differences between them

💡 PCA will help us find the directions that cancel this correlation

## 2) Principal Components

👉 Import `PCA` from `sklearn` and initiate a model with `n_components=2`

❓Fit it on your `X`, and assign it to `pca`

In [3]:
# YOUR CODE HERE

Let's focus on two objects in that `PCA`: 

`pca.components_`: it's a set of eigenvectors which point to the directions where the variance is maximally explained: the **directions of maximum variance**.

`pca.explained_variance_`:  $\frac{Var(Principal\; Component)}{Var(X)}$, given by the corresponding L2 norm of these eigenvectors.

In [4]:
pca.components_

In [5]:
pca.explained_variance_

👇 Run the cell below to visualize your two Principal Components

In [6]:
plt.figure(figsize=(5,5))

plt.scatter(X[0], X[1])

for (length, vector) in zip(pca.explained_variance_, pca.components_):
    v = vector * np.sqrt(length) # Square root of their lenghts to compare same "units"
    plt.quiver(*X.mean(axis=0), *v, units='xy', scale=1, color='r')

The length of the vector is a measure of the standard deviation of the data when projected onto that axis!

We can then use those directions to "explain" most of our observations behaviour - most of the distinction between observations happens along thoses axis. 

## 3) Apply PCA

We can use these components to project every sample of our dataset onto the directions of maximum variance.

 ❓ Use the `transform` method of your `pca` on `X` and store the result in `X_transformed`  
 ❓ Plot your projected features in `X_transformed`against one another.  
 ❓ Compute the correlation between your transformed features in `X_transformed`

In [7]:
# YOUR CODE HERE

In [8]:
# YOUR CODE HERE

☝️ There is no correlation at all between your transformed features.  

This makes it easier to study the behaviour between observations since they are no longer packed along a single line.

🏁 **Don't forget to push your notebook.**  

Proceed with the challenges of the day and come back here if you have time 😉

## 4 - Optional) With a little help from Scaling

Remember that the `projections` obtained with the `PCA` is nothing more than the dot product of your initial `X` and your transposed components.  

👉 Compute your projected values manually by performing the dot product: $X.PC^T$.  

❓ Use `np.allclose`, to check that your `X_transformed` is equal to your dot product $X.PC^T$.

In [9]:
# YOUR CODE HERE

It's not equal 😱  

When the `PCA` of `sklearn` applies the reduction, it does so on a `X` that is, *centered*, but not *scaled*.

This means that `PCA().transform(X)` is actually equivalent to `np.dot(X - X.mean(axis=0), PC)`

In [10]:
np.allclose(X_transformed, np.dot(X - X.mean(axis=0), PC))

Scaling our data can make this difficult to spot.  

❓ Standardize your `X` and store it in `X_scaled`  
❓ Then check the mean of `X` and `X_scaled`

In [14]:
# YOUR CODE HERE

❓ Fit a `PCA` on `X_scaled` and assign it to `pca_scaled`

In [15]:
# YOUR CODE HERE

❓ Using `np.allclose`, check that the projection of `X_scaled` is equal to your dot product $X.PC^T$

In [16]:
# YOUR CODE HERE