# Principal Component Analysis Notes

### If given data of any shape whatesoever, PCA finds new coordinate system and tells you how important they are

- by translation and rotation only
- moves center of coordinate system to center of data
- moves the center of x-axis to center of variation
- moves y-axis orthoginal to x-axis

### Result: reduces dimensionality

#### Measurable vs. Latent features

- Measurable: vars you can measure

- Latent: vars you cant measure directly but drive the feature

- Variance (PCA/stats): spread of feature distribution

##### Why compress data on direction of largest variance (longer line, x-prime) when y-prime is shorter?
- Because it retains the maximum amount of information
- Variance between principal component and point determines information lost
- So points with greater distance from principal component line will lose more information
- Projecting PCA line across the line of largest variance will minimize information loss

#### Problem: not scalable
- if you have many features, can't scale feature detection individually

#### Solution: can put features together and combine into new features and rank feature power
- creates multiple principal components, ex: 1. Neighborhood, 2. house size

##### issues: hard to interpret but there are takeaways


#### The max number of possible principal components is the number of features in the dataset

## In SKlearn

In [3]:
from sklearn.decomposition import PCA
import pandas as pd

You create and fit pca similar to other sklearn algorithms: 

In [2]:
def doPCA(data,n_components=2):
    from sklearn.decomposition import PCA
    pca = PCA(n_components=n_components)
    pca.fit(data)#some data here
    return pca


In [5]:
pca = doPCA(data)
print(pca.explained_variance_ratio_)
#explained variance tells you how much % of the data is in each pca 

You can grab pca components like so:

In [None]:
first_pc = pca.components_[0]
second_pc = pca.components_[1]

To get useful information, here is a potential plotting method, and also how to get your original data tranformed with the pca:

In [None]:
transformed_data = pca.transform(data)
for i, j in zip(transformed_data,data):
    plt.scatter(first_pc[0] * i[0], first_pc[1] * i[0], color='r')
    plt.scatter(second_pc[0] * i[0], second_pc[1] * i[0], color='c')
    plt.scatter(j[0],j[1],color='b')

## When to use PCA

- latent features driving patterns in data
    - ex.: big shots at Enron
    
- Dimensionality reduction
    - can help visualize high dimension data
        - project data down to first 2 pca, and plot
        - then plotting algo performance is easier
    - reduces noise
    - pre-processing before using other algorithm (regression, classification)
        - works better because there are fewer inputs 
        - ex: eigen faces

## Example: PCA for Facial Recognition

#### What makes PCA good for facial recognition?
- pictures of faces have high input dimensionality (many pixels)
- faces have general patterns that can be captured in smaller dimensions

#### How to select a good amount of PCs to use?
- train on different # of pcs and compare F1 score
    - will show point of diminshing returns
- don't do feature selection before- PCA can "salvage" different features because it combines them 
