# Applying PCA

## Reconstruction from Compressed Representation 
- The following is the implementation of transforming our data (linear algebra) back to its original form
    - We must use U reduce * z to get the X approximation matrix
- <img src="../images/PCA-8.png" alt="Drawing" style="width: 500px;"> 

## Choosing the Number of Principal Components 

The goal of the PCA is to:
- Minimize the average squared projection error
    - $1/m \Sigma_{i=1}^m ||x^{(1)} - x_{approx}^{(1)}||^2$
- Total variation in the data: 
    - $1/m \Sigma_{i=1}^m ||x^{(1)}||^2$
    - On average, how far are the training examples from the origin
- Typically, choose $k$ to be smallest value so that:
    - $(1/m \Sigma_{i=1}^m ||x^{(1)} - x_{approx}^{(1)}||^2)$ / $(1/m \Sigma_{i=1}^m ||x^{(1)}||^2)$
- The goal is that our calculation will be less than 0.01 which indicates that 99% of our variance is retained. We want the averafe square projection to be really small!

<img src="../images/PCA-10.png" alt="Drawing" style="width: 300px;">

There are two methods to calculate the K (use the image below)
- The right-side equation computes by passing the K=1, and continue to increase K until we reach our desired goal. This method is simple but not computational efficient!
- The left-side equation computes the S in the SVD!
    - S is the eigenvalue (a diagnol matrix)
    - Thus we compute the eigenvalue of its specific entry (say 3) by dividing amongst the sum of all the eigenvalue
    - This method is the same as the method described above (left-side in the image)
- <img src="../images/PCA-11.png" alt="Drawing" style="width: 400px;">

## Advice for Applying PCA
- **Supervised learning speedup**
    - If you're doing image processing, and you have a 100 by 100 image, you really have 10000 pixel feature and this can take a lot of time
    - <img src="../images/PCA-11.png" alt="Drawing" style="width: 400px;">
    - Thus, computing the PCA to map into a lower-dimension while keeping the variance intact can reduce the dimension which will speed the computation when training the data
    - ONLY PERFORM THE PCA ON THE TRAINING SET AND NOT THE CV SET!
- **Application of PCA**
    - Compression
        - Reduce memory/disk needed to store data
        - Speed up the algorithm
    - Visualization
        - Works with k=2 or k=3
- **Bad USE of PCA**
    - PCA can also cause you to lose valuable information to prevent overfitting! This is not the worst method but regularizaiton is better. You will throw away the predictor value which loses information of the data!
    
<img src="../images/PCA-12.png" alt="Drawing" style="width: 400px;"> 

Another suggestion is to not use PCA right away. You should work with the original RAW data and compute/train the model.
