# Principal Component Analysis

The goal of PCA is to get as much information out of as few variables as possible.
The SVD gives us orthogonal **eigenvectors**, and that orthogonality is desirable.
Because they are orthogonal, they do not have correlations, and therefore do not represent "redundant" information.

In PCA, we'll use the correlation matrices.
Note, PCA/Factor analysis/whatever only need the correlation matrices, not the raw data.
They'll always have the same set of eigenvalues and eigenvectors in use.

The eigenvectors are the components, and eigenvalues are "importances" of the components, I guess.

The output from a principal component analysis is a set of $r$ components $PC$, where each $PC$ is a weighted sum of the original variables.

Suppose we originally have $p$ variables as columns of a matrix $X$.
Then variables are $X_1, X_2, \ldots, X_p$.
So the principle components are given by:  
$PC_{(1)} = w_{(1)1} X_1 + w_{(1)2} X_2 + \ldots + w_{(1)p} X_p$  
$PC_{(2)} = w_{(2)1} X_1 + w_{(2)2} X_2 + \ldots + w_{(2)p} X_p$  
$\ldots$  
$PC_{(r)} = w_{(r)1} X_1 + w_{(r)2} X_2 + \ldots + w_{(r)p} X_p$  

More generally, the component $PC_(j) = \sum\limits_{i=1}^p w_{(j)i} X_i$, where the matrix $w$ is the collection of weights.

Note, the weights are the individual elements of a corresponding eigenvector.

In PCA, we don't worry about whether variance is shared among variables, we just explain variance.

Regardless of how many components we end up using, the components themselves do not change.
Thus, there's no harm in looking at all components before choosing them.

### Interpreting Principal Components

When we obtain our principal components, we can look at the weights to interpret what it means when an individual has a high value on a given component.

For example, suppose we have 6 columns, and each column represents a score on a test.
Then if we have a component where all weights are positive (e.g. weight vector is $.40, .41, .31, .44, .45, .41$), we could interpret the component as indicating an individual with high scores on all tests.
That is, an individual with a high value for that component must have had high scores across their test scores.
If a few components were positive and a few were negative, then an individual with a high score must have high scores in the positively-weighted tests, and poor scores on the negatively-weighted tests.

Note that we generally would assume that the values are standardized, so the X-variables are assumed to be bewteen -1 and 1, with mean of 0.
Then what we'd expect above is that an individual with a high value on the second component would, on the variables with negative weight, have high-magnitude negative values (which presumably corresponds to poor test scores, I'm still working that out in my head).

Now, consider a component where some components have weights near zero.
Then our interpretation for those variables is that they have little effect on the component value.

When interpreting, always be sure to keep in mind that all components are mutually orthogonal.
There's no way to argue that two components are correlated or anything like that.

Also, note that the eigenvalue is how we interpret the importance of the given component.
When deciding which components are worth including, we look at the eigenvalue, not the weights.

### Choosing Principal Components

There's a few approaches to choosing which components to use.
1. Make a Scree Plot, which graphs Eigenvalue on the y-axis and the component number on the x-axis. 
    We then look for a "knee" in the graph, i.e. where the slope goes from large in magnitude to small in magnitude.  
    I suppose this would correspond to a point with high positive second derivative?  
    Ooh, or maybe you could base it on an angle threshold?
2. Only keep components whose eigenvalues are $> 1$, i.e. components with more explanatory power than the original variables.
    This is used when analyzing correlation matrix.
3. Set a variance-accounted-for threshold. In other words, take only as many components as needed to account for, say, 80% of variance.
    Because the eigenvalues are the variances of the components, we can use them to determine how much overall variance has been accounted for by the components.
    Given $n$ variables, the eigenvalues will sum to $n$.
    To take components up to a threshold, say, $T$, we find the smallest set of eigenvalues $E$ such that $\sum\limits_{e \in E} \frac{e}{n} \ge T$.  
    Note that this works only because the components are mutually orthogonal.
    Then the variance on each is completely separated from variance in other components.


#### Variable Loading

The "loading" on a variable is kind of like the correlation between the variable and a given component.
So, given a component $PC_{(i)}$ and a variable $X_j$, then the loading of variable $X_j$ on component $i$ is $r(\text{PC}_{(i)}, X_j) = w_{(i)j} \cdot \sqrt(\lambda_i)$.  
These are sometimes preferable when interpreting, since we could have a low-eigenvalue component which has high weight on a variable but low overall correlation.
Then looking at the raw weight might tempt us to interpret as the component that is high for that variable... but really we're dealing with leftovers, in some sense.
So by "weighting" our weights by eigenvalue to get correlation, we see the strength of the relation between the variables and the component, not just how to compute component value from variable values.

We can also square the loadings and interpret as the amount of variance shared between the variable and the component.

Note that SPSS likes to give the loadings, rather than the eigenvalues, I guess.

### Finding Component Scores

We noted previously that we can calculate our SVD (and thus, components), from the correlation matrix, rather than from the whole original matrix.
However, if we want to find the component scores for individuals in the original matrix, we now need the original matrix.
There's no way to get individual component scores based on the correlations; we need to know what the original individual scores were.

### Uses for Components/Factorizations

- If we have many variables, and many are correlated/colinear/whatever, we can use PCA to reduce the number of variables to use in another analysis.  
    For example, we may want to 