In [2]:
%matplotlib inline

In [3]:
import sympy
import math
import cmath
import numpy as np
import numpy.polynomial.polynomial as p
import matplotlib.pyplot as plt
from turtle import *
import re
from sympy.ntheory import discrete_log
from matplotlib.transforms import Affine2D
import skimage.io
import time

# Principal Component Analysis

Task:

4. Principal Component Analysis

Sometimes a projection of a higher-dimensional to a lower-dimensional space is useful. It's extremely useful if we want to get some visual understanding of a, say, 15D space, in 3D or even 2D. One algorithm which allows us to project multidimensional data into fewer dimensions while keeping the most important shapes and structures is called principal component analysis (PCA). You can explore this using the following checklist:

- What are eigenvalues and eigenvectors?
- What is the eigenbasis? What is the spectrum of a matrix?
- How do we compute the eigenvalues and eigenvectors of a matrix?
- What is projection?
- How does projection conserve some shapes? Think about an object casting a shadow
- How is the projection problem related to eigenvalues and eigenvectors?
- What is PCA?
- What are principal components? How many components are there (as a function of dimensions of the original space)?
- What is variance? What is explained variance?
- How do principal components relate to explained variance?
- How is PCA implemented? Implement and show
- Show some applications of PCA, e.g. reducing a 3D image to its first 2 principal components, plotting the 3D and 2D images
- Show a practical use of PCA, for example, trying to see features in a 15D space, projected in 3D.

## PCA Motivation

Here is the perspective: we are an experimenter. We are trying to understand some phenomenon by measuring various quantities (e.g. spectra, voltages, velocities, etc.) in our system. Unfortunately, we can not figure out what is happening because the data appears clouded, unclear and even redundant. This is not a trivial problem, but rather a fundamental obstacle in empirical science. Examples abound from complex systems such as neuroscience, photometry, meteorology and oceanography - the number of variables to measure can be unwieldy and at times even deceptive, because the underlying relationships can often be quite simple. Take for example a simple toy problem from physics diagrammed in Figure 1. Pretend we are studying the motion of the physicist’s ideal spring. This system consists of a ball of mass m attached to a massless, friction- less spring. The ball is released a small distance away from equilibrium (i.e. the spring is stretched). Because the spring is “ideal,” it oscillates indefinitely along the x-axis about its equilibrium at a set frequency. This is a standard problem in physics in which the motion along the x direction is solved by an explicit function of time. In other words, the underlying dynamics can be expressed as a function of a single variable x. However, being ignorant experimenters we do not know any of this. We do not know which, let alone how many, axes and dimensions are important to measure. Thus, we decide to measure the ball’s position in a three-dimensional space (since we live in a three dimensional world). Specifically, we place three movie cameras around our system of interest. At 200 Hz each movie camera records an image indicating a two dimensional position of the ball (a projection). Unfortunately, because of our ignorance, we do not even know what are the real “x”, “y” and “z” axes, so we choose three camera axes {~a, ~b,~c} at some arbitrary angles with respect to the system. The angles between our measurements might not even be 90o! Now, we record with the cameras for several minutes. The big question remains: how do we get from this data set to a simple equation of x? We know a-priori that if we were smart experimenters, we would have just measured the position along the xaxis with one camera. But this is not what happens in the real world. We often do not know which measurements best reflect the dynamics of our system in question. Furthermore, we sometimes record more dimensions than we actually need! Also, we have to deal with that pesky, real-world problem of noise. In the toy example this means that we need to deal with air, imperfect cameras or even friction in a less-than-ideal spring. Noise contaminates our data set only serving to obfuscate the dynamics further. This toy example is the challenge experimenters face everyday.


## What is PCA?

Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data much easier and faster for machine learning algorithms without extraneous variables to process.

The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of interrelated variables while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables.

For example lets say that we have 15 dimensions but we can only view a 3 dimensional (3D) space. PCA solves this for us by reducing the dimensions from 15 to 3. Another example is reducing a 3D image to its first 2 principal components and plotting the 3D as a 2D image.

So to sum up, the idea of PCA is simple — reduce the number of variables of a data set, while preserving as much information as possible.


## What are eigenvalues and eigenvectors?

In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. 

The corresponding eigenvalue, often denoted by λ is the factor by which the eigenvector is scaled.

Let’s consider for an example that we want to build mathematical models (equations) where the input data is gathered from a large number of sources.

It introduces its own sets of problems such as the large sparse matrix can end up taking a significant amount of space on a disk. Plus, it becomes extremely time-consuming for the model to train itself on the data. Furthermore, it is difficult to understand and visualize data with more than 3 dimensions, let alone a dataset of over 100+ dimensions. Hence, it would be ideal to somehow compress/transform this data into a smaller dataset.

There is a solution. We can utilise Eigenvalues and Eigenvectors to reduce the dimension space. To elaborate, one of the key methodologies to improve efficiency in computationally intensive tasks is to reduce the dimensions after ensuring most of the key information is maintained.

Eigenvalues and Eigenvectors are the key tools to use in those scenarios

1. What Is An Eigenvector?

For the sake of simplicity, let’s consider that we live in a two-dimensional world.

- Alex’s house is located at coordinates [10,10] (x=10 and y =10). Let’s refer to it as vector A.

- Furthermore, his friend Bob lives in a house with coordinates [20,20] (x=20 and y=20). I will refer to it as vector B.

If Alex wants to meet Bob at his place then Alex would have to travel +10 points on the x-axis and +10 points on the y-axis. This movement and direction can be represented as a two-dimensional vector [10,10]. Let’s refer to it as vector C.

We can see that vector A to B are related because vector B can be achieved by scaling (multiplying) the vector A by 2. This is because 2 x [10,10] = [20,20]. This is the address of Bob. Vector C also represents the movement for A to reach B.

*The key to note is that a vector can contain the magnitude and direction of a movement.*



## What is the eigenbasis? What is the spectrum of a matrix?


## How do we compute the eigenvalues and eigenvectors of a matrix?


## What is projection?

## How does projection conserve some shapes? Think about an object casting a shadow.


## How is the projection problem related to eigenvalues and eigenvectors?


## What are principal components? How many components are there (as a function of dimensions of the original space)?


## Glossary

- PCA - Principal Component Analysis
- 3D - Three Dimensional

## References:

<https://www.cs.cmu.edu/~tom/10701_sp11/slides/pca_schlens.pdf>

<https://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf>

<https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c>

<https://towardsdatascience.com/the-mathematics-behind-principal-component-analysis-fff2d7f4b643>

<https://en.wikipedia.org/wiki/Principal_component_analysis>

<https://builtin.com/data-science/step-step-explanation-principal-component-analysis>

<https://heartbeat.fritz.ai/understanding-the-mathematics-behind-principal-component-analysis-efd7c9ff0bb3>

<https://medium.com/analytics-vidhya/mathematics-behind-principal-component-analysis-pca-1cdff0a808a9>

<https://www.nature.com/articles/nmeth.4346>

<https://en.wikipedia.org/wiki/Norm_(mathematics)>

<https://en.wikipedia.org/wiki/Euclidean_distance>

---

<https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors>

<https://medium.com/fintechexplained/what-are-eigenvalues-and-eigenvectors-a-must-know-concept-for-machine-learning-80d0fd330e47>

---