# Principal Component Analysis

In this exercise sheet we look into how to compute and apply a Principal Component Analysis (PCA).

In [None]:
import numpy as np
import mllab.pca

### Task 3.1 - PCA Implementation

Implement the `pca()` function below using either a singular value decomposition or an eigenvector decomposition.

In [None]:
def pca(x, q):
    """
    Compute principal components and the coordinates.
    
    Parameters
    ----------
    
    x: (n, d) NumPy array
    q: int
       The number of principal components to compute.
       Has to be less than `p`.

    Returns
    -------
    
    Vq: (d, q) NumPy array, orthonormal vectors (column-wise)
    xq: (n, q) NumPy array, coordinates for x (row-wise)
    """
    # You code goes here
    return Vq, xq

### Task 3.2 - Toy 4D Example

We start by loading our toy example. The data is stored as a Numpy array, it is a $2585\times 5$ matrix. The last component of each row is the label, the first four components are the coordinates in 4D. Each label is an integer from  $\{0, 1, 2, 3, 4\}$.

The data contains a noisy 2D plane which is embded into 4D. We would like to represent the data in its _intrinsic_ 2D form.

In [None]:
import numpy as np
import mllab.pca

pca_toy_4d = np.load("data/pca_toy_4d.npy")
x = pca_toy_4d[:, :-1]  # 4D coordinates
y = pca_toy_4d[:, -1]  # labels

**Let us plot slices from this 4D data.**

We provide a helper function for this:

In [None]:
# Show documentation
mllab.pca.plot_toy_slice?

In [None]:
# Your code goes here

We want to remove the noise and recover the 2D information.

**PCA transformation for $q=2$, and plot.**

Now we can compute the 2D dimensional representation of `x` using PCA.

In [None]:
# Compute transformation

And then plot the coordinates, which are two dimensional. We provide a helper function for this task. Let us check how to use it:

In [None]:
mllab.pca.plot_toy_2d?

In [None]:
# Plot

Hopefully you appreciate the result.

**Check non-linear transformation.**

Let us see how PCA handles a non-linear transformation. To test this we map our data into 3D by keeping the y-axis as the new z-axis and bending x-coordinate onto an ellipse.

In [None]:
mllab.pca.map_on_ellipse?

In [None]:
# Map on ellipse in 3D using mallab.pca.map_on_ellipse

In [None]:
mllab.pca.plot_toy_3d?

In [None]:
%matplotlib notebook
# Plot 3D using mllab.pca.plot_toy_3d

**(Remeber to stop the interactive plot by pressing the shutdown icon!)**

Now apply PCA to our transformed data and plot the result as before.

In [None]:
%matplotlib inline
# Transform back to 2D
# Plot with mllab.pca.plot_toy_2d

Could be worse, but undeniably discomforting. Try different axes lengths and gap sizes of the ellipse. What do you observe?

### Task 3.3 - PCA on Iris

First compute the singular values of the Iris dataset, then check how many percent of the variance the first two principal components capture.

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()
iris_x = iris['data']
iris_y = iris['target']

In [None]:
# Compute singular values
# Plot captured variance for q=1,2

**Plot 2D PCA.**

Now apply PCA and compute the first two principal components. Plot the projected 2D data in a scatter plot such that the three labels are recognizable. What do you observe?

In [None]:
# Scatter plot 2d

In [None]:
def plot_1d_iris(a, b, c):
    """Show a 1D plot of three 1D datasets a, b and c.
    
    Top to bottom plotted in order is a, b, c."""
    left = min(x.min() for x in (a, b, c))
    right = max(x.max() for x in (a, b, c))
    for i, (x, c) in enumerate(((a, 'red'), (b, 'blue'), (c, 'green'))):
        plt.hlines(i * .3, left, right, linestyles='dotted', colors=[(.8,.8,.8,1)])
        plt.eventplot(x, colors=c, linewidths=.5, linelengths=.25, lineoffsets=(2 - i) * .3)
    plt.axis('off')

# Plot 1d

**Build linear SVM classifier for Iris.**

In [None]:
# Your classifier here
# Print accuracy on 1, 2, and 4 dimensions.

## Pedestrian Classification

### Task 3.4

In [None]:
input_file_path = "data/pca_ped_25x50.mat"

__Rread the file above into a NumPy array__

In [None]:
# Your code here

__Get the training data out__

In [None]:
# Your code here

__Normalize the data to the range [0, 1]__

In [None]:
# Your code here

__Write a function to plot an image__

In [None]:
# Your code here

In [None]:
# Plot samples

### Task 3.5 - Eigenpedestrians

In [None]:
from sklearn.decomposition import PCA
# Compute PCA

In [None]:
# Plot some eigenpedestrians

### Task 3.6 - Linear SVM classifier

In [None]:
from sklearn.svm import LinearSVC

# Compute scores of linear SVM when using increasing q

In [None]:
# Plot q vs scores

### Task 3.7 - HOG features

In [None]:
# Implement HOG features

**To test you implementation test data is availabe.**

The array `image` is the input, and `steps` contains the values of the inner variables of the HOG algorithm.

In [None]:
image, steps = mllab.pca.hog_test_data()

** Repeat task 3.6 with the HOG features.**

In [None]:
# Your code here