# Example: PCA applied to MNIST

This notebook shows how PCA can be applied to a non-trivial data set to reduce its dimensionality.
PCA (and its variations, like Incremental PCA) is a useful tool to reduce the dimensionality of your feature space to boost model training performance.
In this example, we will examine how vanilla PCA affects the MNIST data set.

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams.update({'font.size': 18})

import numpy as np

from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split

# Load my smaller MNIST data set...
X = np.load('data/mnist/mnist_data.pkl')
y = np.load('data/mnist/mnist_target.pkl')
# Create the training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

A single instance of the input has **784 features**!

In [None]:
sample_idx = 0
sample_digit = X_train[0,:]
print("Dimension = {0}".format(sample_digit.shape))

Here is what that 28x28 = 784 dimension instance looks like:

In [None]:
digit_image = sample_digit.reshape(28,28)
plt.imshow(digit_image, cmap=matplotlib.cm.binary, interpolation='nearest')
plt.axis('on')
plt.show()

Let us assume that this feature space is too large for our chose classification algorithm.
We could map the data set into a 196 dimensional space...

In [None]:
pca_1 = PCA(n_components=196, random_state=42)
X_train_pca1 = pca_1.fit_transform(X_train)

In [None]:
sample_digit_pca1 = X_train_pca1[sample_idx]
print("Dimension = {0}".format(sample_digit_pca1.shape))

In [None]:
sample_digit_pca1_img = sample_digit_pca1.reshape(14,14)
plt.imshow(sample_digit_pca1_img, cmap=matplotlib.cm.binary, interpolation='nearest')
plt.axis('on')
plt.show()

Yikes! I thought we were getting a 14x14 image! 
Hmmm...wait a tick. We **compressed** the image into some 192 dimensional space that does not directly map to the originally intended pixel space!

We should transform them back to see the **loss** due to this compression.

In [None]:
X_train_recovered = pca_1.inverse_transform(X_train_pca1)
X_train_recovered.shape

In [None]:
sample_digit_pca1_restored = X_train_recovered[0,:]
print("Dimension = {0}".format(sample_digit_pca1_restored.shape))

As humans, we see that reducing the input features by a factor of 4 still makes the number legible.
To our training algorithm, this reduction will result in faster training as the features are mapped to 196 dimensions with **highest variance**.

In [None]:
sample_digit_pca1_res_img = sample_digit_pca1_restored.reshape(28,28)
plt.imshow(sample_digit_pca1_res_img, cmap=matplotlib.cm.binary, interpolation='nearest')
plt.axis('off')
plt.title('Restored from 196D Space')
plt.show()

plt.imshow(digit_image, cmap=matplotlib.cm.binary, interpolation='nearest')
plt.axis('off')
plt.title('Original')
plt.show()

## Pushing the limits
Now let's get crazy. What if we could smash these numbers into a 3D space?

In [None]:
pca_2 = PCA(n_components=3, random_state=42)
X_train_pca2 = pca_2.fit_transform(X_train)

As you will see, this compression has no intuition to us.

In [None]:
X_train_pca2[0,:]

In [None]:
X_train_recovered_2 = pca_2.inverse_transform(X_train_pca2)
sample_digit_pca2_restored = X_train_recovered_2[sample_idx,:]

In [None]:
sample_digit_pca2_res_img = sample_digit_pca2_restored.reshape(28,28)
plt.imshow(sample_digit_pca2_res_img, cmap=matplotlib.cm.binary, interpolation='nearest')
plt.axis('off')
plt.title('Restored from 3D Space')
plt.show()

In this case, we see *some* visual cues that remain from the original instance; however, it is likely too lossy to be of any use!