# Dimensionality reduction on the Digits data set: PCA vs tSNE

The goal of this notebook is to compare two dimensionality-reduction approaches on the same dataet: PCA and tSNE.

This notebook was created by [Chloé-Agathe Azencott](http://cazencott.info), inspired by material from [Alexandre Gramfort]http://alexandre.gramfort.net/) and [Jake Vanderplas](https://github.com/jakevdp).

This notebook was created using
* python 3.4.3
* numpy 1.15.0
* matplotlib 2.2.2
* scikit-learn 0.19.2

You can check your version of Python by running
```python
import sys
print(sys.version)
```

and the version of any module by running
```python
import <module name>
print(<module name>.__version__)
```

## Loading the data science libraries

In [None]:
%pylab inline
import pandas as pd

## 1. Data

### Loading the data
The data are available in scikit-learn.

In [None]:
# Load data
from sklearn.datasets import load_digits 

digits = load_digits()

# Get descriptors and target to predict
X, y = digits.data, digits.target

# Get the shape of the data
print("Number of samples: %d" % X.shape[0])
print("Number of pixels: %d" % X.shape[1])
print("Number of classes: %d" % len(np.unique(y))) # number of unique values in y

Each object (image) in this data set is represented using 64 features. This makes plotting all the objects on the same figure difficult. We will therefore use dimensionality reduction techniques.

### Data pre-processing

Remember: PCA must be applied on standardized data. We will use scikit-learn's [preprocessing.StandardScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).

In [None]:
from sklearn import preprocessing

std_scale = preprocessing.StandardScaler().fit(X)
X_scaled = std_scale.transform(X)

## 2. Principal component analysis.

Let us use PCA to project the images on 2 dimensions and visualize them. We will use scikit-learn's [decomposition.PCA](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html).

In [None]:
from sklearn import decomposition

# Create a pca object
pca = decomposition.PCA(n_components=2)

# Apply to the data, i.e. learn the PCs and project the data onto them
X_proj = pca.fit_transform(X_scaled)
print(X_proj.shape)

We will use a colormap that is different from the default one, and is well suited to display 10 different classes: the Paired colormap. To learn more about colormaps, and decide which one to pick, see the [Choosing Colormaps](https://matplotlib.org/users/colormaps.html) documentation. 

In [None]:
from matplotlib import cm # Will allow us to change color maps

In [None]:
fig = plt.figure(figsize=(6, 6))

# Visualize the projected data
plt.scatter(X_proj[:, 0], # first dimension
            X_proj[:, 1], # second dimension
            c=y, # color by label
            edgecolor='none', # remove dot border
            cmap=cm.Paired, # use the Paired colormap
            alpha=0.5 # use transparency to better see overlapping dots
           ) 
plt.colorbar(label='digit label', ticks=range(10))

plt.xlabel('PC 1')
plt.ylabel('PC 2')

__Question 1:__ What do you observe? Do you think it will be easy to learn a classifier that separates the data, based on those two components?

__Answer:__

## 3. tSNE

tSNE is a popular non-linear alternative to PCA. It is implemented in scikit-learn's [manifold.TSNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).

In [None]:
from sklearn import manifold

In [None]:
# Create a tSNE object
tsne = manifold.TSNE(n_components=2)

# Apply to the data
X_proj = tsne.fit_transform(X)
print(X_proj.shape)

In [None]:
fig = plt.figure(figsize=(6, 6))

# Visualize the projected data
plt.scatter(X_proj[:, 0], # first dimension
            X_proj[:, 1], # second dimension
            c=y, # color by label
            edgecolor='none', # remove dot border
            cmap=cm.Paired, # use the Paired colormap
            alpha=0.5 # use transparency to better see overlapping dots
           ) 
plt.colorbar(label='digit label', ticks=range(10))

plt.xlabel('tSNE dimension 1')
plt.ylabel('tSNE dimension 2')

__Question 2:__ What do you observe? Do you think it will be easier to learn a classifier that separates the data, based on those two new dimensions, rather than with PCA?

__Answer:__

Here tSNE works quite nicely with default parameters; that is not always the case. This [distill.pub article](https://distill.pub/2016/misread-tsne/) gives a lot of additional information on how to use it.

__Question 3:__ What happens if you use a perplexity of 2? Of 200? 

In [None]:
# TODO