In [1]:
from __future__ import division, print_function, absolute_import

#Session 2 Final Presentations:
Summary of machine learning and an improvement on `scikit-learn`
========

#####Version 0.1

This notebook contains the [final project](https://github.com/LSSTC-DSFP/LSSTC-DSFP-Sessions/blob/master/Session2/Day1/comm_viz_assignment.md) from Session 2 of the LSSTC DSFP.
***
By the LSSTC DSFP Fellows

To test the slides on your local machine you need to run the following: 

    ipython nbconvert Session2FinalProject.ipynb --to slides --post serve

# t-SNE: t-Distributed Stochastic Neighbor Embedding

### Joachim Moeyens & Szymon Prajs

### What is t-SNE?
- Unsupervised manifold learning or more simply nonlinear dimensionality reduction
- Or **even** more simply ... magic!

In [2]:
from sklearn import datasets

digits = datasets.load_digits(n_class=6)

![digits](jmsp/digits.png)

In [3]:
from sklearn import random_projection

srp = random_projection.SparseRandomProjection(n_components=2)
X_projected = srp.fit_transform(digits.data)

![random](jmsp/random.png)

In [4]:
from sklearn import decomposition

pca = decomposition.PCA(n_components=2)
X_pca = pca.fit_transform(digits.data)

![pca](jmsp/pca.png)

The (simplified) algorithm:
- Find high D probabilities:
    - pairs of points with small seperation are given high probabilities, pairs of points with large seperation are given small probabilites
- Find low D probabilities: 
    - reduce the dimensionality of the dataset and now repeat finding the probabilites of correlation between pairs of points
- Compare high D probabilities and low D probabilites and minimize the probability difference between pairs of points
    - a.k.a. minimizing the Kullback-Leibler divergence
    
More details: [GoogleTalk: Visualizing Data Using t-SNE: van der Maaten (2013)](https://www.youtube.com/watch?v=RJVL80Gg3lA)

![dreduction](jmsp/dreduction.png)

In [5]:
from sklearn import manifold

tsne = manifold.TSNE(n_components=2)
X_tsne = tsne.fit_transform(digits.data)

![tsne](jmsp/tsne.png)

In [6]:
from sklearn import manifold

tsne = manifold.TSNE(n_components=2, init="pca")
X_tsne = tsne.fit_transform(digits.data)

![tsne-pca](jmsp/tsne-pca.png)

## Questions?


#### Resources:
- [Scikit-Learn Documenation](http://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html#sphx-glr-auto-examples-manifold-plot-lle-digits-py)
- [Visualizing Data Using t-SNE: van der Maaten & Hinton (2008)](http://www.cs.toronto.edu/~hinton/absps/tsne.pdf)
- [GoogleTalk: Visualizing Data Using t-SNE: van der Maaten (2013)](https://www.youtube.com/watch?v=RJVL80Gg3lA)