## t-Distributed Stochastic Neighbor Embedding (t-SNE)

* First, one reference data point is determined, 
* Euclidean distance with all other data is calculated, and based on this distance, how close is expressed as a probability using a Gaussian distribution.
* Next, using the Gaussian distribution expressed as a probability as the correct answer, the t-distribution value corresponding to the value is selected and data with similar values are grouped together.
* KL-Divergence was used for the cost function required for learning, and gradient descent was used for the optimization method.
* Preserves the local neighbor structure
> * However, if the number of data is n, it may take a long time because the amount of computation increases by the square of n.
> * Very slow in terms of speed but provides good clustering in terms of performance
* Recently, GPUs can be used to represent fast t-SNE.
* Unlike PCA, t-SNE is supervised learning.

### Import and Configuration
Lets start with our typical import and configuration to setup for t-sne

In [1]:
# Import
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

import warnings

warnings.filterwarnings('ignore')

# Configuration
plt.rcParams['image.cmap'] = 'gray'

color = [
    '#6388b4', '#ffae34', '#ef6f6a', '#8cc2ca', '#55ad89', '#c3bc3f',
    '#bb7693', '#baa094', '#a9b5ae', '#767676'
]

In [7]:
# Load our dataset
mnist = pd.read_csv('https://raw.githubusercontent.com/sbussmann/kaggle-mnist/master/Data/train.csv')
mnist.head()
# strip the label
label = mnist['label']
mnist.drop(['label'], inplace=True, axis=1)

### t-SNE

In [8]:
%%time
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=0)
mnist_tsne = tsne.fit_transform(mnist, label)

CPU times: user 21min 37s, sys: 17.2 s, total: 21min 54s
Wall time: 11min 57s


In [9]:
import plotly.graph_objects as go

fig = go.Figure()

for idx in range(10):
    fig.add_trace(
        go.Scatter(x=mnist_tsne[:, 0][label == idx],
                   y=mnist_tsne[:, 1][label == idx],
                   name=str(idx),
                   opacity=0.6,
                   mode='markers',
                   marker=dict(color=color[idx])))

fig.update_layout(width=800,
                  height=800,
                  title="t-SNE result",
                  yaxis=dict(scaleanchor="x", scaleratio=1),
                  legend=dict(orientation="h",
                              yanchor="bottom",
                              y=1.02,
                              xanchor="right",
                              x=1))

fig.show()