## Dimension Reduction [3] : TSNE

![](https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png)

- [🎛 Dimension Reduction [1] : PCA](https://www.kaggle.com/subinium/dimension-reduction-1-pca)
- [🎛 Dimension Reduction [2] : LDA](https://www.kaggle.com/subinium/dimension-reduction-2-lda)
- [🎛 Dimension Reduction [4] : UMAP](https://www.kaggle.com/subinium/dimension-reduction-4-umap)

## Import Library & Default Setting

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib as mpl
import matplotlib.pyplot as plt

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In [None]:
# matplotlib configure
plt.rcParams['image.cmap'] = 'gray'
# Color from R ggplot colormap
color = ['#6388b4', '#ffae34', '#ef6f6a', '#8cc2ca', '#55ad89', '#c3bc3f', '#bb7693', '#baa094', '#a9b5ae', '#767676']

In [None]:
mnist = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
label = mnist['label']
mnist.drop(['label'], inplace=True, axis=1)

## T-SNE & Result

**TSNE** stands for t-Stochastic Nearest Neighbor.

It is very slow in terms of speed, but produces good clustering in terms of performance. However, it can be used because RAPID.AI has implemented a fast TSNE with GPU. I will add it later.

It is supervised learning like LDA.

In [None]:
%%time
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2,random_state=0)
mnist_tsne = tsne.fit_transform(mnist, label)

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

for idx in range(10):
    fig.add_trace(go.Scatter(
        x = mnist_tsne[:,0][label==idx],
        y = mnist_tsne[:,1][label==idx],
        name=str(idx),
        opacity=0.6,
        mode='markers',
        marker=dict(color=color[idx])
        
    ))

fig.update_layout(
    width = 800,
    height = 800,
    title = "T-SNE result",
    yaxis = dict(
      scaleanchor = "x",
      scaleratio = 1
    ),
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)


fig.show()

Certainly, you can see that clustering works better than traditional mathematical methodologies.