## Recurrence, Depth and High-dimensional data
# High dimensionality visualization notebook

In this notebook we present several techniques of high dimensional data visualization.

*Please execute the cell bellow in order to initialize the notebook environment*

In [None]:
%autosave 0
# %matplotlib inline
%matplotlib notebook

from __future__ import division, print_function
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import mod3

plt.rcParams.update({'figure.figsize': (5.0, 4.0), 'lines.linewidth': 2.0})

## MNIST dataset import and pre-processing

*Please execute the cell bellow in order to prepare the MNIST dataset*

In [None]:
import keras
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train[:, ::2, ::2].copy()
x_test = x_test[:, ::2, ::2].copy()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

x_shape = x_train.shape[1:]

# keep only certain classes
x_train = x_train[(y_train==0) | (y_train==1) | (y_train==2) | (y_train==3)
                  | (y_train==4) | (y_train==5)]
y_train = y_train[(y_train==0) | (y_train==1) | (y_train==2) | (y_train==3)
                  | (y_train==4) | (y_train==5)]

x_test = x_test[(y_test==0) | (y_test==1) | (y_test==2) | (y_test==3)
                  | (y_test==4) | (y_test==5)]
y_test = y_test[(y_test==0) | (y_test==1) | (y_test==2) | (y_test==3)
                  | (y_test==4) | (y_test==5)]

print('train set shape:', x_train.shape)
print('test set shape:', x_test.shape)

## MNIST on PCA

In [None]:
from sklearn import manifold, decomposition

n_show = 500

input_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))

pca = decomposition.PCA(n_components=2).fit(input_train)

output_train = pca.transform(input_train)
output_train_inv = pca.inverse_transform(output_train)

plt.figure(figsize=(9, 2))
mod3.plot_generated(input_train, output_train_inv, x_shape)

mod3.plot_embedding(output_train[:n_show], y_train[:n_show], 'PCA projection')

## MNIST on t-SNE

In [None]:
n_show = 500

x_train_reduced = x_train[:5000]

input_train = x_train_reduced.reshape((len(x_train_reduced), np.prod(x_train_reduced.shape[1:])))

output_train = manifold.TSNE(n_components=2, init='pca', verbose=1).fit_transform(input_train)

mod3.plot_embedding(output_train[:n_show], y_train[:n_show], 't-SNE projection')

## Extended exercises

**EXTENDED EXERCISE 1**

The bottleneck layer of the autoencoder provides a latent representation of the dataset.

Characterize the representation of the bottleneck layer using dimensionality reduction techniques (PCA, t-SNE). How does this compare to the input representations?