<img src="https://i.imgflip.com/2jxxia.jpg" title="made at imgflip.com">
# Unsupervised Deep Learning
We, data scientists regularly use **DNNs, CNNs and RNNs** for most applications of deep learning but that's only the **supervised side** of the neural networks family but there is also the more sophisticated and less talked about **unsupervised side** which is just as or even more intriguing than the conventional supervised architectures. These unsupervised models enable the neural networks to perform tasks like **clustering, anomaly detection, feature selection, feature extraction, dimensionality reduction and recommender systems**. Some of these neural networks are: **Self organizing maps, Boltzmann machines, Autoencoders**

### Self Organizing Maps
<img src="https://www.researchgate.net/profile/Damian_Jankowski3/publication/291834232/figure/fig3/AS:553741877481472@1509033759154/Self-organizing-map-structure.png">

### Boltzmann Machines
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/7a/Boltzmannexamplev1.png/330px-Boltzmannexamplev1.png">

### Autoencoders
<img src="https://cdn-images-1.medium.com/max/1000/1*ZEvDcg1LP7xvrTSHt0B5-Q@2x.png" height="400" width="500">

# Aim
This is supposed to be a tutorial on **Self Organizing Maps** where we will perform clustering on Fashion MNIST using a neural network.
# Concepts covered
- Self Organizing Maps(For unsupervised deep learning)
- Bayesian Optimization
- Analysis of Self Organized Maps
- Some image processing

# Self Organizing Maps
A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing maps differ from other artificial neural networks as they apply competitive learning as opposed to error-correction learning (such as backpropagation with gradient descent), and in the sense that they use a neighborhood function to preserve the topological properties of the input space.
<img src="http://www.pitt.edu/~is2470pb/Spring05/FinalProjects/Group1a/tutorial/kohonen1.gif">
This makes SOMs useful for visualization by creating low-dimensional views of high-dimensional data, akin to multidimensional scaling. The artificial neural network introduced by the Finnish professor Teuvo Kohonen in the 1980s is sometimes called a Kohonen map or network. The Kohonen net is a computationally convenient abstraction building on biological models of neural systems from the 1970s and morphogenesis models dating back to Alan Turing in the 1950s.
<img src="https://www.nnwj.de/uploads/pics/1_2-kohonon-feature-map.gif">
While it is typical to consider this type of network structure as related to feedforward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation. It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character.

Source: [Wikipedia](http://https://en.wikipedia.org/wiki/Self-organizing_map)

# Things we can do with Self Organizing Maps
* Visualizing high dimensional data into a low dimensional view which is usually 2D - In this case we have 784 columns because the Fashion MNIST dataset has images of dimensions 28x28 
* Clustering - According to [Wikipedia](https://en.wikipedia.org/wiki/Self-organizing_map): It has been shown that while self-organizing maps with a small number of nodes behave in a way that is similar to K-means, larger self-organizing maps rearrange data in a way that is fundamentally topological in character.
* Anomaly detection - We identify entities whose topological distance to its topological neighbors is significantly higher than all its topological neighbors amongst themselves as anomalies
* Non-linear Dmensionality Reduction - For visualization, we convert high-dimensional data into low-dimensional data

# About the dataset: Fashion MNIST
## Context
Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

The original MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others." Zalando seeks to replace the original MNIST dataset.

## Content
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. The training and test data sets have 785 columns. The first column consists of the class labels (see above), and represents the article of clothing. The rest of the columns contain the pixel-values of the associated image. To locate a pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27. The pixel is located on row i and column j of a 28 x 28 matrix. For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below. 

## Labels
Labels
Each training and test example is assigned to one of the following labels:
* 0 T-shirt/top
* 1 Trouser
* 2 Pullover
* 3 Dress
* 4 Coat
* 5 Sandal
* 6 Shirt
* 7 Sneaker
* 8 Bag
* 9 Ankle boot 


In [None]:
# Loading libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from minisom import MiniSom
import concurrent.futures
import time
from pylab import bone, pcolor, colorbar, plot, show, rcParams, savefig
import os
print(os.listdir("../input"))

In [None]:
# Loading training and test set
train = pd.read_csv('../input/fashion-mnist_train.csv')
test = pd.read_csv('../input/fashion-mnist_test.csv')
train.head()

In [None]:
# Combining training and test set to get over 70k samples
new_train = train.drop(columns=['label'])
new_test = test.drop(columns=['label'])
som_data = pd.concat([new_train, new_test], ignore_index=True).values
labels = pd.concat([train['label'], test['label']], ignore_index=True).values

# Some sample images

In [None]:
f, ax = plt.subplots(1,5)
f.set_size_inches(80, 40)
for i in range(5):
    ax[i].imshow(som_data[i].reshape(28, 28))
plt.show()

In [None]:
#Initializing the map
start_time = time.time()
# The map will have x*y = 50*50 = 2500 features  
som = MiniSom(x=50,y=50,input_len=som_data.shape[1],sigma=0.5,learning_rate=0.4)
# There are two ways to train this data
# train_batch: Data is trained in batches
# train_random: Random samples of data are trained. Following line of code provides random weights as we are going to use train_random for training
som.random_weights_init(som_data)

In [None]:
# Training data for 1000 iterations
som.train_random(data=som_data,num_iteration=1000)

Now, we will plot the map. First, we will manually define labels with their markers.
* 0 -> Light blue circle
* 1 -> Caramel square
* 2 -> Blue pentagon
* 3 -> Orange star
* 4 -> Tomato red triangle
* 5 -> Bright cyan tri_down
* 6 -> Electric indigo hexagon
* 7 -> Light orange x
* 8 -> Raspberry plus
* 9 -> Purple diamond

In [None]:
# Finally plotting the map
with concurrent.futures.ProcessPoolExecutor() as executor:
    rcParams['figure.figsize'] = 25, 20
    bone()
    pcolor(som.distance_map().T)
    colorbar()
    markers = ['o','s','p','*','^','1','h','x','+','d']
    colors = ['#57B8FF','#B66D0D','#009FB7','#FBB13C','#FE6847','#4FB5A5','#670BE8','#F29F60','#8E1C4A','#85809B']
    for i,x in enumerate(som_data):
        w = som.winner(x)
        plot(w[0]+0.5,w[1]+0.5,markers[labels[i]],markeredgecolor=colors[labels[i]],markerfacecolor='None',markersize=10,markeredgewidth=2)
    savefig("map.png")
    show()
end_time = time.time() - start_time

In [None]:
print(int(end_time),"seconds taken to complete the task.")

# How to interpret and evaluate a Self Organizing Map
* **Again, a Self Organizing Map creates a view that represents high dimensional data as low dimensional data preserving topological properties of the input space using a neighborhood function** 
* The heatmap in the background on which the clusters reside represent the topological properties of the input space. The colorbar() on the right represent the topological distance. The distance goes from **0(black) to 1(white) where lesser the distance, more is the correlation/similarity of the feature with its immediate neighboring features**.
* If the feature is white i.e., **topological distance close to 1, then they can be classified as anomalies**.
* The markers(colored shapes) represent different labels and are clustered on the topological space on the basis of their topological properties.
* Our goal is to have distinct clusters but that doesn't mean all the points of the cluster have to be close to each other because this is non-linear dimensionality reduction and not K-means clustering where points are located close to the centroids
* For better evaluation, we have to take care that any**given feature should be occupied by only one label/marker. We should optimize the map for the same.** 
* Overlap of mutiple labels on a feature means its uniqueness is compromised and there is a scope of improvement.

# Analyzing the results
**Minisom objects provide us with enough data to perform good analysis of our results and gain more insights**

In [None]:
start_time = time.time()
# Returns a matrix where the element i,j is the number of time that the neuron i,j have been winner.
act_res = som.activation_response(som_data)
# Returns a dictionary wm where wm[(i,j)] is a list with all the patterns that have been mapped in the position i,j.
winner_map = som.win_map(som_data)
# Returns a dictionary wm where wm[(i,j)] is a dictionary that contains the number of samples from a given label that have been mapped in position i,j.
labelmap = som.labels_map(som_data,labels)
end_time = time.time() - start_time
print(int(end_time),"seconds taken to extract data from results.")

## Heatmap for performance of neurons
We will use *act_res* to generate a heatmap which indicates the neurons which perform better than others.
**Colour given to a neuron represents the number of times it has been winner. Lighter colour shade is directly proportional to this frequency of winning.**

In [None]:
sns.heatmap(act_res)

## Distribution of outlier neurons

In [None]:
# Extracting outliers
q75, q25 = np.percentile(act_res.flatten(), [75 ,25])
iqr = q75 - q25
lower_fence = q25 - (1.5*iqr)
upper_fence = q75 + (1.5*iqr)
condition = (act_res < lower_fence) | (act_res > upper_fence)
outlier_neurons = np.extract(condition,act_res)

In [None]:
# Plotting the distribution of neurons and outliers
f, (ax1, ax2) = plt.subplots(1, 2, sharex='col', sharey='row', figsize=(15,5))
ax1.set(xlabel='Distribution of all neurons')
ax2.set(xlabel='Distribution of outliers')
sns.distplot(act_res.flatten(),ax=ax1)
sns.distplot(outlier_neurons,ax=ax2)
plt.close(2)
plt.close(3)

## Visualizing patterns 
*winner_map* contains dominant features/patterns generated by neurons. Some of these features if visualized look like amalgamation of different kinds of clothes.
Like a generated feature may generate images that look like the combination of shirts, coats and pullovers. This may help in clustering

In [None]:
count=0
f, ax = plt.subplots(2,5)
f.set_size_inches(80, 40)
for i in range(2):
    for j in range(5):
        ax[i][j].imshow(winner_map[list(winner_map)[1]][count].reshape(28, 28))
        count = count + 1
plt.show()

**Check the comments for intrpretation**

Future versions will consist of Bayesian Optimizaation(if it's possible) and more analysis of the map produced.

**Kindly upvote, comment, share and fork this kernel if you like my work. I am open to suggestions.**