# **Contents**
- [1. Introduction](#1.-Introduction)
    - [1.1. What is clustering ?](#1.-1.-What-is-clustering-?)
    - [1.2. What is time series ?](#1.-2.-What-are-time-series-?)
- [2. Analysis](#2.-Analysis)
    - [2.1. Let's check the data](#2.-1.-Let's-check-the-data)
    - [2.2. Preprocessing](#2.-2.-Preprocessing)
    - [2.3. Clustering](#2.-3.-Clustering)
        - [2.3.1. SOM](#2.-3.-1.-SOM)
           - [2.3.1.1. Results](#2.-3.-1.-1.-Results)
           - [2.3.1.2. Cluster Distribution](#2.-3.-1.-2.-Cluster-Distribution)
           - [2.3.1.3. Cluster Mapping](#2.-3.-1.-3.-Cluster-Mapping)
        - [2.3.2. K-Means](#2.-3.-2.-K-Means)
            - [2.3.2.1. Results](#2.-3.-2.-1.-Results)
            - [2.3.2.2. Cluster Distribution](#2.-3.-2.-2.-Cluster-Distribution)
            - [2.3.2.3. Cluster Mapping](#2.-3.-2.-3.-Cluster-Mapping)
            - [2.3.2.4. Curse of Dimensionality](#2.-3.-2.-4.-Curse-of-Dimensionality)
- [3. Libraries](#3.-Libraries)
- [4. References](#4.-References)
- [5. See Also](#5.-See-Also)

# 1. Introduction
## 1. 1. What is clustering ?

   Clustering is a type of unsupervised learning problem and the main idea is finding similarities between different data points and pair them under the same group in a way that those data points in the same group (cluster) are more like each other than to those in other groups. It is one of the main tasks of exploratory data mining and used in many fields such as bioinformatics, pattern recognition, image analysis, machine learning, etc.
    
![Clustering algorithms benchmark](https://scikit-learn.org/stable/_images/sphx_glr_plot_cluster_comparison_0011.png) 
<center>Source : scikit-learn Documentation: <a href="https://scikit-learn.org/stable/_images/sphx_glr_plot_cluster_comparison_0011.png">sphx_glr_plot_cluster_comparison_0011.png</a></center>
    
## 1. 2. What are time series ?
    
   Time series are a stream of data that are created by making measures of something such as sales, temperature, stocks, etc. in fixed frequency. They have to be indexed in time order and usually used in weather forecasting, econometrics, earthquake prediction, signal processing, etc.
    
![Time series example](https://upload.wikimedia.org/wikipedia/commons/7/77/Random-data-plus-trend-r2.png)
<center>Source : Wiki Commons: <a href="https://upload.wikimedia.org/wikipedia/commons/7/77/Random-data-plus-trend-r2.png">Random-data-plus-trend-r2.png</a></center>

# 2. Analysis

In this notebook, we will be using [Retail and Retailers Sales Time Series Collection](https://www.kaggle.com/census/retail-and-retailers-sales-time-series-collection) that is provided by [US Census Bureau](https://www.kaggle.com/census).

## 2. 1. Let's check the data

First of all, let's read it from the input and put them in a list.

In [1]:
!pip install minisom
!pip install tslearn

Collecting minisom
  Downloading MiniSom-2.3.0.tar.gz (8.8 kB)
Building wheels for collected packages: minisom
  Building wheel for minisom (setup.py): started
  Building wheel for minisom (setup.py): finished with status 'done'
  Created wheel for minisom: filename=MiniSom-2.3.0-py3-none-any.whl size=9021 sha256=f4e2001a14e395eae5bdbb699295ed0cff758e323810380f183982ba2d4fcbc2
  Stored in directory: c:\users\nicolas\appdata\local\pip\cache\wheels\7e\47\6d\97ad48be13d8b0fc231b7df226a3d6645820c32559822a826c
Successfully built minisom
Installing collected packages: minisom
Successfully installed minisom-2.3.0


In [2]:
# Native libraries
import os
import math
# Essential Libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Preprocessing
from sklearn.preprocessing import MinMaxScaler
# Algorithms
from minisom import MiniSom
from tslearn.barycenters import dtw_barycenter_averaging
from tslearn.clustering import TimeSeriesKMeans
from sklearn.cluster import KMeans

from sklearn.decomposition import PCA

In [3]:
directory = '/kaggle/input/retail-and-retailers-sales-time-series-collection/'

mySeries = []
namesofMySeries = []
for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        df = pd.read_csv(directory+filename)
        df = df.loc[:,["date","value"]]
        # While we are at it I just filtered the columns that we will be working on
        df.set_index("date",inplace=True)
        # ,set the date columns as index
        df.sort_index(inplace=True)
        # and lastly, ordered the data according to our date index
        mySeries.append(df)
        namesofMySeries.append(filename[:-4])

FileNotFoundError: [WinError 3] Le chemin d’accès spécifié est introuvable: '/kaggle/input/retail-and-retailers-sales-time-series-collection/'

Let's check how many series we have.

In [None]:
print(len(mySeries))

So, for 23 series let's create a 6 by 4 grid which will be resulted in 24 slots and fill it with the plot of our series.

In [None]:
fig, axs = plt.subplots(6,4,figsize=(25,25))
fig.suptitle('Series')
for i in range(6):
    for j in range(4):
        if i*4+j+1>len(mySeries): # pass the others that we can't fill
            continue
        axs[i, j].plot(mySeries[i*4+j].values)
        axs[i, j].set_title(namesofMySeries[i*4+j])
plt.show()

It seems like there are pretty much similar time series such as ```MRTSSM44000USS``` and ```RETAILMSA``` or ```MRTSSM7221USN``` and ```MRTSSM44611USN```.

In [None]:
fig, axs = plt.subplots(6,4,figsize=(25,25))
fig.suptitle('Series')
for i in range(6):
    for j in range(4):
        if i*4+j+1>len(mySeries): # pass the others that we can't fill
            continue
        axs[i, j].plot(mySeries[i*4+j].values)
        axs[i, j].set_title(namesofMySeries[i*4+j])
plt.show()

## 2. 2. Preprocessing

Before we start analyzing let's check if our data is uniform in length.

In [None]:
series_lengths = {len(series) for series in mySeries}
print(series_lengths)

As we guessed, it is not uniform in length. So in this case, we should find which series contain missing data and fill them. Because, otherwise our indices will be shifted and i.th index -let's say it is 10th of May- of the x series won't be same as i.th index of the y series -let's say i.th index of the y series may be 11th of May-.

In [None]:
ind = 0
for series in mySeries:
    print("["+str(ind)+"] "+series.index[0]+" "+series.index[len(series)-1])
    ind+=1

As you can see 6th, 11th and 12th series are not starting from the same date as others. To solve this problem, we should first find the longest series of the series and elongate others according to that. Usually, to do this we should check the oldest and newest date and elongate all series according to these dates. But in our case, nearly every series starts from 1992-01-01 and ends in 2019-09-01. Thus, finding the longest series will be enough for us.

In [None]:
max_len = max(series_lengths)
longest_series = None
for series in mySeries:
    if len(series) == max_len:
        longest_series = series

In this code block, I reindexed the series that are not as long as the longest one and fill the empty dates with ```np.nan```.

In [None]:
problems_index = []

for i in range(len(mySeries)):
    if len(mySeries[i])!= max_len:
        problems_index.append(i)
        mySeries[i] = mySeries[i].reindex(longest_series.index)

We can check how many series are polluted with nan values with this function.

In [None]:
def nan_counter(list_of_series):
    nan_polluted_series_counter = 0
    for series in list_of_series:
        if series.isnull().sum().sum() > 0:
            nan_polluted_series_counter+=1
    print(nan_polluted_series_counter)

We have 3 series that are polluted with nan and we used to have 3 series that are shorter than others, so math checks out.

In [None]:
nan_counter(mySeries)

Because these series lack only one point, I used linear interpolation to fill the gap but for series that have more missing value, you can use much more complex interpolation methods such as quadratic, cubic, spline, barycentric, etc.

In [None]:
for i in problems_index:
    mySeries[i].interpolate(limit_direction="both",inplace=True)

As we can see, now all of our series are the same length and don't contain any missing value.

In [None]:
nan_counter(mySeries)

After handling missing values, the other issue is the scale of the series. Without, normalizing data the series that looks like each other will be seen so different from each other and will affect the accuracy of the clustering process. We can see the effect of the normalizing in the following images.



In [None]:
a = [[2],[7],[11],[14],[19],[23],[26]]
b = [[20000000],[40000000],[60000000],[80000000],[100000000],[120000000],[140000000]]
fig, axs = plt.subplots(1,3,figsize=(25,5))
axs[0].plot(a)
axs[0].set_title("Series 1")
axs[1].plot(b)
axs[1].set_title("Series 2")
axs[2].plot(a)
axs[2].plot(b)
axs[2].set_title("Series 1 & 2")
plt.figure(figsize=(25,5))
plt.plot(MinMaxScaler().fit_transform(a))
plt.plot(MinMaxScaler().fit_transform(b))
plt.title("Normalized Series 1 & Series 2")
plt.show()

Note that we normalized each time series by their own values, not the values of other time series.

In [None]:
for i in range(len(mySeries)):
    scaler = MinMaxScaler()
    mySeries[i] = MinMaxScaler().fit_transform(mySeries[i])
    mySeries[i]= mySeries[i].reshape(len(mySeries[i]))

The result of the normalizing process seems fine.

In [None]:
print("max: "+str(max(mySeries[0]))+"\tmin: "+str(min(mySeries[0])))
print(mySeries[0][:5])

## 2. 3. Clustering

I will be using 2 different methods for clustering these series. The first of the methods is Self Organizing Maps(SOM) and the other method is K-Means.

### 2. 3. 1. SOM
 
Self-organizing maps are a type of neural network that is trained using unsupervised learning to produce a low-dimensional representation of the input space of the training samples, called a map.

![SOM](https://raw.githubusercontent.com/izzettunc/Kohonen-SOM/master/data/screenshots/landing.png)
<center>Source : Github Repo: <a href="https://raw.githubusercontent.com/izzettunc/Kohonen-SOM/master/data/screenshots/landing.png">landing.png</a></center>
<br>    
Also, self-organizing maps  differ from other artificial neural networks as they apply competitive(or cooperative) learning as opposed to error-correction learning (such as backpropagation with gradient descent), and in the sense that they use a neighborhood function to preserve the topological properties of the input space.

![Learning process of som](https://upload.wikimedia.org/wikipedia/commons/3/35/TrainSOM.gif)
<center>Source : Wiki Commons: <a href="https://upload.wikimedia.org/wikipedia/commons/3/35/TrainSOM.gif">TrainSOM.gif</a></center>
<br>
Because of the ability to produce a map, som deemed as a method to do dimensionality reduction. But in our case, when each node of the som is accepted as medoids of the cluster, we can use it for clustering. To do so, we should remove our time indices from our time series, and instead of measured values of each date, we should accept them as different features and dimensions of a single data point.

For more info about some, you can check [this medium post](https://medium.com/@abhinavr8/self-organizing-maps-ff5853a118d4).

In [None]:
a = [1,2]
b = [3,7]
c = [1,3]
d = [3,8]
img = plt.imread("/kaggle/input/notebook-material/arrow.png")
fig, axs = plt.subplots(1,3,figsize=(25,5))
axs[0].plot(a)
axs[0].plot(b)
axs[0].plot(c)
axs[0].plot(d)
axs[0].set_title("Time Series")
axs[1].imshow(img)
axs[1].axis("off")
axs[2].set_title("Data Points")
axs[2].scatter(a[0],a[1], s=300)
axs[2].scatter(b[0],b[1], s=300)
axs[2].scatter(c[0],c[1], s=300)
axs[2].scatter(d[0],d[1], s=300)
plt.show()

For the implementation of the som algorithm I used [miniSom](https://github.com/JustGlowing/minisom) and set my parameters as follows:
- sigma: 0.3
- learning_rate: 0.5
- random weight initialization
- 50.000 iteration
- Map size: square root of the number of series

As a side note, I didn't optimize these parameters due to the simplicity of the dataset.

In [None]:
som_x = som_y = math.ceil(math.sqrt(math.sqrt(len(mySeries))))
# I didn't see its significance but to make the map square,
# I calculated square root of map size which is 
# the square root of the number of series
# for the row and column counts of som

som = MiniSom(som_x, som_y,len(mySeries[0]), sigma=0.3, learning_rate = 0.1)

som.random_weights_init(mySeries)
som.train(mySeries, 50000)


#### 2. 3. 1. 1. Results

After the training, I plotted the results. For each cluster, I plotted every series, a little bit transparent and in gray, and in order to see the movement or the shape of the cluster, I took the average of the cluster and plotted that averaged series in red .

In [None]:
# Little handy function to plot series
def plot_som_series_averaged_center(som_x, som_y, win_map):
    fig, axs = plt.subplots(som_x,som_y,figsize=(25,25))
    fig.suptitle('Clusters')
    for x in range(som_x):
        for y in range(som_y):
            cluster = (x,y)
            if cluster in win_map.keys():
                for series in win_map[cluster]:
                    axs[cluster].plot(series,c="gray",alpha=0.5) 
                axs[cluster].plot(np.average(np.vstack(win_map[cluster]),axis=0),c="red")
            cluster_number = x*som_y+y+1
            axs[cluster].set_title(f"Cluster {cluster_number}")

    plt.show()

In [None]:
win_map = som.win_map(mySeries)
# Returns the mapping of the winner nodes and inputs

plot_som_series_averaged_center(som_x, som_y, win_map)

As you can see from the plot below, som perfectly clustered the 23 different series into 8 clusters.

In [None]:
win_map = som.win_map(mySeries)
# Returns the mapping of the winner nodes and inputs

plot_som_series_averaged_center(som_x, som_y, win_map)

Another method to extract the movement/shape of the cluster is instead of averaging each series in the cluster, using Dynamic Time Warping Barycenter Averaging ([DBA](https://github.com/fpetitjean/DBA)).

DBA is another type of averaging method that used the Dynamic Time Warping method in it and might be very useful to extract the movement/shape of the cluster as seen in the following images.

![Arithmetic Averaging](https://raw.githubusercontent.com/fpetitjean/DBA/master/images/arithmetic.png)
![DBA](https://raw.githubusercontent.com/fpetitjean/DBA/master/images/DBA.png)


To do so, I used ```dtw_barycenter_averaging``` method in the [tslearn](https://github.com/tslearn-team/tslearn) library and changed the ```np.average``` with it.

In [None]:
def plot_som_series_dba_center(som_x, som_y, win_map):
    fig, axs = plt.subplots(som_x,som_y,figsize=(25,25))
    fig.suptitle('Clusters')
    for x in range(som_x):
        for y in range(som_y):
            cluster = (x,y)
            if cluster in win_map.keys():
                for series in win_map[cluster]:
                    axs[cluster].plot(series,c="gray",alpha=0.5) 
                axs[cluster].plot(dtw_barycenter_averaging(np.vstack(win_map[cluster])),c="red") # I changed this part
            cluster_number = x*som_y+y+1
            axs[cluster].set_title(f"Cluster {cluster_number}")

    plt.show()

In [None]:
win_map = som.win_map(mySeries)

plot_som_series_dba_center(som_x, som_y, win_map)

We can't see much difference from this result but, I highly recommend that to use this method for this purpose. But, also, note that the operation of dba is not a light one. So, if you seek speed, this method might not be for you.

In [None]:
win_map = som.win_map(mySeries)

plot_som_series_dba_center(som_x, som_y, win_map)

#### 2. 3. 1. 2. Cluster Distribution
We can see the distribution of the time series in clusters in the following chart.

In [None]:
cluster_c = []
cluster_n = []
for x in range(som_x):
    for y in range(som_y):
        cluster = (x,y)
        if cluster in win_map.keys():
            cluster_c.append(len(win_map[cluster]))
        else:
            cluster_c.append(0)
        cluster_number = x*som_y+y+1
        cluster_n.append(f"Cluster {cluster_number}")

plt.figure(figsize=(25,5))
plt.title("Cluster Distribution for SOM")
plt.bar(cluster_n,cluster_c)
plt.show()

#### 2. 3. 1. 3. Cluster Mapping

<p style="color:gray">(Thank you for this wonderful question <a href="https://www.kaggle.com/stephentseng">Stephen Tseng</a>)</p>
Well, we did cluster our series but how de we know which series belonging to which cluster? Ain't that the whole purpose of clustering? <br><br>

As we can see in [these illustrations](#2.-3.-1.-SOM) each node (or multiple of nodes in some cases) represents a cluster. Therefore we can find out which series is belonging to which cluster by checking the winner node of each series. 

In [None]:
# Let's check first 5
for series in mySeries[:5]:
    print(som.winner(series))

In order to make this piece of information more appealing to eye, we can map each node to a number <br>

```e.g. for n*m grid (0,0)=1, (0,1)=2, ... (0,m)=m+1, (1,0)=(m+1)+1, (1,1)=(m+1)+2, ... , (n,m)=(n+1)*(m+1) ``` 

and print the name of the series with the cluster number.

In [None]:
cluster_map = []
for idx in range(len(mySeries)):
    winner_node = som.winner(mySeries[idx])
    cluster_map.append((namesofMySeries[idx],f"Cluster {winner_node[0]*som_y+winner_node[1]+1}"))

pd.DataFrame(cluster_map,columns=["Series","Cluster"]).sort_values(by="Cluster").set_index("Series")

### 2. 3. 2. K-Means

K-means clustering is a method that aims to cluster n input to k clusters in which each data point belongs to cluster with the nearest mean (cluster centroid). It can be visualized as Voronoi cells and it is one of the most popular clustering algorithms and the most basic one. For more info about k-means, you can check [this medium post](https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-methods-and-drawbacks-aa03e644b48a).

![Training process](https://i.imgur.com/k4XcapI.gif)

In order to cluster our series with k-means, the essential thing to do is, as we do it with som, removing our time indices from our time series, and instead of measured values of each date, we should accept them as different features and dimensions of a single data point. Another important thing to do is, selecting the distance metric. In the k-means algorithm, people usually use the euclidean distance but as we've seen in [DBA](https://github.com/fpetitjean/DBA), it is not effective in our case. So, we will be using Dynamic Time Warping (DTW) instead of euclidean distance and you can see why we are doing this in the following images.

![Difference of dtw and euclidean distance](https://upload.wikimedia.org/wikipedia/commons/6/69/Euclidean_vs_DTW.jpg)

In [None]:
cluster_count = math.ceil(math.sqrt(len(mySeries))) 
# A good rule of thumb is choosing k as the square root of the number of points in the training data set in kNN

km = TimeSeriesKMeans(n_clusters=cluster_count, metric="dtw")

labels = km.fit_predict(mySeries)

#### 2. 3. 2. 1. Results

After the training, I plotted the results as I did with the som. For each cluster, I plotted every series, a little bit transparent and in gray, and in order to see the movement or the shape of the cluster, I took the average of the cluster and plotted that averaged series in red.

In [None]:
plot_count = math.ceil(math.sqrt(cluster_count))

fig, axs = plt.subplots(plot_count,plot_count,figsize=(25,25))
fig.suptitle('Clusters')
row_i=0
column_j=0
# For each label there is,
# plots every series with that label
for label in set(labels):
    cluster = []
    for i in range(len(labels)):
            if(labels[i]==label):
                axs[row_i, column_j].plot(mySeries[i],c="gray",alpha=0.4)
                cluster.append(mySeries[i])
    if len(cluster) > 0:
        axs[row_i, column_j].plot(np.average(np.vstack(cluster),axis=0),c="red")
    axs[row_i, column_j].set_title("Cluster "+str(row_i*som_y+column_j))
    column_j+=1
    if column_j%plot_count == 0:
        row_i+=1
        column_j=0
        
plt.show()

As you can see from the plot below, k-means clustered the 23 different series into 5 clusters. 2 of the clusters contains only 1 time series which may be deemed as an outlier.

In [None]:
plot_count = math.ceil(math.sqrt(cluster_count))

fig, axs = plt.subplots(plot_count,plot_count,figsize=(25,25))
fig.suptitle('Clusters')
row_i=0
column_j=0
# For each label there is,
# plots every series with that label
for label in set(labels):
    cluster = []
    for i in range(len(labels)):
            if(labels[i]==label):
                axs[row_i, column_j].plot(mySeries[i],c="gray",alpha=0.4)
                cluster.append(mySeries[i])
    if len(cluster) > 0:
        axs[row_i, column_j].plot(np.average(np.vstack(cluster),axis=0),c="red")
    axs[row_i, column_j].set_title("Cluster "+str(row_i*som_y+column_j))
    column_j+=1
    if column_j%plot_count == 0:
        row_i+=1
        column_j=0
        
plt.show()

As I did before, I used [DBA](https://github.com/fpetitjean/DBA) to see much more time dilated series.

In [None]:
plot_count = math.ceil(math.sqrt(cluster_count))

fig, axs = plt.subplots(plot_count,plot_count,figsize=(25,25))
fig.suptitle('Clusters')
row_i=0
column_j=0
for label in set(labels):
    cluster = []
    for i in range(len(labels)):
            if(labels[i]==label):
                axs[row_i, column_j].plot(mySeries[i],c="gray",alpha=0.4)
                cluster.append(mySeries[i])
    if len(cluster) > 0:
        axs[row_i, column_j].plot(dtw_barycenter_averaging(np.vstack(cluster)),c="red")
    axs[row_i, column_j].set_title("Cluster "+str(row_i*som_y+column_j))
    column_j+=1
    if column_j%plot_count == 0:
        row_i+=1
        column_j=0
        
plt.show()

#### 2. 3. 2. 2. Cluster Distribution

We can see the distribution of the time series in clusters in the following chart. And it seems like k-means clustered 15 of the time series as cluster 1, which is a bit skewed. The reason why this happens is the most probably ```The Curse of Dimentionality``` <p><small><small>You can check it out from the links that I provided at section 5 (See Also)</small></small></p>


In [None]:
cluster_c = [len(labels[labels==i]) for i in range(cluster_count)]
cluster_n = ["Cluster "+str(i) for i in range(cluster_count)]
plt.figure(figsize=(15,5))
plt.title("Cluster Distribution for KMeans")
plt.bar(cluster_n,cluster_c)
plt.show()

#### 2. 3. 2. 3. Cluster Mapping

As we did before, in this part we will be finding which series belonging to which cluster. Thanks to awesome scikit-learn library we actually already have that information. Order of the labels is the same order with our series.


In [None]:
labels

In [None]:
fancy_names_for_labels = [f"Cluster {label}" for label in labels]
pd.DataFrame(zip(namesofMySeries,fancy_names_for_labels),columns=["Series","Cluster"]).sort_values(by="Cluster").set_index("Series")

#### 2. 3. 2. 4. Curse of Dimensionality

Curse of Dimensionality is a term, first invented by Richard E. Bellman when considering problems in dynamic programming. It basically means, when the dimensionality of the data increase so does the distance between data points. Thus, this change in measurement of distance affects the distance-based algorithms badly. To learn for more about it please check section [5. See Also](#5.-See-Also).

To solve this problem there are numerous algorithms that can be helpful such as PCA which is the most prominent of them, t-SNE, UMAP(map of the som), etc.

In [None]:
pca = PCA(n_components=2)

mySeries_transformed = pca.fit_transform(mySeries)

Now with less dimension than before, we can see how our series distributed in 2 dimensions.

In [None]:
plt.figure(figsize=(25,10))
plt.scatter(mySeries_transformed[:,0],mySeries_transformed[:,1], s=300)
plt.show()

The result of PCA is basically, representation of a 333-dimensional data point as a 2-dimensional data point. As a result of that instead of a time series, we have just 2 value for each series.

In [None]:
print(mySeries_transformed[0:5])

Thus, we don't have to use ```dtw``` anymore and instead of ```TimeSeriesKMeans``` from tslearn, we can use basic ```KMeans``` from ```sklearn```.

In [None]:
kmeans = KMeans(n_clusters=cluster_count,max_iter=5000)

labels = kmeans.fit_predict(mySeries_transformed)

And this is the result of the basic KMeans, pretty logical and straight forward.

In [None]:
plt.figure(figsize=(25,10))
plt.scatter(mySeries_transformed[:, 0], mySeries_transformed[:, 1], c=labels, s=300)
plt.show()

And again thanks to the clever implementation of ```KMeans``` algorithm by ```sklearn``` team, labels are returned in the same order. Thus, we can use the same code to visualize our cluster in series.

In [None]:
plot_count = math.ceil(math.sqrt(cluster_count))

fig, axs = plt.subplots(plot_count,plot_count,figsize=(25,25))
fig.suptitle('Clusters')
row_i=0
column_j=0
for label in set(labels):
    cluster = []
    for i in range(len(labels)):
            if(labels[i]==label):
                axs[row_i, column_j].plot(mySeries[i],c="gray",alpha=0.4)
                cluster.append(mySeries[i])
    if len(cluster) > 0:
        axs[row_i, column_j].plot(np.average(np.vstack(cluster),axis=0),c="red")
    axs[row_i, column_j].set_title("Cluster "+str(row_i*som_y+column_j))
    column_j+=1
    if column_j%plot_count == 0:
        row_i+=1
        column_j=0
        
plt.show()

And we can see that now with the ```PCA``` algorithm, our series are much more equally distributed to clusters than before.

In [None]:
cluster_c = [len(labels[labels==i]) for i in range(cluster_count)]
cluster_n = ["cluster_"+str(i) for i in range(cluster_count)]
plt.figure(figsize=(15,5))
plt.title("Cluster Distribution for KMeans")
plt.bar(cluster_n,cluster_c)
plt.show()

In [None]:
fancy_names_for_labels = [f"Cluster {label}" for label in labels]
pd.DataFrame(zip(namesofMySeries,fancy_names_for_labels),columns=["Series","Cluster"]).sort_values(by="Cluster").set_index("Series")

# 3. Libraries

In here, you can easily reach the libraries that I used in this notebook.
- [Pandas](https://github.com/pandas-dev/pandas)
- [NumPy](https://github.com/numpy/numpy)
- [scikit-learn](https://github.com/scikit-learn/scikit-learn)
- [MiniSom](https://github.com/JustGlowing/minisom)
- [tslearn](https://github.com/tslearn-team/tslearn)
- [matplotlib](https://matplotlib.org/)

# 4. References

- Petitjean F., Ketterlin A., Gançarski P., A global averaging method for dynamic time warping, with applications to clustering, Pattern Recognition, 44(3), 678-693, 2011
- Kohonen T., Self-organized formation of topologically correct feature maps, Biological Cybernetics, 43, 59–69, 1982
- Bellman R., Kalaba R., On adaptive control processes, in IRE Transactions on Automatic Control, 4(2), 1-9, 1959

# 5. See Also

* [K-means Clustering: Algorithm, Applications, Evaluation Methods, and Drawbacks](https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-methods-and-drawbacks-aa03e644b48a)
* [Self Organizing Maps](https://medium.com/@abhinavr8/self-organizing-maps-ff5853a118d4)
* <p style="color:red"><a href="https://towardsdatascience.com/the-curse-of-dimensionality-50dc6e49aa1e">The Curse of Dimensionality</a> <b>***</b></p>
* <p style="color:red"><a href="https://towardsdatascience.com/k-nearest-neighbors-and-the-curse-of-dimensionality-e39d10a6105d">k-Nearest Neighbors and the Curse of Dimensionality</a> <b>***</b></p>
* <p style="color:red"><a href="https://medium.com/@aptrishu/understanding-principle-component-analysis-e32be0253ef0">Understanding Principal Component Analysisy</a> <b>***</b></p>
* <p style="color:red"><a href="https://www.youtube.com/watch?v=FgakZw6K1QQ&ab_channel=StatQuestwithJoshStarmer">StatQuest: Principal Component Analysis (PCA), Step-by-Step</a> <b>*** (An awesome video by StatQuest)</b></p>

Hey, this is my first notebook and tutorial in Kaggle, so feel free to criticize and comment about, any error that you see, any idea of improvement for this notebook, or questions that you have.

Have a nice day!