# This file shows how to use the cluster, metrics and utils modules
**cluster** contains 
1. KMeans clustering class 
2. Agglomerative clustering class

**metrics** contains
1. Silhoutte_score function 
2. Jaccard_index function

**utils** contains
1. export function

For more details about them, kindly follow README.md file or those modules

**Note :** 
1. I have also showed that output of my modules are same as *scikit-learn*'s library functions.

2. The labels generated by my *AgglomerativeClustering* may not be same as *sklearn*'s, since the implementations are different. But both of them are clustering same points at same cluster. See our *jaccard_index*'s output, which perfectly finds out which label maps to whom. For more evidence see the outputs of my clustering algorithms with *scikit-learn*'s inside output folder. You can see both outputs are exactly same, since it is an unlabeled representation of clustering.

3. My *agglomerativeClustering* algo runs for predefined iterations only.

4. Here I have taken distance as *Euclidean* distance and all calculations are done accordingly.

5. *jaccard_index* function is not same as *sklearn*'s *jaccard_score* or *jaccard_similarity_score* function. *jaccard_index* intelligently finds out the similar clusters (also if they have different labels), whereas *jaccard_index* or *jaccard_similarity_score* can not do that.

## Loading dataset

In [18]:
from sklearn.datasets import load_iris
iris_data=load_iris() 
data = iris_data['data']
data[:5]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

## My functions

### KMeans Clustering

In [19]:
from cluster import KMeans
my_km = KMeans(n_clusters=3,max_iter=100,random_state=0)
my_km.fit_predict(data)

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1])

In [20]:
my_km.cluster_centers_

array([[6.85384615, 3.07692308, 5.71538462, 2.05384615],
       [5.88360656, 2.74098361, 4.38852459, 1.43442623],
       [5.006     , 3.428     , 1.462     , 0.246     ]])

In [21]:
from utils import export
export('my_kmeans_output.txt',my_km.labels_,foldername='output')

Exported successfully into output/my_kmeans_output.txt


In [22]:
from metrics import silhouette_score
silhouette_score(data,my_km.labels_)

0.5511916046195923

### Agglomerative clustering

In [23]:
from cluster import AgglomerativeClustering
my_ag=AgglomerativeClustering(n_clusters=3,linkage="complete")
my_ag.fit_predict(data)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 1,
       2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1,
       2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [24]:
from utils import export
export('my_agglomerative_output.txt',my_ag.labels_,foldername='output')

Exported successfully into output/my_agglomerative_output.txt


In [25]:
from metrics import silhouette_score
silhouette_score(data,my_ag.labels_)

0.5135953221192218

In [26]:
from metrics import jaccard_index
j = jaccard_index(my_km.labels_,my_ag.labels_)
print("KNN_Labels\tAgglomerative_Labels\tJaccard_Scores")
for c in j:
    print(c[1],'\t\t\t',c[2],'\t\t\t',c[0]) 

KNN_Labels	Agglomerative_Labels	Jaccard_Scores
2 			 0 			 1.0
0 			 1 			 0.5416666666666666
1 			 2 			 0.45901639344262296


## Scikit-learn's functions 

### KMeans clustering

In [27]:
from sklearn.cluster import KMeans
km = KMeans(n_clusters=3,n_init=1,max_iter=100,init='random',algorithm="full",random_state=0)
km.fit_predict(data)

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1], dtype=int32)

In [28]:
from utils import export
export('sklearn_kmeans_output.txt',km.labels_,foldername='output')

Exported successfully into output/sklearn_kmeans_output.txt


In [29]:
km.cluster_centers_

array([[6.85384615, 3.07692308, 5.71538462, 2.05384615],
       [5.88360656, 2.74098361, 4.38852459, 1.43442623],
       [5.006     , 3.428     , 1.462     , 0.246     ]])

In [30]:
from sklearn.metrics import silhouette_score
silhouette_score(data,km.labels_, metric='euclidean')

0.5511916046195916

### Agglomerative clustering

In [31]:
from sklearn.cluster import AgglomerativeClustering
ag=AgglomerativeClustering(n_clusters=3,compute_full_tree=False,linkage="complete")
ag.fit_predict(data)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 2, 0, 2, 0, 2, 0, 2, 2, 2, 2, 0, 2, 0,
       2, 2, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 2, 0, 0, 0,
       2, 2, 2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [32]:
from utils import export
export('sklearn_agglomerative_output.txt',ag.labels_,foldername='output')

Exported successfully into output/sklearn_agglomerative_output.txt


In [33]:
from sklearn.metrics import silhouette_score
silhouette_score(data,ag.labels_, metric='euclidean')

0.5135953221192208

In [34]:
from metrics import jaccard_index
j1 = jaccard_index(km.labels_,ag.labels_)
print("KNN_Labels\tAgglomerative_Labels\tJaccard_Scores")
for c in j1:
    print(c[1],'\t\t\t',c[2],'\t\t\t',c[0]) 

KNN_Labels	Agglomerative_Labels	Jaccard_Scores
2 			 1 			 1.0
0 			 0 			 0.5416666666666666
1 			 2 			 0.45901639344262296
