
# Unsupervised Learning Lab: Evaluation Metrics

## Introduction
In this lab, you will evaluate the performance of the customer segmentation models that you created yesterday. You will use various evaluation metrics for unsupervised learning to assess the quality of the clusters.

### Evaluation Metrics
The following metrics will be used to evaluate the clustering models:
1. Silhouette Score
2. Davies-Bouldin Index
3. Calinski-Harabasz Index

## Exercise 1: Calculate Silhouette Score
1. Calculate the Silhouette Score for the K-Means clustering model.
2. Calculate the Silhouette Score for the Hierarchical clustering model.
3. Calculate the Silhouette Score for the DBSCAN clustering model.

## Exercise 2: Calculate Davies-Bouldin Index
1. Calculate the Davies-Bouldin Index for the K-Means clustering model.
2. Calculate the Davies-Bouldin Index for the Hierarchical clustering model.
3. Calculate the Davies-Bouldin Index for the DBSCAN clustering model.

## Exercise 3: Calculate Calinski-Harabasz Index
1. Calculate the Calinski-Harabasz Index for the K-Means clustering model.
2. Calculate the Calinski-Harabasz Index for the Hierarchical clustering model.
3. Calculate the Calinski-Harabasz Index for the DBSCAN clustering model.

### Instructions
1. Load the dataset and the clustering results from the previous lab.
2. Use the `sklearn.metrics` module to calculate the evaluation metrics.
3. Interpret the results and compare the performance of the different clustering models.


In [15]:
import pandas as pd
import numpy as np

from sklearn import cluster
from sklearn.preprocessing import StandardScaler

from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import DBSCAN

In [16]:
#Evaluation metrics
from sklearn.metrics import silhouette_score
from sklearn.metrics import davies_bouldin_score
from sklearn.metrics import calinski_harabasz_score

In [17]:
df = pd.read_csv("C:\\Users\\ramya\\Downloads\\customer_data.csv")
df

df.describe()

df = df.drop(columns = ['CustomerID'])

df.isnull().sum()

df_scaled = StandardScaler().fit_transform(df)

df = pd.DataFrame(df_scaled, columns = df.columns)

df

Unnamed: 0,Age,AnnualIncome,SpendingScore
0,0.853003,0.395549,0.331292
1,1.729608,-0.016514,-0.542232
2,0.178692,-0.292815,1.528345
3,-0.765343,-1.578610,1.269522
4,1.122728,0.441453,0.104823
...,...,...,...
95,-0.091032,1.616391,-0.801054
96,1.257590,1.636381,0.169528
97,0.987866,-1.209187,-0.509880
98,0.178692,0.562844,-0.801054


# Kmeans

In [64]:
kmeans = cluster.KMeans(n_clusters=3, random_state=0)
df["KMeans_Cluster"] = kmeans.fit_predict(df[['Age', 'AnnualIncome', 'SpendingScore']])



In [65]:
df["KMeans_Cluster"]

0     1
1     1
2     2
3     2
4     1
     ..
95    0
96    1
97    2
98    1
99    2
Name: KMeans_Cluster, Length: 100, dtype: int32

In [66]:
sil_score = silhouette_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df["KMeans_Cluster"])
print(f"Silhouette Score: {sil_score}")

Silhouette Score: 0.27408167479057666


In [67]:
ch_score = calinski_harabasz_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df["KMeans_Cluster"])
print(f"Calinski-Harabasz Index: {ch_score}")

Calinski-Harabasz Index: 39.44726630374726


In [68]:
db_score = davies_bouldin_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df["KMeans_Cluster"])
print(f"Davies-Bouldin Index: {db_score}")

Davies-Bouldin Index: 1.2658255184837606


# Agglomerative Hierarchical Clustering

In [46]:
agglomerative = cluster.AgglomerativeClustering(n_clusters=5,linkage='ward')
df["Hierarchical_Cluster"] = agglomerative.fit_predict(df[['Age', 'AnnualIncome', 'SpendingScore']])

In [47]:
df["Hierarchical_Cluster"]

0     2
1     3
2     0
3     0
4     2
     ..
95    1
96    2
97    3
98    3
99    0
Name: Hierarchical_Cluster, Length: 100, dtype: int64

In [48]:
sil_score = silhouette_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df["Hierarchical_Cluster"])
print(f"Silhouette Score: {sil_score}")

Silhouette Score: 0.2777275442788798


In [49]:
ch_score = calinski_harabasz_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df["Hierarchical_Cluster"])
print(f"Calinski-Harabasz Index: {ch_score}")

Calinski-Harabasz Index: 39.499283699636926


In [50]:
db_score = davies_bouldin_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df["Hierarchical_Cluster"])
print(f"Davies-Bouldin Index: {db_score}")

Davies-Bouldin Index: 0.950163972662341


#  DBSCAN Clustering

In [94]:
dbscan = DBSCAN(eps=0.5, min_samples=4)
df['DBSCAN_Cluster'] = dbscan.fit_predict(df[['Age', 'AnnualIncome', 'SpendingScore']])

In [95]:
df['DBSCAN_Cluster']

0     0
1    -1
2     3
3    -1
4     0
     ..
95   -1
96   -1
97   -1
98   -1
99   -1
Name: DBSCAN_Cluster, Length: 100, dtype: int64

In [96]:
sil_score = silhouette_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df['DBSCAN_Cluster'])
print(f"Silhouette Score: {sil_score}")

Silhouette Score: -0.14914156460934522


In [97]:
ch_score = calinski_harabasz_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df['DBSCAN_Cluster'])
print(f"Calinski-Harabasz Index: {ch_score}")

Calinski-Harabasz Index: 4.243720066306933


In [98]:
db_score = davies_bouldin_score(df[['Age', 'AnnualIncome', 'SpendingScore']], df['DBSCAN_Cluster'])
print(f"Davies-Bouldin Index: {db_score}")

Davies-Bouldin Index: 1.5715532773947418
