<a href="https://www.kaggle.com/code/iahhel/customer-clustering-using-bkmeans-agglomerative?scriptVersionId=130433424" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

**In this data science project, I applied four different clustering methods to a customer dataset to discover patterns and relationships among the customers. The clustering methods used were K-means, Breathing K-means, Agglomerative Clustering, and Hierarchical Clustering.**

 K-means is a well-known and widely used clustering algorithm that aims to partition the data into K clusters, where each observation belongs to the cluster with the nearest mean. The Breathing K-means method is a faster and more efficient version of K-means that produces higher-quality solutions.



 Agglomerative Clustering is a hierarchical clustering method that merges the two closest clusters iteratively until all observations belong to a single cluster. Hierarchical clustering is another clustering method that produces a tree-like structure of clusters based on a distance metric.



 By applying these four clustering methods, I was able to identify groups of customers with similar characteristics and behaviors, which can be used for targeted marketing, customer segmentation, and other business insights. Overall, this project demonstrates the power of clustering techniques in discovering hidden patterns in customer data.


**Importing packages and the data im going to be using.**

In [None]:
!pip install bkmeans

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import warnings
from bkmeans import BKMeans
from sklearn.cluster import KMeans, AgglomerativeClustering
from scipy.spatial import distance_matrix
from scipy.cluster import hierarchy

os.environ["OMP_NUM_THREADS"] = '1'
warnings.filterwarnings('ignore')
%config InlineBackend.figure_format = 'svg'

In [None]:
df = pd.read_csv('/kaggle/input/123qweasd/Mall_Customers.csv')
print(df.shape)
print(df.isna().sum())
df.head()

In [None]:
df.describe(include='all')

In [None]:
plt.scatter(df['Annual Income (k$)'],df['Spending Score (1-100)'],c='g')
plt.scatter(df['Age'],df['Spending Score (1-100)'],c='r')
plt.ylabel('Spending Score')
plt.xlabel('Age/Income')
plt.show()

In [None]:
df['Gender'] = df['Gender'].map({'Male':1,'Female':0}) # encoding Gender column
x = df.drop('CustomerID',axis=1)

In [None]:
km = KMeans(n_clusters=4,init='k-means++',n_init=13) # fitting the data on the model with 3 as the number of clusters and getting labels
km.fit(x)
label = km.predict(x)

In [None]:
plt.scatter(df['Annual Income (k$)'],df['Spending Score (1-100)'],c=label,cmap='viridis')
plt.scatter(df['Age'],df['Spending Score (1-100)'],c=label,cmap='viridis')
plt.ylabel('Spending Score')
plt.xlabel('Age/Income')
plt.show()

In [None]:
# using the elbow method to determine what number of clusters i should be using.
wcss = []
for i in range(1,10):
  kml = KMeans(n_clusters=i)
  kml.fit(x)
  wcss.append(kml.inertia_)
plt.plot(range(1,10),wcss)

In [None]:
bkm = BKMeans(n_clusters=6, n_init=13) # Using breathing K-means method, as it's faster and better at performing
bkm.fit(x)
label2 = bkm.predict(x)
plt.scatter(df['Annual Income (k$)'],df['Spending Score (1-100)'],c=label2,cmap='viridis')
plt.scatter(df['Age'],df['Spending Score (1-100)'],c=label2,cmap='viridis')
plt.ylabel('Spending Score')
plt.xlabel('Age/Income')
plt.show()

Applying Agglomerative clustering to the data

In [None]:
agglom = AgglomerativeClustering(linkage = 'average', n_clusters=6, metric='euclidean')
agglopred = agglom.fit_predict(x)
plt.scatter(df['Annual Income (k$)'],df['Spending Score (1-100)'],c=agglopred,cmap='viridis')
plt.scatter(df['Age'],df['Spending Score (1-100)'],c=agglopred,cmap='viridis')
plt.ylabel('Spending Score')
plt.xlabel('Age/Income')
plt.show()

In [None]:
x_min, x_max = np.min(x, axis=0), np.max(x, axis=0) # normalizing the data and getting it's distance matrix to plot a dendrogram.
X1 = (x - x_min) / (x_max - x_min)

dist_matrix = distance_matrix(X1,X1)

In [None]:
Z = hierarchy.linkage(dist_matrix, 'complete')
dendro = hierarchy.dendrogram(Z)