    Team 8 : Adham Ahmed 13004821 , Aly Labib 13005792 , Omar Ayman 13002702

# Importing Libraries

In [1]:
import pandas as pd

import matplotlib.pyplot as plt


from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans, AgglomerativeClustering, Birch
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score, davies_bouldin_score

# Load Dataset

In [2]:
df = pd.read_csv("C:\\Users\\omarg\\Desktop\\DATA\\Mall_Customers.csv")

# Load Important Data Features(head,info, null values , description of the dataset)

In [3]:
# Display first few rows
df.head()

# Basic info about the dataset
df.info()

# Check for missing values
df.isnull().sum()

# Summary statistics
df.describe()

# Selecting features for clustering

In [4]:
features = df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']]

# Standardizing the features

In [5]:
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

# Convert to DataFrame for better readability

In [6]:
scaled_df = pd.DataFrame(scaled_features, columns=features.columns)
scaled_df.head()

# Apply PCA to reduce dimensions to 2D

In [7]:
pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_df)

# Convert to DataFrame
pca_df = pd.DataFrame(pca_data, columns=['PCA1', 'PCA2'])

# Visualize the data in 2D space
plt.figure(figsize=(8, 6))
plt.scatter(pca_df['PCA1'], pca_df['PCA2'])
plt.title("PCA - 2D Visualization of Customers")
plt.xlabel("PCA1")
plt.ylabel("PCA2")
plt.show()
inertia = []
k_range = range(1, 11)

for k in k_range:
    km = KMeans(n_clusters=k, random_state=0)
    km.fit(scaled_df)
    inertia.append(km.inertia_)

# Plot the elbow curve

In [8]:
inertia = []
k_range = range(1, 11)

for k in k_range:
    km = KMeans(n_clusters=k, random_state=0)
    km.fit(scaled_df)
    inertia.append(km.inertia_)
plt.figure(figsize=(8, 5))
plt.plot(k_range, inertia, marker='o')
plt.title("Elbow Method")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Inertia")
plt.xticks(k_range)
plt.grid(True)
plt.show()
# From the elbow method, let's say k = 5
kmeans = KMeans(n_clusters=5, random_state=0)
kmeans_labels = kmeans.fit_predict(scaled_df)

# Visualize the clusters

In [9]:
plt.scatter(pca_df['PCA1'], pca_df['PCA2'], c=kmeans_labels, cmap='rainbow')
plt.title("K-Means Clustering")
plt.show()

agg = AgglomerativeClustering(n_clusters=5)
agg_labels = agg.fit_predict(scaled_df)

plt.scatter(pca_df['PCA1'], pca_df['PCA2'], c=agg_labels, cmap='rainbow')
plt.title("Agglomerative Clustering")
plt.show()

gmm = GaussianMixture(n_components=5, random_state=0)
gmm_labels = gmm.fit_predict(scaled_df)

plt.scatter(pca_df['PCA1'], pca_df['PCA2'], c=gmm_labels, cmap='rainbow')
plt.title("Gaussian Mixture Model Clustering")
plt.show()

birch = Birch(n_clusters=5)
birch_labels = birch.fit_predict(scaled_df)

plt.scatter(pca_df['PCA1'], pca_df['PCA2'], c=birch_labels, cmap='rainbow')
plt.title("BIRCH Clustering")
plt.show()

*How each Algorithm Clusters it's data:*


    K-means:
    K-Means forms clusters by first choosing the number of clusters (k) and randomly placing k points called centroids. Each data point is assigned to the nearest centroid based on distance. Then, the centroids are moved to the center (mean) of the points in their cluster. This process repeats until the centroids no longer change significantly. In the end, each point belongs to exactly one cluster. K-Means works best for data that forms compact, circular-shaped groups.
    Agglomerative Clustering:
        Agglomerative clustering is a type of hierarchical clustering. It starts by treating each data point as its own cluster. Then, it repeatedly finds and merges the two closest clusters based on distance until only the desired number of clusters remains. This process creates a tree-like structure (dendrogram) that shows how clusters were merged. It doesn't assume the shape of the clusters and works well for various cluster sizes and forms.
    Gaussian Mixture Model (GMM):
        GMM assumes that the data comes from a mixture of multiple Gaussian (bell curve) distributions. Each cluster is modeled as a probability distribution with its own mean and shape. Instead of assigning points to a single cluster, GMM gives each point a probability of belonging to each cluster. It uses a method called Expectation-Maximization (EM) to improve the accuracy step by step. This is useful for data where clusters may overlap.
    BIRCH Clustering :
        BIRCH is designed for very large datasets. It builds a tree-like structure called a CF Tree that groups data into small, compact clusters as it scans the dataset. Once the tree is built, it can optionally use another algorithm like K-Means to refine the final clusters. BIRCH is very fast and memory-efficient, but it may not work as well when clusters have complex shapes or are very close together.

# Evaluation Function

In [10]:
def evaluate_model(name, labels):
    silhouette = silhouette_score(scaled_df, labels)
    db_index = davies_bouldin_score(scaled_df, labels)
    print(f"{name}: Silhouette Score = {silhouette:.4f}, Davies-Bouldin Index = {db_index:.4f}")

evaluate_model("K-Means", kmeans_labels)
evaluate_model("Agglomerative", agg_labels)
evaluate_model("GMM", gmm_labels)
evaluate_model("BIRCH", birch_labels)

The clustering results show that K-Means performed the best among the four methods, achieving the highest Silhouette Score (0.4166) and lowest Davies-Bouldin Index (0.8746), indicating well-separated and compact clusters. GMM followed closely with a Silhouette Score of 0.4064 and Davies-Bouldin Index of 0.9356, suggesting decent but slightly overlapping clusters. Agglomerative Clustering showed moderate performance with a Silhouette Score of 0.3900 and Davies-Bouldin Index of 0.9163, while BIRCH performed the worst, with a Silhouette Score of 0.3231 and a high Davies-Bouldin Index of 1.1507, reflecting less distinct clustering. Overall, K-Means provided the most effective customer segmentation for this dataset.




The clustering identified five distinct customer groups with unique characteristics: The first group consists of young customers with high spending scores and moderate to high income, making them ideal targets for trend-focused marketing, loyalty programs, and exclusive offers. The second group includes older or budget-conscious shoppers who have lower income and spending levels, responding best to discounts, value bundles, and senior-focused promotions. The third cluster represents middle-aged customers with average income and moderate spending habits, forming the stable core of the customer base suitable for general marketing and upselling strategies. The fourth group comprises high-income but low-spending individuals who tend to be selective buyers, making personalized premium services and luxury branding effective engagement tools. Finally, the fifth cluster contains balanced customers with average income and spending patterns; they are loyal but less active, benefiting from broad marketing campaigns and retention incentives. Together, these insights enable businesses to tailor their marketing efforts and better meet the needs of each customer segment.
