# Project 6: Customer Segmentation

This notebook performs customer segmentation using the K-Means clustering algorithm. The goal is to group customers of a mall into different segments based on their annual income and spending score, allowing for targeted marketing strategies.

## 1. Setup and Library Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans

## 2. Data Loading and Exploration

In [None]:
# Load the dataset
try:
    df = pd.read_csv('data/Mall_Customers.csv')
    print("Data loaded successfully.")
except FileNotFoundError:
    print("Data file not found. Make sure 'Mall_Customers.csv' is in the 'data/' directory.")

df.head()

In [None]:
df.info()

In [None]:
# Select features for clustering
X = df.iloc[:, [3, 4]].values

## 3. Finding the Optimal Number of Clusters (Elbow Method)

In [None]:
# WCSS: Within-Cluster Sum of Squares
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42, n_init=10)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

# Plot the Elbow Method graph
plt.figure(figsize=(10, 5))
sns.lineplot(x=range(1, 11), y=wcss, marker='o', color='red')
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

From the plot above, the elbow is clearly at **k=5**. This is the optimal number of clusters for this dataset.

## 4. Training the K-Means Model

In [None]:
# Training the K-Means model on the dataset
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42, n_init=10)
y_kmeans = kmeans.fit_predict(X)

## 5. Visualizing the Clusters

In [None]:
plt.figure(figsize=(12, 8))

plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 60, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 60, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 60, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 60, c = 'violet', label = 'Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 60, c = 'yellow', label = 'Cluster 5')

# Plotting the centroids
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 100, c = 'cyan', label = 'Centroids')

plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

## 6. Conclusion

We have successfully segmented the customers into 5 distinct groups based on their annual income and spending score.

### Interpretation of Clusters:
- **Cluster 1 (Red):** High income, low spending score. (Careful spenders)
- **Cluster 2 (Blue):** Average income, average spending score. (Standard customers)
- **Cluster 3 (Green):** High income, high spending score. (Target customers)
- **Cluster 4 (Violet):** Low income, low spending score. (Sensible spenders)
- **Cluster 5 (Yellow):** Low income, high spending score. (Careless spenders)

This segmentation allows the business to tailor marketing strategies for each group. For example, the 'Target' group could be offered premium products, while the 'Careful' group might respond better to loyalty programs or exclusive deals.