# Customer Segmentation using Machine Learning

## Objective
The goal of this project is to segment customers based on their purchasing behavior using K-Means clustering.
This helps businesses identify different customer groups for targeted marketing strategies.

## Dataset
- The dataset includes **Age, Annual Income, and Spending Score**.
- Data has been generated synthetically for demonstration purposes.

## Steps:
1. Load and preprocess the dataset.
2. Perform Exploratory Data Analysis (EDA).
3. Apply K-Means clustering and determine the optimal number of clusters using the Elbow Method.
4. Visualize the clustered data.
5. Evaluate clustering performance using the **Silhouette Score**.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# Set random seed for reproducibility
np.random.seed(42)


In [None]:
# Generate synthetic dataset
n_samples = 200
age = np.random.randint(18, 70, n_samples)
income = np.random.randint(15000, 120000, n_samples)
spending_score = np.random.randint(1, 100, n_samples)

# Create DataFrame
df = pd.DataFrame({'Age': age, 'Annual Income': income, 'Spending Score': spending_score})

# Display first 5 rows
df.head()


In [None]:
# Standardize the data
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)


In [None]:
# Determine the optimal number of clusters using the Elbow Method
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, random_state=42, n_init=10)
    kmeans.fit(df_scaled)
    wcss.append(kmeans.inertia_)

# Plot the Elbow Method
plt.figure(figsize=(8, 5))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.title('Elbow Method for Optimal K')
plt.show()


In [None]:
# Apply K-Means with optimal K (based on Elbow Method, assume K=4)
optimal_k = 4
kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
clusters = kmeans.fit_predict(df_scaled)
df['Cluster'] = clusters

# Calculate Silhouette Score
silhouette_avg = silhouette_score(df_scaled, clusters)
print(f'Silhouette Score: {silhouette_avg:.2f}')


In [None]:
# Plot Clusters
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['Annual Income'], y=df['Spending Score'], hue=df['Cluster'], palette='viridis', s=100, alpha=0.7)
plt.xlabel('Annual Income')
plt.ylabel('Spending Score')
plt.title(f'Customer Segmentation (K={optimal_k}) - Silhouette Score: {silhouette_avg:.2f}')
plt.legend(title='Cluster')
plt.show()
