# Task 2: Customer Segmentation

This notebook performs customer segmentation using clustering algorithms (K-Means, DBSCAN) on the Mall Customer dataset. The workflow includes:
- Data loading and inspection
- Exploratory Data Analysis (EDA)
- Preprocessing (scaling, encoding)
- Clustering (K-Means, DBSCAN)
- Cluster visualization and evaluation
- Conclusions

---

In [None]:
# 1. Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score
import warnings
warnings.filterwarnings('ignore')

In [None]:
# 2. Load and Inspect the Dataset
try:
    df = pd.read_csv('Mall_Customers.csv')
except FileNotFoundError:
    df = None
    print('Dataset not found. Please place Mall_Customers.csv in this folder.')

if df is not None:
    display(df.head())
    print('\nShape:', df.shape)
    display(df.info())
    display(df.describe())

In [None]:
# 3. Exploratory Data Analysis (EDA)
if df is not None:
    # Gender distribution
    plt.figure(figsize=(5,3))
    sns.countplot(x='Gender', data=df)
    plt.title('Gender Distribution')
    plt.show()
    # Age distribution
    plt.figure(figsize=(6,3))
    sns.histplot(df['Age'], bins=20, kde=True)
    plt.title('Age Distribution')
    plt.show()
    # Annual Income distribution
    plt.figure(figsize=(6,3))
    sns.histplot(df['Annual Income (k$)'], bins=20, kde=True)
    plt.title('Annual Income Distribution')
    plt.show()
    # Spending Score distribution
    plt.figure(figsize=(6,3))
    sns.histplot(df['Spending Score (1-100)'], bins=20, kde=True)
    plt.title('Spending Score Distribution')
    plt.show()
    # Pairplot
    sns.pairplot(df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']])
    plt.show()

In [None]:
# 4. Preprocessing (Encoding, Scaling)
if df is not None:
    # Encode Gender
    df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})
    # Select features for clustering
    X = df[['Gender', 'Age', 'Annual Income (k$)', 'Spending Score (1-100)']]
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

In [None]:
# 5. K-Means Clustering
if df is not None:
    # Find optimal number of clusters using the elbow method
    inertia = []
    for k in range(1, 11):
        kmeans = KMeans(n_clusters=k, random_state=42)
        kmeans.fit(X_scaled)
        inertia.append(kmeans.inertia_)
    plt.figure(figsize=(6,4))
    plt.plot(range(1, 11), inertia, marker='o')
    plt.title('Elbow Method For Optimal k')
    plt.xlabel('Number of clusters')
    plt.ylabel('Inertia')
    plt.show()
    # Fit KMeans with optimal k (e.g., 5)
    kmeans = KMeans(n_clusters=5, random_state=42)
    clusters = kmeans.fit_predict(X_scaled)
    df['Cluster'] = clusters
    print('Silhouette Score:', silhouette_score(X_scaled, clusters))

In [None]:
# 6. Cluster Visualization
if df is not None:
    plt.figure(figsize=(8,6))
    sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', hue='Cluster', data=df, palette='Set1')
    plt.title('Customer Segments (K-Means)')
    plt.show()
    # Average spending per cluster
    print(df.groupby('Cluster')['Spending Score (1-100)'].mean())

In [None]:
# 7. DBSCAN Clustering (Bonus)
if df is not None:
    dbscan = DBSCAN(eps=0.7, min_samples=5)
    db_clusters = dbscan.fit_predict(X_scaled)
    df['DBSCAN_Cluster'] = db_clusters
    plt.figure(figsize=(8,6))
    sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', hue='DBSCAN_Cluster', data=df, palette='Set2')
    plt.title('Customer Segments (DBSCAN)')
    plt.show()

## Conclusions

- K-Means clustering reveals distinct customer segments based on income and spending.
- DBSCAN can find non-spherical clusters and outliers.
- EDA and cluster analysis help businesses target marketing strategies.

---

**To run this notebook:**
1. Place `Mall_Customers.csv` in this folder.
2. Run all cells in order.
3. Review results and plots for insights.