# Customer Grouping using DBSCAN Clustering
### AI Analyst Project Submission

**Team Details**  
- **Member 1:** Shweta Singh (202310101150047), B.Tech CSE (DS+AI)-52
- **Member 2:** Yashvi Jaiswal (202310101150020), B.Tech CSE (DS+AI)-51


## 1. Problem Statement

To group customers based on their annual income and spending score using DBSCAN clustering algorithm, and to understand their behaviors and preferences for targeted marketing.

## 2. Dataset Details

We are using a synthetic dataset containing 200 samples with the following features:
- Annual Income (k$)
- Spending Score (1-100)

The data is generated to simulate different spending patterns of customers.

## 3. Explanation of ML Model

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together points that are close to each other based on a distance measurement and a minimum number of points.

**Key Parameters:**
- `eps`: The maximum distance between two samples for them to be considered as in the same neighborhood.
- `min_samples`: The number of samples in a neighborhood for a point to be considered as a core point.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN

# Load dataset
df = pd.read_csv("synthetic_customer_data.csv")

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)

# Apply DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X_scaled)

# Add cluster labels
df['Cluster'] = labels

# Plot the clusters
plt.figure(figsize=(8, 6))
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'], c=labels, cmap='plasma')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title('DBSCAN Clustering of Customers')
plt.colorbar(label='Cluster Label')
plt.grid(True)
plt.show()

## 5. Results and Insights

- DBSCAN successfully identified clusters based on customer behavior.
- It also labeled some customers as noise (outliers) who do not belong to any cluster.
- This can help in identifying unique or extreme spending behaviors.

## 6. Challenges Faced

- Choosing the right `eps` and `min_samples` values is crucial and non-trivial.
- DBSCAN may not perform well on datasets with varying densities.

## 7. Learnings

- Understood how density-based clustering works.
- Learned to apply DBSCAN for customer segmentation.
- Realized the importance of data scaling and parameter tuning.