# Customer Segmentation Project for BPCL (Petrol Pump Purchase Patterns)
This project simulates a clustering-based segmentation analysis for petrol pumps supplied by Bharat Petroleum Corporation Limited (BPCL) in the Tirunelveli region of Tamil Nadu. We use synthetic data representing monthly average purchases of petrol, diesel, and kerosene, along with order frequency, to group petrol pumps into meaningful clusters.

In [None]:
# Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings('ignore')

## Step 2: Generate Synthetic Dataset
We simulate data for 100 petrol pumps including average monthly liters ordered and frequency.

In [None]:
num_customers = 100
customer_ids = [f"BPCL_PP_{i+1:03d}" for i in range(num_customers)]

data = {
    'Customer_ID': customer_ids,
    'Avg_Petrol_Liters': np.random.normal(loc=30000, scale=8000, size=num_customers).astype(int),
    'Avg_Diesel_Liters': np.random.normal(loc=50000, scale=10000, size=num_customers).astype(int),
    'Avg_Kerosene_Liters': np.random.normal(loc=10000, scale=3000, size=num_customers).astype(int),
    'Frequency_of_Orders_per_Month': np.random.randint(2, 10, size=num_customers),
    'District': np.random.choice(['Tirunelveli', 'Thoothukudi', 'Kanyakumari'], size=num_customers),
    'Outlet_Type': np.random.choice(['Company Owned', 'Dealer Owned'], size=num_customers)
}

df = pd.DataFrame(data)
df[['Avg_Petrol_Liters', 'Avg_Diesel_Liters', 'Avg_Kerosene_Liters']] = df[['Avg_Petrol_Liters', 'Avg_Diesel_Liters', 'Avg_Kerosene_Liters']].clip(lower=0)
df.head()

## Step 3: Data Preprocessing (Standardization)
We scale the numerical features so that clustering isn't biased by the scale of any one variable.

In [None]:
features = df[['Avg_Petrol_Liters',
               'Avg_Diesel_Liters',
               'Avg_Kerosene_Liters',
               'Frequency_of_Orders_per_Month']]

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)
scaled_df = pd.DataFrame(scaled_features, columns=features.columns)
scaled_df.head()

## Step 4: Finding Optimal Number of Clusters Using Elbow Method
We calculate inertia for k = 1 to 10 and plot it to visually identify the 'elbow point'.

In [None]:
inertia = []
K_range = range(1, 11)

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(scaled_df)
    inertia.append(kmeans.inertia_)

plt.figure(figsize=(8, 5))
plt.plot(K_range, inertia, marker='o')
plt.title('Elbow Method to Determine Optimal K')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Inertia (Error)')
plt.grid(True)
plt.show()

## Step 5: Apply KMeans with Optimal K (k=3)
Based on the elbow method, we choose k=3 clusters.

In [None]:
kmeans = KMeans(n_clusters=3, random_state=42)
df['Cluster'] = kmeans.fit_predict(scaled_df)
df.head()

## Step 6: Analyze Cluster-wise Aggregates
We interpret the characteristics of each cluster by calculating average feature values.

In [None]:
df.groupby('Cluster')[['Avg_Petrol_Liters', 'Avg_Diesel_Liters', 'Avg_Kerosene_Liters', 'Frequency_of_Orders_per_Month']].mean().round(2)

## Step 7: Visualize Clusters with PCA (Optional Bonus)
We reduce the dimensionality of features to 2D using PCA and visualize the clusters.

In [None]:
pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_df)
df['PCA1'] = pca_result[:, 0]
df['PCA2'] = pca_result[:, 1]

plt.figure(figsize=(8,6))
sns.scatterplot(x='PCA1', y='PCA2', hue='Cluster', data=df, palette='Set1', s=100)
plt.title('Customer Segments (Petrol Pumps) - PCA Visualization')
plt.grid(True)
plt.show()

# Step 7: Business Insights from Customer Segmentation
Now that we have clustered the petrol pumps into three segments, let's analyze and interpret these clusters to derive **actionable business insights** for BPCL.

In [None]:
# Calculate cluster-wise averages for better understanding
cluster_summary = df.groupby('Cluster')[[
    'Avg_Petrol_Liters',
    'Avg_Diesel_Liters',
    'Avg_Kerosene_Liters',
    'Frequency_of_Orders_per_Month']].mean().round(2)
cluster_summary

## Step 8: Visualizing Clusters
We'll visualize the customer segments using different approaches:
- **Bar plots** to compare average purchase volumes per cluster
- **PCA scatter plot** to visualize the clusters in 2D space

In [None]:
# Bar plot for average fuel consumption per cluster
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
cluster_summary[['Avg_Petrol_Liters', 'Avg_Diesel_Liters', 'Avg_Kerosene_Liters']].plot(
    kind='bar', figsize=(10,6), width=0.7)
plt.title('Cluster-wise Average Fuel Consumption')
plt.xlabel('Cluster')
plt.ylabel('Liters (Monthly Average)')
plt.grid(True)
plt.show()

In [None]:
# PCA visualization for better cluster understanding
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_df)
df['PCA1'] = pca_result[:, 0]
df['PCA2'] = pca_result[:, 1]

plt.figure(figsize=(10,7))
sns.scatterplot(x='PCA1', y='PCA2', hue='Cluster', data=df, palette='Set1', s=100)
plt.title('Customer Segments Visualization (PCA)')
plt.grid(True)
plt.show()

## Step 9: Actionable Business Insights
- **Cluster 0 (Frequent Small Buyers):**
    - Moderate purchase volumes
    - High order frequency
    - **Recommendation:** Offer loyalty rewards or subscription programs.
- **Cluster 1 (Bulk Buyers):**
    - Highest fuel volumes, low order frequency
    - **Recommendation:** Offer bulk purchase discounts, automate deliveries.
- **Cluster 2 (Diesel-Focused Clients):**
    - Diesel-heavy demand, moderate petrol/kerosene usage
    - **Recommendation:** Target them with diesel-specific offers and prioritize logistics.

### Business Impact:
- Improved **logistics planning** by identifying high-frequency pumps
- Better **inventory management** by predicting monthly fuel needs per cluster
- Personalized **marketing strategies** based on purchase behavior

# Final Conclusion
This analysis demonstrates how we can leverage **customer segmentation** to:
- Optimize **supply chain and delivery schedules**
- Improve **customer retention and satisfaction**
- Enhance **overall revenue** through targeted marketing strategies.

