# Customer Segmentation Using Clustering (K-Means)
This project performs customer segmentation using K-Means clustering on the dataset customer_data.csv. It includes:
- Data loading and preprocessing
- Elbow Method and Silhouette Score analysis
- K-Means clustering
- PCA-based visualization
- Final recommendations for business actions

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA

In [None]:
df = pd.read_csv("customer_data.csv")
df.head()

In [None]:
print("Dataset shape:", df.shape)
print("Missing values:\n", df.isnull().sum())
print("Duplicates:", df.duplicated().sum())
print("Data types:\n", df.dtypes)
df.describe()

In [None]:
features = ['Age', 'Annual_Income_(k$)', 'Spending_Score']
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df[features])

In [None]:
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42, n_init=10)
    kmeans.fit(scaled_data)
    wcss.append(kmeans.inertia_)

plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.grid(True)
plt.show()

In [None]:
silhouette_scores = []
for k in range(2, 11):
    kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42, n_init=10)
    labels = kmeans.fit_predict(scaled_data)
    score = silhouette_score(scaled_data, labels)
    silhouette_scores.append(score)

plt.plot(range(2, 11), silhouette_scores, marker='x', color='green')
plt.title('Silhouette Scores')
plt.xlabel('Number of clusters')
plt.ylabel('Score')
plt.grid(True)
plt.show()

In [None]:
optimal_clusters = 5 # Change if needed based on elbow/silhouette
kmeans = KMeans(n_clusters=optimal_clusters, init='k-means++', random_state=42, n_init=10)
df['Cluster'] = kmeans.fit_predict(scaled_data)

In [None]:
pca = PCA(n_components=2)
components = pca.fit_transform(scaled_data)
df['PCA1'] = components[:, 0]
df['PCA2'] = components[:, 1]

plt.figure(figsize=(8,6))
sns.scatterplot(data=df, x='PCA1', y='PCA2', hue='Cluster', palette='tab10')
plt.title("Customer Segments (PCA)")
plt.grid(True)
plt.show()

In [None]:
df.to_csv("Customer_Segmentation_Results.csv", index=False)

## Recommendations and Business Insights
Based on the clustering results:
- **High-Spending Younger Customers**: Target these with premium products and exclusive offers.
- **Low-Spending High Income Customers**: Investigate their needs; they may require different engagement strategies.
- **Middle-Spending Clusters**: Offer loyalty programs or discounts to convert them to high-spenders.
- **Older Customers**: Consider personalized or senior-specific offerings if they appear in separate clusters.
- **Cluster-based Marketing**: Customize marketing campaigns based on age, income, and spending behavior. This segmentation helps improve marketing effectiveness, resource allocation, and customer satisfaction.