# Clustering for Customer Insights (Silhouette Score 0.61)

# Introduction
Customer segmentation is a critical technique in data analytics, helping businesses understand customer behaviors and preferences. By dividing customers into distinct groups based on purchasing patterns, loyalty, and other demographic factors, businesses can tailor their marketing strategies and improve customer engagement.

In this analysis, we apply K-Means Clustering to segment customers based on features such as age, annual income, purchase amount, purchase frequency, and loyalty score. To evaluate the quality of the segmentation, we use the Silhouette Score, which measures how well-separated the clusters are. With a Silhouette Score of 0.61, the model demonstrates a good level of clustering quality, providing valuable insights for targeted marketing and customer retention strategies.

This report presents the results of the segmentation, interpreting the characteristics of each customer group and suggesting actionable marketing strategies based on the insights gained from the clustering process.

# Import Library

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

# EDA

In [None]:
data = pd.read_csv("/kaggle/input/customer-purchasing-behaviors/Customer Purchasing Behaviors.csv")
data.head()

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
data.shape

In [None]:
data.isnull().sum()

In [None]:
sns.histplot(data['purchase_amount'], kde=True)
plt.title('Distribution of Purchase Amount')
plt.show()

sns.histplot(data['loyalty_score'], kde=True)
plt.title('Distribution of Loyalty Score')
plt.show()

sns.boxplot(x='region', y='purchase_amount', data=data)
plt.title('Purchase Amount by Region')
plt.show()

In [None]:
corr_matrix = data[['age', 'annual_income', 'purchase_amount', 'purchase_frequency', 'loyalty_score']].corr()

plt.figure(figsize=(10, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Numerical Variables')
plt.show()

sns.pairplot(data[['age', 'annual_income', 'purchase_amount', 'purchase_frequency', 'loyalty_score']])
plt.show()

# Data Preprocessing

In [None]:
features = data[['purchase_amount', 'annual_income', 'loyalty_score', 'purchase_frequency']]
scaler = StandardScaler()
scaled_data = scaler.fit_transform(features)
print(scaled_data[:5])

# K-Means Clustering

In [None]:
kmeans = KMeans(n_clusters=4, random_state=42)
data['Cluster'] = kmeans.fit_predict(scaled_data)
print(data.head())

In [None]:
inertia = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(scaled_data)
    inertia.append(kmeans.inertia_)
plt.plot(range(1, 11), inertia)
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.show()

In [None]:
silhouette_avg = silhouette_score(scaled_data, data['Cluster'])
print(f"Silhouette Score: {round(silhouette_avg, 2)}")

In [None]:
pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)
data['PCA1'] = pca_data[:, 0]
data['PCA2'] = pca_data[:, 1]

plt.figure(figsize=(10, 6))
sns.scatterplot(x='PCA1', y='PCA2', hue='Cluster', palette='viridis', data=data)
plt.title('Customer Segmentation with PCA')
plt.show()


# Conclusion

Cluster 0 (Purple) : Likely represents low spenders or infrequent buyers. Consider strategies to encourage more purchases and increase loyalty.

Cluster 1 (Green) : A balanced segment where engagement can be nurtured. Personalized offers or incentives might increase retention and spending.

Cluster 2 (Blue) : High engagement with moderate spending. Tailor your marketing to increase purchase frequency or encourage upgrades to higher-end products.

Cluster 3 (Yellow) : High-value customers who are loyal and spend more. Focus on retaining them with premium services or rewards.

By understanding the behavior of each cluster, businesses can tailor their strategies to enhance customer satisfaction, loyalty, and overall sales.