**Programmer: python_scripts (Abhijith Warrier)**

**PYTHON SCRIPT TO _SEGMENT CUSTOMERS INTO MEANINGFUL GROUPS USING UNSUPERVISED LEARNING (KMEANS CLUSTERING)_. üß†üß©üìä**

This script demonstrates how machine learning is used in **marketing, retail, and product analytics** to group customers based on behavioral or demographic similarity. We use **KMeans clustering** to uncover natural customer segments without labeled data.

---

## **üì¶ Install Required Packages**

**Install core data science and ML libraries.**

In [None]:
pip install pandas numpy scikit-learn matplotlib

---

## **üß© Load or Create Customer Data**

**We assume a customer dataset with spending and income features.**

In [1]:
import pandas as pd

df = pd.read_csv("datasets/customers.csv")
df.head()

Unnamed: 0,customer_id,annual_income,spending_score,avg_order_value,purchase_frequency
0,C001,15000,22,450,3
1,C002,18000,28,480,4
2,C003,20000,35,520,5
3,C004,22000,40,560,6
4,C005,25000,45,600,7


Typical features include:

- annual income
- spending score
- purchase frequency
- average order value

---

## **üîç Basic Data Inspection**

**Understand feature ranges and data types.**

In [2]:
print(df.info())
print(df.describe())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   customer_id         20 non-null     object
 1   annual_income       20 non-null     int64 
 2   spending_score      20 non-null     int64 
 3   avg_order_value     20 non-null     int64 
 4   purchase_frequency  20 non-null     int64 
dtypes: int64(4), object(1)
memory usage: 932.0+ bytes
None
       annual_income  spending_score  avg_order_value  purchase_frequency
count      20.000000       20.000000        20.000000           20.000000
mean    58750.000000       56.500000      1180.500000            9.500000
std     29563.268461       22.746891       517.091817            5.052357
min     15000.000000       22.000000       450.000000            3.000000
25%     36250.000000       38.750000       825.000000            5.750000
50%     60000.000000       52.500000      1150.000000     

This helps identify scale differences and outliers.

---

## **üìè Feature Scaling**

**KMeans is distance-based, so scaling is mandatory.**

In [3]:
from sklearn.preprocessing import StandardScaler

# Drop non-numeric identifier
X = df.drop("customer_id", axis=1)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Without scaling, high-magnitude features would dominate clustering.

---

## **üî¢ Choosing Number of Clusters**

**Select the number of customer segments.**

In [4]:
from sklearn.cluster import KMeans

kmeans = KMeans(
    n_clusters=4,
    random_state=42
)

(Elbow or Silhouette methods are commonly used for tuning.)

---

## **üß† Apply KMeans Clustering**

**Assign each customer to a segment.**

In [5]:
df["segment"] = kmeans.fit_predict(X_scaled)

Each segment represents a group of similar customers.

---

## **üìä Visualizing Customer Segments**

**Plot customer clusters for interpretation.**

In [None]:
import matplotlib.pyplot as plt

plt.scatter(
    df.iloc[:, 0],
    df.iloc[:, 1],
    c=df["segment"]
)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Customer Segmentation using KMeans")
plt.show()

Visualization helps translate clusters into business insight.

---

## **üß™ Interpreting the Segments**

Typical customer segments may represent:

- high-value customers
- frequent but low-spending customers
- occasional high spenders
- low engagement customers

These insights guide **targeted marketing and personalization**.

---

## **Key Takeaways**

1. Customer segmentation is a core unsupervised ML use case.
2. KMeans groups customers based on similarity, not labels.
3. Feature scaling is essential for distance-based algorithms.
4. Clusters reveal actionable business insights.
5. Segmentation enables personalization and smarter decision-making.

---