
# Customer Segmentation Analysis

## Introduction
This notebook walks through the process of performing customer segmentation using K-means clustering. The goal is to identify distinct groups of customers based on their purchasing behavior.

## Data Loading
Load the necessary libraries and the dataset to begin the analysis.

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the dataset
data = pd.read_csv('data.csv', encoding='ISO-8859-1')
data.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom



## Data Preprocessing
The next step is to clean and preprocess the data to make it suitable for clustering.

In [2]:

# Data cleaning steps
# Example: Remove missing values and convert data types
data.dropna(subset=['CustomerID'], inplace=True)  # Remove missing CustomerIDs
data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])  # Convert InvoiceDate to datetime

# Display cleaned data
data.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom



## Exploratory Data Analysis (EDA)
Perform some exploratory data analysis to understand the data better.

In [None]:

# EDA - Plotting distributions and correlations
sns.pairplot(data[['Quantity', 'UnitPrice']])
plt.show()


## Model Building
Apply K-means clustering to segment the customers.

In [None]:

# Scaling the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data[['Quantity', 'UnitPrice']])

# K-means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(scaled_data)

# Assign clusters back to our DataFrame
data['Cluster'] = kmeans.labels_


## Evaluation
Evaluate the segmentation by examining the characteristics of each cluster.

In [None]:

# Visualizing the clusters
plt.figure(figsize=(10, 6))
plt.scatter(data[data['Cluster'] == 0]['Quantity'], data[data['Cluster'] == 0]['UnitPrice'], s=50, c='red', label='Cluster 1')
plt.scatter(data[data['Cluster'] == 1]['Quantity'], data[data['Cluster'] == 1]['UnitPrice'], s=50, c='blue', label='Cluster 2')
plt.scatter(data[data['Cluster'] == 2]['Quantity'], data[data['Cluster'] == 2]['UnitPrice'], s=50, c='green', label='Cluster 3')
plt.title('Clusters by Quantity and Unit Price')
plt.xlabel('Quantity')
plt.ylabel('Unit Price')
plt.legend()
plt.show()