# Clustering for Customer Segmentation with KMeans


## Key Takeaways
In this module, students will gain a comprehensive understanding of KMeans clustering and its applications in customer segmentation. 

__They will learn how to:__
- choose the optimal number of clusters, 
- visualize cluster results, and 
- apply clustering techniques to real-world datasets. 

Through practical exercises and projects, students will develop the skills necessary to leverage clustering for data-driven decision-making in various domains.

## Introduction to Clustering
- Definition of Clustering
- Unsupervised Learning vs. Supervised Learning
- Applications of Clustering in Data Science
- Types of Clustering Algorithms: Partitioning, Hierarchical, Density-Based, Model-Based


## KMeans Algorithm

__Overview of KMeans Clustering__

___How KMeans Works:___
1. Initialization: Randomly select initial cluster centroids
2. Assignment: Assign each data point to the nearest cluster centroid
3. Update: Recalculate the cluster centroids based on the mean of data points in each cluster
4. Repeat Assignment and Update steps until convergence

__Determining the Optimal Number of Clusters:__
1. Elbow Method
2. Silhouette Score

## Choosing the Optimal Number of Clusters

__Elbow Method:__
- Explanation of Elbow Method
- Plotting the Within-Cluster Sum of Squares (WCSS) against the number of clusters
- Identifying the "elbow" point where the rate of decrease in WCSS slows down

__Silhouette Score:__
- Explanation of Silhouette Score
- Calculating the Silhouette Score for different numbers of clusters
- Choosing the number of clusters with the highest Silhouette Score

## Visualizing Clusters

__Techniques for Visualizing Cluster Results:__
- Scatter plots with cluster centroids
- Cluster profiles: Mean feature values for each cluster
- Cluster heatmaps: Visualizing cluster characteristics

__Interpreting and Analyzing Cluster Results:__
- Identifying distinct customer segments or groups
- Understanding the characteristics and behaviors of each cluster
- Extracting insights for business decisions and marketing strategies

## Applications in Market Segmentation

__How data science helps:__

- Businesses analyze customer data to create targeted marketing strategies that cater to specific groups (segments), improving engagement and loyalty.

__Types of customer segmentation features:__

- Demographic, Geographic, Psychographic, and Behavioural.

__Demographic__ - grouping is based on demographic variables such as age, gender, income, occupation, and education level. 

__Geographic__ - group according to their location, which could be as broad as a country or as specific as a neighborhood. This helps in tailoring marketing campaigns that are culturally and regionally relevant. 

__Psychographic__ - includes lifestyle, values, attitudes, and personal traits. 

__Behavioural__ - customers are divided based on their behaviour patterns related to the business, such as purchase history, product usage frequency, brand loyalty, and user status (new, potential, or loyal customers).

__What data do I regularly segment on for Email Marketing Segmentation?__ 

Recency Frequency Monetary (RFM) features, time on list, time since last purchase, spend in last 30 days, products purchased, interests (what they clicking on), events attended, email scoring, clicked on a product page (which ones?), geographic region, number of tags, number of events, and many more.

__Algorithms used:__

- KMeans - great tool for finding similar customers.


## Libraries

### Data Manipulation

In [15]:
import pandas as pd
#from pandas_profiling import ProfileReport

### Visualization

In [12]:
import matplotlib.pyplot as plt
import plotly as px

### Date Manipulation

In [5]:
from datetime import date, datetime, timedelta

### Clustering Manipulation

In [6]:
from sklearn.cluster import KMeans

### For Category Features

In [7]:
from category_encoders import OneHotEncoder

### For Scaling Features

In [8]:
from sklearn.preprocessing import StandardScaler

### Model Pipeline

In [9]:
from sklearn.pipeline import make_pipeline
from sklearn.decomposition import PCA

### Evaluation Metric

In [10]:
from sklearn.metrics import silhouette_score

### Warnings

In [11]:
import warnings as wa