# Customer Segmentation
Customer segmentation models are often used for dividing a company’s clients into different user groups. Customers in each group display shared characteristics that distinguish them from other users.

https://365datascience.com/tutorials/python-tutorials/build-customer-segmentation-models/
https://docs.google.com/spreadsheets/d/18TBSlm0Sfxuqk-ccMAt2Vn1N3KFzTyPyDTxp8qT_h7w/edit#gid=0

Customer segmentation models are often used for dividing a company’s clients into different user groups. Customers in each group display shared characteristics that distinguish them from other users. Here is a simple example of how companies use data segmentation to drive sales. It contains transaction information from around 4,000 customers.

Every time we visit an e-commerce site, we look for items that are on sale to add to my cart. If we want to buy an item of clothing and it isn’t currently on sale, we wait until we see a special offer before making a purchase.

Data scientists at e-commerce companies often build customer segmentation models to identify shared traits amongst their customers. After building such a model, they notice that there are a handful of customers like me who always wait for a special offer before making purchases.

They classify us into a segment called “thrifty shoppers.”

Every time a new promotion is released, the company’s marketing team sends me and every other “thrifty shopper” a curated advertisement highlighting product affordability.

Whenever I get notified of a special discount, I rush to purchase all the items I require before the promotion ends, which increases the company’s sales.

Similarly, all the platform’s customers are grouped into different segments and sent targeted promotions based on their purchase behavior.

The example above demonstrates how customer segmentation models add value to organizations.

Data scientists usually build customer segmentation models using unsupervised machine learning algorithms such as K-Means clustering or hierarchical clustering. These models can pick up on similarities between user groups that often go unnoticed by the human eye.

In this article, I will show you how to build a data segmentation model in Python. You will learn to prepare data for customer segmentation and to build a K-Means algorithm from scratch. We will also look at how RFM is used in marketing to analyze customer value and explore other metrics for evaluating the performance of a clustering algorithm. Finally, we’ll answer the question of how to visualize and interpret clusters for customer segmentation.

We will build a data segmentation model in Python. You will learn to prepare data for customer segmentation and to build a K-Means algorithm from scratch. We will also look at how RFM is used in marketing to analyze customer value and explore other metrics for evaluating the performance of a clustering algorithm.
Finally, we’ll answer the question of how to visualize and interpret clusters for customer segmentation.

## Step 1: Prerequisites for Building a Customer Segmentation Model

In [3]:
# Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Other Libraries


In [9]:
# Data
sheet_id = '18TBSlm0Sfxuqk-ccMAt2Vn1N3KFzTyPyDTxp8qT_h7w'
sheet_name = 'rfmCustomers'
url1 = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"
print(url1)
df = pd.read_csv(url1, encoding='unicode_escape')
df.shape
df2 = df.copy()  # make a copy in case required later

https://docs.google.com/spreadsheets/d/18TBSlm0Sfxuqk-ccMAt2Vn1N3KFzTyPyDTxp8qT_h7w/gviz/tq?tqx=out:csv&sheet=rfmCustomers


In [34]:
#df = df2.copy()

## Step 2: Understand the Segmentation Data

In [35]:
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365.0,,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 08:26,2.55,17850.0,United Kingdom
1,536365.0,71053.0,WHITE METAL LANTERN,6,12/1/2010 08:26,3.39,17850.0,United Kingdom
2,536365.0,,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 08:26,2.75,17850.0,United Kingdom
3,536365.0,,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 08:26,3.39,17850.0,United Kingdom
4,536365.0,,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 08:26,3.39,17850.0,United Kingdom


In [36]:
df.columns

Index(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'],
      dtype='object')

InvoiceNo: The unique identifier of each customer invoice.
StockCode: The unique identifier of each item in stock.
Description: The item purchased by the customer.
Quantity: The number of each item purchased by a customer in a single invoice.
InvoiceDate: The purchase date.
UnitPrice: Price of one unit of each item.
CustomerID: Unique identifier assigned to each user.
Country: The country from where the purchase was made.
With the transaction data above, we need to build different customer segments based on each user’s purchase behavior.

## Step 3: Preprocessing Data for Segmentation

The informative features in this dataset that tell us about customer buying behavior include “Quantity”, “InvoiceDate” and “UnitPrice.” Using these variables, we are going to derive a customer’s RFM profile - Recency, Frequency, Monetary Value.
RFM is commonly used in marketing to evaluate a client’s value based on their:
- Recency: How recently have they made a purchase?
- Frequency: How often have they bought something?
-  Monetary Value: How much money do they spend on average when making purchases?
With the variables in this e-commerce transaction dataset, we will calculate each customer’s recency, frequency, and monetary value. These RFM values will then be used to build the segmentation model.

In [38]:
# Missing values
df.isnull().sum()
df.shape

(541909, 8)

In [40]:
# Drop rows which have customer names missing : print('DataFrame after using the subset function:')
df.dropna(subset=['CustomerID', 'InvoiceDate'], inplace=True)
print(df.isnull().sum())
print(df.shape)

InvoiceNo       4133
StockCode      15334
Description        0
Quantity           0
InvoiceDate        0
UnitPrice          0
CustomerID         0
Country            0
Date               0
dtype: int64
(172782, 9)


### Recency
To identify a customer’s recency, we need to pinpoint when each user was last seen making a purchase:
we only kept rows with the most recent date for each customer. We now need to rank every customer based on what time they last bought something and assign a recency score to them.


In [46]:
# convert date column to datetime format
df['Date']= pd.to_datetime(df['InvoiceDate'])
# keep only the most recent date of purchase
df['rank'] = df.sort_values(['CustomerID','Date']).groupby(['CustomerID'])['Date'].rank(method='min').astype(int)
df_rec = df[df['rank']==1]
df_rec.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,Date,rank
0,536365.0,,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 08:26,2.55,17850.0,United Kingdom,2010-12-01 08:26:00,1
1,536365.0,71053.0,WHITE METAL LANTERN,6,12/1/2010 08:26,3.39,17850.0,United Kingdom,2010-12-01 08:26:00,1
2,536365.0,,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 08:26,2.75,17850.0,United Kingdom,2010-12-01 08:26:00,1
3,536365.0,,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 08:26,3.39,17850.0,United Kingdom,2010-12-01 08:26:00,1
4,536365.0,,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 08:26,3.39,17850.0,United Kingdom,2010-12-01 08:26:00,1


In [52]:
pd.options.mode.chained_assignment = None  # default='warn' Remove warning

In [53]:
# To assign a recency score to each customerID, run the following lines of code:
df_rec.loc[:,'recency'] = (df_rec.loc[:,'Date'] - pd.to_datetime(min(df_rec.loc[:,'Date']))).dt.days
print(df_rec.head())
print(df_rec.tail())
#  dataframe now has a new column called “recency” that tells us 
#when each customer last bought something from the platform 
# 0 means most recent, x value means x days before most recent purchase

   InvoiceNo  StockCode                          Description  Quantity  \
0   536365.0        NaN   WHITE HANGING HEART T-LIGHT HOLDER         6   
1   536365.0    71053.0                  WHITE METAL LANTERN         6   
2   536365.0        NaN       CREAM CUPID HEARTS COAT HANGER         8   
3   536365.0        NaN  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4   536365.0        NaN       RED WOOLLY HOTTIE WHITE HEART.         6   

       InvoiceDate  UnitPrice  CustomerID         Country                Date  \
0  12/1/2010 08:26       2.55     17850.0  United Kingdom 2010-12-01 08:26:00   
1  12/1/2010 08:26       3.39     17850.0  United Kingdom 2010-12-01 08:26:00   
2  12/1/2010 08:26       2.75     17850.0  United Kingdom 2010-12-01 08:26:00   
3  12/1/2010 08:26       3.39     17850.0  United Kingdom 2010-12-01 08:26:00   
4  12/1/2010 08:26       3.39     17850.0  United Kingdom 2010-12-01 08:26:00   

   rank  recency  
0     1        0  
1     1        0  
2     1    

In [88]:
df1 = df_rec.sort_values(['recency','CustomerID'],ascending=False).groupby(by=['CustomerID','recency'], sort=True, dropna=True).count().reset_index()
print(df1.head()) # df1.sort_values(['recency','CustomerID'])
print(type(df1))

   CustomerID  recency  InvoiceNo  StockCode  Description  Quantity  \
0     12347.0        6         31         25           31        31   
1     12348.0      125          5          4            5         5   
2     12350.0       63         17         13           17        17   
3     12352.0       90          5          4            5         5   
4     12355.0      159         13          9           13        13   

   InvoiceDate  UnitPrice  Country  Date  rank  
0           31         31       31    31    31  
1            5          5        5     5     5  
2           17         17       17    17    17  
3            5          5        5     5     5  
4           13         13       13    13    13  
<class 'pandas.core.frame.DataFrame'>


In [None]:
df_rec.sort_values(['recency','CustomerID'],ascending=False).groupby( 
    by=['CustomerID','recency'], sort=True, dropna=True).count().reset_index()


In [87]:
df1.to_frame() # group.to_frame()
#df1.query('recency==0')

AttributeError: 'DataFrame' object has no attribute 'to_frame'

In [48]:
print(df_rec.tail(1))

        InvoiceNo  StockCode                 Description  Quantity  \
541805   581578.0    22736.0  RIBBON REEL MAKING SNOWMEN        10   

            InvoiceDate  UnitPrice  CustomerID  Country                Date  \
541805  12/9/2011 12:16       1.65     12713.0  Germany 2011-12-09 12:16:00   

        rank  recency  
541805     1      373  


### Frequency

### Monetary Value

### Removing Outliers

### Standardization

## Step 4: Building The Customer Segmentation Model

## Step 5: Segmentation Model Interpretation and Visualization

# Segmentation Modelling: Next Steps