# Recency, Frequency, Monetary Value analysis

**Datacamp** : https://app.datacamp.com/learn/courses/customer-segmentation-in-python

**RFM segmentation**

To do this, we are going to calculate three customer behavior metrics 
- Recency - which measures how recent was each customer's last purchase, 
- Frequency - which measures how many purchases the customer has done in the last 12 months, 
- MonetaryValue - measures how much has the customer spent in the last 12 months. 

We will use these values to assign customers to RFM segments.

**Grouping RFM values**

Next step is to group them into some sort of categorization such as high, medium and low. There are multiple ways to do that. 
- We can break customers into groups of equal size based on percentile values of each metric 
- We can assign either high or low value to each metric based on a 80/20% Pareto split 
- Or we can use existing knowledge from previous business insights about certain threshold values for each metric 


# Imports

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

# RFM Segmentation

## Calculate percentile

In [11]:
# Create a simple DF
data = pd.DataFrame({'CustomerID': pd.Series(range(0,8)),
                     'Spend': [137, 335, 172, 355, 303, 233, 244, 229]
    
})
data

Unnamed: 0,CustomerID,Spend
0,0,137
1,1,335
2,2,172
3,3,355
4,4,303
5,5,233
6,6,244
7,7,229


In [12]:
# Calculate percentile
spend_quartiles = pd.qcut(x=data['Spend'], q=4, labels=range(1,5))
data['Spend_Quartiles'] = spend_quartiles
data.sort_values('Spend')

Unnamed: 0,CustomerID,Spend,Spend_Quartiles
0,0,137,1
2,2,172,1
7,7,229,2
5,5,233,2
6,6,244,3
4,4,303,3
1,1,335,4
3,3,355,4


## Assigning labels

In [13]:
# Create a simple DF
data = pd.DataFrame({'CustomerID': pd.Series(range(0,8)),
                     'Recency_Days': [37, 235, 396, 72, 255, 393, 203, 133]
    
})
data

Unnamed: 0,CustomerID,Recency_Days
0,0,37
1,1,235
2,2,396
3,3,72
4,4,255
5,5,393
6,6,203
7,7,133


In this case, the more recent the customer, the better !

In [16]:
# Create nb labels
r_labels = list(range(4,0,-1))
r_labels

[4, 3, 2, 1]

In [19]:
# Divide into groups based on quartiles
recency_quartiles = pd.qcut(x=data['Recency_Days'], q=4, labels=r_labels)

# Create new column
data['Recency_quartiles'] = recency_quartiles
data.sort_values('Recency_Days')

Unnamed: 0,CustomerID,Recency_Days,Recency_quartiles
0,0,37,4
3,3,72,4
7,7,133,3
6,6,203,3
1,1,235,2
4,4,255,2
5,5,393,1
2,2,396,1


The quartile labels are reversed, since the most recent customer are more valuable

## Custom labels

In [21]:
# Create string labels
r_labels = ['Active', 'Lapsed', 'Inactive', 'Churned']

# Divide into groups based on quartiles
recency_quartiles = pd.qcut(x=data['Recency_Days'], q=4, labels=r_labels)

# Create new column
data['Recency_Quartile'] = recency_quartiles

# Sort values
data.sort_values('Recency_Days')

Unnamed: 0,CustomerID,Recency_Days,Recency_quartiles,Recency_Quartile
0,0,37,4,Active
3,3,72,4,Active
7,7,133,3,Lapsed
6,6,203,3,Lapsed
1,1,235,2,Inactive
4,4,255,2,Inactive
5,5,393,1,Churned
2,2,396,1,Churned


# Calculating RFM metrics

In [41]:
# Load data
online_df = pd.read_csv('data/online.csv')
online_df.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,416792,572558,22745,POPPY'S PLAYHOUSE BEDROOM,6,2011-10-25 08:26:00,2.1,14286,United Kingdom
1,482904,577485,23196,VINTAGE LEAF MAGNETIC NOTEPAD,1,2011-11-20 11:56:00,1.45,16360,United Kingdom
2,263743,560034,23299,FOOD COVER WITH BEADS SET 2,6,2011-07-14 13:35:00,3.75,13933,United Kingdom
3,495549,578307,72349B,SET/6 PURPLE BUTTERFLY T-LIGHTS,1,2011-11-23 15:53:00,2.1,17290,United Kingdom
4,204384,554656,21756,BATH BUILDING BLOCK WORD,3,2011-05-25 13:36:00,5.95,17663,United Kingdom


In [42]:
# Create TotalSum = Quantity * UnitPrice
online_df['TotalSum'] = online_df['Quantity'] * online_df['UnitPrice']
online_df = online_df.set_index('Unnamed: 0')
online_df.head()

Unnamed: 0_level_0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,TotalSum
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
416792,572558,22745,POPPY'S PLAYHOUSE BEDROOM,6,2011-10-25 08:26:00,2.1,14286,United Kingdom,12.6
482904,577485,23196,VINTAGE LEAF MAGNETIC NOTEPAD,1,2011-11-20 11:56:00,1.45,16360,United Kingdom,1.45
263743,560034,23299,FOOD COVER WITH BEADS SET 2,6,2011-07-14 13:35:00,3.75,13933,United Kingdom,22.5
495549,578307,72349B,SET/6 PURPLE BUTTERFLY T-LIGHTS,1,2011-11-23 15:53:00,2.1,17290,United Kingdom,2.1
204384,554656,21756,BATH BUILDING BLOCK WORD,3,2011-05-25 13:36:00,5.95,17663,United Kingdom,17.85


In [50]:
online_df['InvoiceDate'] = pd.to_datetime(online_df['InvoiceDate'])
online_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 70864 entries, 416792 to 312243
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   InvoiceNo    70864 non-null  int64         
 1   StockCode    70864 non-null  object        
 2   Description  70864 non-null  object        
 3   Quantity     70864 non-null  int64         
 4   InvoiceDate  70864 non-null  datetime64[ns]
 5   UnitPrice    70864 non-null  float64       
 6   CustomerID   70864 non-null  int64         
 7   Country      70864 non-null  object        
 8   TotalSum     70864 non-null  float64       
dtypes: datetime64[ns](1), float64(2), int64(3), object(3)
memory usage: 5.4+ MB


## Data preparation

In [51]:
# Which period
'Min: {}; Max: {}'.format(min(online_df.InvoiceDate),
                       max(online_df.InvoiceDate))

'Min: 2010-12-01 08:26:00; Max: 2011-12-09 12:49:00'

In [53]:
# Create a hypothetical snapshot_day data as if we're doing analysis recently
snapshot_date = max(online_df.InvoiceDate) + dt.timedelta(days=1)
snapshot_date

Timestamp('2011-12-10 12:49:00')

Pour reprendre  : https://campus.datacamp.com/courses/customer-segmentation-in-python/recency-frequency-monetary-value-analysis?ex=4 , min -1.04

# Building RFM segments

# Analyzing RFM table