# Customer Relationship Management (CRM)

## Customer Segmentation with RFM
___

**Business Problem: There is an e-commerce company and this company want to segments it's customers for develop marketing strategy** 
___

__Variables__

Invoice: Unique number for each invoice. If invoice number starts with 'C', it's mean this invoice cancelled.

StockCode: Unique number for each product.

Description: Prooduct name.

Quantity: Indicates how many units of the product were sold.

InvoiceDate: Invoice date.

Price: Product price for each units. (Sterling)

CustumerID: Unique ID for each customer.

Country: Customer country.

**Import Libraries**

In [74]:
import datetime as dt
import pandas as pd

pd.set_option('display.float_format', lambda x: '%.3f' %x)

**Read Excel File**

In [75]:
df_ = pd.read_excel('crm_analytics/datasets/online_retail_II.xlsx', sheet_name='Year 2009-2010')

In [225]:
df = df_.copy()

## Understanding Data

**First 15 rows**

In [77]:
df.head(15)

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom
5,489434,22064,PINK DOUGHNUT TRINKET POT,24,2009-12-01 07:45:00,1.65,13085.0,United Kingdom
6,489434,21871,SAVE THE PLANET MUG,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom
7,489434,21523,FANCY FONT HOME SWEET HOME DOORMAT,10,2009-12-01 07:45:00,5.95,13085.0,United Kingdom
8,489435,22350,CAT BOWL,12,2009-12-01 07:46:00,2.55,13085.0,United Kingdom
9,489435,22349,"DOG BOWL , CHASING BALL DESIGN",12,2009-12-01 07:46:00,3.75,13085.0,United Kingdom


**Check units and columns**

In [78]:
df.shape

(525461, 8)

**Check null units for each columns**

In [79]:
df.isnull().sum()

Invoice             0
StockCode           0
Description      2928
Quantity            0
InvoiceDate         0
Price               0
Customer ID    107927
Country             0
dtype: int64

**Check the unique values for each variables**

In [80]:
df.nunique()

Invoice        28816
StockCode       4632
Description     4681
Quantity         825
InvoiceDate    25296
Price           1606
Customer ID     4383
Country           40
dtype: int64

**Add 'TotalPrice' to dataframe.**

TotalPrice = Quantity x UnitPrice

In [81]:
df['TotalPrice'] = df['Quantity'] * df['Price']

In [82]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,TotalPrice
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom,83.4
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,81.0
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,81.0
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom,100.8
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom,30.0


## Data Preparation

**Remove null units from dataframe**

In [83]:
df.shape

(525461, 9)

In [84]:
df.dropna(inplace=True)

In [85]:
df.shape

(417534, 9)

**Deleting cancalled invoice**

In [86]:
df = df[~df['Invoice'].str.contains('C', na=False)]

In [87]:
df.shape

(407695, 9)

In [88]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,TotalPrice
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom,83.4
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,81.0
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,81.0
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom,100.8
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom,30.0


## Calculating RMF Metrics

**RFM Metrics**

Recency: how long ago they made a purchase. 

Frequency: how often they make purchases.

Monetary: how much money they spend.

In [210]:
df['InvoiceDate'].max()

Timestamp('2010-12-09 20:01:00')

In [90]:
today_date = dt.datetime(2010, 12, 11)

In [150]:
rfm = df.groupby('Customer ID').agg({'InvoiceDate': lambda x: (today_date - x.max()).days,
                              'Invoice': lambda x: x.nunique(),
                              'TotalPrice': lambda x: x.sum()})

rfm.columns = ['recency', 'frequency', 'monetary']

In [151]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12346.0,165,11,372.86
12347.0,3,2,1323.32
12348.0,74,1,222.16
12349.0,43,3,2671.14
12351.0,11,1,300.93


In [152]:
rfm.shape

(4314, 3)

In [153]:
rfm.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
recency,4314.0,91.27,96.944,1.0,18.0,53.0,136.0,374.0
frequency,4314.0,4.454,8.169,1.0,1.0,2.0,5.0,205.0
monetary,4314.0,2047.289,8912.523,0.0,307.95,705.55,1722.802,349164.35


**Deleting monetary values equal to zero from rfm dataframe.**

In [154]:
rfm = rfm[rfm['monetary'] > 0]

In [155]:
rfm.shape

(4312, 3)

In [156]:
rfm.describe()

Unnamed: 0,recency,frequency,monetary
count,4312.0,4312.0,4312.0
mean,91.173,4.456,2048.238
std,96.861,8.17,8914.481
min,1.0,1.0,2.95
25%,18.0,1.0,307.988
50%,53.0,2.0,706.02
75%,136.0,5.0,1723.142
max,374.0,205.0,349164.35


## Calculating RMF Scores

In [174]:
rfm['recency_score'] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])

In [175]:
rfm['frequency_score'] = pd.qcut(rfm['frequency'].rank(method='first'), 5, labels=[1, 2, 3, 4, 5])

In [176]:
rfm['monetary_score'] = pd.qcut(rfm['monetary'], 5, labels=[1, 2, 3, 4, 5])

In [177]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary,recency_score,frequency_score,monetary_score
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12346.0,165,11,372.86,2,5,2
12347.0,3,2,1323.32,5,2,4
12348.0,74,1,222.16,2,1,1
12349.0,43,3,2671.14,3,3,5
12351.0,11,1,300.93,5,1,2


In [190]:
rfm['RFM Score'] = rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str)

In [191]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary,recency_score,frequency_score,monetary_score,RFM Score
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12346.0,165,11,372.86,2,5,2,25
12347.0,3,2,1323.32,5,2,4,52
12348.0,74,1,222.16,2,1,1,21
12349.0,43,3,2671.14,3,3,5,33
12351.0,11,1,300.93,5,1,2,51


## Creating and Analysing RFM Segments

![image-2.png](attachment:image-2.png)

In [203]:
seg_map = {r'[1-2][1-2]': 'hibernating',
          r'[1-2][3-4]': 'at risk',
          r'[1-2]5': 'cant lose them',
          r'3[1-2]': 'about to sleep',
          r'33': 'need attention',
          r'[3-4][4-5]': 'loyal customers',
          r'41': 'promising',
          r'[4-5][2-3]': 'potential loyallists',
          r'51': 'new_customers',
          r'5[4-5]': 'champions'}

In [204]:
rfm['segments'] = rfm['RFM Score'].replace(seg_map, regex=True)

In [205]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary,recency_score,frequency_score,monetary_score,RFM Score,segments
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12346.0,165,11,372.86,2,5,2,25,cant lose them
12347.0,3,2,1323.32,5,2,4,52,potential loyallists
12348.0,74,1,222.16,2,1,1,21,hibernating
12349.0,43,3,2671.14,3,3,5,33,need attention
12351.0,11,1,300.93,5,1,2,51,new_customers


In [220]:
rfm.groupby('segments').describe()

Unnamed: 0_level_0,recency,recency,recency,recency,recency,recency,recency,recency,frequency,frequency,frequency,frequency,frequency,monetary,monetary,monetary,monetary,monetary,monetary,monetary,monetary
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
segments,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
about to sleep,343.0,53.819,10.286,37.0,45.0,53.0,64.0,71.0,343.0,1.201,...,1.0,2.0,343.0,441.32,416.673,24.4,194.125,317.76,535.275,3502.48
at risk,611.0,152.159,69.94,72.0,92.5,131.0,198.0,372.0,611.0,3.074,...,4.0,6.0,611.0,1188.878,1844.624,24.05,449.245,760.19,1376.575,34095.26
cant lose them,77.0,124.117,49.925,72.0,88.0,109.0,137.0,298.0,77.0,9.117,...,9.0,46.0,77.0,4099.45,5304.769,181.35,1698.56,2322.59,4009.66,26286.75
champions,663.0,7.119,4.62,1.0,3.0,7.0,11.0,15.0,663.0,12.554,...,13.0,205.0,663.0,6852.264,21556.377,75.76,1379.675,2508.32,4983.785,349164.35
hibernating,1015.0,213.886,89.932,72.0,136.0,213.0,284.0,374.0,1015.0,1.126,...,1.0,2.0,1015.0,403.978,775.89,2.95,148.505,252.48,405.05,11880.84
loyal customers,742.0,36.287,16.073,16.0,23.0,31.0,50.0,71.0,742.0,6.83,...,8.0,42.0,742.0,2746.067,3256.543,97.4,1072.79,1821.385,3174.05,50291.38
need attention,207.0,53.266,9.796,37.0,44.0,53.0,60.5,71.0,207.0,2.449,...,3.0,3.0,207.0,1060.357,1189.804,101.1,471.47,742.9,1352.33,13544.99
new_customers,50.0,8.58,4.31,1.0,5.0,8.5,11.0,15.0,50.0,1.0,...,1.0,1.0,50.0,386.199,493.321,35.4,129.232,258.825,400.688,2945.38
potential loyallists,517.0,18.793,9.731,1.0,11.0,19.0,26.0,36.0,517.0,2.017,...,3.0,3.0,517.0,729.511,836.679,10.95,303.7,524.65,891.02,12079.99
promising,87.0,25.747,6.035,16.0,21.5,25.0,30.0,36.0,87.0,1.0,...,1.0,1.0,87.0,367.087,343.886,30.3,194.975,293.74,418.75,2389.62


In [223]:
rfm = rfm[['segments', 'recency', 'frequency', 'monetary']]
rfm

Unnamed: 0_level_0,segments,recency,frequency,monetary
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12346.000,cant lose them,165,11,372.860
12347.000,potential loyallists,3,2,1323.320
12348.000,hibernating,74,1,222.160
12349.000,need attention,43,3,2671.140
12351.000,new_customers,11,1,300.930
...,...,...,...,...
18283.000,loyal customers,18,6,641.770
18284.000,about to sleep,67,1,461.680
18285.000,hibernating,296,1,427.000
18286.000,at risk,112,2,1296.430


## Functionalization of all steps

In [227]:
def create_rfm(dataframe, csv=False):
    
    # Data preparation      
    dataframe['TotalPrice'] = dataframe['Quantity'] * dataframe['Price']
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe['Invoice'].str.contains('C', na=False)]
    
    # Calculating RMF metrics    
    max_day = dataframe['InvoiceDate'].max()
    today_date = dt.datetime(max_day.year, max_day.month, max_day.day+2)
    rfm = dataframe.groupby('Customer ID').agg({'InvoiceDate': lambda x: (today_date - x.max()).days,
                              'Invoice': lambda x: x.nunique(),
                              'TotalPrice': lambda x: x.sum()})

    rfm.columns = ['recency', 'frequency', 'monetary']
    rfm = rfm[rfm['monetary'] > 0]
    
    # Calculating RMF scores      
    rfm['recency_score'] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])
    rfm['frequency_score'] = pd.qcut(rfm['frequency'].rank(method='first'), 5, labels=[1, 2, 3, 4, 5])
    rfm['monetary_score'] = pd.qcut(rfm['monetary'], 5, labels=[1, 2, 3, 4, 5])
    rfm['RFM Score'] = rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str)
    
    # Creating and analysing RFM segments    
    seg_map = {r'[1-2][1-2]': 'hibernating',
          r'[1-2][3-4]': 'at risk',
          r'[1-2]5': 'cant lose them',
          r'3[1-2]': 'about to sleep',
          r'33': 'need attention',
          r'[3-4][4-5]': 'loyal customers',
          r'41': 'promising',
          r'[4-5][2-3]': 'potential loyallists',
          r'51': 'new_customers',
          r'5[4-5]': 'champions'}
    
    rfm['segments'] = rfm['RFM Score'].replace(seg_map, regex=True)    
    rfm = rfm[['segments', 'recency', 'frequency', 'monetary']]
    rfm.index = rfm.index.astype(int)
    
    # Convert to csv
    rfm.to_csv('rfm.csv')
    
    return rfm  

In [228]:
test_df = df_.copy()

In [229]:
create_rfm(test_df, csv=False)

Unnamed: 0_level_0,segments,recency,frequency,monetary
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12346,cant lose them,165,11,372.860
12347,potential loyallists,3,2,1323.320
12348,hibernating,74,1,222.160
12349,need attention,43,3,2671.140
12351,new_customers,11,1,300.930
...,...,...,...,...
18283,loyal customers,18,6,641.770
18284,about to sleep,67,1,461.680
18285,hibernating,296,1,427.000
18286,at risk,112,2,1296.430
