# RFM Analytics

RFM Analysis is a marketing technique used to quantitatively determine which customers are the best ones by examining their shopping behaviour:
* how recently a customer has purchased (recency)
  * Recency is the most important predictor of who is more likely to show loyalty towards your brand. Customers who have purchased recently from you are more likely to purchase again from you compared to those who did not purchase recently.
* how often they purchase (frequency) 
  * The second most important factor is how frequently these customers purchase from you. The higher the frequency, the higher is the chances of such customers making a repeat purchase.
* and how much the customer spends (monetary). 

RFM analysis is based on an extension of Pareto’s principle which says that “80% of your business comes from 20% of your customers.“

Customers who have purchased more recently, more frequently, and have spent more money, are likelier to buy again. But those who haven’t, are less valuable for the company and therefore, likely to churn. 

<img src="http://www.wiseguysmarketing.com/wp-content/uploads/2016/03/RFM.png">

In [1]:
import sys
sys.version

'3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) \n[GCC 7.3.0]'

In [2]:
import datetime
now = datetime.datetime.now()
print(now)

2021-03-03 17:22:17.876501


In [3]:
import pandas as pd

## 1. Load transaction file

In [None]:
sales = pd.read_csv("https://git.davewentzel.com/demos/MLOps-E2E/-/raw/master/Lab900/transactions.csv", delimiter=',')
sales.head(10)

In [5]:
sales.describe()

Unnamed: 0,customer,purchase_quantity,product_line,product_code,quantity,price_before_discount,price_after_discount,Hour,store_code,vendor_code,amount
count,4411.0,4411.0,4411.0,4411.0,4411.0,4411.0,4411.0,4411.0,4411.0,4411.0,4411.0
mean,3247793.0,12307.737928,2.300159,78992.072319,1.012469,35.309243,32.489602,1454.219225,10000.150306,17.462934,35.455786
std,1264739.0,8547.264548,2.048316,35480.470064,0.238245,29.400703,27.89324,267.312613,0.432612,11.18351,29.874411
min,126993.0,430.0,1.0,4.0,1.0,0.0,-4.707,904.0,10000.0,1.0,0.0
25%,2310253.0,3773.5,1.0,52874.0,1.0,11.867,10.533,1237.0,10000.0,8.0,11.867
50%,2767146.0,11765.0,2.0,90303.0,1.0,26.96,25.2,1452.0,10000.0,22.0,27.067
75%,3847057.0,16488.5,3.0,106375.5,1.0,51.733,48.2,1657.0,10000.0,27.0,51.867
max,7490767.0,27955.0,28.0,140136.0,11.0,314.667,305.12,2017.0,10002.0,90.0,336.0


In [8]:
## since this is an older dataset and recency is important, let's set the current time back a little bit
import datetime as dt
NOW = dt.datetime(2019,1,22)
print("For this experiment, the current time is now: ",NOW)

For this experiment, the current time is now:  2019-01-22 00:00:00


In [10]:
sales['purchase_date'] = pd.to_datetime(sales['purchase_date'])
sales.head(10)

Unnamed: 0,customer,purchase_date,purchase_quantity,product_line,product_code,quantity,price_before_discount,price_after_discount,Hour,store_code,vendor_code,amount
0,495501,2017-01-01,12026,2,25441,1,41.333,41.333,1639,10000,28,41.333
1,495501,2017-01-01,12026,1,23768,1,32.4,32.4,1639,10001,28,32.4
2,1797430,2017-01-01,12001,3,82509,1,0.667,0.0,1607,10002,20,0.667
3,2248471,2017-01-01,12060,2,1856,1,12.0,11.4,1830,10002,31,12.0
4,2248499,2017-01-01,12004,1,36367,1,47.587,47.587,1610,10002,23,47.587
5,2248528,2017-01-01,12024,1,87914,1,11.6,11.6,1632,10001,4,11.6
6,2248820,2017-01-01,12057,5,72607,1,10.533,10.533,1820,10000,26,10.533
7,2248820,2017-01-01,12057,4,90140,1,6.933,6.933,1820,10000,26,6.933
8,2255804,2017-01-01,12067,1,89462,1,26.587,26.587,1906,10001,26,26.587
9,2267053,2017-01-01,12053,1,79081,1,100.8,100.8,1808,10000,31,100.8


In [14]:
dfRFM = (sales
    .groupby('customer')
    .agg({'purchase_date': lambda x: (NOW - x.max()).days, # Recency
                                        'quantity': lambda x: len(x),      # Frequency
                                        'amount': lambda x: x.sum()}) # Monetary Value
)
dfRFM.head(10)

Unnamed: 0_level_0,purchase_date,quantity,amount
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
126993,174,1,26.0
214674,143,3,116.8
318696,113,2,73.734
438711,327,3,102.134
495501,438,3,108.933
918238,600,1,80.667
1090522,292,1,31.467
1315018,355,1,20.0
1419361,265,1,39.333
1419459,438,2,92.667


In [15]:
dfRFM['purchase_date'] = dfRFM['purchase_date'].astype(int)
dfRFM.rename(columns={'purchase_date': 'recency', 
                         'quantity': 'frequency', 
                         'amount': 'monetary'}, inplace=True)
dfRFM.head(10)

Unnamed: 0_level_0,recency,frequency,monetary
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
126993,174,1,26.0
214674,143,3,116.8
318696,113,2,73.734
438711,327,3,102.134
495501,438,3,108.933
918238,600,1,80.667
1090522,292,1,31.467
1315018,355,1,20.0
1419361,265,1,39.333
1419459,438,2,92.667


## 2. Analyse RFM
Do some basic exploration

In [17]:
customer = sales[sales['customer']==438711]
customer

Unnamed: 0,customer,purchase_date,purchase_quantity,product_line,product_code,quantity,price_before_discount,price_after_discount,Hour,store_code,vendor_code,amount
2388,438711,2018-03-01,4596,2,118557,1,44.8,44.8,1337,10000,2,44.8
2389,438711,2018-03-01,4596,1,78270,1,30.267,30.267,1337,10000,2,30.267
2390,438711,2018-03-01,4602,1,21887,1,27.067,13.733,1346,10000,2,27.067


In [24]:
quantiles = dfRFM.quantile(q=[0.25,0.5,0.75])

In [25]:
quantiles

Unnamed: 0,recency,frequency,monetary
0.25,265.0,1.0,27.8265
0.5,355.0,1.0,53.067
0.75,438.0,2.0,87.7


In [26]:
dictquantiles = quantiles.to_dict()

In [27]:
dictquantiles

{'recency': {0.25: 265.0, 0.5: 355.0, 0.75: 438.0},
 'frequency': {0.25: 1.0, 0.5: 1.0, 0.75: 2.0},
 'monetary': {0.25: 27.826500000000003,
  0.5: 53.06699999999999,
  0.75: 87.69999999999999}}

In [28]:
SegmentationRFM = dfRFM

In [29]:
# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def RClass(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def FMClass(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1

In [30]:
SegmentationRFM['R_Quartile'] = SegmentationRFM['recency'].apply(RClass, args=('recency',quantiles,))
SegmentationRFM['F_Quartile'] = SegmentationRFM['frequency'].apply(FMClass, args=('frequency',quantiles,))
SegmentationRFM['M_Quartile'] = SegmentationRFM['monetary'].apply(FMClass, args=('monetary',quantiles,))

In [31]:
SegmentationRFM['RFMClass'] = SegmentationRFM.R_Quartile.map(str) \
                            + SegmentationRFM.F_Quartile.map(str) \
                            + SegmentationRFM.M_Quartile.map(str)

In [32]:
SegmentationRFM.head(20)

Unnamed: 0_level_0,recency,frequency,monetary,R_Quartile,F_Quartile,M_Quartile,RFMClass
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
126993,174,1,26.0,1,4,4,144
214674,143,3,116.8,1,1,1,111
318696,113,2,73.734,1,2,2,122
438711,327,3,102.134,2,1,1,211
495501,438,3,108.933,3,1,1,311
918238,600,1,80.667,4,4,2,442
1090522,292,1,31.467,2,4,3,243
1315018,355,1,20.0,2,4,4,244
1419361,265,1,39.333,1,4,3,143
1419459,438,2,92.667,3,2,1,321


## Sorting by RFM Score descending

In [33]:
SegmentationRFM.sort_values(by=['RFMClass'], ascending=[False])

Unnamed: 0_level_0,recency,frequency,monetary,R_Quartile,F_Quartile,M_Quartile,RFMClass
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2273695,748,1,22.133,4,4,4,444
2248528,751,1,11.600,4,4,4,444
4215108,748,1,13.333,4,4,4,444
2255804,751,1,26.587,4,4,4,444
4209466,692,1,26.000,4,4,4,444
...,...,...,...,...,...,...,...
5035631,38,4,130.254,1,1,1,111
5093099,265,3,108.400,1,1,1,111
2307972,143,7,284.401,1,1,1,111
2304731,113,3,106.666,1,1,1,111
