# RFM Analysis

This iPython notebook explains how to perform RFM analysis from customer purchase history data. The sample orders file is Sample - Superstore dataset from Tableau Software.

If you have suggestions or improvements please contribute on https://github.com/joaolcorreia/RFM-analysis

In [160]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

Read the sample orders file, containing all past purchases for all customers.

In [161]:
orders = pd.read_csv('rfm-med.csv',sep=',')


In [162]:
orders.head()

Unnamed: 0,Date,Timestamp,Account No,Balance,Amount,Third Party Name
0,01/01/2023,14:03,932820758,1406.201933,5.8,Medical
1,02/01/2023,16:42,285873880,3341.642906,42.44,Medical
2,02/01/2023,17:41,409370545,1474.547647,17.82,Medical
3,03/01/2023,04:10,548907062,2339.2603,9.81,Medical
4,04/01/2023,04:52,124102251,2544.42147,16.61,Medical


## Create the RFM Table

Since recency is calculated for a point in time and the Tableau Super Store dataset last order date is Dec 31 2014, that is the date we will use to calculate recency.

Set this date to the current day and extract all orders until yesterday.

In [163]:
import datetime as dt
NOW = dt.datetime(2023,12,31)

In [164]:
# Make the date_placed column datetime
orders['order_date'] = pd.to_datetime(orders['Date'], dayfirst=True)

Create the RFM Table

In [165]:
rfmTable = orders.groupby('Account No').agg({'order_date': lambda x: (NOW - x.max()).days, # Recency
                                        'Account No': lambda x: len(x),      # Frequency
                                        'Amount': lambda x: x.sum()}) # Monetary Value

rfmTable['order_date'] = rfmTable['order_date'].astype(int)
rfmTable.rename(columns={'order_date': 'recency', 
                         'Account No': 'frequency', 
                         'Amount': 'monetary_value'}, inplace=True)

## Validating the RFM Table

In [166]:
rfmTable.head()

Unnamed: 0_level_0,recency,frequency,monetary_value
Account No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
104832000,340,1,7.31
105375973,354,1,69.03
106601471,32,8,159.38
108481285,29,26,660.89
108563213,34,17,378.77


## Determining RFM Quartiles

In [167]:
quantiles = rfmTable.quantile(q=[0.25,0.5,0.75])

In [168]:
quantiles

Unnamed: 0,recency,frequency,monetary_value
0.25,32.0,1.0,31.955
0.5,52.0,4.0,113.545
0.75,214.0,15.0,352.635


Send quantiles to a dictionary, easier to use.

In [169]:
quantiles = quantiles.to_dict()

Recency：
25%的客户最后一次购买是在25天或更少时间之前。
50%的客户（中位数，0.5位置）最后一次购买是在27.5天或更少时间之前。
75%的客户最后一次购买是在126.25天或更少时间之前。

Frequency：
25%的客户在给定时间段内的购买次数是2.25次或更少。
50%的客户的购买次数是1785.5次或更少。
75%的客户的购买次数是3908.25次或更少。

Monetary Value：
25%的客户在给定时间段内的购买金额是324.995或更少。
50%的客户的购买金额是30926.834999999995或更少。
75%的客户的购买金额是135194.74或更少。

In [170]:
quantiles

{'recency': {0.25: 32.0, 0.5: 52.0, 0.75: 214.0},
 'frequency': {0.25: 1.0, 0.5: 4.0, 0.75: 15.0},
 'monetary_value': {0.25: 31.955000000000002, 0.5: 113.545, 0.75: 352.635}}

## Creating the RFM segmentation table

In [171]:
rfmSegmentation = rfmTable

We create two classes for the RFM segmentation since, being high recency is bad, while high frequency and monetary value is good. 

In [172]:
# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def RClass(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def FMClass(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1


In [173]:
rfmSegmentation['R_Quartile'] = rfmSegmentation['recency'].apply(RClass, args=('recency',quantiles,))
rfmSegmentation['F_Quartile'] = rfmSegmentation['frequency'].apply(FMClass, args=('frequency',quantiles,))
rfmSegmentation['M_Quartile'] = rfmSegmentation['monetary_value'].apply(FMClass, args=('monetary_value',quantiles,))

In [174]:
rfmSegmentation['RFMClass'] = rfmSegmentation.R_Quartile.map(str) \
                            + rfmSegmentation.F_Quartile.map(str) \
                            + rfmSegmentation.M_Quartile.map(str)

R_Quartile：基于recency分数的四分位排名，1表示最好（最近的购买），而4表示最差（最久远的购买）。

F_Quartile：基于frequency分数的四分位排名，1表示最差（购买次数最少），而4表示最好（购买次数最多）。

M_Quartile：基于monetary_value分数的四分位排名，1表示最差（消费金额最少），而4表示最好（消费金额最多）。



444：这个客户最近的购买日期较远（R=4），购买频率低（F=4），消费金额也较少（M=4）。这可能表明这个客户对你的业务不再那么活跃。

133：这个客户最近有购买（R=1），但购买频率较低（F=3），消费金额适中（M=3）。

122：这个客户最近有购买（R=1），购买频率高（F=2），消费金额也高（M=2）。这表明这是一个价值较高的客户。

112：这个客户最近有购买（R=1），购买频率非常高（F=1），消费金额也非常高（M=2），这是一个非常重要的客户，可能是一个忠诚并经常大额消费的VIP客户。

In [175]:
rfmSegmentation.head()

Unnamed: 0_level_0,recency,frequency,monetary_value,R_Quartile,F_Quartile,M_Quartile,RFMClass
Account No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
104832000,340,1,7.31,4,4,4,444
105375973,354,1,69.03,4,4,3,443
106601471,32,8,159.38,1,2,2,122
108481285,29,26,660.89,1,1,1,111
108563213,34,17,378.77,2,1,1,211


In [176]:
# Uncomment any of the following lines to: copy data to clipboard or save it to a CSV file.
rfmSegmentation.to_clipboard()
rfmSegmentation.to_csv('rfm-table-med.csv', sep=',')

Who are the top 5 best customers? by RFM Class (111), high spenders who buy recently and frequently?

In [177]:
rfmSegmentation[rfmSegmentation['RFMClass']=='111'].sort_values('monetary_value', ascending=False).head(5)

Unnamed: 0_level_0,recency,frequency,monetary_value,R_Quartile,F_Quartile,M_Quartile,RFMClass
Account No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
675806859,31,30,816.01,1,1,1,111
586315521,28,33,796.73,1,1,1,111
383625050,28,33,792.35,1,1,1,111
541639148,30,29,760.74,1,1,1,111
522188082,29,24,760.24,1,1,1,111
