# RFM Analysis

(Recency , Frequency & Monetary)

### Introduction

RFM analysis is a customer segmentation technique that uses past purchase behavior to segment customers.
It groups customers based on their transaction history – how recently, how often and how much did they buy. It is a handy method to find the best customers, understand their behavior and then run targeted marketing campaigns to increase sales, satisfaction and customer lifetime value.

The goal of this analysis is to identify customer segments using RFM analysis and to understand how those groups differ from each other.

### Benefits of RFM analysis

* Increased customer retention
* Increased response rate
* Increased conversion rate
* Increased revenue

RFM model combines three different customer attributes to rank customers:

#### Recency (R): Who have purchased recently? Number of days since last purchase
#### Frequency (F): Who has purchased frequently? The total number of purchases
#### Monetary Value(M): Who has high purchase amount? The total money customer spent


If the customer bought in recent past, he gets higher points. If he bought many times, he gets higher score. And if he spent a lot of money, he gets more points.

### How to create segments?

Finding quartile values for each RFM & then giving a score to the data.

from IPython.display import Image

Image(filename= path+'\\rfm.PNG')

For example, let’s look at a customer who:

- is within the group who purchased most recently (R=1),
- is within the group who purchased most quantity (F=1),
- is within the group who spent the most (M=1)

This customer belongs to RFM segment 1-1-1 (Best Customers), (R=1, F=1, M=1)

In [45]:
import pandas as pd


Importing the sample orders file, containing all past purchases for all customers.

In [46]:
path = '../../Data\\'
orders = pd.read_csv(path+ 'sample-orders.csv', sep =',' , encoding = "ISO-8859-1")

In [47]:
orders.head()

Unnamed: 0,order_date,order_id,customer,grand_total
0,9/7/2011,CA-2011-100006,Dennis Kane,378
1,7/8/2011,CA-2011-100090,Ed Braxton,699
2,3/14/2011,CA-2011-100293,Neil Franzsisch,91
3,1/29/2011,CA-2011-100328,Jasper Cacioppo,4
4,4/8/2011,CA-2011-100363,Jim Mitchum,21


## Create the RFM Table

Since recency is calculated for a point in time and the current dataset last order date is Dec 31 2014, that is the date we will use to calculate recency.

Set this date to the current day and extract all orders until yesterday.

In [48]:
import datetime as dt
NOW = dt.datetime(2014,12,31)

In [49]:
# Make the date_placed column datetime
orders['order_date'] = pd.to_datetime(orders['order_date'])

Create the RFM Table

In [50]:
rfmTable = orders.groupby('customer').agg({'order_date': lambda x: (NOW - x.max()).days, # Recency
                                        'order_id': lambda x: len(x),      # Frequency
                                        'grand_total': lambda x: x.sum()}) # Monetary Value

rfmTable['order_date'] = rfmTable['order_date'].astype(int)
rfmTable.rename(columns={'order_date': 'recency', 
                         'order_id': 'frequency', 
                         'grand_total': 'monetary_value'}, inplace=True)

## Validating the RFM Table

In [51]:
rfmTable.head()

Unnamed: 0_level_0,recency,frequency,monetary_value
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aaron Bergman,415,3,887
Aaron Hawkins,12,7,1744
Aaron Smayling,88,7,3050
Adam Bellavance,54,8,7756
Adam Hart,34,10,3249


Customer **Aaron Bergman** has frequency:3, monetary value:$887 and recency:415 days.

#### Explanation:  
**Aaron Bergman**  purchased recently (R=1) , bought 3 times from the store(F=3) & Avg spending is $887.

In [52]:
aaron = orders[orders['customer']=='Aaron Bergman']
aaron

Unnamed: 0,order_date,order_id,customer,grand_total
624,2011-02-19,CA-2011-152905,Aaron Bergman,13
665,2011-03-07,CA-2011-156587,Aaron Bergman,310
2336,2013-11-11,CA-2013-140935,Aaron Bergman,564


Inserting the date of Aaron purchase and comparing it to the recency in the rfmTable we verify our RFM table is correct.

## Determining RFM Quartiles

In [53]:
quantiles = rfmTable.quantile(q=[0.25,0.5,0.75])

In [54]:
quantiles

Unnamed: 0,recency,frequency,monetary_value
0.25,30.0,5.0,1145.0
0.5,75.0,6.0,2257.0
0.75,183.0,8.0,3784.0


In [55]:
quantiles = quantiles.to_dict()

In [56]:
quantiles

{'recency': {0.25: 30.0, 0.5: 75.0, 0.75: 183.0},
 'frequency': {0.25: 5.0, 0.5: 6.0, 0.75: 8.0},
 'monetary_value': {0.25: 1145.0, 0.5: 2257.0, 0.75: 3784.0}}

## Creating the RFM segmentation table

In [57]:
rfmSegmentation = rfmTable

In [65]:
#added by sharada
from pandas import Series, DataFrame


We create two classes for the RFM segmentation since, being high recency is bad, while high frequency and monetary value is good. 

In [59]:
# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def RClass(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def FMClass(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1


In [60]:
rfmSegmentation['R_Quartile'] = rfmSegmentation['recency'].apply(RClass, args=('recency',quantiles,))
rfmSegmentation['F_Quartile'] = rfmSegmentation['frequency'].apply(FMClass, args=('frequency',quantiles,))
rfmSegmentation['M_Quartile'] = rfmSegmentation['monetary_value'].apply(FMClass, args=('monetary_value',quantiles,))

In [61]:
rfmSegmentation['RFMClass'] = rfmSegmentation.R_Quartile.map(str) \
                            + rfmSegmentation.F_Quartile.map(str) \
                            + rfmSegmentation.M_Quartile.map(str)

In [62]:
rfmSegmentation.head()

Unnamed: 0_level_0,recency,frequency,monetary_value,R_Quartile,F_Quartile,M_Quartile,RFMClass
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Aaron Bergman,415,3,887,4,4,4,444
Aaron Hawkins,12,7,1744,1,2,3,123
Aaron Smayling,88,7,3050,3,2,2,322
Adam Bellavance,54,8,7756,2,2,1,221
Adam Hart,34,10,3249,2,1,2,212


To get the top 5 best customers? by RFM Class (111), high spenders who buy recently and frequently?

In [63]:
rfmSegmentation[rfmSegmentation['RFMClass']=='111'].sort_values('monetary_value', ascending=False).head(5)

Unnamed: 0_level_0,recency,frequency,monetary_value,R_Quartile,F_Quartile,M_Quartile,RFMClass
customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Sanjit Engle,9,11,12210,1,1,1,111
John Lee,21,11,9801,1,1,1,111
Pete Kriz,9,12,8647,1,1,1,111
Harry Marie,2,10,8237,1,1,1,111
Lena Creighton,16,12,7661,1,1,1,111


### Distribution of Recency, Frequency and Monetary

In [64]:
#Pending

#Drawbacks - check the Outliers

In [None]:
#Tested:no errors