RFM (Recency Frequency Monetary) Analysis
RFM is a method used for analyzing customer value. It is commonly used in database marketing and direct marketing and has received particular attention in retail and professional services industries

RFM stands for the three dimensions:

Recency – How recently did the customer purchase?
Frequency – How often do they purchase?
Monetary Value – How much do they spend?

Definition of Data

* InvoiceNo: Unique Transaction Number
* StockCode: Product Code
* Description: Product Definition
* Quantity: Product Amount.
* InvoiceDate: Invoice Date :)
* UnitPrice: Product Price (Sterlin cinsinden)
* CustomerID: Unique Customer Number
* Country: Country Name

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
#importing all important package..

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import warnings
warnings.filterwarnings("ignore")

In [None]:
#load data into pandas dataframe..
df = pd.read_csv("../input/ecommerce-data/data.csv", encoding="ISO-8859-1")

df1=df.copy()
df1.head()

In [None]:

df1.shape
(541909, 8)
df1["TotalPrice"]=df1["UnitPrice"] * df1["Quantity"]

In [None]:
# Added total price column for using in Monatary Value
df1.head()

In [None]:
#Drop NA data for accurate analysis
df1.isnull().sum()
df1.dropna(inplace=True)

In [None]:
# Proof of NA data exclusion
df1.isnull().sum()

In [None]:
#We have some returned invoices and we need to exclude that Returned Invoices 
df=df[df["InvoiceNo"].str.contains("C",na=False)]
df.head()

In [None]:
#Returned invoices have removed from dataset.
df1=df1[~df1["InvoiceNo"].str.contains("C",na=False)]
df1.shape

In [None]:
# RFM Metrics Calculation
# We need to decide analysis date 

df1["InvoiceDate"].max()

In [None]:
df1.info()
print(df1[0:1].values)
(df1.InvoiceDate).max()

In [None]:
# Our maximum InvoiceDate is '9/9/2011 9:52' so we will add 2 days to maximum Invoice Date (For Recency calculation its mandatory step)
today_date=dt.datetime(2011,12,11)
today_date
#df1.InvoiceDate=df1.InvoiceDate.astype(dt.datetime)

In [None]:
print(today_date) 
print(df1.InvoiceDate.max())
# I convert string to date time because of error and applied all columns
d,m,y=df1.InvoiceDate.max().split()[0].split("/")
new_date = dt.datetime(int(y),int(m),int(d))
(today_date - new_date).days

def convert_to_datetime(_date):
    m,d,y=_date.split()[0].split("/")
    new_date = dt.datetime(int(y),int(m),int(d))
    return new_date

df1.InvoiceDate = df1.InvoiceDate.apply(convert_to_datetime)

In [None]:
rfm = df1.groupby('CustomerID').agg({'InvoiceDate': lambda InvoiceDate: (today_date - InvoiceDate.max()).days,
                                     'InvoiceNo': lambda InvoiceNo: InvoiceNo.nunique(),
                                     'TotalPrice': lambda TotalPrice: TotalPrice.sum()})

In [None]:
# lambda date:(today_date-date max()).days, RECENCY
# lambda num:num num.nunique(), FREQUENCY
# lambda TotalPrice: TotalPrice:TotalPrice.sum()}) Monatary

rfm

In [None]:
rfm.head()
rfm.columns = ['recency', 'frequency', 'monetary']
rfm.describe().T

In [None]:
rfm = rfm[rfm["monetary"] > 0]
# Group of Recency Fequency and Monatery .Monetary score should be excluded in RFM score.
rfm["recency_score"] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])
# 0,20,40,60,80,100

rfm["frequency_score"] = pd.qcut(rfm['frequency'].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])


rfm["monetary_score"] = pd.qcut(rfm['monetary'], 5, labels=[1, 2, 3, 4, 5])


rfm["RFM_SCORE"] = (rfm['recency_score'].astype(str) +
                    rfm['frequency_score'].astype(str))

rfm.head()

In [None]:
# You can find below table groups of our customers with Recency and Frequency points
seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

In [None]:
rfm.head()