**RFM are used to understand and segment customers based on their buying behavior.**
RFM stands for recency, frequency, and monetary value. These are the 3 key metrics that provide the information 
about customer engagement,loyalty, and value to a business.

RFM Analysis is a concept used, especially in the marketing domain for understanding and segmenting customers based on their buying behaviour.

Using RFM Analysis, a business can assess customers:

- recency (the date they made their last purchase)

- frequency (how often they make purchases)

- and monetary value (the amount spent on purchases)

## 1.Importing the libraries:

In [130]:
import pandas as pd
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = 'plotly_white'
from datetime import datetime
import numpy as np

## 2. Reading the dataset:

In [40]:
data = pd.read_csv("rfm_data.csv")
data.head(2)

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location
0,8814,2023-04-11,943.31,Product C,890075,Tokyo
1,2188,2023-04-11,463.7,Product A,176819,London


In [3]:
data.shape

(1000, 6)

## 3. Calculating RFM Values:

#### 3.1 Calculating Recency:

In [62]:
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'])
data['recency'] = (datetime.now().date() - data['PurchaseDate'].dt.date).dt.days

#### 3.2 Calculating the Frequency:

In [42]:
frequency = data.groupby('CustomerID')['OrderID'].count().reset_index()
frequency.rename(columns={'OrderID': 'Frequency'}, inplace=True)
data = data.merge(frequency, how='left', on='CustomerID')

#### 3.3 Calculating the Monetary Value:

In [52]:
total_amount = data.groupby('CustomerID')['TransactionAmount'].sum().reset_index()
total_amount.rename(columns={'TransactionAmount':'Total Amount'}, inplace=True)
data = data.merge(total_amount, how='left', on='CustomerID')

In [63]:
data.head(2)

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location,Frequency,Total Amount,recency
0,8814,2023-04-11,943.31,Product C,890075,Tokyo,1,943.31,325
1,2188,2023-04-11,463.7,Product A,176819,London,1,463.7,325


## 4. Calculating RFM Scores:

#### 4.1 Defining the criteria for the RFM Values:

In [70]:
recency_scores = [5, 4, 3, 2, 1]
frequency_scores = [1, 2, 3, 4, 5]
monetary_scores = [1, 2, 3, 4, 5]

#### 4.2 Caculating the RFM Scores:

In [71]:
data['RecencyScore'] = pd.cut(data['recency'], bins=5, labels=recency_scores)
data['FrequencyScore'] = pd.cut(data['Frequency'], bins=5, labels=frequency_scores)
data['MonetaryScore'] = pd.cut(data['Total Amount'], bins=5, labels=monetary_scores)

##### Convert these new columns from categorical to numerical datatype:

In [74]:
data['RecencyScore'] =  data['RecencyScore'].astype(int)
data['FrequencyScore'] =  data['FrequencyScore'].astype(int)
data['MonetaryScore'] =  data['MonetaryScore'].astype(int)

## 5. RFM Segementation:

#### 5.1 Caculating the RFM Score:

In [76]:
data['RFM_Score'] = data['RecencyScore'] + data['FrequencyScore'] + data['MonetaryScore']

#### 5.2 Create RFM Segments based on the RFM Score using Quantile:

In [103]:
segment_labels = ['Low-Value', 'Mid-Value', 'High-Value']
data['Value Segment'] = pd.qcut(data['RFM_Score'], q=3, labels=segment_labels)

## 6. RFM Segmentation Distribution:

In [109]:
segmentation_count = data['Value Segment'].value_counts().reset_index()

In [110]:
segmentation_count

Unnamed: 0,index,Value Segment
0,Low-Value,435
1,Mid-Value,386
2,High-Value,179


In [167]:
custom_colors = {
    'Low-Value': 'rgb(237, 100, 90)',
    'Mid-Value': 'rgb(47, 138, 196)',
    'High-Value': 'rgb(36, 121, 108)'
}
fig = px.bar(data_frame=segmentation_count, x='index', y='Value Segment', text='Value Segment', 
             title='RFM Value Segment Distribution',
             color='index', color_discrete_map=custom_colors)
fig.update_layout(xaxis_title='RFM Value Segments', yaxis_title='Count')
fig.update_traces(textposition='outside', showlegend=False)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.show()

## 7. RFM Customer Segments:

##### Now, we will categorize the customer based on these RFM scores:

In [159]:
data['RFM Customer Segments'] = ''

In [160]:
data.loc[data['RFM_Score'] >= 9, 'RFM Customer Segments'] = 'Golden Customers'
data.loc[(data['RFM_Score'] >= 6) & (data['RFM_Score'] < 9), 'RFM Customer Segments'] = 'Potential Loyal Customers'
data.loc[(data['RFM_Score'] >=5) & (data['RFM_Score'] < 6), 'RFM Customer Segments'] = 'Customers at Risk'
data.loc[(data['RFM_Score'] >=4) & (data['RFM_Score'] < 5), 'RFM Customer Segments'] = 'About to lose'
data.loc[(data['RFM_Score'] >=3) & (data['RFM_Score'] < 4), 'RFM Customer Segments'] = 'Lost'

In [161]:
customer_status = data['RFM Customer Segments'].value_counts().reset_index()
customer_status.rename(columns={'index':'Customer Segmentation', 
                                'RFM Customer Segments':'Count'}, inplace=True)

In [162]:
customer_status

Unnamed: 0,Customer Segmentation,Count
0,Potential Loyal Customers,503
1,Customers at Risk,180
2,About to lose,173
3,Lost,82
4,Golden Customers,62


#### Visualiziing the Customer Segmentation:

In [166]:
color_mapping = {
    'Potential Loyal Customers': '#00cc96',
    'Customers at Risk': 'rgb(249,123,114)',
    'About to lose': 'rgb(237,100,90)',
    'Lost': '#DC3912',
    'Golden Customers': '#2CA02C'    
}

fig = px.bar(data_frame=customer_status, x='Customer Segmentation', y='Count',
             title='Customer Segmentation based on RFM Score',
            text='Count', color='Customer Segmentation', color_discrete_map=color_mapping)
fig.update_layout(xaxis_title='Customer Segmentation', yaxis_title='Count')
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.update_traces(showlegend=False, textposition='outside')
fig.show()

## 8. RFM Analysis with relation of RFM Segmentation and Customer Segmentation:

In [183]:
segment_product_counts = data.groupby(['Value Segment', 'RFM Customer Segments']).size().reset_index(name='Count')
segment_product_counts = segment_product_counts.sort_values(by='Count', ascending=False)

##### Showing the Analysis with a Tree map:

In [186]:
fig = px.treemap(data_frame=segment_product_counts, path=['Value Segment', 'RFM Customer Segments'],
                values='Count', title='RFM Customer Segments by Value')
fig.show()

## 9. Drilling down to our Golden Customers:

#### 9.1 Distribution of RFM values within Golden Customers:

In [190]:
golden_customers = data[data['RFM Customer Segments'] == 'Golden Customers']

In [195]:
fig = go.Figure()
fig.add_trace(go.Box(y=golden_customers['RecencyScore'], name='Recency'))
fig.add_trace(go.Box(y=golden_customers['FrequencyScore'], name='Frequency'))
fig.add_trace(go.Box(y=golden_customers['MonetaryScore'], name='Monetary'))
fig.update_layout(title='Distribution of RFM values for Golden Customers',
                 yaxis_title='RFM Values')
fig.show()

#### 9.2 The correlation of recency, frequency, and monetary scores:

In [200]:
corr_matrix = golden_customers[['RecencyScore', 'FrequencyScore', 'MonetaryScore']].corr()

In [201]:
corr_matrix

Unnamed: 0,RecencyScore,FrequencyScore,MonetaryScore
RecencyScore,1.0,-0.571727,-0.474715
FrequencyScore,-0.571727,1.0,0.390657
MonetaryScore,-0.474715,0.390657,1.0


In [205]:
heatmap = go.Figure(data=go.Heatmap(
        z=corr_matrix.values,
        x=corr_matrix.columns,
        y=corr_matrix.columns,
        colorscale='RdBu',
        colorbar=dict(title='Correlation')))

heatmap.update_layout(title='Correlation Matrix of RFM Values within Golden Customer Segment')
heatmap.show()

## 10. Recency, Frequency, and Monetary scores of all the segments:

In [214]:
segment_scores = data.groupby('RFM Customer Segments')[['RecencyScore', 'FrequencyScore', 'MonetaryScore']].mean().reset_index()

In [215]:
segment_scores

Unnamed: 0,RFM Customer Segments,RecencyScore,FrequencyScore,MonetaryScore
0,About to lose,1.537572,1.0,1.462428
1,Customers at Risk,2.344444,1.011111,1.644444
2,Golden Customers,3.806452,3.064516,3.225806
3,Lost,1.0,1.0,1.0
4,Potential Loyal Customers,3.918489,1.194831,1.741551


In [226]:
fig = go.Figure()


fig.add_trace(go.Bar(
    x=segment_scores['RFM Customer Segments'],
    y=segment_scores['RecencyScore'],
    name='Recency Score',
    marker_color='rgb(158,202,225)',
    text=segment_scores['RecencyScore']
))


fig.add_trace(go.Bar(
    x=segment_scores['RFM Customer Segments'],
    y=segment_scores['FrequencyScore'],
    name='Frequency Score',
    marker_color='rgb(94,158,217)',
    text=segment_scores['FrequencyScore']
))


fig.add_trace(go.Bar(
    x=segment_scores['RFM Customer Segments'],
    y=segment_scores['MonetaryScore'],
    name='Monetary Score',
    marker_color='rgb(32,102,148)',
    text=segment_scores['MonetaryScore']
))


fig.update_layout(
    title='Comparison of RFM Segments based on Recency, Frequency, and Monetary Scores',
    xaxis_title='RFM Segments',
    yaxis_title='Score',
    barmode='group',
    showlegend=True
)
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')

fig.show()