#RFM Analysis is a powerful technique used by businesses to understand and segment customers based on their purchasing behavior. RFM stands for recency, frequency, and monetary value, representing three key metrics that provide valuable insights into customer engagement, loyalty, and contribution to the business's revenue.

##RFM Analysis is a method employed by Data Science practitioners, particularly in marketing, to analyze and categorize customers according to their purchasing patterns.

###Through RFM Analysis, a company can evaluate customers based on:

###- Recency: The timeframe since their last purchase.
###- Frequency: How often they make purchases.
###- Monetary value: The total amount spent on purchases.

###These three metrics—Recency, Frequency, and Monetary value—offer crucial insights into customer engagement, loyalty, and the significance of individual customers to a business.

In [1]:
import pandas as pd
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"

In [2]:
data = pd.read_csv('/content/rfm_data.csv')
data.head()

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location
0,8814,2023-04-11,943.31,Product C,890075,Tokyo
1,2188,2023-04-11,463.7,Product A,176819,London
2,4608,2023-04-11,80.28,Product A,340062,New York
3,2559,2023-04-11,221.29,Product A,239145,London
4,9482,2023-04-11,739.56,Product A,194545,Paris


#Calculating RFM Values

##I will proceed by computing the Recency, Frequency, and Monetary values of the customers in order to progress further.

In [3]:
from datetime import datetime

# Convert 'PurchaseDate' to datetime
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'])

# Calculate Recency
data['Recency'] = (datetime.now() - data['PurchaseDate']).dt.days

# Calculate Frequency
frequency_data = data.groupby('CustomerID')['OrderID'].count().reset_index()
frequency_data.rename(columns={'OrderID': 'Frequency'}, inplace=True)
data = data.merge(frequency_data, on='CustomerID', how='left')

# Calculate Monetary Value
monetary_data = data.groupby('CustomerID')['TransactionAmount'].sum().reset_index()
monetary_data.rename(columns={'TransactionAmount': 'MonetaryValue'}, inplace=True)
data = data.merge(monetary_data, on='CustomerID', how='left')

###We initiated the process by computing the recency metric, which signifies the duration since the last purchase for each customer. To achieve this, we subtracted the purchase date from the present date, acquiring the count of days utilizing the `datetime.now()` function.

###Next, our focus shifted to determining the frequency metric. Grouping the data by 'CustomerID', we tallied the unique 'OrderID' values to gauge the number of purchases executed by each customer. This yielded the frequency value, denoting the total count of purchases made by individual customers.

###Subsequently, our attention turned towards calculating the monetary value metric. Employing the same grouping methodology by 'CustomerID', we summed the 'TransactionAmount' values to ascertain the total expenditure made by each customer. This computation provided the monetary value, indicating the cumulative financial contribution of each customer.

###By executing these computations, we successfully derived the essential RFM (Recency, Frequency, Monetary Value) metrics for every customer. These metrics serve as pivotal benchmarks for comprehending customer behaviors and facilitating segmentation in RFM analysis.

In [4]:
data.head()

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location,Recency,Frequency,MonetaryValue
0,8814,2023-04-11,943.31,Product C,890075,Tokyo,398,1,943.31
1,2188,2023-04-11,463.7,Product A,176819,London,398,1,463.7
2,4608,2023-04-11,80.28,Product A,340062,New York,398,1,80.28
3,2559,2023-04-11,221.29,Product A,239145,London,398,1,221.29
4,9482,2023-04-11,739.56,Product A,194545,Paris,398,1,739.56


#Calculating RFM Scores

In [5]:
# Define scoring criteria for each RFM value
recency_scores = [5,4,3,2,1]   # Higher score for lower recency (more recent)
frequency_scores = [1,2,3,4,5] # Higher score for higher frequency
monetary_scores = [1,2,3,4,5]  # Higher score for higher monetary value

# Calculate RFM Scores
data['RecencyScore'] = pd.cut(data['Recency'], bins= 5 , labels=recency_scores)
data['FrequencyScore'] = pd.cut(data['Frequency'], bins=5, labels = frequency_scores)
data['MonetaryScore'] = pd.cut(data['MonetaryValue'], bins = 5 , labels = monetary_scores)

###We graded recency scores on a scale from 5 to 1, allocating higher scores to customers with more recent purchases. Thus, those who have made purchases more recently received higher recency scores.

###Similarly, for frequency scores, we employed a scale from 1 to 5, assigning higher scores to customers with greater purchase frequency. Consequently, customers who made purchases more frequently were awarded higher frequency scores.

###For computing the monetary score, we employed a scale from 1 to 5, with higher scores reflecting a larger expenditure by the customer.

###In deriving the RFM scores, we utilized the `pd.cut()` function to partition recency, frequency, and monetary values into distinct bins. Each value was segmented into 5 bins, and scores were allocated accordingly to each bin.

###To proceed with further calculations, we need to convert their data type into integers.

In [6]:
# Convert RFM scores to numeric type
data['RecencyScore'] = data['RecencyScore'].astype(int)
data['FrequencyScore'] = data['FrequencyScore'].astype(int)
data['MonetaryScore'] = data['MonetaryScore'].astype(int)

#RFM Value Segmentation

###Let's now compute the final RFM score and determine the corresponding value segment based on these scores.

In [7]:
# Calculate RFM score by combining the individual scores
data['RFM_Score'] = data['RecencyScore'] + data['FrequencyScore'] + data['MonetaryScore']

# Create RFM segments based on the RFM score
segmentlabels = ['Low-Value', 'Mid-Value','High-Value']

data['Value Segment'] = pd.qcut(data["RFM_Score"], q=3 , labels = segmentlabels)

###The RFM score is calculated by summing up the scores obtained for recency, frequency, and monetary value. For instance, if a customer has a recency score of 3, a frequency score of 4, and a monetary score of 5, their RFM score will be 12.

###After computing the RFM scores, we categorized them into three segments: "Low-Value," "Mid-Value," and "High-Value." This segmentation was accomplished using the `pd.qcut()` function, which evenly distributes scores among segments.

##Let's examine the resulting data now.

In [8]:
data.head()

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location,Recency,Frequency,MonetaryValue,RecencyScore,FrequencyScore,MonetaryScore,RFM_Score,Value Segment
0,8814,2023-04-11,943.31,Product C,890075,Tokyo,398,1,943.31,1,1,2,4,Low-Value
1,2188,2023-04-11,463.7,Product A,176819,London,398,1,463.7,1,1,1,3,Low-Value
2,4608,2023-04-11,80.28,Product A,340062,New York,398,1,80.28,1,1,1,3,Low-Value
3,2559,2023-04-11,221.29,Product A,239145,London,398,1,221.29,1,1,1,3,Low-Value
4,9482,2023-04-11,739.56,Product A,194545,Paris,398,1,739.56,1,1,2,4,Low-Value


#Let's now examine the distribution of segments.

In [9]:
# RFM Segment Distribution
segment_counts = data['Value Segment'].value_counts().reset_index()
segment_counts.columns = ['Value Segment', 'Count']

# Choose a color palette that is visually accessible
pastel_colors = px.colors.qualitative.Pastel

# Create the bar chart
fig_segment_dist = px.bar(segment_counts, x='Value Segment', y='Count',
                          color='Value Segment', color_discrete_sequence=pastel_colors,
                          title='RFM Value Segment Distribution')

# Update the layout
fig_segment_dist.update_layout(
    xaxis_title='RFM Value Segment',
    yaxis_title='Count',
    showlegend=False,
    uniformtext_minsize=8,  # Adjust minimum text size for better readability
    uniformtext_mode='hide',  # Hide text if it doesn't fit within bars
    bargap=0.1,  # Reduce the gap between bars
    plot_bgcolor='rgba(0,0,0,0)',  # Set plot background color to transparent
    xaxis=dict(
        tickmode='array',  # Specify tickmode as 'array' for manual positioning
        tickvals=list(segment_counts['Value Segment']),  # Use segment labels for tick values
        ticktext=list(segment_counts['Value Segment']),  # Use segment labels for tick text
        tickangle=45  # Rotate x-axis labels for better readability
    )
)

# Add annotations to provide additional insights
annotations = [dict(
    x=pos,
    y=count,
    text=str(count),
    xanchor='center',
    yanchor='bottom',
    showarrow=False,
    font=dict(
        color='black',
        size=10
    )
) for pos, count in zip(segment_counts['Value Segment'], segment_counts['Count'])]

fig_segment_dist.update_layout(annotations=annotations)

# Show the figure
fig_segment_dist.show()

#RFM Customer Segments

##The RFM value segments we calculated earlier represent the categorization of customers based on their RFM scores into groups such as "low value," "medium value," and "high value." These segments enable a detailed analysis of customer RFM characteristics by dividing RFM scores into distinct ranges or groups.

##Now, let's move on to creating and analyzing broader classifications known as RFM Customer Segments. These segments, such as "Champions," "Potential Loyalists," and "Can't Lose," offer a strategic perspective on customer behavior and characteristics in terms of recency, frequency, and monetary aspects. Here's how to create the RFM customer segments:

In [10]:
# Create a new column for RFM Customer Segments
data['RFM Customer Segments'] = ''

# Assign RFM segments based on the RFM score
data.loc[data['RFM_Score'] >= 9, 'RFM Customer Segments'] = 'Champions'
data.loc[(data['RFM_Score'] >= 6) & (data['RFM_Score'] < 9), 'RFM Customer Segments'] = 'Potential Loyalists'
data.loc[(data['RFM_Score'] >= 5) & (data['RFM_Score'] < 6), 'RFM Customer Segments'] = 'At Risk Customers'
data.loc[(data['RFM_Score'] >= 4) & (data['RFM_Score'] < 5), 'RFM Customer Segments'] = "Can't Lose"
data.loc[(data['RFM_Score'] >= 3) & (data['RFM_Score'] < 4), 'RFM Customer Segments'] = "Lost"

# Print the updated data with RFM segments
print(data[['CustomerID', 'RFM Customer Segments']])

     CustomerID RFM Customer Segments
0          8814            Can't Lose
1          2188                  Lost
2          4608                  Lost
3          2559                  Lost
4          9482            Can't Lose
..          ...                   ...
995        2970   Potential Loyalists
996        6669   Potential Loyalists
997        8836   Potential Loyalists
998        1440   Potential Loyalists
999        4759   Potential Loyalists

[1000 rows x 2 columns]


###In the provided code, we'll assign RFM segments to customers based on their RFM scores and then create a new column called "RFM Customer Segments" in the data.

#RFM Analysis

###Let's analyze the distribution of customers across different RFM customer segments within each value segment.

In [11]:
# Group by 'Value Segment' and 'RFM Customer Segments' and count the occurrences
segment_product_counts = data.groupby(['Value Segment', 'RFM Customer Segments']).size().reset_index(name='Count')

# Sort the data by 'Count' in descending order
segment_product_counts = segment_product_counts.sort_values('Count', ascending=False)

# Create the treemap visualization
fig_treemap_segment_product = px.treemap(segment_product_counts,
                                         path=['Value Segment', 'RFM Customer Segments'],
                                         values='Count',
                                         color='Value Segment', color_discrete_sequence=px.colors.qualitative.Pastel,
                                         title='RFM Customer Segments by Value')

# Update the layout
fig_treemap_segment_product.update_layout(
    margin=dict(t=50, l=10, r=10, b=10),
    uniformtext=dict(minsize=10, mode='hide'),
)

# Show the figure
fig_treemap_segment_product.show()

###Let's examine the distribution of RFM values within the Champions segment.

In [12]:
# Filter the data to include only the customers in the Champions segment
champions_segment = data[data['RFM Customer Segments'] == 'Champions']

# Create a box plot for each RFM value
fig = go.Figure()
fig.add_trace(go.Box(y=champions_segment['RecencyScore'], name='Recency'))
fig.add_trace(go.Box(y=champions_segment['FrequencyScore'], name='Frequency'))
fig.add_trace(go.Box(y=champions_segment['MonetaryScore'], name='Monetary'))

# Update layout and add title
fig.update_layout(
    title='Distribution of RFM Values within Champions Segment',
    yaxis_title='RFM Value',
    showlegend=True
)

# Show the figure
fig.show()

##Let's examine the correlation among the recency, frequency, and monetary scores within the Champions segment.

In [13]:
# Calculate the correlation matrix
correlation_matrix = champions_segment[['RecencyScore', 'FrequencyScore', 'MonetaryScore']].corr()

# Create a heatmap to visualize the correlation matrix
fig_heatmap = go.Figure(data=go.Heatmap(
    z=correlation_matrix.values,
    x=correlation_matrix.columns,
    y=correlation_matrix.columns,
    colorscale='RdBu',
    colorbar=dict(title='Correlation')
))

# Update layout and add title
fig_heatmap.update_layout(
    title='Correlation Matrix of RFM Values within Champions Segment'
)

# Show the figure
fig_heatmap.show()

##Let's examine the count of customers in all segments.

In [14]:
import plotly.colors

# Retrieve pastel colors from Plotly
pastel_colors = plotly.colors.qualitative.Pastel

# Calculate the count of customers in each RFM segment
segment_counts = data['RFM Customer Segments'].value_counts()


comparison_fig = go.Figure(data=[go.Bar(x=segment_counts.index, y=segment_counts.values,
                             marker=dict(color=pastel_colors))])


champions_color = 'rgb(158, 202, 225)'
comparison_fig.update_traces(marker_color=[champions_color if segment == 'Champions' else pastel_colors[i]
                                 for i, segment in enumerate(segment_counts.index)],
                  marker_line_color='rgb(8, 48, 107)',
                  marker_line_width=1.5, opacity=0.6)

# Update the layout
comparison_fig.update_layout(title='Comparison of RFM Segments',
                  xaxis_title='RFM Segments',
                  yaxis_title='Number of Customers',
                  showlegend=False)

# Show the figure
comparison_fig.show()

##Let's examine the recency, frequency, and monetary scores across all segments.

In [15]:
# Calculate the average Recency, Frequency, and Monetary scores for each segment
segment_scores = data.groupby('RFM Customer Segments')[['RecencyScore', 'FrequencyScore', 'MonetaryScore']].mean().reset_index()

# Create a grouped bar chart to compare segment scores
fig = go.Figure()

# Add bars for Recency score
fig.add_trace(go.Bar(
    x=segment_scores['RFM Customer Segments'],
    y=segment_scores['RecencyScore'],
    name='Recency Score',
    marker_color='rgb(158,202,225)'
))

# Add bars for Frequency score
fig.add_trace(go.Bar(
    x=segment_scores['RFM Customer Segments'],
    y=segment_scores['FrequencyScore'],
    name='Frequency Score',
    marker_color='rgb(94,158,217)'
))

# Add bars for Monetary score
fig.add_trace(go.Bar(
    x=segment_scores['RFM Customer Segments'],
    y=segment_scores['MonetaryScore'],
    name='Monetary Score',
    marker_color='rgb(32,102,148)'
))

# Update the layout
fig.update_layout(
    title='Comparison of RFM Segments based on Recency, Frequency, and Monetary Scores',
    xaxis_title='RFM Segments',
    yaxis_title='Score',
    barmode='group',
    showlegend=True
)

# Show the figure
fig.show()

#Constructing a Python-Based RFM Analytics Dashboard

In [16]:
!pip install Dash

Collecting Dash
  Downloading dash-2.17.0-py3-none-any.whl (7.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m25.7 MB/s[0m eta [36m0:00:00[0m
Collecting dash-html-components==2.0.0 (from Dash)
  Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Collecting dash-core-components==2.0.0 (from Dash)
  Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Collecting dash-table==5.0.0 (from Dash)
  Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)
Collecting retrying (from Dash)
  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Installing collected packages: dash-table, dash-html-components, dash-core-components, retrying, Dash
Successfully installed Dash-2.17.0 dash-core-components-2.0.0 dash-html-components-2.0.0 dash-table-5.0.0 retrying-1.3.4


##Now, let's explore how to transform our RFM Analysis into a dashboard.

In [17]:
import dash
from dash import dcc, html
from dash.dependencies import Input, Output

# Initialize the Dash app
app = dash.Dash(__name__)

# Define the app layout using Bootstrap components
app.layout = html.Div([
    html.H1("RFM Analysis Dashboard", className="text-center mt-5 mb-4", style={'color': '#007BFF', 'font-weight': 'bold'}),
    html.Div("Analyze customer segments based on RFM scores.", className="text-center mb-4"),

    # Dropdown for selecting the chart
    html.Div([
        html.Label("Select Chart:", style={'font-weight': 'bold'}),
        dcc.Dropdown(
            id='chart-type-dropdown',
            options=[
                {'label': 'RFM Value Segment Distribution', 'value': 'segment_distribution'},
                {'label': 'Distribution of RFM Values within Customer Segment', 'value': 'RFM_distribution'},
                {'label': 'Correlation Matrix of RFM Values within Champions Segment', 'value': 'correlation_matrix'},
                {'label': 'Comparison of RFM Segments', 'value': 'segment_comparison'},
                {'label': 'Comparison of RFM Segments based on Scores', 'value': 'segment_scores'},
            ],
            value='segment_distribution',  # Default selection
            className="form-control",
        ),
    ], className="container mb-4"),

    # Graph container
    html.Div([
        dcc.Graph(id='rfm-chart'),
    ], className="container"),
])

# Define callback to update the selected chart
@app.callback(
    Output('rfm-chart', 'figure'),
    [Input('chart-type-dropdown', 'value')]
)
def update_chart(selected_chart_type):
    if selected_chart_type == 'segment_distribution':
        return fig_segment_dist
    elif selected_chart_type == 'RFM_distribution':
        return fig_treemap_segment_product
    elif selected_chart_type == 'correlation_matrix':
        return fig_heatmap
    elif selected_chart_type == 'segment_comparison':
        return comparison_fig
    elif selected_chart_type == 'segment_scores':
        return fig

    # Return a default chart if no valid selection
    return fig_segment_dist

In [18]:
if __name__ == '__main__':
    app.run_server(port=8052)

<IPython.core.display.Javascript object>