This is my adaptation of an article written by Baris Karaman on Towards Data Science.

I used his article as a way to get familiar with grouping customers based on their recency, frequency, and monetary value—an incredibly powerful way to understand your customers and how they fit within specific segments.

This understanding can be used to guide future decisions and give non-technical stakeholders a quick, reliable, and easy-to-digest view of their customer base.

Baris's article can be found here:

https://towardsdatascience.com/data-driven-growth-with-python-part-2-customer-segmentation-5c019d150444

### RFM Segmentation

In this notebook, I'll be working through an orders dataset from a UK-based online retailer and creating different customer segments based on their R (recency), F (frequency), and M (monetary value).

Recency can be thought of in a few different ways; in today's notebook, it will be shown as the number of days since a customer's last purchase.

Frequency looks at how often a customer makes purchases within a specific time period.

Monetary value is a customer's total orders multiplied by their average order value, again within a specified time window. High-revenue customers usually go hand in hand with recency and frequency.

Theoretically, we'll end up with segments like...

1. Low Value

Customers who are not very frequent buyers and are less active. They generate very low, or maybe even negative, monetary value.

2. Medium Value

Medium-value customers tend to make purchases often and are fairly active. They generate solid revenue and are important to understand.

3. High Value

Frequent, high-order volume customers. They generate a ton of revenue and are a group you never want to lose.

Let's look at recency first. We'll examine a customers most recent purchase find how long it's been since they've made that purchase. (measure in days) 

![Recency.png](attachment:Recency.png)

![Screenshot%202023-11-14%20at%2014.35.30.png](attachment:Screenshot%202023-11-14%20at%2014.35.30.png)

So, looking at our customers and their most recent purchase, the average recency (in days) is 90, and our median is 49 days. The longest it's been since a customer of ours made a purchase is 373 days. We have our highest distribution of recency within 0 - 9 days, meaning 720 customers have made a purchase within the last week and a half.

Now, we're going to use machine learning, specifically K-means clustering, to group our customers into specific groups based on their recency. Before we move forward, we need to determine how many clusters or segments we want to create. This is typically based on intuition and domain expertise, but since we don't have that right now, there are ways to ask our model what it thinks is best. Let's look.

![elbow-2.png](attachment:elbow-2.png)

It looks like 3 is the optimal number of segments here. When looking at an elbow graph like this, we can make our decision based on the elbow in the graph and its corresponding x-value. A case could be made for two or three, and any real decision would be based on end business requirments. I'm actually going to choose four, as four customer segments makes the most sense to me. 

![Screenshot%202023-11-14%20at%2014.40.35.png](attachment:Screenshot%202023-11-14%20at%2014.40.35.png)

We've ran our clustering model specifying four segments and assigned each customer to their appropriate segment. Segment 0 belongs to our least recent customers, and segment 3 belongs to our most recent. 

We're going to do the same thing for frequency and monetary value. 

Remember, recency saw how many days it's been since a customers last purchase. Frequency will look at how many unique purchases were made within a time period. 

![Frequency.png](attachment:Frequency.png)

![Screenshot%202023-11-14%20at%2014.42.50.png](attachment:Screenshot%202023-11-14%20at%2014.42.50.png)

Segment 0 represents our least frequent customers, and segment three represents our most frequent customers. 

You can see how many customers are in each segment, and what their average number of purchases is. For segment 3, the average number of purchases is 5917 an there are only 3 customers in this segment. 

Now we'll do the same thing for monetary value. We can call if revenue from now on. 

![Revenue.png](attachment:Revenue.png)

![Screenshot%202023-11-14%20at%2014.46.41.png](attachment:Screenshot%202023-11-14%20at%2014.46.41.png)

Same logic as before.

We now have our customers in three distinct segments based on their recency, frequency, and monetary value. Now we're going to combine the three to create an overall score, i.e., how well does a customer fit within all three segments. Could a customer have high revenue but low frequency and recency? We'll be able to answer questions like this after we find our combined score.

![Screenshot%202023-11-14%20at%2014.47.38.png](attachment:Screenshot%202023-11-14%20at%2014.47.38.png)

Remember that our highest segment was three for each group, so the maximum possible score is nine if they scored three across each cluster.

Eight seems to be the highest score here, with customers who scored eight being our most valuable customers. Curiously, though, the average revenue seems to be way higher for cluster seven. Perhaps a customer who spent a massive amount a while ago is causing that.

Let's make things easier to understand by throwing our customers into three segments:

1. Low value: corresponding to a score of 0 - 2. 
2. Medium value: corresponding to a score of 3 - 5. 
3. High value: corresponding to a score of 6+

![newplot.png](attachment:newplot.png)

![newplot%201.png](attachment:newplot%201.png)

![newplot.png](attachment:newplot.png)

The takeaways here are fairly intuitive:

1. You can see high-value customers tend to make more purchases and drive more revenue. (graph 1)
2. You can see high-value customers tend to be the most recent buyers and have higher revenue. (graph 2)
3. You can see high-value customers tend to be the most recent and most frequent. (graph 3)


Using these segments, we can begin to take action. We can create stategies to optimize our high value customers, and strategies to improve RFM among mid and low value customers. 