Steps and considerations to run a successful segmentation with K-means, Principal Components Analysis and Bootstrap Evaluation
In this project I use a complex and feature-rich dataset to run through the practical steps you need to take and considerations you may face when running a customer profiling analysis. I use the K-means clustering technique on a range of different customer attributes to look for potential sub-groups in the customer base, visually examine the clusters with Principal Components Analysis, and validate the cluster’s stability with clusterboot from the fpc package.
I assume that I’m working with a client that wants to get a better understanding of their customer base, with particular emphasis to the monetary value each customer contributes to the business’ bottom line.
One approach that lends itself well to this kind of analysis is the popular RFM segmentation, which takes into consideration 3 main attributes:
- Recency – How recently did the customer purchase?
- Frequency – How often do they purchase?
- Monetary Value – How much do they spend?
This is a popular approach for good reasons: it’s easy to implement (you just need a transactional database with client’s orders over time), and explicitly creates sub-groups based on how much each customer is contributing.
This analysis should provide a solid base for discussion with relevant business stakeholders. Normally I would present my client with a variety of customer profiles based on different combinations of customer features and formulate my own data-driven recommendations. However, it is ultimately down to them to decide how many groups they want settle for and what characteristics each segment should have.
In this post I’m simply loading up the compiled dataset but I’ve also written a post called Loading, Merging and Joining Datasets where I show how I’ve assembled the various data feeds and sorted out the likes of variable naming, new features creation and some general housekeeping tasks. You can find the full data code on this Github repository.
You can find the final article on my website
I've also published the article on Towards Data Science