Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


In this project I revisit clustering, one of my favourite analytic methods, to explore and analyse a real-world dataset that included a mix of categorical and numerical feature. This required a different approach from the classical K-means algorithm that cannot be no directly applied to categorical data.

Instead, I used the K-medoids algorithm, also known as PAM (Partitioning Around Medoids), that has the advantage of working on distances other than numerical and lends itself well to analyse mixed-type data.

The silhouette coefficient helped to establish the optimal number of clusters, whilst t-SNE ( t-distributed stochastic neighbour embedding), a dimensionality reduction technique akin Principal Component Analysis and UMAP, unveiled good separation between clusters as well as closeness of elements within clusters, confirming the segmentation relevance.

Finally, I condensed the insight generated from the analysis into a number of actionable and data-driven recommendations that, applied correctly, could help improve product sign up.