Skip to content

Identifying customer segments based on their purchasing behavior using RFM analysis and K-Means clustering.

Notifications You must be signed in to change notification settings

adzict/online_retail_customer_segmentation

Repository files navigation

Online Retail Dataset - Customer Segmentation

project_header

Table of Contents

  1. Project Introduction
  2. Technologies Used
  3. Methods Used
  4. Project Description
  5. Feature Notebooks and Deliverables
  6. 6. Conclusion and Future Recommendations
  7. Licences
  8. Contact

Project Introduction

This project is aimed at identifying customer clusters / segments based on their purchasing behaviour using K-Means Clustering.

Technologies Used

Methods Used

  • Data Processing / Data Cleaning
  • Data Analysis
  • Descriptive Statistics
  • Feature Engineering
  • Data Visualization
  • Text Preprocessing
  • Principal Component Analysis
  • Clustering Customers using K-Means Clustering
  • Evaluating Model Results
  • Reporting

Project Description

The success of any business depends on its ability to understand its customers and cater to their needs. With the rise of e-commerce and online shopping, it has become increasingly important for businesses to have a deep understanding of their customer base. One way to gain insights into customer behavior is through the use of clustering techniques. In this project, I aim to identify customer segments based on their purchase behavior using RFM analysis and K-Means clustering. I will also heavily focus on Feature Engineering as providing more features will yield better results.

Data Sources

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

Data Source

Attribute Information:

  • InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
  • StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
  • Description: Product (item) name. Nominal.
  • Quantity: The quantities of each product (item) per transaction. Numeric.
  • InvoiceDate: Invoice Date and time. Numeric, the day and time when each transaction was generated.
  • UnitPrice: Unit price. Numeric, Product price per unit in sterling.
  • CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
  • Country: Country name. Nominal, the name of the country where each customer resides.

File Descriptions

Feature Notebooks and Deliverables

Structure of Notebooks

Collapse
  1. Data Preprocessing and Basic EDA

        1. Imports
        2. Data
        3. Basic EDA
           3.1 Missing values
        4. Data Preprocessing
           4.1 Removing the missing values
           4.2 Checking for duplicate rows
           4.3 Outliers
        5. RFM Analysis
           5.1 Recency
           5.2 Frequency
           5.3 Monetary Value
           5.4 RFM Segmentation using scores
           5.5 Visualizing the RFM Level customers using a bar plot
        6. Clustering products into product categories
           6.1 The Elbow Method
           6.2 Visualizing the clusters
        7. Customer Segmentation using Unsupervised Learning
           7.1 PCA
           7.2 K-Means Clustering
        8. Understanding Clusters

6. Conclusion and Future Recommendations

In conclusion, this project aimed to identify customer segments based on their purchase behavior using RFM analysis and K-Means clustering. By analyzing a transnational dataset containing all the transactions of a UK-based online retail company, I was able to gain insights into the purchasing habits of their customer base. Feature engineering played a critical role in enhancing the quality of our results, as it provided us with additional variables to use in my analysis.

The results of the analysis revealed that the customer base could be divided into several distinct segments based on their purchasing behavior. By understanding these segments, the company can tailor its marketing efforts to better meet the needs of its customers, leading to increased customer loyalty and satisfaction.

Overall, this project highlights the importance of understanding customer behavior in today's e-commerce landscape and demonstrates how clustering techniques, coupled with feature engineering, can be powerful tools for gaining insights into customer behavior.

Licenses

Database Contents License (DbCL) v1.0

Contact

Find me on LinkedIn, Twitter or adzictanja.com.

About

Identifying customer segments based on their purchasing behavior using RFM analysis and K-Means clustering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published