**Goal of the project**

ABC classification is a simple technique that is commonly used in inventory management and is based on the Pareto principle or 80/20 rule. This says that 80% of consequences come from 20% of causes. In ABC inventory classification, that means that 80% of product sales typically come from 20% of products, so managing the inventory on the most important 20% of products can lead to better efficiency and profits, through lower effort and fewer stockouts.

The Pareto principle applies to many things in ecommerce and marketing, including customer behaviour. As a result, ABC classification can also be used in customer segmentation, since 80% of revenue or orders often come from 20% of customers. Using ABC for customer segmentation means marketing efforts, or sales team efforts can be concentrated on the most important customers, saving labour and marketing costs by trying to treat all customers equally.

In this project, I’ll create an ABC customer segmentation using the Pareto principle.

**Load the packages**

In [599]:
# Importing libraries
import numpy as np
import pandas as pd

**Load the data**

In this project we’re going to use a [standard transactional dataset](https://www.kaggle.com/datasets/marian447/retail-store-sales-transactions) from Kaggle. This anonymized dataset includes 64.682 transactions of 5.242 SKU's sold to 22.625 customers during one year.

In [600]:
# Load dataset
df = pd.read_csv('../input/retail-store-sales-transactions/scanner_data.csv')

In [601]:
# Rename Pandas columns to lower case
df.columns = df.columns.str.lower()

In [602]:
# Examine the data
df.head()

Unnamed: 0,unnamed: 0,date,customer_id,transaction_id,sku_category,sku,quantity,sales_amount
0,1,02/01/2016,2547,1,X52,0EM7L,1.0,3.13
1,2,02/01/2016,822,2,2ML,68BRQ,1.0,5.46
2,3,02/01/2016,3686,3,0H2,CZUZX,1.0,6.35
3,4,02/01/2016,3719,4,0H2,549KK,1.0,5.59
4,5,02/01/2016,9200,5,0H2,K8EHH,1.0,6.88


In [603]:
# Overview of all variables, their datatypes
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131706 entries, 0 to 131705
Data columns (total 8 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   unnamed: 0      131706 non-null  int64  
 1   date            131706 non-null  object 
 2   customer_id     131706 non-null  int64  
 3   transaction_id  131706 non-null  int64  
 4   sku_category    131706 non-null  object 
 5   sku             131706 non-null  object 
 6   quantity        131706 non-null  float64
 7   sales_amount    131706 non-null  float64
dtypes: float64(2), int64(3), object(3)
memory usage: 8.0+ MB


**Create a customer dataset**

We need to take the raw transaction items dataset and create a new dataset based on the customer level. This needs to include the customer ID and the total revenue. We don’t need it for the actual ABC analysis, but while we’re here we may as well calculate the total number of items and SKUs purchased by each customer. To do this we’ll aggregate the Pandas data using the agg( ) function.



In [604]:
df_customers = df.groupby('customer_id').agg(orders = ('transaction_id', 'nunique'),
                                             skus = ('sku', 'nunique'),
                                             quantity = ('quantity', 'sum'),
                                             revenue = ('sales_amount', 'sum')).reset_index()

In [605]:
df_customers.head()

Unnamed: 0,customer_id,orders,skus,quantity,revenue
0,1,1,2,2.0,16.29
1,2,2,2,2.0,22.77
2,3,1,3,4.0,10.92
3,4,2,5,5.0,33.29
4,5,5,2,14.0,78.82


**Segment the customers using ABC**

Finally, we can use the Python package 'abc_analysis' to segment our customers. We’ll define the metric column as the revenue, and we’ll call the ABC class abc_class.

In [606]:
from abc_analysis import abc_analysis

In [607]:
abc = abc_analysis(df_customers['revenue'])

In [608]:
a_index = abc['Aind']
b_index = abc['Bind']
c_index = abc['Cind']

condition_list = [df_customers.index.isin(a_index),
                  df_customers.index.isin(b_index),
                  df_customers.index.isin(c_index)]

choice_list = ['A','B','C']

df_customers['abc_class'] = np.select(cond_list, choice_list)

In [609]:
df_segments = df_customers.sort_values(by = ['revenue'], ascending = False)

In [610]:
df_segments.head()

Unnamed: 0,customer_id,orders,skus,quantity,revenue,abc_class
17470,17471,62,38,814.9,3985.94,A
17103,17104,55,86,407.5,3844.97,A
17293,17294,39,55,246.6,3798.39,A
15539,15540,38,76,165.0,2900.61,A
15676,15677,25,49,171.0,2765.16,A


**Use ABC analysis to examine the segments**

To examine the customer segments we’ll use the Pandas groupby( ) function and agg( ) again to create a summary of statistics from the dataframe. We’ll group by the abc_class and then calculate the number of unique customers, and the sum of orders, SKUs, quantity, and revenue for each ABC Class.

The data show that the 80/20 “rule” does not perfectly apply to customers in this dataset. That’s something which is pretty common. Despite the name, the Pareto rule rarely gives us an exact 80/20 split, but it’s often very close for many metrics and businesses. We find that 22% of customers generate 76% of the revenue in this business.

In [611]:
df_summary = df_segments.groupby('abc_class').agg(customers = ('customer_id', 'nunique'),
                                                  orders = ('orders', 'sum'),
                                                  skus = ('skus', 'sum'),
                                                  quantity = ('quantity', 'sum'),
                                                  revenue = ('revenue', 'sum')).reset_index()

In [612]:
df_summary['retail_revenue'] = round((df_summary['revenue'] / df_summary['revenue'].sum()) * 100, 2)
df_summary['retail_customers'] = round((df_summary['customers'] / df_summary['customers'].sum()) * 100, 2)

In [613]:
df_summary

Unnamed: 0,abc_class,customers,orders,skus,quantity,revenue,retail_revenue,retail_customers
0,A,5180,35592,61053,132903.443,1199999.21,76.04,22.9
1,B,2309,6786,11983,18226.68,128342.44,8.13,10.21
2,C,15136,22304,33616,44494.306,249696.97,15.82,66.9
