# Customer Segmentation

Customer segmentation is grouping of customers based on varius properties that the customers share in common within your product or service.
We can use any transactional infrmation to segment customers to distinct classes.

<h5>Customer Segmentation Models</h5>

1. Recency, Frequency, Monetary (RFM).<br>
2. High-Value Customers (HVCs) based on RFM<br>
3. Customer Status (Active or Inactive).<br>
4. Demographic (gender,age, religion e.t.c).<br>
5. Behaviour (spending habits, purchasing habits, browsing, habits, product rating, brand loyalty e.t.c).<br>
6. Psychographic (beliefs, lifestyles, hobbies, values e.t.c).<br>
7. Geographic (country, region, city e.t.c).

<h5>Benefits of Customer Segmentation</h5>

1. Marketing strategy.<br>
2. Promotion strategy.<br>
3. Budget efficiency.<br>
4. Product developemnt.<br>
5. Customer demand relevance.<br>
6. Customer retention.<br>
7. Branding strategy.<br>

<a href="https://looker.com/blog/creating-actionable-customer-segmentation-models">more details</a>


## Import Required Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

In [3]:
px.defaults.template = "ggplot2"
plt.style.use('ggplot')

## Load Data

In [4]:
df=pd.read_csv("../datasets/Customer Lifetime Value Online Retail.csv",encoding="cp1252")

## Explore Data

In [5]:
df.shape

(397924, 14)

In [6]:
df.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,TotalSales,Date,Month,Year,Day
0,0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,01/12/2010 08:26,2.55,17850,United Kingdom,15.3,2010-12-01,2010-12,2010,1
1,1,536365,71053,WHITE METAL LANTERN,6,01/12/2010 08:26,3.39,17850,United Kingdom,20.34,2010-12-01,2010-12,2010,1
2,2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,01/12/2010 08:26,2.75,17850,United Kingdom,22.0,2010-12-01,2010-12,2010,1
3,3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,01/12/2010 08:26,3.39,17850,United Kingdom,20.34,2010-12-01,2010-12,2010,1
4,4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,01/12/2010 08:26,3.39,17850,United Kingdom,20.34,2010-12-01,2010-12,2010,1


In [7]:
df.describe()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,Quantity,UnitPrice,CustomerID,TotalSales,Year,Day
count,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0
mean,278465.221859,560617.126645,13.021823,3.116174,15294.315171,22.394748,2010.934259,15.042181
std,152771.368303,13106.167695,180.42021,22.096788,1713.169877,309.055588,0.247829,8.653771
min,0.0,536365.0,1.0,0.0,12346.0,0.0,2010.0,1.0
25%,148333.75,549234.0,2.0,1.25,13969.0,4.68,2011.0,7.0
50%,284907.5,561893.0,6.0,1.95,15159.0,11.8,2011.0,15.0
75%,410079.25,572090.0,12.0,3.75,16795.0,19.8,2011.0,22.0
max,541908.0,581587.0,80995.0,8142.75,18287.0,168469.6,2011.0,31.0


check for null values

In [8]:
df.isnull().sum()

Unnamed: 0     0
InvoiceNo      0
StockCode      0
Description    0
Quantity       0
InvoiceDate    0
UnitPrice      0
CustomerID     0
Country        0
TotalSales     0
Date           0
Month          0
Year           0
Day            0
dtype: int64

Drop null records without customerID

In [9]:
df = df[pd.notnull(df['CustomerID'])]

Remove negative quantities

In [10]:
df=df[df['Quantity']>0]

Convert customerID to string

In [11]:
df['CustomerID'] = df['CustomerID'].astype(int)
df['CustomerID'] = df['CustomerID'].astype(str) 

Convert InvoiceDate to date datatype

In [13]:
df['Date'] = pd.to_datetime(df['InvoiceDate'], format="%d/%m/%Y %H:%M").dt.date

Create TotalSales column

In [14]:
df['TotalSales']=df['Quantity']*df['UnitPrice']
df['TotalSales']=round(df['TotalSales'],2)

## Analysis and Visualization

Customer distribution by country

In [18]:
def cust_dist_by_country(df):
    customer_count_df=df.groupby( ["Country"], as_index=False )["CustomerID"].count().sort_values(by="CustomerID",ascending=False)
    customer_count_df.columns=['Country','Customers']
    fig=px.bar(customer_count_df.head(10),x='Country',y='Customers',text='Customers',color='Country', log_y=True,title='Top 10 Customers Distribution per Country')
    fig.update_layout(legend=dict(yanchor="top",y=0.99,xanchor="left",x=0.8),autosize=True,margin=dict(t=30,b=0,l=0,r=0))
    return fig

cust_dist_by_country(df)

Revenue distribution by country

In [20]:
def renevue_dist_by_country(df):
    revenue_per_country_df=df.groupby( ["Country"], as_index=False )["TotalSales"].sum().sort_values(by="TotalSales",ascending=False)
    revenue_per_country_df.columns=['Country','TotalSales']
    revenue_per_country_df=round(revenue_per_country_df,2)
    fig=px.bar(revenue_per_country_df.head(10),x='Country',y='TotalSales',text='TotalSales',color='Country', log_y=True,title='Top 10 Customers Distribution per Revenue')
    fig.update_layout(legend=dict(yanchor="top",y=0.99,xanchor="left",x=0.8),autosize=True,margin=dict(t=30,b=0,l=0,r=0))
    return fig

renevue_dist_by_country(df)