# Customer Relationship Management (CRM)

## Customer Segmentation with Customer Lifetime Value (CLTV)
___

Business Problem: There is an e-commerce company and this company want to segments it's customers for develop marketing strategy
___

Variables

Invoice: Unique number for each invoice. If invoice number starts with 'C', it's mean this invoice cancelled.

StockCode: Unique number for each product.

Description: Prooduct name.

Quantity: Indicates how many units of the product were sold.

InvoiceDate: Invoice date.

Price: Product price for each units. (Sterling)

CustumerID: Unique ID for each customer.

Country: Customer country.

**Import Libraries**

In [109]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

**Read Excel File**

In [110]:
df_ = pd.read_excel('crm_analytics/datasets/online_retail_II.xlsx', sheet_name='Year 2009-2010')

In [111]:
df = df_.copy()

**First 5 rows**

In [112]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom


## Data Preparation

**Check the data shape**

In [113]:
df.shape

(525461, 8)

**Check null values**

In [114]:
df.isnull().sum()

Invoice             0
StockCode           0
Description      2928
Quantity            0
InvoiceDate         0
Price               0
Customer ID    107927
Country             0
dtype: int64

**Deleting null values**

In [115]:
df.dropna(inplace=True)

In [116]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Quantity,417534.0,12.758815,101.220424,-9360.0,2.0,4.0,12.0,19152.0
Price,417534.0,3.887547,71.131797,0.0,1.25,1.95,3.75,25111.09
Customer ID,417534.0,15360.645478,1680.811316,12346.0,13983.0,15311.0,16799.0,18287.0


**Deleting cancelled invoice**

In [117]:
df = df[~df['Invoice'].str.contains('C', na=False)]

In [118]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Quantity,407695.0,13.586686,96.842229,1.0,2.0,5.0,12.0,19152.0
Price,407695.0,3.294188,34.756655,0.0,1.25,1.95,3.75,10953.5
Customer ID,407695.0,15368.504107,1679.7957,12346.0,13997.0,15321.0,16812.0,18287.0


**Add total price to dataframe**

In [119]:
df['TotalPrice'] = df['Quantity'] * df['Price']

In [120]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,TotalPrice
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom,83.4
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,81.0
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom,81.0
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom,100.8
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom,30.0


**Group by 'Customer ID' for unique customers**

In [121]:
cltv_cal = df.groupby('Customer ID').agg({'Invoice': lambda x: x.nunique(),
                                                'Quantity': lambda x: x.sum(),
                                                'TotalPrice': lambda x: x.sum()})

cltv_cal.columns = ['total_transaction', 'total_unit', 'total_price']

In [122]:
cltv_cal

Unnamed: 0_level_0,total_transaction,total_unit,total_price
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12346.0,11,70,372.86
12347.0,2,828,1323.32
12348.0,1,373,222.16
12349.0,3,993,2671.14
12351.0,1,261,300.93
...,...,...,...
18283.0,6,336,641.77
18284.0,1,494,461.68
18285.0,1,145,427.00
18286.0,2,608,1296.43


## Calculation CLTV Parameters

CLTV = (Customer Value / Churn Rate) x Profit Margin

Customer Value = Average Order Value x Purchase Frequency

Average Order Value = Total Price / Total Transaction

Purchase Frequency = Total Transaction / Total Number of Customer

Churn Rate = 1 - Repeat Rate(Retention Rate)

Profit Margin = Total Price x Profit

**Average Order Value**

In [123]:
cltv_cal['average_order_value'] = cltv_cal['total_price'] / cltv_cal['total_transaction']

In [124]:
cltv_cal.head()

Unnamed: 0_level_0,total_transaction,total_unit,total_price,average_order_value
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12346.0,11,70,372.86,33.896364
12347.0,2,828,1323.32,661.66
12348.0,1,373,222.16,222.16
12349.0,3,993,2671.14,890.38
12351.0,1,261,300.93,300.93


**Purchase Frequency**

In [125]:
cltv_cal['purchase_frequency'] = cltv_cal['total_transaction'] / cltv_cal.shape[0]

In [126]:
cltv_cal.head()

Unnamed: 0_level_0,total_transaction,total_unit,total_price,average_order_value,purchase_frequency
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
12346.0,11,70,372.86,33.896364,0.00255
12347.0,2,828,1323.32,661.66,0.000464
12348.0,1,373,222.16,222.16,0.000232
12349.0,3,993,2671.14,890.38,0.000695
12351.0,1,261,300.93,300.93,0.000232


**Churn Rate and Repeat Rate(Retention Rate)**

In [127]:
repeat_rate = cltv_cal[cltv_cal['total_transaction'] > 1].shape[0] / cltv_cal.shape[0]
repeat_rate

0.6706073249884098

In [128]:
churn_rate = 1 - repeat_rate
churn_rate

0.3293926750115902

**Profit margin (Profit = 0.1)**

In [129]:
cltv_cal['profit_margin'] = cltv_cal['total_price'] * 0.1

In [130]:
cltv_cal.head()

Unnamed: 0_level_0,total_transaction,total_unit,total_price,average_order_value,purchase_frequency,profit_margin
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12346.0,11,70,372.86,33.896364,0.00255,37.286
12347.0,2,828,1323.32,661.66,0.000464,132.332
12348.0,1,373,222.16,222.16,0.000232,22.216
12349.0,3,993,2671.14,890.38,0.000695,267.114
12351.0,1,261,300.93,300.93,0.000232,30.093


**Customer Value**

In [131]:
cltv_cal['customer_value'] = cltv_cal['average_order_value'] * cltv_cal['purchase_frequency']

In [132]:
cltv_cal.head()

Unnamed: 0_level_0,total_transaction,total_unit,total_price,average_order_value,purchase_frequency,profit_margin,customer_value
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12346.0,11,70,372.86,33.896364,0.00255,37.286,0.08643
12347.0,2,828,1323.32,661.66,0.000464,132.332,0.30675
12348.0,1,373,222.16,222.16,0.000232,22.216,0.051497
12349.0,3,993,2671.14,890.38,0.000695,267.114,0.619179
12351.0,1,261,300.93,300.93,0.000232,30.093,0.069757


**Customer Lifetime Value**

In [133]:
cltv_cal['cltv'] = (cltv_cal['customer_value'] / churn_rate) * cltv_cal['profit_margin']

In [134]:
cltv_cal.head()

Unnamed: 0_level_0,total_transaction,total_unit,total_price,average_order_value,purchase_frequency,profit_margin,customer_value,cltv
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12346.0,11,70,372.86,33.896364,0.00255,37.286,0.08643,9.783574
12347.0,2,828,1323.32,661.66,0.000464,132.332,0.30675,123.235455
12348.0,1,373,222.16,222.16,0.000232,22.216,0.051497,3.473263
12349.0,3,993,2671.14,890.38,0.000695,267.114,0.619179,502.110408
12351.0,1,261,300.93,300.93,0.000232,30.093,0.069757,6.372897


**Segment all customers**

In [135]:
cltv_cal['segments'] = pd.qcut(cltv_cal['cltv'], 4, ['D', 'C', 'B', 'A'])

In [136]:
cltv_cal

Unnamed: 0_level_0,total_transaction,total_unit,total_price,average_order_value,purchase_frequency,profit_margin,customer_value,cltv,segments
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
12346.0,11,70,372.86,33.896364,0.002550,37.286,0.086430,9.783574,C
12347.0,2,828,1323.32,661.660000,0.000464,132.332,0.306750,123.235455,B
12348.0,1,373,222.16,222.160000,0.000232,22.216,0.051497,3.473263,D
12349.0,3,993,2671.14,890.380000,0.000695,267.114,0.619179,502.110408,A
12351.0,1,261,300.93,300.930000,0.000232,30.093,0.069757,6.372897,D
...,...,...,...,...,...,...,...,...,...
18283.0,6,336,641.77,106.961667,0.001391,64.177,0.148764,28.984429,C
18284.0,1,494,461.68,461.680000,0.000232,46.168,0.107019,14.999889,C
18285.0,1,145,427.00,427.000000,0.000232,42.700,0.098980,12.831034,C
18286.0,2,608,1296.43,648.215000,0.000464,129.643,0.300517,118.278026,B


## Functionalization all steps

In [140]:
def create_cltv_cal(dataframe, profit=0.1, csv=False):
    
    # Data Preparation
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe['Invoice'].str.contains('C', na=False)]
    dataframe['TotalPrice'] = dataframe['Quantity'] * dataframe['Price']
    
    # Group by Customer ID 
    cltv_cal = dataframe.groupby('Customer ID').agg({'Invoice': lambda x: x.nunique(),
                                                'Quantity': lambda x: x.sum(),
                                                'TotalPrice': lambda x: x.sum()})
    cltv_cal.columns = ['total_transaction', 'total_unit', 'total_price']
    
    # Calculate CLTV Parameters
    cltv_cal['average_order_value'] = cltv_cal['total_price'] / cltv_cal['total_transaction']
    cltv_cal['purchase_frequency'] = cltv_cal['total_transaction'] / cltv_cal.shape[0]
    repeat_rate = cltv_cal[cltv_cal['total_transaction'] > 1].shape[0] / cltv_cal.shape[0]
    churn_rate = 1 - repeat_rate
    cltv_cal['profit_margin'] = cltv_cal['total_price'] * profit
    cltv_cal['customer_value'] = cltv_cal['average_order_value'] * cltv_cal['purchase_frequency']
    cltv_cal['cltv'] = (cltv_cal['customer_value'] / churn_rate) * cltv_cal['profit_margin']
    
    # Segments Customer
    cltv_cal['segments'] = pd.qcut(cltv_cal['cltv'], 4, ['D', 'C', 'B', 'A'])
    
    if csv:
        cltv_cal.to_csv('cltv.csv')
        
    return cltv_cal

In [141]:
test_df = df_.copy()

In [142]:
create_cltv_cal(test_df)

Unnamed: 0_level_0,total_transaction,total_unit,total_price,average_order_value,purchase_frequency,profit_margin,customer_value,cltv,segments
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
12346.0,11,70,372.86,33.896364,0.002550,37.286,0.086430,9.783574,C
12347.0,2,828,1323.32,661.660000,0.000464,132.332,0.306750,123.235455,B
12348.0,1,373,222.16,222.160000,0.000232,22.216,0.051497,3.473263,D
12349.0,3,993,2671.14,890.380000,0.000695,267.114,0.619179,502.110408,A
12351.0,1,261,300.93,300.930000,0.000232,30.093,0.069757,6.372897,D
...,...,...,...,...,...,...,...,...,...
18283.0,6,336,641.77,106.961667,0.001391,64.177,0.148764,28.984429,C
18284.0,1,494,461.68,461.680000,0.000232,46.168,0.107019,14.999889,C
18285.0,1,145,427.00,427.000000,0.000232,42.700,0.098980,12.831034,C
18286.0,2,608,1296.43,648.215000,0.000464,129.643,0.300517,118.278026,B
