# CLTV Prediction with BG-NBD and Gamma-Gamma

1. Data Preperation
2. Expected Number of Transaction with BG-NBD Model
3. Expected Average Profit with Gamma-Gamma Model
4. Calculation of CLTV with BG-NBD and Gamma-Gamma Model
5. Creating Segments by CLTV

## Data Preperation

An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.

Story of dataset: 
    
    https://archive.ics.uci.edu/ml/datasets/Online+Retail+II
    
    The data set named Online Retail II was obtained from a UK-based online store.
    Includes sales between 01/12/2009 - 09/12/2011.
    
Variables:

    InvoiceNo: Invoice number. The unique number of each transaction, namely the invoice. (Aborted operation if it starts with C.)
    StockCode: Product code. Unique number for each product.
    Description: Product name
    Quantity: Number of products. It expresses how many of the products on the invoices have been sold.
    InvoiceDate: Invoice date and time.
    UnitPrice: Product price (in GBP)
    CustomerID: Unique customer number
    Country: Country name. Country where the customer lives.

## Required Libraries and Functions

In [5]:
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter
from lifetimes.plotting import plot_period_transactions

In [6]:
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 500) 
pd.set_option('display.float_format', lambda x: '%.4f' % x)
from sklearn.preprocessing import MinMaxScaler

In [9]:
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

In [10]:
def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

## Reading Data

In [31]:
df_ = pd.read_excel("/Users/serdartafrali/PycharmProjects/VboBootcamp/DSMLBC8/Datasets/online_retail_II.xlsx", 
                    sheet_name="Year 2010-2011")

In [32]:
df = df_.copy()

In [34]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Quantity,541910.0,9.5522,218.081,-80995.0,1.0,3.0,10.0,80995.0
Price,541910.0,4.6111,96.7598,-11062.06,1.25,2.08,4.13,38970.0
Customer ID,406830.0,15287.6842,1713.6031,12346.0,13953.0,15152.0,16791.0,18287.0


In [35]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [36]:
df.isnull().sum()

Invoice             0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
Price               0
Customer ID    135080
Country             0
dtype: int64

In [37]:
# CLTV Prediction with BG-NBD and Gamma-Gamma

In [38]:
def create_cltv_p(dataframe, month=3):
    # 1. Data preprocessing
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
    dataframe = dataframe[dataframe["Quantity"] > 0]
    dataframe = dataframe[dataframe["Price"] > 0]
    replace_with_thresholds(dataframe, "Quantity")
    replace_with_thresholds(dataframe, "Price")
    dataframe["TotalPrice"] = dataframe["Quantity"] * dataframe["Price"]
    today_date = dt.datetime(2011, 12, 11)

    cltv_df = dataframe.groupby('Customer ID').agg(
        {'InvoiceDate': [lambda InvoiceDate: (InvoiceDate.max() - InvoiceDate.min()).days,
                         lambda InvoiceDate: (today_date - InvoiceDate.min()).days],
         'Invoice': lambda Invoice: Invoice.nunique(),
         'TotalPrice': lambda TotalPrice: TotalPrice.sum()})

    cltv_df.columns = cltv_df.columns.droplevel(0)
    cltv_df.columns = ['recency', 'T', 'frequency', 'monetary']
    cltv_df["monetary"] = cltv_df["monetary"] / cltv_df["frequency"]
    cltv_df = cltv_df[(cltv_df['frequency'] > 1)]
    cltv_df["recency"] = cltv_df["recency"] / 7
    cltv_df["T"] = cltv_df["T"] / 7

    # 2. Establishment of BG-NBD Model
    bgf = BetaGeoFitter(penalizer_coef=0.001)
    bgf.fit(cltv_df['frequency'],
            cltv_df['recency'],
            cltv_df['T'])

    cltv_df["expected_purc_1_week"] = bgf.predict(1,
                                                  cltv_df['frequency'],
                                                  cltv_df['recency'],
                                                  cltv_df['T'])

    cltv_df["expected_purc_1_month"] = bgf.predict(4,
                                                   cltv_df['frequency'],
                                                   cltv_df['recency'],
                                                   cltv_df['T'])

    cltv_df["expected_purc_3_month"] = bgf.predict(12,
                                                   cltv_df['frequency'],
                                                   cltv_df['recency'],
                                                   cltv_df['T'])

    # 3. Establishing the GAMMA-GAMMA Model
    ggf = GammaGammaFitter(penalizer_coef=0.01)
    ggf.fit(cltv_df['frequency'], cltv_df['monetary'])
    cltv_df["expected_average_profit"] = ggf.conditional_expected_average_profit(cltv_df['frequency'],
                                                                                 cltv_df['monetary'])

    # 4. Calculation of CLTV with BG-NBD and GG model.
    cltv = ggf.customer_lifetime_value(bgf,
                                       cltv_df['frequency'],
                                       cltv_df['recency'],
                                       cltv_df['T'],
                                       cltv_df['monetary'],
                                       time=month,
                                       freq="W",  # Frequency information of T.
                                       discount_rate=0.01)

    cltv = cltv.reset_index()
    cltv_final = cltv_df.merge(cltv, on="Customer ID", how="left")
    cltv_final["segment"] = pd.qcut(cltv_final["clv"], 4, labels=["D", "C", "B", "A"])

    return cltv_final


In [39]:
df = df_.copy()

In [41]:
cltv_final = create_cltv_p(df)

In [42]:
cltv_final

Unnamed: 0,Customer ID,recency,T,frequency,monetary,expected_purc_1_week,expected_purc_1_month,expected_purc_3_month,expected_average_profit,clv,segment
0,12347.0000,52.1429,52.5714,7,615.7143,0.1413,0.5635,1.6784,631.9123,1128.4477,A
1,12348.0000,40.2857,51.2857,4,442.6950,0.0920,0.3668,1.0920,463.7460,538.8089,B
2,12352.0000,37.1429,42.4286,8,219.5425,0.1824,0.7271,2.1631,224.8868,517.5000,B
3,12356.0000,43.1429,46.5714,3,937.1433,0.0862,0.3435,1.0222,995.9989,1083.0903,A
4,12358.0000,21.2857,21.5714,2,575.2100,0.1223,0.4862,1.4388,631.9022,966.6727,A
...,...,...,...,...,...,...,...,...,...,...,...
2840,18272.0000,34.8571,35.2857,6,513.0967,0.1721,0.6856,2.0369,529.0185,1146.2057,A
2841,18273.0000,36.4286,36.8571,3,68.0000,0.1043,0.4157,1.2352,73.4942,96.5648,D
2842,18282.0000,16.8571,18.1429,2,89.0250,0.1357,0.5392,1.5934,99.5249,168.5946,D
2843,18283.0000,47.5714,48.2857,16,130.9300,0.3017,1.2034,3.5831,132.6012,505.5117,C
