# **Customer Lifetime Value (CLV) & Churn Prediction**

## ***CLV Modeling***
**Goal:** Predict customer purchase behavior and estimate 12-month CLV using probabilistic models.

**Models Used**
- BG/NBD - Purchase frequency prediction
- Gamma-Gamma - Monetary value prediction

In [15]:
# Importing Necessary Libraries
import numpy as np
import pandas as pd

from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.utils import summary_data_from_transaction_data

In [16]:
# Loading dataset
df = pd.read_csv('cleaned_transactions.csv', parse_dates=['invoice_date'], dtype={'customer_id': str, 'invoice_id': str})
df.head()

Unnamed: 0,invoice,stockcode,description,quantity,price,customer_id,country,invoice_date,total_price
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,6.95,13085.0,United Kingdom,2009-12-01 07:45:00,83.4
1,489434,79323P,PINK CHERRY LIGHTS,12,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,81.0
2,489434,79323W,WHITE CHERRY LIGHTS,12,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,81.0
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2.1,13085.0,United Kingdom,2009-12-01 07:45:00,100.8
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,1.25,13085.0,United Kingdom,2009-12-01 07:45:00,30.0


### **Section 1:** Prepare Data for CLV Models

Creating summary of data for BG/NBD Model

In [17]:
summary = summary_data_from_transaction_data(
    df,
    customer_id_col='customer_id',
    datetime_col='invoice_date',
    monetary_value_col='total_price',
    observation_period_end=df['invoice_date'].max()
    )

summary.head()

Unnamed: 0_level_0,frequency,recency,T,monetary_value
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12346.0,7.0,400.0,725.0,11066.637143
12347.0,7.0,402.0,404.0,615.714286
12348.0,4.0,363.0,438.0,449.31
12349.0,3.0,571.0,589.0,1120.056667
12350.0,0.0,0.0,310.0,0.0


BG/NBD Model requires frequency > 0. There, need to remove the rows with frequency = 0.

In [18]:
summary = summary[summary['frequency'] > 0]

### **Section 2:** BG/NBD Model (Purchase Frequency)
BG/NBD models the probability that a customer is still active and estimates how often they will purchase in the future based on historical behavior.

In [19]:
bgf = BetaGeoFitter(penalizer_coef=0.01)
bgf.fit(
    summary['frequency'],
    summary['recency'],
    summary['T']
)

<lifetimes.BetaGeoFitter: fitted with 4189 subjects, a: 0.07, alpha: 93.73, b: 0.64, r: 1.38>

Predicting purchases in the next 12th months(=365 days)

In [20]:
summary['predicted_purchases_12m'] = bgf.conditional_expected_number_of_purchases_up_to_time(
    365,
    summary['frequency'],
    summary['recency'],
    summary['T']
)

### **Section 3:** Gamma-Gamma Model (Monetary Value)
Gamma-Gamma estimates the expected monetary value of a transaction assuming spending behavior is independent of purchase frequency.

In [21]:
ggf = GammaGammaFitter(penalizer_coef = 0.01)
ggf.fit(
    summary['frequency'],
    summary['monetary_value']
)

<lifetimes.GammaGammaFitter: fitted with 4189 subjects, p: 3.79, q: 0.34, v: 3.69>

Predicting expected average order value.

In [22]:
summary['expected_avg_order_value'] = ggf.conditional_expected_average_profit(
    summary['frequency'],
    summary['monetary_value']
)

### **Section 4:** Calculate 12 Month CLV
CLV represents the discounted expected revenue a customer will generate over the next 12 months.

Now, combining both BD/NBG and Gamma Gamma Model, I will be calculating the CLV of each customer for the next 12 months(These models assume customer purchase behavior remains stable over the prediction horizon).

In [23]:
summary['clv_12m'] = ggf.customer_lifetime_value(
    bgf,
    summary['frequency'],
    summary['recency'],
    summary['T'],
    summary['monetary_value'],
    time=12,  # months
    freq='D',  # daily data
    discount_rate=0.01
)

### **Section 6:** Inspecting Results

In [24]:
summary[['predicted_purchases_12m',
         'expected_avg_order_value',
         'clv_12m']].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
predicted_purchases_12m,4189.0,4.148898,5.30524,1.5169e-08,1.413952,2.88061,5.194253,110.513634
expected_avg_order_value,4189.0,480.079325,3213.273676,9.193838,202.805326,323.273343,475.352545,204083.279792
clv_12m,4189.0,2058.456207,10239.914796,2.239632e-06,332.253783,826.576522,1845.974959,468516.057024


* Top 10 Customers by CLV

In [25]:
summary.sort_values(by='clv_12m', ascending=False).head(10)

Unnamed: 0_level_0,frequency,recency,T,monetary_value,predicted_purchases_12m,expected_avg_order_value,clv_12m
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
16446.0,1.0,205.0,205.0,168469.6,2.47824,204083.279792,468516.057024
18102.0,66.0,738.0,738.0,8768.193939,29.13095,8791.492357,237015.75277
14646.0,90.0,736.0,737.0,5809.905333,39.565169,5821.232246,213150.876458
17450.0,30.0,430.0,438.0,6851.561667,21.015207,6891.768915,134062.032412
14156.0,120.0,729.0,738.0,2603.090167,52.425661,2606.911527,126481.943377
14096.0,16.0,97.0,101.0,4071.434375,30.89909,4116.557982,117806.896108
14911.0,254.0,737.0,738.0,1144.437717,110.513634,1145.238975,117130.295835
12415.0,22.0,503.0,527.0,6460.528636,13.384665,6512.347415,80679.069612
13694.0,82.0,732.0,735.0,2355.009756,36.176133,2360.076751,79014.728867
17511.0,50.0,735.0,737.0,3380.4846,22.225864,3392.397013,69779.301542


These customers most contributes the most in the company, therefore they can be our VIP customers.

#### Saving the CLV Output
This output will be used in:
* Churn Model
* Power BI Dashboard
* What-if Simulations

In [26]:
# Exporting CLV Predictions
summary = summary.reset_index()
summary['customer_id'] = summary['customer_id'].astype(str)

summary.reset_index().to_csv("clv_predictions.csv", index=False)

In [27]:
summary.head()

Unnamed: 0,customer_id,frequency,recency,T,monetary_value,predicted_purchases_12m,expected_avg_order_value,clv_12m
0,12346.0,7.0,400.0,725.0,11066.637143,2.119811,11350.085451,22268.269061
1,12347.0,7.0,402.0,404.0,615.714286,5.928618,631.994983,3468.587381
2,12348.0,4.0,363.0,438.0,449.31,3.447019,470.768204,1502.268172
3,12349.0,3.0,571.0,589.0,1120.056667,2.222522,1190.530808,2449.322654
4,12352.0,8.0,356.0,392.0,338.26125,6.746406,346.275039,2162.617254


In [28]:
summary.isnull().sum()

customer_id                 0
frequency                   0
recency                     0
T                           0
monetary_value              0
predicted_purchases_12m     0
expected_avg_order_value    0
clv_12m                     0
dtype: int64