<a href="https://colab.research.google.com/github/Kalyankr/Numerical-ml-models/blob/master/Costumers_worth_value.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [0]:
!wget  http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx

In [0]:
data=pd.read_excel("Online Retail.xlsx")


As usual, we have some cleaning to do, then create a new dataframe that only contains CustomerID, InvoiceDate (remove the time) and add a new column — sales:

In [0]:
import datetime as dt
data["InvoiceDate"]=pd.to_datetime(data["InvoiceDate"]).dt.date
data=data[pd.notnull(data["CustomerID"])]
data=data[(data["Quantity"]>0)]
data["Sales"]=data["Quantity"]*data["UnitPrice"]
cols_of_interest=["CustomerID","InvoiceDate","Sales"]
data=data[cols_of_interest]

print(data.head())
print(data["CustomerID"].nunique())

For the CLV models, the following nomenclature is used:

    Frequency represents the number of repeat purchases the customer has made. This means that it’s one less than the total number of purchases.
    T represents the age of the customer in whatever time units chosen (daily, in our dataset). This is equal to the duration between a customer’s first purchase and the end of the period under study.
    Recency represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.)

The following analysis is conducted in Python using Lifetimes packagedeveloped by Cameron Davidson-Pilon, data scientist at Shopify.

In [0]:
!pip install lifetimes
from lifetimes.plotting import *
from lifetimes.utils import *
import lifetimes


data_df = summary_data_from_transaction_data(data, 'CustomerID', 'InvoiceDate', monetary_value_col='Sales', observation_period_end='2011-12-9')

In [0]:
data_df.head()

CustomerID 12346 made 1 purchase only (no repeat), so his frequency and recency are 0, and his age is 325 days (e.g. the duration between his first purchase and the end of the period in the analysis).

In [0]:
import matplotlib.pyplot as plt
fig=plt.figure(figsize=(8,6))
data_df.frequency.plot(kind="hist",bins=50)
print(data_df.frequency.describe())
print(sum(data_df.frequency==0)/float(len(data_df)))

Among all customers in our data, more than 35% of them only made purchase once (no repeat).

In [0]:
from lifetimes import BetaGeoFitter

bgf = BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(data_df['frequency'], data_df['recency'], data_df['T'])
print(bgf)

Visualizing our frequency/recency matrix

Consider: a customer has made purchase every day for four weeks straight, and then we haven’t heard from him in months. What are the chances he is still “alive”? Pretty small, right? On the other hand, a customer who historically made purchase once a quarter, and again last quarter, is likely still alive. We can visualize this relationship using the frequency/recency matrix, which computes the expected number of transactions an artificial customer is to make in the next time period, given his recency (age at last purchase) and frequency (the number of repeat transactions he has made).

In [0]:
from lifetimes.plotting import plot_frequency_recency_matrix
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12,8))
plot_frequency_recency_matrix(bgf)

If a customer has made 120 purchases, and his latest purchase was when he was approximately 350 days old (i.e. Recency: the duration between his first transaction and his latest transaction is 350 days), then he is our best customer (bottom-right).

Customers who have purchased a lot and purchased recently will likely be the best customers in the future. We will never have enough of them.

Customers who have purchased a lot but not recently (top-right corner), have probably gone.

There are also another type of customers that around (40, 300) that represents the customer who buys infrequently, and we have not seen him recently, so he might buy again. However, we are not sure if he has gone or just between purchases.

We can predict which customers are surely alive:

In [0]:
from lifetimes.plotting import plot_probability_alive_matrix
fig = plt.figure(figsize=(12,8))
plot_probability_alive_matrix(bgf)

Customers who have purchased recently are almost surely “alive”.

Customers who have purchased a lot but not recently, are likely to have dropped out. And the more they bought in the past, the more likely they have dropped out. They are represented in the upper-right.

We are ranking customers from “highest expected purchases in the next period” to lowest. Models expose a method that will predict a customer’s expected purchases in the next period using their history.

In [0]:
t = 1
data_df['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(t, data_df['frequency'], data_df['recency'], data_df['T'])
data_df.sort_values(by='predicted_purchases').tail(5)

Listed above are our top 5 customers that the model expects them to make purchases in the next day. The predicted_purchases column represents their expected number of purchases while the other three columns represent their current RF metrics. The BG/NBD model believes these individuals will be making more purchases within the near future as they are our current best customers.

In [0]:
from lifetimes.plotting import plot_period_transactions
plot_period_transactions(bgf)

Not bad, out model does not suck. So, we can continue on with our analysis.

We now partition the dataset into a calibration period dataset and a holdout dataset. This is important as we want to test how our model performs on data not yet seen (just like cross-validation in machine learning practice).

In [0]:
from lifetimes.utils import calibration_and_holdout_data

summary_cal_holdout = calibration_and_holdout_data(data, 'CustomerID', 'InvoiceDate',
                                        calibration_period_end='2011-06-08',
                                        observation_period_end='2011-12-9' )   
print(summary_cal_holdout.head())

In [0]:
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases

bgf.fit(summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal'])
plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)

Based on customer history, we can now predict what an individual’s future purchases might look like

In [0]:
t = 10
individual = data.loc[12347]
bgf.predict(t, individual['frequency'], individual['recency'], individual['T'])

In [0]:
from lifetimes.plotting import plot_history_alive
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(12,8))
id = 14606
days_since_birth = 365
sp_trans = data.loc[data['CustomerID'] == id]
plot_history_alive(bgf, days_since_birth, sp_trans, 'InvoiceDate')

In [0]:
fig = plt.figure(figsize=(12,8))
id = 14729
days_since_birth = 365
sp_trans = data.loc[data['CustomerID'] == id]
plot_history_alive(bgf, days_since_birth, sp_trans, 'InvoiceDate')

In [0]:
returning_customers_summary = data[data['frequency']>0]

print(returning_customers_summary.head())
print(len(returning_customers_summary))

In [0]:
from lifetimes import GammaGammaFitter

ggf = GammaGammaFitter(penalizer_coef = 0)
ggf.fit(returning_customers_summary['frequency'],
        returning_customers_summary['monetary_value'])
print(ggf)

In [0]:
print(ggf.conditional_expected_average_profit(
        data['frequency'],
        data['monetary_value']
    ).head(10))