(clv_quickstart)=
# CLV Quickstart

Customer Lifetime Value (CLV) is the measure of a customer's contribution over time to a business. This metric is used to inform spending levels on new customer acquisition, retention, and other marketing and sales efforts, so reliable estimation is essential.

PyMC-Marketing provides tools to segment customers on their past behavior (see [RFM Segmentation](https://www.pymc-marketing.io/en/stable/api/generated/pymc_marketing.clv.utils.rfm_segments.html#pymc_marketing.clv.utils.rfm_segments)) as well as the following Buy Till You Die (BTYD) probabilistic models to predict future behavior:

* **[BG/NBD model](https://pymc-marketing.readthedocs.io/en/stable/notebooks/clv/bg_nbd.html)** for continuous time, non-contractual modeling
* **[Pareto/NBD model](https://pymc-marketing.readthedocs.io/en/stable/notebooks/clv/pareto_nbd.html)** for continuous time, non-contractual modeling with covariates
* **[Shifted BG model](https://pymc-marketing.io/en/stable/notebooks/clv/sBG.html)** for discrete time, contractual modeling
* **BG/BB model** for discrete time, contractual modeling
* **Exponential Gamma model** for discrete time, contractual modeling (coming soon)
* **[Gamma-Gamma model](https://pymc-marketing.readthedocs.io/en/stable/notebooks/clv/gamma_gamma.html)** for expected monetary value

This table contains a breakdown of the four BTYD modeling domains, and examples for each:

|                | **Non-contractual** | **Contractual**                 |
|----------------|---------------------|---------------------------------|
| **Continuous** | online purchases    | ad conversion time              |
| **Discrete**   | concerts & sports events    | recurring subscriptions |

In this notebook we will demonstrate how to estimate future purchasing activity and CLV with the CDNOW dataset, a popular benchmarking dataset in CLV and BTYD research. Data is available [here](https://www.brucehardie.com/datasets/), with additional details [here](https://www.brucehardie.com/notes/026/).

In [None]:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from arviz.labels import MapLabeller

from pymc_marketing import clv

In [None]:
az.style.use("arviz-darkgrid")

%config InlineBackend.figure_format = "retina" # nice looking plots

## 1.1 Data Requirements

For all models, the following nomenclature is used:

* `customer_id` represents a unique identifier for each customer.
* `frequency` represents the number of _repeat_ purchases that a customer has made, i.e. one less than the total number of purchases.
* `T` represents a customer's "age", i.e. the duration between a customer's first purchase and the end of the period of study. In this example notebook, the units of time are in weeks.
* `recency` represents the time period when a customer made their most recent purchase. This is equal to the duration between a customer’s first and last purchase. If a customer has made only 1 purchase, their recency is 0.
* `monetary_value` represents the average value of a given customer’s repeat purchases. Customers who have only made a single purchase have monetary values of zero.

The `rfm_summary` function can be used to preprocess raw transaction data for modeling:

In [None]:
raw_trans = pd.read_csv(
    "https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/data/cdnow_transactions.csv"
)

raw_trans.head(5)

In [None]:
rfm_data = clv.utils.rfm_summary(
    raw_trans,
    customer_id_col="id",
    datetime_col="date",
    monetary_value_col="spent",
    datetime_format="%Y%m%d",
    time_unit="W",
)

rfm_data

It is important to note these definitions differ from that used in RFM segmentation, where the first purchase is included, `T` is not used, and `recency` is the number of time periods since a customer's most recent purchase.

To visualize data in RFM format, we can plot the recency and T of the customers with the `plot_customer_exposure` function. We see a large chunk (>60%) of customers haven't made another purchase in a while.

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
(
    rfm_data.sample(n=100, random_state=42)
    .sort_values(["recency", "T"])
    .pipe(clv.plot_customer_exposure, ax=ax, linewidth=0.5, size=0.75)
);

## Predicting Future Purchasing Behavior with the BG/NBD Model

This dataset is an example of continuous time, noncontractual transactions because it comprises purchases from an online music store. PyMC-Marketing provides two models for this use case:

- [Beta-Geometric/Negative Binomial Distribution (BG/NBD)](https://pymc-marketing.readthedocs.io/en/stable/notebooks/clv/bg_nbd.html)
- [Pareto/Negative Binomial Distribution (Pareto/NBD)](https://pymc-marketing.readthedocs.io/en/stable/notebooks/clv/pareto_nbd.html)

We will be using the BG/NBD model in this notebook because it converges quickly and works well for basic modeling tasks. However, if a more comprehensive analysis is desired, consider using the Pareto/NBD model instead due to its expanded functionality, including support for covariates.

In [None]:
bgm = clv.BetaGeoModel(data=rfm_data)
bgm.build_model()

This model has 4 parameters that specify the global frequency and dropout rates of customers.

In [None]:
bgm

The default priors for the 4 parameters follow a HalfFlat distribution, which is an improper positive uniform distribution. For small datasets this prior can yield implausible posteriors. To avoid this problem, more informative priors can be specified by defining custom PyMC distributions.

Here, we will replace the HalfFlat default by more well-behaved HalfNormal priors with a standard deviation of 10.
Customization priors is possible by passing a dictionary with keys being the name of the prior, and values being a dictionary with 2 keys: 'dist' representing the name of PyMC distribution and 'kwargs' that holds an optional dictionary of all parameters we wish to pass to the distribution

In [None]:
from pymc_marketing.prior import Prior

model_config = {
    "a_prior": Prior("HalfNormal", sigma=10),
    "b_prior": Prior("HalfNormal", sigma=10),
    "alpha_prior": Prior("HalfNormal", sigma=10),
    "r_prior": Prior("HalfNormal", sigma=10),
}

In [None]:
bgm = clv.BetaGeoModel(
    data=rfm_data,
    model_config=model_config,
)
bgm.build_model()
bgm

Having specified the model, we can now fit it.

In [None]:
bgm.fit()
bgm.fit_summary()

We can use [ArviZ](https://python.arviz.org/en/stable/), a Python library tailored to produce visualizations for Bayesian models, to plot the posterior distribution of each parameter.

In [None]:
az.plot_posterior(bgm.fit_result);

### 1.2.1. Visualizing Predictions over Time

Let's evaluate model performance by tracking predictions against historical purchases:

In [None]:
clv.plot_expected_purchases(
    model=bgm,
    purchase_history=raw_trans,
    datetime_col="date",
    customer_id_col="spent",
    datetime_format="%Y%m%d",
    time_unit="W",
    t=78,
);

There is a wide discrepancy between cumulative actual and predicted purchases! This is a good indicator of extraneous customers and/or date ranges to exclude from model training, and perhaps the need for additional covariates.


Let's plot incremental purchase dates for more insights:

In [None]:
clv.plot_expected_purchases(
    model=bgm,
    purchase_history=raw_trans,
    datetime_col="date",
    customer_id_col="spent",
    datetime_format="%Y%m%d",
    time_unit="W",
    t=78,
    set_index_date=True,
    plot_cumulative=False,
);

There was a large sales bump in the first few months that is biasing model results and should be investigated. However, notice purchases flatline in the following months and the model is still able to capture this trend. Simply excluding data prior to Apr 1997 should improve performance considerably, but for pedagogical purposes we will continue with the tutorial.

### Visualizing Prediction Matrices

In [None]:
clv.plot_frequency_recency_matrix(bgm);

We can see our best customers have been active for over 60 weeks and have made over 20 purchases (bottom-right). Note the “tail” sweeping up towards the upper-left corner - these customers are infrequent and/or may not have purchased recently. What is the probability they are still active? 

In [None]:
clv.plot_probability_alive_matrix(bgm)

Note that all non-repeat customers have an alive probability of 1, which is one of the quirks of `BetaGeoModel`. In many use cases this is still a valid assumption, but if non-repeat customers are a key focus in your use case, you may want to try `ParetoNBDModel` instead.  

Looking at the probability alive matrix, we can infer that customers who have made fewer purchases are less likely to return, and may be worth targeting for retention.

### Ranking customers from best to worst

Having fit the model, we can ask what is the expected number of purchases for our customers in the next period. Let's look at the four more promising customers.

In [None]:
num_purchases = bgm.expected_purchases(future_t=10)

sdata = rfm_data.copy()
sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values
sdata.sort_values(by="expected_purchases").tail(4)

We can plot the uncertainty in the expected number of purchases in the next period.

In [None]:
ids = [841, 1981, 157, 1516]
ax = az.plot_posterior(num_purchases.sel(customer_id=ids), grid=(2, 2))
for axi, id in zip(ax.ravel(), ids, strict=False):
    axi.set_title(f"Customer: {id}", size=20)
plt.suptitle("Expected number purchases in the next period", fontsize=28, y=1.05);

### Predicting purchase behavior of a new customer

We can use the fitted model to predict the number of purchases for a fresh new customer.

In [None]:
az.plot_posterior(bgm.expected_purchases_new_customer(t=10).sel(customer_id=1))
plt.title("Expected purchases of a new customer in the first 10 periods");

### Customer Probability Histories

Given a customer transaction history, we can calculate their historical probability of being alive, according to our trained model. 

Let look at active customer 1516 and assess the change in probability that the user will ever return if they do no other purchases in the next 9 time periods.

In [None]:
customer_1516 = rfm_data.loc[1515]
customer_1516

In [None]:
customer_1516_history = pd.DataFrame(
    dict(
        customer_id=np.arange(10),
        frequency=np.full(10, customer_1516["frequency"], dtype="int"),
        recency=np.full(10, customer_1516["recency"]),
        T=(np.arange(0, 10) + customer_1516["recency"]).astype("int"),
    )
)
customer_1516_history

In [None]:
p_alive = bgm.expected_probability_alive(data=customer_1516_history)

In [None]:
az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(
    customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase"
)

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();

We can see that, if no purchases are being made in the next 9 weeks, the model has low confidence that the costumer will ever return. What if they had done one purchase in between?

In [None]:
customer_1516_history.loc[7:, "frequency"] += 1
customer_1516_history.loc[7:, "recency"] = customer_1516_history.loc[7, "T"] - 0.5
customer_1516_history

In [None]:
p_alive = bgm.expected_probability_alive(data=customer_1516_history)

In [None]:
az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(
    customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase"
)
plt.axvline(customer_1516_history["recency"].iloc[-1], c="black", ls="--")

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();

From the plot above, say that customer 1516 makes a purchase at week 73.5, just over 6 weeks after they have made their last recorded purchase. We can see that the probability of the customer returning quickly goes back up!

## Estimating Customer Lifetime Value Using the Gamma-Gamma Model

Until now we’ve focused mainly on transaction frequencies and probabilities, but to estimate economic value we can use the Gamma-Gamma model.

The Gamma-Gamma model assumes at least 1 repeat transaction has been observed per customer. As such we filter out those with zero repeat purchases.

In [None]:
nonzero_data = rfm_data.query("frequency>0")
nonzero_data

If computing the monetary value from your own data, note that it is the *mean* of a given customer’s value, *not* the sum. `monetary_value` can be used to represent profit, or revenue, or any value as long as it is consistently calculated for each customer.

The Gamma-Gamma model relies upon the important assumption there is no relationship between the monetary value and the purchase frequency. In practice we need to check whether the Pearson correlation is less than 0.3:

In [None]:
nonzero_data[["monetary_value", "frequency"]].corr()

Transaction frequencies and monetary values are uncorrelated; we can now fit our Gamma-Gamma model to predict average spend and expected lifetime values of our customers

The Gamma-Gamma model takes in a 'data' parameter, a pandas DataFrame with 3 columns representing Customer ID, average spend of repeat purchases, and number of repeat purchase for that customer. As with the BG/NBD model, these parameters are given HalfFlat priors which can be too diffuse for small datasets. For this example, we will use the default priors, but other priors can be specified just like with the BG/NBD example above.

In [None]:
gg = clv.GammaGammaModel(data=nonzero_data)
gg.build_model()
gg

By default, `fit` approximates full Bayesian posterior using [MCMC](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) sampling provided by `pymc.sample`. If the full posterior is not needed or MCMC sampling is too slow, users can obtain the single [maximum a posteriori estimate](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation) via the `fit_method` kwarg.

In [None]:
gg.fit(fit_method="map");

In [None]:
gg.fit_summary()

In [None]:
gg.fit();

In [None]:
gg.fit_summary()

In [None]:
az.plot_posterior(gg.fit_result);

### Predicting spend value of customers

Having fit our model, we can now use it to predict the conditional, expected average lifetime value of our customers, including those with zero repeat purchases.

In [None]:
expected_spend = gg.expected_customer_spend(data=rfm_data)

In [None]:
az.summary(expected_spend.isel(customer_id=range(10)), kind="stats")

In [None]:
labeller = MapLabeller(var_name_map={"x": "customer"})
az.plot_forest(
    expected_spend.isel(customer_id=(range(10))), combined=True, labeller=labeller
)
plt.xlabel("Expected mean spend");

We can also look at the average expected mean spend across all customers

In [None]:
az.summary(expected_spend.mean("customer_id"), kind="stats")

In [None]:
az.plot_posterior(expected_spend.mean("customer_id"))
plt.axvline(expected_spend.mean(), color="k", ls="--")
plt.title("Expected mean spend of all customers");

### Predicting spend value of a new customer

In [None]:
az.plot_posterior(gg.expected_new_customer_spend())
plt.title("Expected mean spend of a new customer");

### Estimating CLV

Finally, we can combine the GG with the BG/NBD model to obtain an estimate of the customer lifetime value. This relies on the [Discounted cash flow](https://en.wikipedia.org/wiki/Discounted_cash_flow) model, adjusting for cost of capital:

In [None]:
clv_estimate = gg.expected_customer_lifetime_value(
    transaction_model=bgm,
    data=rfm_data,
    future_t=12,  # months
    discount_rate=0.01,  # monthly discount rate ~ 12.7% annually
    time_unit="W",  # original data is in weeks
)

In [None]:
az.summary(clv_estimate.isel(customer_id=range(10)), kind="stats")

In [None]:
az.plot_forest(
    clv_estimate.isel(customer_id=range(10)), combined=True, labeller=labeller
)
plt.xlabel("Expected CLV");

According to our models, customer[6] has a much higher expected CLV. There is also a large variability in this estimate that arises solely from uncertainty in the parameters of the BG/NBD and GG models. 

In general, these models tend to induce a strong correlation between expected CLV and uncertainty. This modelling of uncertainty can be very useful when making marketing decisions.

In [None]:
%load_ext watermark
%watermark -n -u -v -iv -w -p pymc,pytensor