# Customer LTV
- categories: [Julia, Turing, Churn, Survival, LTV]

In [1]:
#collapse
using Turing
using Gadfly
using DataFrames, DataFramesMeta
Gadfly.set_default_plot_size(900px, 300px)

Customer Lifetime Value (LTV or CLTV) is the total dollar value a consumer will spend at a business throughout their life. The concept is as important as the definition is straightforward - businesses very often want to know which consumers are their whales and which are eating up their marketing or infrastructure budgets with little or no value returned. This is pretty tricky and there are a few approaches you can take:

### Observational

**Naive calculation**. The following will give you an average that is delightfully simple but tragically wrong:

$$\mathrm{LTV} = \frac{1}{|\mathrm{Customers}|}\sum_{\mathrm{orders}} \mathrm{Order\ Value}$$

Assuming (hmm) that LTV is constant over time, this will converge to the true average LTV value as customers churn (and thus achieve their final lifetime value). New customers will continue to weigh the average down and make it an underestimate. There are some of these sort of equations floating around the tubes.

**Wait and see**. Simialr algorithm to the above, the major difference is applying this to only a small cohort from a brief window in time. Just follow along with that group and add up how much they spend. This is simple and will get to the true LTV of that cohort faster but it's still typically too slow to be useful. By the time you know, it's months/quarters/years later (depending on the churn characteristics of your product) and most insights you might glean are no longer relevant to your product roadmap.

### Modeled

**Machine Learning** :tada:. There are a bunch of ML approaches here that can be found relatively easily online (but apparently not easy enough for me to find them again to include here). IIRC, one was using a random forest (or GBM, or whatever) to predict 

$$P(\mathrm{purchase\ in\ next\ period}|\mathcal{D})$$ 

and then in a second stage model (conditioned on the purchase outcome) predict the order value of said purchase.

It's a reasonably standard approach: decompose the problem into churn, expected future purchases, and expected value per purchase. There are a bunch of approaches that are tailored to this decomposition by breaking down the inputs into the so called **RFM** metrics: 

- **R**ecency: time since the last purchase,
- **F**requency: number of purchases per time period, 
- **M**onetary value: average order value.

Note that we'll use days for the time scale.

**Buy 'til You Die**. https://www.zdnet.com/article/nikes-purchase-of-analytics-firm-zodiac-highlights-focus-on-customer-lifetime-value/
http://www.brucehardie.com/papers/bgnbd_2004-04-20.pdf


**Custom Model**. That's what we're going to do! Fader and Hardie do a great job of making their work look harder than necessary so I can't be bothered to decode it (and anyway, [Alex did a great job](https://medium.com/ordergroove-engineering/every-customer-counts-52aa70e4f85)). That said, I'm going to take what I believe to be a similar approach:

1. Estimate churn based on **R**ecency and **F**requency.
2. Set up a super simple survival model to understand the expected number of future purchases using sample from (1) as the churn signal.
3. Scale by **M**onetary value.

By building these out independently we can understood the whole model but first figuring it out component-by-component. It also provides a quick way to make single-component adjustments that might be important. For instance, some retailers have an extremely wide spread of possible order values (e.g. Walmart, you can buy a stick of gum or probably a boat or something). If there are orders-of-magnitude differences in purchase value then you better model that out so you know exactly which consumers are likely to find themselves that lucrative long tail. In my experience, lognormal is a decent start but the tail is still too light.

## Our Models

### Active from RF

We sample when we expect the customer's next purchase to occur based on what we've observed of their frequency, then we compare that to how long it's been since they purchased. If we expected them to purchase already, we count them as churned. Note that we don't have any kind of regularization and just assume F is a fine number for us. Exercise for the reader to make that more stable :smile:.

$$
\begin{aligned}
\mathrm{next\ purchase} &\sim \mathrm{Exponential}(F) \\
\mathrm{active} &= R < \mathrm{next\ purchase}
\end{aligned}
$$

### Future Purchases from RF+Active

We'd like to then take the inferences above and use them to understand churn as a function of time, or perhaps number of orders. In other words:

$$P(\mathrm{churned}_{t=i} | \mathrm{active}_{t=i-1})$$

Here we find some wrinkles. Most notably, what to do with consumers that have recently purchased and we don't know if they are going to churn before the next purchase? This is called censoring, which comes in many directional varieties and this variety is called right-censoring (on the "right" side of our time interval, we don't yet have data on the outcome). We'll ignore that for now, and instead assume "constant hazard" on the data we can observe, ie the rate at which users remain active ($\rho$) is constant across all time points.

$$
\begin{aligned}
\rho &\sim \mathrm{Beta}(1,1)\\
\mathrm{purchases}_{uncensored} &\sim \mathrm{Geometric(\rho)}\\
(\mathrm{Future\ purchases}) &\sim 
    \begin{cases}
    \mathrm{Geometric}(\rho) & \mathrm{if\ active} \\
    \mathrm{Dirac}(0) & \mathrm{otherwise}
    \end{cases}
\end{aligned}
$$

### LTV from M+Future Purchases

$$
\begin{aligned}
\mathrm{Future\ value} &= \mathrm{Future\ purchases} * \mathrm{AOV}\\
\mathrm{Lifetime\ value} &= \mathrm{Future\ value} + \mathrm{Past\ value}
\end{aligned}
$$

The estimator of the survival function $S(t)$ (the probability that life is longer than $t$) is given by:

$$\widehat {S}(t)=\prod \limits _{i:\ t_{i}\leq t}\left(1-{\frac {d_{i}}{n_{i}}}\right)$$

with $t_{i}$ a time when at least one event happened, $d_i$ the number of events (e.g., deaths) that happened at time $t_{i}$, and $n_{i}$ the individuals **known to have survived** (have not yet had an event or been censored) up to time $t_{i}$.

[order_count, churned_at_order_count, known_active_at_order_count]

definition of churned_at_order_count = next_purchase < R (ie active field)

definition of known_active_at_order_count = made at least this many orders

then want to see:
[order_count, 1 - (churned_at_order_count / lag(known_active_at_order_count)) = S(t) factor]

# Coding it with Turing

In [105]:
@model function active(custs::Array{CustomerData})
    predicted_purchase_days = Vector(undef, length(custs))
    active = Vector{Bool}(undef, length(custs))

    for i in 1:length(custs)
        predicted_purchase_days[i] ~ Exponential(custs[i].frequency) 
        active[i] = predicted_purchase_days[i] > custs[i].recency
    end
    
    return active
end

active (generic function with 1 method)

In [106]:
cust_data = [
    CustomerData(2, 10.0, 123),
    CustomerData(10, 10, 123),
    CustomerData(23, 10, 123),
    CustomerData(2, 2, 123),
    CustomerData(10, 2, 123),
    CustomerData(23, 2, 123),
];

In [107]:
iterations = 2000
ϵ = 0.05
τ = 10;

chain_ltv = sample(
    active(cust_data), 
    HMC(ϵ, τ), iterations, 
    progress=true)

[32mSampling: 100%|█████████████████████████████████████████| Time: 0:00:02[39m


Chains MCMC chain (2000×15×1 Array{Float64,3}):

Iterations        = 1:2000
Thinning interval = 1
Chains            = 1
Samples per chain = 2000
parameters        = predicted_purchase_days[1], predicted_purchase_days[2], predicted_purchase_days[3], predicted_purchase_days[4], predicted_purchase_days[5], predicted_purchase_days[6]
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size

Summary Statistics
 [1m                 parameters [0m [1m    mean [0m [1m     std [0m [1m naive_se [0m [1m    mcse [0m [1m      e[0m ⋯
 [90m                     Symbol [0m [90m Float64 [0m [90m Float64 [0m [90m  Float64 [0m [90m Float64 [0m [90m  Float[0m ⋯

  predicted_purchase_days[1]    9.3736    9.5614     0.2138    0.7994   198.81 ⋯
  predicted_purchase_days[2]    9.8823    9.6488     0.2158    0.7492   151.43 ⋯
  predicted_purchase_days[3]   10.9717   10.6465     0.2381    0.7395   168.

In [117]:
DataFrame(hcat(generated_quantities(active(cust_data), chain_ltv)...)')

Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6
Unnamed: 0_level_1,Bool,Bool,Bool,Bool,Bool,Bool
1,0,0,0,0,0,0
2,1,0,0,0,0,0
3,0,0,0,0,0,0
4,1,0,0,0,0,0
5,1,0,0,0,0,0
6,1,0,0,0,0,0
7,1,0,0,0,0,0
8,1,1,0,0,0,0
9,1,1,0,0,0,0
10,1,1,0,0,0,0


In [None]:
#collapse
plot(DataFrame(chain_ltv), x=:, Theme(alphas=[0.6]),
    Stat.density(bandwidth=0.02), Geom.polygon(fill=true, preserve_order=true),
    Coord.cartesian(xmin=0.0, xmax=1.0, ymin=0.0)
)

In [None]:
for customer in list:
    if churned:
        d[customer.ordercount] += 1
    for i in 1:customer.ordercount:
        n[i] += 1

In [4]:
struct CustomerData
    recency::Int64
    frequency::Float64
    money::Float64
end

v = Vector{Float64}(undef, n)

In [99]:
struct LtvGQs
    active::Vector{Bool}
end




btyd (generic function with 1 method)

[32mSampling: 100%|█████████████████████████████████████████| Time: 0:00:00[39m


Chains MCMC chain (2000×15×1 Array{Float64,3}):

Iterations        = 1:2000
Thinning interval = 1
Chains            = 1
Samples per chain = 2000
parameters        = days_to_next_predicted_purchase[1], days_to_next_predicted_purchase[2], days_to_next_predicted_purchase[3], days_to_next_predicted_purchase[4], days_to_next_predicted_purchase[5], days_to_next_predicted_purchase[6]
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size

Summary Statistics
 [1m                         parameters [0m [1m    mean [0m [1m     std [0m [1m naive_se [0m [1m    mcse [0m ⋯
 [90m                             Symbol [0m [90m Float64 [0m [90m Float64 [0m [90m  Float64 [0m [90m Float64 [0m ⋯

  days_to_next_predicted_purchase[1]    9.1810    9.3432     0.2089    0.7770  ⋯
  days_to_next_predicted_purchase[2]    9.8585    9.7331     0.2176    0.8114  ⋯
  days_to_next_predicted_purchase[3]   10.624

Unnamed: 0_level_0,x1
Unnamed: 0_level_1,Tuple…
1,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
2,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
3,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
4,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
5,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
6,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
7,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
8,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
9,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"
10,"([0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0])"


In [38]:
ltv_df = DataFrame(chain_ltv)

Unnamed: 0_level_0,iteration,chain,acceptance_rate,days_to_next_predicted_purchase,hamiltonian_energy
Unnamed: 0_level_1,Int64,Int64,Float64,Float64,Float64
1,1,1,1.0,1.37874,21.8509
2,2,1,1.0,4.07641,15.0957
3,3,1,1.0,5.71715,9.18893
4,4,1,1.0,8.31854,7.91616
5,5,1,0.999205,7.2199,7.3525
6,6,1,1.0,11.8225,7.58637
7,7,1,1.0,8.66177,7.22491
8,8,1,1.0,9.47284,7.06928
9,9,1,0.999398,8.01086,7.16072
10,10,1,0.992705,19.3259,9.22283


In [None]:
ltv_df.

In [21]:
chain_ltv

Chains MCMC chain (200×10×1 Array{Float64,3}):

Iterations        = 1:200
Thinning interval = 1
Chains            = 1
Samples per chain = 200
parameters        = days_to_next_predicted_purchase
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size

Summary Statistics
 [1m                      parameters [0m [1m    mean [0m [1m     std [0m [1m naive_se [0m [1m    mcse [0m [1m  [0m ⋯
 [90m                          Symbol [0m [90m Float64 [0m [90m Float64 [0m [90m  Float64 [0m [90m Float64 [0m [90m  [0m ⋯

  days_to_next_predicted_purchase    9.3565    3.9885     0.2820    0.2072   1 ⋯
[31m                                                               2 columns omitted[0m

Quantiles
 [1m                      parameters [0m [1m    2.5% [0m [1m   25.0% [0m [1m   50.0% [0m [1m   75.0% [0m [1m   [0m ⋯
 [90m                          Symbol [0m [90m Float64 [0m [

Plot(...)