Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for the BG/NBD model might be confusing scale and rate parameter for alpha value? #174

Open
kaybenleroll opened this issue Jun 28, 2021 · 5 comments
Labels
documentation Improvements or additions to documentation
Projects

Comments

@kaybenleroll
Copy link

Hi,

Love the package, but I am a bit confused by the docs for the BG/NBD model in terms of the parameterisation of the Gamma distribution for the purchase rate.

The documentation says it has shape parameter r and scale parameter alpha, but then says the mean of this is r / alpha. However, this suggests it is the inverse-scale (i.e. the rate).

This would match data I have as well, based on using rgamma() to randomly generate values to mimic the data. Should we interpret r and alpha as the shape and rate instead?

I had a look through the code to check the log-likelihood calculation, but my C++ templating is not good enough to quite follow it.

My apologies if I have misunderstood this but I figured it was worth asking and happy to help out in discussing this further if my point is not clear - keep up the great work!

@pschil
Copy link
Collaborator

pschil commented Jul 2, 2021

Im not as familiar with the BG/NBD as I am with the Pareto/NBD but according to my knowledge, the heterogeneity in the transaction rates is also assumed to follow a shape-rate parametrized gamma distribution with r=shape and alpha=rate (same as in the Pareto/NBD). Hence, the documentation is wrong and should be "shape parameter r and rate parameter alpha".

For various reasons mainly about the P(alive) I would generally recommend against using the BG/NBG:

  • For one-time buyers (x=0), the is P(alive)=1 always.
  • The more transactions a customer makes, the more opportunity to die he/she has. In comparison to the Pareto/NBD, the BG/NBD often seems to substantially underestimate P(alive) for loyal customers.

The P(alive) is the probability by which the model regards a customer as alive at estimation end T but not much can be interpreted into the P(alive) itself. Rather it is a part of the much more relevant CET calculations (in P/NBD also of DERT).

I would recommend using the Pareto/NBD whenever in a continuous-time / non-contractual setting. To my knowledge, the BG/NBD exists mainly because it is simple enough to be implemented in a spreadsheet (ie management-appeal) and we implemented it for completeness.

@kaybenleroll
Copy link
Author

Yep, all that makes sense.

I would certainly prefer to use P/NBD alright, but I have had major issues fitting it with the dataset I am using (Online Retail II from the UCI repository) - the issue seems to be that there is only 2 years worth of transactions, so is it possible that that long tail of lifetimes is not represented in the data and results in a lack of identifiability in the parameters.

I am planning to implement a Stan-based Bayesian model of all the above for my next workshop series, and I'll certainly keep you posted on the progress of that if you think it would be interesting?

Thanks for all your help, I will see about making those changes in the docs for you if you like?

@mmeierer
Copy link
Collaborator

mmeierer commented Jul 2, 2021

Thanks for pointing us to this issue in the documentation.

Adding to Patrik's comment, you might be also be interested in the recommendation of Peter Fader with regards to the BG/NBD model. For this, please see the screenshot in the following GitHub issue: #5

@pschil
Copy link
Collaborator

pschil commented Jul 4, 2021

only 2 years worth of transactions

One of the main strength of these models is that they normally require only very few transactions and periods for good results. Assuming periods defined as weeks, 2 years (104 periods) should be plenty.

so is it possible that that long tail of lifetimes is not represented in the data and results in a lack of identifiability in the parameters.

Im not sure if I can follow. The Pareto/NBD without covariates has no identification issues. There are however occasionally numerical instabilities if the parameters or input data (mainly x, the number of transactions) take extreme values.
I have shortly dabbled with the UCI Retail II dataset myself but did not have any issues. Mind however that these models are conventionally applied per cohort. Customers are usually assigned to cohorts based on their first transaction (coming alive) and model is then applied separately to each cohort.
See previous issues and the FAQ for more info: #172, #172 (comment), #146
Note that the resulting tracking plots then should look like here with a falling line after an initial spike: Page 9, https://cran.r-project.org/web/packages/CLVTools/vignettes/CLVTools.pdf

In case the heterogeneity in the customer base cannot be adequately captured with the gamma distribution, one might be able to solve this by splitting the customers in a cohort into further sub-groups and applying the model separately on each of them. However, from my own experience, the gamma distribution's limited flexibility in most cases seems to be beneficial for prediction because it limits overfitting.

I am planning to implement a Stan-based Bayesian model of all the above for my next workshop series

I guess you would apply the Pareto/NBD in the insurance industry. May I ask in what specific context it is used / what is being predicted with it?

There certainly already exist a few Stan implementations of the Pareto/NBD but I am not familiar with any of them, see:
https://github.com/wpbindt/clv-model
https://gist.github.com/capers/9bd73b57e604db376e75
https://www.briancallander.com/posts/customer_lifetime_value/pareto-nbd.html

I will see about making those changes in the docs for you if you like?

Feel free to create a PR to dev but im not sure what the current access rights for contributors are (im not the repo owner).

@mmeierer mmeierer added this to To do in v0.9.1 via automation Jul 5, 2021
@mmeierer mmeierer added bug Something isn't working documentation Improvements or additions to documentation and removed bug Something isn't working labels Jul 5, 2021
@kaybenleroll
Copy link
Author

One of the main strength of these models is that they normally require only very few transactions and periods for good results. Assuming periods defined as weeks, 2 years (104 periods) should be plenty.

Ah, okay, that's interesting. My assumption was that with just two years of data the right tail of lifetimes is heavily censored and that may cause identifiability issues as a result of that. Good to know that should be a lot.

so is it possible that that long tail of lifetimes is not represented in the data and results in a lack of identifiability in the parameters.

Im not sure if I can follow. The Pareto/NBD without covariates has no identification issues. There are however occasionally numerical instabilities if the parameters or input data (mainly x, the number of transactions) take extreme values.
I have shortly dabbled with the UCI Retail II dataset myself but did not have any issues. Mind however that these models are conventionally applied per cohort. Customers are usually assigned to cohorts based on their first transaction (coming alive) and model is then applied separately to each cohort.
See previous issues and the FAQ for more info: #172, #172 (comment), #146
Note that the resulting tracking plots then should look like here with a falling line after an initial spike: Page 9, https://cran.r-project.org/web/packages/CLVTools/vignettes/CLVTools.pdf

Thanks for that, I will have a read, but segmenting by cohort certainly makes sense, and leads to some interesting possiblities for adding hierarchies over these cohorts in futures versions of these models.

In case the heterogeneity in the customer base cannot be adequately captured with the gamma distribution, one might be able to solve this by splitting the customers in a cohort into further sub-groups and applying the model separately on each of them. However, from my own experience, the gamma distribution's limited flexibility in most cases seems to be beneficial for prediction because it limits overfitting.

That makes a lot of sense - I completely agree on the simpler model improving prediction robustness.

I am planning to implement a Stan-based Bayesian model of all the above for my next workshop series

I guess you would apply the Pareto/NBD in the insurance industry. May I ask in what specific context it is used / what is being predicted with it?

Of course - actually insurance is the day-job but I run a lot of more general workshops and talks as part of a Meetup group I run here in Dublin, and all of this is being done in that context. I'm doing a general workshop on data projects and learning these models was one of the ways I made it interesting for myself, so the use of the model is pretty much that standard idea of getting a feel for customer value and so on.

That being said, there may be a use for this in terms of claims development as changes to claims amounts is effectively equivalent to non-contractual purchases, but this is little more than a half-baked idea in my head. My main reason for doing this is due to finding the ideas interesting, and they appear to be used a lot in a retail setting (though that is just based on reading around that I have done).

There certainly already exist a few Stan implementations of the Pareto/NBD but I am not familiar with any of them, see:
https://github.com/wpbindt/clv-model
https://gist.github.com/capers/9bd73b57e604db376e75
https://www.briancallander.com/posts/customer_lifetime_value/pareto-nbd.html

I had seen a few of those but not all of them, they were definitely where I was planning to start, so thanks for that. I probably won't be starting this second series of workshops for at least 3-4 months but I'll likely start working on it sooner than that, so I will keep you posted just in case it is of interest.

I will see about making those changes in the docs for you if you like?

Feel free to create a PR to dev but im not sure what the current access rights for contributors are (im not the repo owner).

I'm sure I can create some kind of fork and pull-request anyway, as I think the changes will be minor to fix it. Happy to contribute if I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
v0.9.1
  
To do
Development

No branches or pull requests

3 participants