Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGBB model - Data structures to fit models #88

Open
pschil opened this issue Apr 27, 2020 · 2 comments
Open

BGBB model - Data structures to fit models #88

pschil opened this issue Apr 27, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects

Comments

@pschil
Copy link
Collaborator

pschil commented Apr 27, 2020

While implementing the BGBB model, it became clear that the transaction history does not suffice as input for all models. Because the BGBB model is for a discrete-time setting, it requires additional information on the transaction opportunities, potentially for each customer differently.

Providing this functionality through the existing clv.data object which represents the full transaction history blurs the lines of responsibility (=Single Responsibility Principle). It would require all kinds of internal case differentiations in clv.data (ie has transaction opportunities or not?) that hamper maintenance. It would also create a much more challenging user-interface although this functionality is in fact only used for a single model.

Rather one class should do one thing only and do it well. Therefore, to simplify usage and encapsulate distinct functionality into separate objects, I suggest to separate the transaction opportunity functionality from the transaction history:

clv.transactions
This is the full transaction history of each customer which allows to add static and dynamic covariate data. This is what clv.data currently is.

clv.transaction.opportunities
A separate data structure that contains the transactions opportunities for every customer, potentially a duration in case TOs stretch over a period (ie a TO is a week). In combination with clv.transactions this can be used to fit a discrete-time model.

Usage
Fitting a continuous time model remains the same while for discrete time models, it would required an additional input.

clv.trans <- clv.transactions(cdnow, "ymd", "w", 37)
pnbd(clv.trans)

clv.TO <- clv.transaction.opportunities(table)
bgbb(clv.trans, clv.TO)

Another common use case is that end users do not have the full transaction history because it can be huge. Rather users are given a summary of all transactions pulled from some DB (last transaction, number of transaction, mean spending, etc). To support this use case, I suggest to add data structures:

clv.transaction.summary
Contains the minimal information per customer to create the model cbs. Notably, this differs from the cbs as that the values given to create it do not imply a time unit already: The recency is not given as a number (ie 34) what rather calculated based on dates to allow for different time units. It allows to add static covariates but not dynamic.

clv.cbs
In order to reproduce results from papers such as for the BGBB or for expert users familiar with the models, it provides an additional way to fit a model. It allows to add static covariates but not dynamic. This could replace the current way that the cbs is currently stored internally (ie as simple data.table). Note, that they are specific to one model only (ie required columns).

Usage

trans.summary <- data.table(Id=1, last.trans="2005-03-01", first.trans="2007-08-21", n.trans=8, mean.spending=41)
clv.summary <- clv.transaction.summary(trans.summary, "ymd", "weeks")
pnbd(clv.summary)

cbs.pnbd <- clv.pnbd.cbs(data.table(Id=1, recency=1, frequency=8, mean.spending=41))
pnbd(cbs.pnbd)

@bachmannpatrick @mmeierer @niels89 critique and comments?

@mmeierer mmeierer added the enhancement New feature or request label Apr 27, 2020
@mmeierer mmeierer added this to To do in v0.6 via automation Apr 27, 2020
@mmeierer mmeierer added this to the v0.6 milestone Apr 27, 2020
@pschil
Copy link
Collaborator Author

pschil commented Apr 29, 2020

As discussed with Patrick, it might be more desirable to have entirely distinct classes for continuous- and discrete-time data. Reasons are to sensitize users for the differences and that the plots and summary statistics to produce are inherently different.

@mmeierer
Copy link
Collaborator

mmeierer commented May 1, 2020

I see the reasons for having two different classes and agree with your line of argumentation.

@pschil pschil removed this from To do in v0.6 Jun 2, 2020
@pschil pschil added this to To do in v0.7 via automation Jun 2, 2020
@pschil pschil modified the milestones: v0.6, v0.7 Jun 2, 2020
@mmeierer mmeierer removed this from To do in v0.7 Jun 16, 2020
@mmeierer mmeierer added this to To do in v1.0 via automation Jun 16, 2020
@mmeierer mmeierer modified the milestones: v0.7, v1.0 Jun 16, 2020
@mmeierer mmeierer moved this from To do to In progress in v1.0 Jun 16, 2020
@mmeierer mmeierer changed the title Data structures to fit models BGBB model - Data structures to fit models Jun 16, 2020
@mmeierer mmeierer modified the milestones: v1.0, v0.9 Oct 2, 2020
@mmeierer mmeierer removed this from In progress in v1.0 Oct 2, 2020
@mmeierer mmeierer added this to To do in v1.1 via automation Oct 2, 2020
@mmeierer mmeierer modified the milestones: v0.9, v1.1 Jan 30, 2021
@mmeierer mmeierer added the help wanted Extra attention is needed label Mar 2, 2021
@mmeierer mmeierer removed this from the v1.1 milestone Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
v1.1
To do
Development

No branches or pull requests

4 participants