-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PNBD Dyncov Rcpp #205
Merged
Merged
PNBD Dyncov Rcpp #205
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
subvec with pointers to walk
cleanup includes
also pass accordingly
d_omega, Di, real walks as residual, EmptyWalk
remove _makewalks
life walks do not have delta (calculated based on i)
if there are less than 3 elements (should never be called in this case)
access walk data directly
and fix tests
cleanup todos
pschil
added a commit
that referenced
this pull request
Oct 23, 2023
* Add slash to name * Fix Pareto/NBD print name (#201) * Update author name * Formula interface (#203) * Github actions: Upgrade to v2 (#214) * clvdata: Plot Transaction Timings (#213) * Housekeeping: Makevars (#217) * Run tests in parallel (#216) * Draw other models in same plot (#215) * PNBD Dyncov Rcpp (#205) * Update docu: continuous discount rate (#223) * Drop explicit c++11 specification * Prepare release * Fix CRAN CPU usage & fix docu
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Definitions
Walks
Walk Definition
Aux walks (life and trans):
Covariates [t.x, T] = Covariates from last transaction to estimation end. A single walk per customer, of the same length for life and trans. Guaranteed to exist and such that they always contain the covariates where t.x and T are in. Created as if there is an artificial transaction at T.
Real life walk
Covariates [0, t.x] = Covariates from the first until last transaction. Only ever used in Di(). It may not overlap with the aux walk to avoid double counting in Di() and is therefore defined as all covariates before the aux walk (the "difference" of the aux walk [t.x, T] and the walk [0, t.x]). As a consequence, there is no real lifetime walk for customers having t.x in their first covariate period (mostly zero repeaters).
Real trans walk
Covariates [t_j, t_k] = Covariates from one repeat transaction to the next. There is no walk up to the first transaction and hence there are no walks for zero-repeaters. The number of walks varies per customer.
A covariate may appear twice (in two subsequent walks) if a transaction is on the covariate boundary because the transaction is used to define two walks: Transaction j is upper and lower boundary of subsequent walks, i.e. [tp_i, tp_j] and [tp_j, tp_k].
Walk Data
Walk data is stored in 4 separate data tables, in long format to facilitate passing it as a matrix to Rcpp
The tables are sorted by customer id, covariate date. Hence it is guaranteed that data belonging to the same walk comes in subsequent rows.
Columns
Walk Creation
All covariate data is defined for a closed interval [lower, upper] during which it is active, such that there are no gaps between covariates and subsequent covariates never overlap, i.e. [lower1, upper2], [lower2, upper2] where lower2=upper2+eps. It has to be stored as closed interval to ensure that it can be matched to the relevant transaction data. For every covariate (row), the upper and lower boundaries of this period is given, hence the covariate is active in the period [tp.cov.lower, tp.cov.upper].
A single walk is defined by 2 transactions and contains the covariate data that is active from the first to the second transaction, incl. the covariate on the day of the transaction itself. A walk is created by matching the interval from the first to second transaction of the walk to the interval of the covariate data using an interval join (foverlaps(type='any', mult='all')).
Transactions on the same timepoint are aggregated in clvdata so that there are no walk of length 0, but if the last transaction is on estimation end (t.x=T), the aux walk has start and end on the same timepoint. The aux walk then only consists of the covariate data active on this timepoint.
Differences to previous Implementation
Dyncov LL
Walk creation
Bug Fixes
Rcpp Implementation
LL
Classes
TransactionWalk, LifetimeWalk, EmptyLifetimeWalk
Represents covariate data of a single walk
Customer
All data, inkl walks, related to a single customer
Passing Data to the LL
walkinfo_{aux, real}_{life, trans}
Timing insights
Recovery Results
From 50 simulated datasets, each 104 weeks
Testing
Future improvements
Discarded Ideas
Discarded because it is hyp2f1() that drives runtime
Open, scoped out issues
0 <= d < 1
or0 < d <= 1
? #222 .