Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more efficient support for time-varying covariates in cyclopsdata for cox models #51

Open
myoung3 opened this issue Feb 9, 2021 · 1 comment

Comments

@myoung3
Copy link

myoung3 commented Feb 9, 2021

From the release package documentation

These columns are expected in the outcome object:
- stratumId (integer) (optional) Stratum ID for conditional regression models
- rowId (integer) Row ID is used to link multiple covariates (x) to a single outcome (y)
- y (real) The outcome variable
- time (real) For models that use time (e.g. Poisson or Cox regression) this contains time
(e.g. number of days)
- weights (real) (optional) Non-negative weights to apply to outcome
- censorWeights (real) (optional) Non-negative censoring weights for competing risk model; will be computed if not provided.

These columns are expected in the covariates object:
- stratumId (integer) (optional) Stratum ID for conditional regression models
- rowId (integer) Row ID is used to link multiple covariates (x) to a single outcome (y)
- covariateId (integer) A numeric identifier of a covariate
- covariateValue (real) The value of the specified covariate

The correct way to dealing with timevarying data in a cox model is to split each individual's follow-up period into multiple intervals at each change in their covariate value. Thus a time-varying dataset for cox analysis would have more than [edit] 1 row per person, and the above data spec would require the covariates object to have the same row length as the outcome object. In the case of a cox model with both time-varying and time-invariant variables, all of the time-invariant values would need to be repeated for every interval within participant. A more efficient data structure would allow a time-invariant covariate object which would join to the outcome object on participant id, along with a time-varying covariates object which would link to the outcome on both participant id and time.

@msuchard
Copy link
Member

Thanks for looking into this, @myoung3 . I am very interested in providing both a convenient and efficient interface to cyclops for time-varying covariates. Naturally, efficiency includes both in terms of "space" (as you bring up above) and "time" (compute speed that may decrease dramatically with the extra layer of memory-indirection).

Could I entice you to work on this further with my group?

Do you have a specific use-case in mind where performance / memory-usage becomes an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants