Most of CRAN submission prep.

LukeDuttweiler · May 14, 2024 · cce15df · cce15df
1 parent bca173a
commit cce15df
Show file tree

Hide file tree

Showing 34 changed files with 1,872 additions and 24 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -3,3 +3,7 @@
 ^LICENSE\.md$
 ^doc$
 ^Meta$
+^README\.Rmd$
+^bibliography.bib$
+^cran-comments\.md$
+^\.github$
diff --git a/.github/.gitignore b/.github/.gitignore
@@ -0,0 +1 @@
+*.html
diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml
@@ -0,0 +1,52 @@
+# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
+# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
+on:
+  push:
+    branches: [main, master]
+  pull_request:
+    branches: [main, master]
+
+name: R-CMD-check
+
+permissions: read-all
+
+jobs:
+  R-CMD-check:
+    runs-on: ${{ matrix.config.os }}
+
+    name: ${{ matrix.config.os }} (${{ matrix.config.r }})
+
+    strategy:
+      fail-fast: false
+      matrix:
+        config:
+          - {os: macos-latest,   r: 'release'}
+          - {os: windows-latest, r: 'release'}
+          - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
+          - {os: ubuntu-latest,   r: 'release'}
+          - {os: ubuntu-latest,   r: 'oldrel-1'}
+
+    env:
+      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
+      R_KEEP_PKG_SOURCE: yes
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: r-lib/actions/setup-pandoc@v2
+
+      - uses: r-lib/actions/setup-r@v2
+        with:
+          r-version: ${{ matrix.config.r }}
+          http-user-agent: ${{ matrix.config.http-user-agent }}
+          use-public-rspm: true
+
+      - uses: r-lib/actions/setup-r-dependencies@v2
+        with:
+          extra-packages: any::rcmdcheck
+          needs: check
+
+      - uses: r-lib/actions/check-r-package@v2
+        with:
+          upload-snapshots: true
+          build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,10 +1,10 @@
 Package: skipTrack
-Title: Bayesian Hierarchical Models Adjusting for Non-Adherence in Mobile Menstrual Cycle Tracking
-Version: 0.1.0
+Title: A Bayesian Hierarchical Model that Controls for Non-Adherence in Mobile Menstrual Cycle Tracking
+Version: 0.0.1
 Authors@R: 
-    person("Luke", "Duttweiler", , "lduttweiler@hsph.harvard.edu", role = c("aut", "cre"),
+    person("Luke", "Duttweiler", , "lduttweiler@hsph.harvard.edu", role = c("aut", "cre", 'cph'),
            comment = c(ORCID = "0000-0002-0467-995X"))
-Description: Implements a Bayesian hierarchical model of the same name designed to perform inference on cycle length mean and regularity given the possibility of non-adherence in cycle length self-tracking. Currently accepts baseline continuous covariates for cycle mean length and regularity. Future updates will include include categorical covariates, time-varying covariates, and the inclusion of external information regarding tracking skips.
+Description: Implements a Bayesian hierarchical model designed to identify skips in mobiel menstrual cycle self-tracking on mobiel apps. Future developments will allow for the inclusion of covariates affecting cycle mean and regularity, as well as extra information regarding tracking non-adherence. Main methods to be outlined in a forthcoming paper, with alternative models from Li et al. (2022) <doi:10.1093/jamia/ocab182>.
 License: MIT + file LICENSE
 Encoding: UTF-8
 Roxygen: list(markdown = TRUE)
@@ -20,6 +20,7 @@ Imports:
     glmnet (>= 4.1.0),
     gridExtra (>= 2.0),
     LaplacesDemon (>= 16.0.0),
+    lifecycle,
     mvtnorm (>= 1.2.0),
     optimg (>= 0.1.2),
     parallel (>= 4.0.0),

diff --git a/LICENSE b/LICENSE
@@ -1,2 +1,2 @@
-YEAR: 2023
+YEAR: 2024
 COPYRIGHT HOLDER: skipTrack authors
diff --git a/LICENSE.md b/LICENSE.md
@@ -1,6 +1,6 @@
 # MIT License
 
-Copyright (c) 2023 skipTrack authors
+Copyright (c) 2024 skipTrack authors
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/NAMESPACE b/NAMESPACE
@@ -11,6 +11,7 @@ export(skipTrack.simulate)
 export(skipTrack.visualize)
 importFrom(foreach,"%do%")
 importFrom(foreach,"%dopar%")
+importFrom(lifecycle,deprecated)
 importFrom(parallel,detectCores)
 importFrom(parallel,makeCluster)
 importFrom(stats,dbeta)

diff --git a/R/liInference.R b/R/liInference.R
@@ -19,6 +19,8 @@
 #'
 #' @return Numeric value representing the Monte Carlo estimate of the negative marginal log-likelihood.
 #'
+#' @references Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
+
 likVec <- function(pars = c(kappa = 180,
                             gamma = 6,
                             alpha = 2,

diff --git a/R/liMCMC.R b/R/liMCMC.R
@@ -1,4 +1,40 @@
-#NEEDS DOCUMENTATION
+
+#' Runs MCMC algorithm for performing inference using the model from Li et al. (2022)
+#'
+#' @description
+#' This function performs inference on cycle length data, assuming the model from Li et al. (2022). It is important to note
+#' that Li et al. does not actually use this algorithm as they target a particular analytic posterior predictive distribution,
+#' and solve directly. However, we are targeting a different posterior and thus use this MCMC to perform inference.
+#'
+#' @inheritParams skipTrack.MCMC
+#'
+#' @param S Integer. The maximum number of skips to consider possible.
+#' @param hyperparams Named numeric vector of hyperparameters containing the
+#' elements: kappa, gamma, alpha, beta. NOTE: MUST BE IN CORRECT ORDER.
+#'   - \code{kappa}: Numeric value, shape parameter of Gamma distribution for Lambda_i.
+#'   - \code{gamma}: Numeric value, rate parameter of Gamma distribution for Lambda_i.
+#'   - \code{alpha}: Numeric value, shape1 parameter of Beta distribution for Pi_i.
+#'   - \code{beta}:  Numeric value, shape2 parameter of Beta distribution for Pi_i.
+#' @param initialParams A list of initial parameter values for the MCMC algorithm.
+#' Default values are provided for pi, lambdais, piis, ss.
+#'
+#' @return A list containing the MCMC draws for each parameter at each iteration. Each element
+#' in the list is itself a list containing:
+#' \describe{
+#'   \item{ijDat}{A data.frame with updated parameters at the individual-observation level: Individual, ys, lambdais, piis, ss.}
+#'   \item{iDat}{A data.frame with updated parameters at the individual level: Individual, lambdas, pis.}
+#'   \item{kappa}{Fixed value of hyperparameter kappa.}
+#'   \item{gamma}{Fixed value of hyperparameter gamma.}
+#'   \item{alpha}{Fixed value of hyperparameter alpha.}
+#'   \item{beta}{Fixed value of hyperparamter beta.}
+#'   \item{S}{Fixed input value S.}
+#'   \item{indFirst}{A logical vector indicating the first occurrence of each individual.}
+#' }
+#'
+#' @seealso \code{\link{gibbsStepLi}}
+#'
+#' @references Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
+#'
 liMCMC <- function(Y,
                    cluster,
                    S,
@@ -43,6 +79,33 @@ liMCMC <- function(Y,
   return(fullDraws)
 }
 
+#' Gibbs Step Li - One MCMC step for the Li Model
+#'
+#'
+#'
+#' @param ijDat A data.frame with parameters at the individual-observation level: Individual, ys, lambdais, piis, ss.
+#' @param iDat A data.frame with parameters at the individual level: Individual, lambdas, pis.
+#' @param kappa Fixed value of hyperparameter kappa.
+#' @param gamma Fixed value of hyperparameter gamma.
+#' @param alpha Fixed value of hyperparameter alpha.
+#' @param beta Fixed value of hyperparamter beta.
+#' @param S Fixed input value S.
+#' @param indFirst A logical vector indicating the first occurrence of each individual.
+#'
+#' @return A list containing one MCMC draws for each parameter. Elements are:
+#' \describe{
+#'   \item{ijDat}{A data.frame with updated parameters at the individual-observation level: Individual, ys, lambdais, piis, ss.}
+#'   \item{iDat}{A data.frame with updated parameters at the individual level: Individual, lambdas, pis.}
+#'   \item{kappa}{Fixed value of hyperparameter kappa.}
+#'   \item{gamma}{Fixed value of hyperparameter gamma.}
+#'   \item{alpha}{Fixed value of hyperparameter alpha.}
+#'   \item{beta}{Fixed value of hyperparamter beta.}
+#'   \item{S}{Fixed input value S.}
+#'   \item{indFirst}{A logical vector indicating the first occurrence of each individual.}
+#' }
+#'
+#' @references Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
+#'
 gibbsStepLi <- function(ijDat, iDat, kappa, gamma, alpha, beta, S, indFirst){
   #Now i level
   newLambdais <- lapply(iDat$Individual, function(ind){

diff --git a/R/liPosteriors.R b/R/liPosteriors.R
@@ -4,10 +4,10 @@
 #' for lambda_i in the Li algorithm, given the observed values y_ij, the indicators s_ij,
 #' and the prior hyperparameters priorK and priorG.
 #'
-#' @param yij Vector of observed values for lambda_i.
-#' @param sij Vector of indicators for lambda_i.
-#' @param priorK Prior hyperparameter for K.
-#' @param priorG Prior hyperparameter for G.
+#' @param yij Vector of observed values for individual i.
+#' @param sij Vector of cycle skip indicators for individual i.
+#' @param priorK Prior hyperparameter kappa.
+#' @param priorG Prior hyperparameter gamma.
 #'
 #' @return A random draw from the posterior distribution of lambda_i.
 #'
@@ -24,9 +24,19 @@ postLambdai <- function(yij, sij, priorK, priorG){
   return(rep(dr, n))
 }
 
-#Function to compute random draw from posterior for pi_i in Li algorithm.
-#This one requires an MH step. Using the posterior of true geometric from the calculated hyperparameters as a proposal
-
+#' Compute M-H draw for pi_i in Li algorithm
+#'
+#' This performs a Metropolis-Hastings draw for pi_i, assuming s_ij follows a truncated geometric distribution with parameters
+#' pi_i and S. The proposal distribution for pi_i is Beta(alpha, beta).
+#'
+#' @param sij Vector of cycle skip indicators for individual i
+#' @param currentPii Current value of pi_i
+#' @param priorA Hyperparameter alpha.
+#' @param priorB Hyperparameter beta.
+#' @param S Maximum number of skips allowed in algorithm
+#'
+#' @return Draw for pi_i, repeated for the number of observations from individual i
+#'
 postPii <- function(sij, currentPii, priorA, priorB, S){
   #n is the number of cycles for this individual
   n <- length(sij)

diff --git a/R/skipTrack-package.R b/R/skipTrack-package.R
@@ -4,6 +4,7 @@
 ## usethis namespace: start
 #' @importFrom foreach %do%
 #' @importFrom foreach %dopar%
+#' @importFrom lifecycle deprecated
 #' @importFrom parallel detectCores
 #' @importFrom parallel makeCluster
 #' @importFrom stats dbeta

diff --git a/README.Rmd b/README.Rmd
@@ -0,0 +1,162 @@
+---
+output: github_document
+bibliography: 'bibliography.bib'
+header-includes: 
+  - \usepackage{amsmath}
+  - \usepackage{xcolor}
+---
+
+<!-- README.md is generated from README.Rmd. Please edit that file -->
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>",
+  fig.path = "man/figures/README-",
+  out.width = "100%"
+)
+```
+
+# skipTrack
+
+<!-- badges: start -->
+[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
+[![R-CMD-check](https://github.com/LukeDuttweiler/skipTrack/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/LukeDuttweiler/skipTrack/actions/workflows/R-CMD-check.yaml)
+<!-- badges: end -->
+
+Welcome to the SkipTrack Package! 
+
+SkipTrack is a Bayesian hierarchical model for self-reported menstrual cycle length data on mobile health apps. The model is an <!-- significant --> extension of the hierarchical model presented in @li2022predictive that focuses on predicting an individual's next menstrual cycle start date while accounting for cycle length inaccuracies introduced by non-adherence in user self-tracked data.
+
+## Installation
+
+```{r}
+#Install from CRAN
+install.packages('skipTrack')
+
+#Install Development Version
+devtools::install_github("LukeDuttweiler/skipTrack")
+```
+
+## Model
+
+@li2022predictive notes that apps designed to help users track their menstrual cycles "are subject to adherence artifacts that may obscure health-related conclusions: if a user forgets to track their period, their cycle length computations are inflated." This is visualized in the image below in which the numbers represent days after the initial bleeding day is recorded in the app, $\color{red}{\text{red}}$ days are bleeding days recorded by the user, and $\color{blue}{\text{blue}}$ days are bleeding days not recorded by the user. 
+
+$$\overbrace{\underbrace{\color{red}{1, 2, 3, 4}, 5,  \dots, 29}_\text{True Cycle, 29 Days}}^\text{Recorded Cycle, 29 Days}, \overbrace{\underbrace{\color{red}{30, 31, 32, 33}, 34,  \dots, 61}_\text{True Cycle, 32 Days}, \underbrace{\color{blue}{62, 63, 64, 65}, 66,  \dots, 90}_\text{True Cycle, 29 Days}}^\text{Recorded Cycle, 61 Days}$$
+
+The SkipTrack model extends the model given by @li2022predictive by specifying parameters for each individuals for cycle length regularity, as well as their cycle length mean, and weakening assumptions made by Li et al. on the probability of failing to track a cycle.
+
+<!--and by allowing for other sources of data to help identify associations between covariates and cycle length mean or regularity, while still accounting for skips in self-tracking adherence. -->
+
+In short, the modeling framework assumed by SkipTrack is as follows. The observed cycle lengths are represented with $y_{ij}$ where $1 \leq i \leq n$ represents an individual who has contributed $n_i$ observations, with $1 \leq j \leq n_i$. We assume that 
+
+$$
+y_{ij} \sim \text{LogNormal}\big(\mu_i + \log(c_{ij}), \tau_i\big),
+$$
+where $\mu_i$ is an individual level mean parameter, $\tau_i$ is an individual level precision parameter, and $c_{ij}$ is an integer-valued parameter representing the number of true cycles present in the observed cycle $y_{ij}$. That is, if $c_{ij} = 1$ then $y_{ij}$ is a true cycle, if $c_{ij} = 2$ then $y_{ij}$ gives the length of two true cycles added together, and so on. 
+
+We then assume 
+
+$$
+\mu_i \sim \text{Normal}(\mu, \rho) \mspace{100mu}\tau_i \sim \text{Gamma}(\theta, \phi)
+$$
+
+where $\rho$ is a precision parameter, and the Gamma distribution above is parameterized by mean ($\theta$) and rate $\phi$.
+
+<!--We then include covariates from two matrices $X$ and $Z$ (which may be, but are not necessarily, equal) by 
+
+$$
+\mu_i \sim \text{Normal}\big(X_i^T\beta, \rho\big) \mspace{100mu}\tau_i \sim \text{Gamma}\big(\exp(Z_i^T\Gamma), \phi\big) 
+$$
+where $\rho$ is a precision parameter, the Gamma distribution above is parameterized by mean and rate, and $X_i$ and $Z_i$ are the $i$th rows of $X$ and $Z$ respectively. -->
+
+This is a fully interpretable model that allows for the identification of skipping in cycle tracking, while allowing for different individual's regularities, and accounting for uncertainty in the model. A paper discussing the full model details will be published soon. 
+
+## Example Usage
+
+# Package Usage
+
+The SkipTrack package provides functions for fitting the SkipTrack model, evaluating model run diagnostics, retrieving and visualizing model results, and simulating related data. We begin our tutorial by examining some simulated data. 
+
+```{r}
+library(skipTrack)
+```
+
+First, we simulate data on 100 individuals from the SkipTrack model where each observed $y_{ij}$ value has a 75% probability of being a true cycle, a 20% probability of being two true cycles recorded as one, and a 5% probability of being three true cycles recorded as one. 
+
+```{r}
+#Simulate data
+dat <- skipTrack.simulate(n = 100, model = 'skipTrack', skipProb = c(.75, .2, .05))
+
+names(dat)
+```
+
+The result of the simulation function is simply a named list with various components. The (currently) important components are
+
+
+  * `Y`: the $y_{ij}$ values, observed outcomes
+  * `cluster`: the $i$ values, individual markers 
+  * `NumTrue`: the $c_{ij}$ values, number of true cycles in an observed cycle
+  * `Underlying`: underlying parameters pertaining to the specific model used for data simulation
+
+<!--
+* `X`: the matrix $X$, covariates for cycle length mean
+  * `Z`: the matrix $Z$, covariates for cycle length regularity
+  * `Beta`: the true values of $\beta$, parameters for cycle length mean
+  * `Gamma`: the true values of $\Gamma$, parameters for cycle length regularity
+-->
+
+Looking at the histogram of `dat$Y`, we can see a clear mixture of at least two distributions, one centered around 30 days, and another centered near 60 days (corresponding to the true cycles and observed cycles containing two true cycles respectively), which is what we expect based on our generation. 
+
+```{r, fig.align='center', fig.width = 7}
+#Histogram of observed outcomes
+hist(dat$Y, breaks = 10:150)
+```
+
+Fitting the SkipTrack model using this simulated data requires a call to the function `skipTrack.fit`. Note that because this is a Bayesian model and is fit with an MCMC algorithm, it can take some time with large datasets and a high number of MCMC reps and chains. 
+
+In this code we ask for 4 chains, each with 1000 iterations, run sequentially. Note that we recommend allowing the sampler to run longer than this (usually at least 5000 iterations per chain), but we use a short run here to save time.
+
+If `useParallel = TRUE`, the MCMC chains will be evaluated in parallel, which helps with longer runs.
+
+```{r}
+ft <- skipTrack.fit(Y = dat$Y, cluster = dat$cluster,
+                    reps = 1000, chains = 4, useParallel = FALSE)
+```
+
+Once we have the model results we are able to examine model diagnostics, visualize results from the model, and view a model summary. 
+
+### Diagnostics
+
+Multivariate, multichain MCMC diagnostics, including traceplots, Gelman-Rubin diagnostics, and effective sample size, are all available for various parameters from the model fit. These are supplied using the `genMCMCDiag` package, see that packages' documentation for details.
+
+Here we show the output of the diagnostics on the $c_{ij}$ parameters, which show that (at least for the $c_{ij}$ values) the algorithm is mixing effectively (or will be, once the algorithm runs a little longer).
+
+```{r, fig.align='center', fig.width = 7, fig.height=7}
+skipTrack.diagnostics(ft, param = 'cijs')
+```
+
+### Visualization
+
+In order to see some important plots for the SkipTrack model fit, you can simply use `plot(ft)`, and the plots are directly accessible using `skipTrack.visualize(ft)`.
+
+```{r, fig.align='center', fig.width = 7, fig.height=7}
+plot(ft)
+```
+
+### Summary
+
+A summary is available for the SkipTrack model fit with `summary(ft)`, with more detailed results accessible through `skipTrack.results(ft)`. Importantly, these results are based on a default chain burn-in value of 750 draws. This can be changed using the parameter `burnIn` for either function.
+
+```{r}
+summary(ft)
+
+summary(ft, burnIn = 500)
+```
+
+This introduction provides enough information to start fitting the SkipTrack model. For further information regarding different methods of simulating data, additional model fitting, and tuning parameters for fitting the model, please see the help pages. Additional vignettes are forthcoming.
+
+\newpage
+
+## Bibliography
+