Skip to content

Commit

Permalink
Brent and Shruthi paper comments
Browse files Browse the repository at this point in the history
  • Loading branch information
LukeDuttweiler committed Jun 3, 2024
1 parent 0bd401e commit 9f9c1c1
Showing 1 changed file with 11 additions and 13 deletions.
24 changes: 11 additions & 13 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,16 @@ bibliography: bibliography.bib

# Summary

Mobile apps that allow users to self-track menstrual cycle lengths and symptoms are now widely available and frequently used [@fox2010mobile]. Multiple studies (consider [@bull2019real; @li2020characterizing;@mahalingaiah2022design]) have taken advantage of these uniquely large data sets to gain insight on characteristics of the menstrual cycle, which is an important vital sign [@diaz2006menstruation]. Unfortunately, due to the self-tracking nature of the gathered data, recorded cycle lengths may be inflated if users do not accurately document all period dates in the app. A non-trivial number of incorrectly inflated cycle lengths in a data set will be damaging to the reliability and reproducibility of analysis results.
Mobile apps that allow users to self-track menstrual cycle lengths and symptoms are now widely available and frequently used [@fox2010mobile]. Multiple studies (consider [@bull2019real; @li2020characterizing;@mahalingaiah2022design]) have taken advantage of these uniquely large data sets to gain insight on characteristics of the menstrual cycle, which is an important vital sign [@diaz2006menstruation]. Due to the self-reported nature of the gathered data, recorded cycle lengths may be inflated if users skip tracking any cycle related bleeding events in the app. A non-trivial number of incorrectly inflated cycle lengths in a data set will be damaging to the reliability and reproducibility of analysis results.

Current solutions to this problem of non-adherence in cycle tracking include removing cycles that exhibit no user-app interaction [@li2020characterizing], identifying possibly inaccurate cycles based on user-specific average cycle lengths [@li2022predictive], or *ad hoc* removal of cycles based on well-established menstrual cycle characteristics. The `skipTrack` package adapts and advances the Bayesian approach of @li2022predictive by identifying possible skips in cycle tracking based on user-specific average cycle lengths **and** user-specific cycle regularity.
Current solutions to this problem of non-adherence (skipped tracking) in cycle length reporting include removing implausibly long cycles that exhibit no user-app interaction [@li2020characterizing], identifying possibly inaccurate cycles based on user-specific average cycle lengths [@li2022predictive], or *ad hoc* removal of cycles based on well-established menstrual cycle characteristics such as average cycle length or cycle length difference. The `skipTrack` package implements a Bayesian hierarchical model that is the first to explicitly use information on both an individual's cycle length **and** regularity to identify errors in recorded cycle lengths that arise from user non-adherence in logging one or more bleeding events.

# Statement of need

Analyses involving large amount of user-tracked menstrual cycle data sets are becoming more prevalent. Identifying skips in cycle tracking is crucially important for maintaining the validity of these studies. The `skipTrack` package provides easy to use software in R that can identify skips in menstrual cycle data based on a pre-specified Bayesian hierarchical model. The resulting inference on possible skipped cycles may then be included by a researcher *a priori* in an analysis, or may be used to develop a multiple-imputation scheme.
Analyses involving large amount of user-tracked menstrual cycle data sets are becoming more prevalent. Identifying recorded cycle lengths that result from skips in tracking one or more period bleeding events (hereafter referred to as 'skipped cycles') is crucially important for maintaining the validity of these studies. The `skipTrack` package provides easy to use software in R that can identify skipped cycles in menstrual cycle data based on a pre-specified Bayesian hierarchical model. The resulting inference on possible skipped cycles may then be included by a researcher *a priori* in an analysis, or may be used to develop a multiple-imputation scheme.

Additionally, while based on the Bayesian hierarchical model from [@li2022predicitve], the model used by `skipTrack` includes components for both cycle length mean and regularity. This allows the model to correctly adjust for individuals with irregular cycles who are often excluded from menstrual cycle analyses, despite the important information their data contains.
Finally, several extensions to the current `skipTrack` model and software are planned. These include the addition of regression models for both cycle length mean and regularity, an auto-regressive modeling structure for sequential cycle lengths from the same individual, and a method for the inclusion of user-app interaction or other external data to help with skip identification. These updates, along with open availability and ease-of-use, will provide researchers easy access to high level modeling techniques for mobile menstrual cycle data.
Additionally, while based on the Bayesian hierarchical model from [@li2022predictive], the model used by `skipTrack` includes components for both cycle length mean and regularity. This allows the model to correctly adjust for individuals with irregular cycles who are often excluded from menstrual cycle analyses, despite the important information their data contains.
Finally, the `skipTrack` model and software lead to many possible useful extensions including the addition of regression models for both cycle length mean and regularity, an auto-regressive modeling structure for sequential cycle lengths from the same individual, and a method for the inclusion of user-app interaction or other external data to help with skip identification. These updates, along with open availability and ease-of-use, will provide researchers easy access to high level modeling techniques for mobile menstrual cycle data.

# The SkipTrack Model

Expand All @@ -60,29 +60,27 @@ where the natural log of $\mu$ gives the overall population cycle length median,
Finally,

$$
c_{ij} \sim \text{Categorical}(\pi_1, \pi_2, \dots, \pi_{NS})
c_{ij} \sim \text{Categorical}(\pi_1, \pi_2, \dots, \pi_{K})
$$

where $\pi_k = \text{Pr}(c_{ij} = k)$ and $NS$ is the maximum number of skips allowed in the model.
where $\pi_k = \text{Pr}(c_{ij} = k)$ and $K$ is the maximum number of skips allowed in the model.

# Package Description

The `skipTrack` package contains tools for fitting the SkipTrack model, visualizing model results, diagnosing model convergence, and simulating example data.

The model fit is accomplished using an MCMC algorithm composed mainly of Gibbs sampling steps with a small number of Metropolis-Hastings steps. Model fitting is accomplished through an easy-to-use interface that allows users to select the number of MCMC chains to run, the number of iterations to run per chain, and the parameters used to initialize each chain. Model results may be visualized or retrived through standard interaction functions (`summary()`, `plot()`, etc.).
In order to fit the model, the code employs a Markov Chain Monte Carlo (MCMC) algorithm composed of Gibbs sampling steps. Model fitting is accomplished through an easy-to-use interface that allows users to select the number of MCMC chains to run, the number of iterations to run per chain, and the parameters used to initialize each chain. Model results may be visualized or retrived through standard reporting and visualization functions (`summary()`, `plot()`, etc.).

MCMC convergence diagnostics are multivariate and multi-chain and are provided using the R package `genMCMCDiag` @genMCMCPackage.
MCMC convergence diagnostics (traceplots, effective sample size, and the Gelman-Rubin potential scale reduction factor) are multivariate and multi-chain and are provided using the R package `genMCMCDiag` [@genMCMCPackage].

Example data may be simulated from the SkipTrack model, the generative model provided in @li2022predictive, or a provided mixture model.
Functions are included which allow a user to simulate example data from the SkipTrack model, the generative model provided in @li2022predictive, or a provided mixture model.

# Availability

A stable version of `skipTrack` is available on CRAN, and a development version is publicly available on GitHub (https://github.com/LukeDuttweiler/skipTrack).

# Acknowledgements

Research reported in this publication was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health (NIH) under award number T32ES007142. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

(Shruthi, Brent, grant info?)
Research reported in this publication was supported by the National Institute of Environmental Health Sciences (NIEHS) Grants T32ES007142, P30 ES000002, and R01 ES035106. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

# References

0 comments on commit 9f9c1c1

Please sign in to comment.