Skip to content

aimeertaylor/Pv3Rs

Repository files navigation

Pv3Rs

An R package developed to support model-based classification, using genetic data, of the cause of recurrent Plasmodium vivax malaria, i.e. the Pv3Rs:

  • Relapse
  • Reinfection
  • Recrudescence

Please be aware of the following points!

Unstable state:

The R package is currently under development and thus liable to contain many errors. A model around which it is being developed is documented in the preprint [1], which is a continuation of [2].

[1] Taylor, Foo & White, 2022

[2] Taylor & Watson et al. 2019

Prior considerations:

  • Genetic data are modelled using a Bayesian model, whose prior is ideally informative (in [2] prior estimates were generated by a time-to-event model) because the cause of recurrent P. vivax malaria is not always identifiable from genetic data alone (when the data suggest that recurrent parasites are relatively unrelated to those in all preceding infections, both reinfection and relapse are plausible; meanwhile, when the data suggest that recurrent parasites are clones of those in the preceding infection, both recrudescence and relapse are plausible).

  • The main Pv3Rs function, which is called compute_posterior(), could be used to estimate the cause of recurrent Plasmodium falciparum malaria by setting the prior probability of relapse to zero. In fact, we might re-name this R package and use it as a basis for more streamlined versions Pv3R and Pf2R.

Model assumptions and their implications:

  1. For a given episode, the MOI, which is modelled as a fixed variable, is given by the maximum per-marker allele count.
  2. No within-host mutations, genotyping errors or undetected alleles (renders recrudescence inference brittle).
  3. Relationship graphs are equally likely given recurrence states.
  4. Parasites are outbred.

Repeat NA values are included in per-marker allele counts to facilitate model checking. They could be used to encode MOI estimates based on external data. We have not explored fully the ramifications of doing so, however. In any case, markers used for recurrent-state inference ought to be as diverse as markers used for MOI inference.

We recommend a sensitivity analysis to explore the impact of genotyping errors. For example, one could compare recurrent state estimates across modified data sets, where data sets are based on the observed data modified in silico assuming different error rates and models.

The third assumption listed above has a small but undesirable effect on posterior estimates when relationship graphs grow in size. This is demonstrated in one of the examples of compute_posterior() and will be explained in more detail in an upcoming vignette.

Computational limits:

  • We do not recommend running compute_posterior() for data whose total genotype count exceeds eight, where the total genotype count is the sum of per-episode maximum per-marker allele counts, i.e., the sum of per-episode MOIs - see first assumption above. If total genotype counts exceed eight due to multiple recurrences, it might be possible to generate recurrent state estimates for individual recurrences (this approach was used in [2]).
  • We have not yet tested the marker-count limit of compute_posterior(). If available data take too long to process, it might be necessary to estimate 3R probabilities using marker subsets. Estimates based on different subsets could be compared, and then pooled and summarised in different ways.
  • We have not yet tested the per-marker allele limit of compute_posterior(). Very high marker cardinalities could lead to very small allele frequencies and underflow problems; otherwise we do not anticipate any problems.

Population-level allele frequencies

compute_posterior() requires population-level alelle frequencies. To avoid bias due to within-host selection of recrudescent parasites, we recommended using only enrolment episodes to estimate population-level alelle frequencies. That said, if most recurrences are either reinfections or relapses, both of which are draws from the mosquito population (albeit a time-lagged draw in the case of a relapse), in the absence of systematic within-patient selection (as might occur when break-through infections encounter lingering drug pressure), estimates based on all episodes should be unbiased and more precise than those based on enrolment episodes only.

Other points:

Unfortunately, the Pv3Rs model does not exploit read count data at present. However, read count data could be used to compute population-level allele frequencies, assuming they are not biased by experimental artifacts.

Installation

# Install or update latest stable version of devtools from CRAN
# I highly recommend doing this if you've recently updated R and RStudio to versions 4.3.0 and 2023.3.1.446, respectively; 
# otherwise, you might encounter problems rendering documentation
install.packages("devtools")

# Install paneljudge from GitHub 
# I highly recommend doing this in RStudio as RStudio installs pandoc needed to build vignettes
# If you're working in R outside of RStudio you might need to install pandoc and check its path; 
# otherwise set build_vignettes = FALSE
devtools::install_github("aimeertaylor/Pv3Rs", build_vignettes = TRUE)

# Load and attach the package
library(Pv3Rs)

# View documentation and examples for main function
?compute_posterior

# Load the demo vignette
vignette("demo", package = "Pv3Rs")

# Lists available functions, as well as example data sets and their documentation [check]
help(package = "Pv3Rs")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages