Skip to content

simADPenetrance

Thomas P Spargo edited this page Dec 16, 2022 · 4 revisions

simADPenetrance function documentation

Updated 14/11/2022

The repository is maintained by Thomas Spargo (thomas.spargo@kcl.ac.uk) - please reach out with any questions.


Description

simADPenetrance provides a function for performing simulation studies to test the degree to which age of sampling is expected to influence lifetime penetrance estimation across various circumstances. Users can supply the numeric output by checkOnsetVariability to the onsetRateDiff argument of simADPenetrance, indicating a certain degree of departure from equal onset variability when sampling across cohorts of people with and without a tested variant. Various other parameters are provided which can be used to customise this simulation.

The ggplot2 (version 3.4.0; ref 1), reshape2 (version 1.4.4; ref 2), and plyr (version 1.8.7; ref 3) packages are dependencies for simADPenetrance.

Usage examples

Several usage examples are provided in the TimeSimulation*.R scripts provided here.

Arguments

f - Numeric vector (all values must be between 0 and 1) defining the lifetime penetrance values to test. Defaults to c(0.25,0.50,0.75,1).

f_compare - Numeric vector (all values must be between 0 and 1) which can have only one element, specifying alternative variants occurring in families without the each tested f variant; each element of f_compare is assigned to subset of the simulated families not harbouring f (see: scale_novar argument). Defaults to c(0.2,0.4,0.6,0.8,1).

onsetRateDiff - Define whether differences exist in rate of onset between variant and non-variant group. To test the impact of unequal onset variability between groups, set onsetRateDiff according to the numeric returned by checkOnsetVariability. Defaults to 1, indicating equal onset variability.

amplify_f - Set number of repeats for each defined f. Simulation results will be averaged across these to reduce effects attributed to pseudo-randomisation. Defaults to 3.

g - Residual disease risk in people without either f or f_compare variant, defaults to 0.

states - Define the disease state combinations to model. Defaults to: c("fsu","fs","fu","su","au"), but any subset of these 5 options can be modelled; see the adpenetrance documentation for further details.

numfamilies_var - Number of families to simulate which harbour each variant in f.

scale_novar - Numeric to indicate scaling of families relative to the value of numfamilies_var. Generates a comparator population of families without the variant of penetrance f interest according to the number of families with the variant (indicated by numfamilies_var) multipled by the value of scale_novar. Those families generated will be assigned one of the competing variants specified in f_compare (e.g. if $\ {numfamilies\_var}\times{scale\_novar}=50,000 $, and f_compare is a vector defining 5 penetrance values for competing variants, then 10,000 families are generated with each f_compare variant).

sibstructureCustom - Specify a custom sibship population structure. See here for details.

which_sibstructure - Specify which sibship distributions (sibstructure) to test; 1=UK,2=NS,3=Custom. Defaults to c(1,2), the two in-built sibstructures. Further details about the preset options are provided in Figure S3 of the manuscript associated with this repository (4) and briefly here.

nameSibstructureCustom - If providing a custom sibstructure (see sibstructureCustom and which_sibstructure arguments), specify a string to use in naming the custom structure in the output plot. Defaults to "Custom sibstructure".

numsteps - Number of time points (ages) across which a person may become affected after the 0 time.

f_cohort_only - Logical, defaults to FALSE. Run simulation with f cohort only, generating RX for analysis with adpenetrance based on proportion of those states modelled across families where f occurs. Certain other arguments are ignored if TRUE. These are: f_compare, scale_novar.

eldestAt0 - Logical, defaults to FALSE. Is passed to the genFamily subfunction. See Details and the genFamily documentation here.

stepHazard - Logical, defaults to FALSE. Is passed to the genFamily subfunction. See Details and the genFamily documentation here.

benchmark - Logical, defaults to FALSE. Set TRUE to return benchmark estimate of minimum time required for simulation study given the current arguments set. Benchmarks are very approximate.

Details

simADPenetrance was applied in the TimeSimulation*.R scripts available here. These illustrate several scenarios across which data may be sampled. Full descriptions of these scenarios are available in the associated manuscript (4).

Two of the simADPenetrance arguments are passed to subfunctions within:

eldestAt0: If set to TRUE, adjusts the family ages such that the eldest parent is assigned age 0 at the first time of sampling, where 0 is the last time point where no family members could be affected by disease. If FALSE, the default, the youngest family member is at age 0, and thus at the first time of sampling all other family members have some probability of being affected by disease. Setting eldestAt0=TRUE is not representative of a real population since all people have parents who may have been affected at an earlier time.

stepHazard: If FALSE, disease risks at each time point are determined using the affAtAge subfunction (see here). If TRUE, disease risk at the first time of sampling is determined using the affAtAge subfunction. Thereafter, cumulative risk across each subsequent time of sampling is determined based on additional risk across the relevant ages of for onset (determined according to numsteps, onsetRateDiff, and whether the individual has the variant). Setting TRUE is not recommended because higher ('familial'/'sporadic') disease state proportions are systematically underrepresented under the 'stepwise' approach.

Output

A list of two elements:

$ggfigure

A ggplot indicating sampling time on the x-axis and divergence from true penetrance estimates on the y-axis. Plot panels stratify according to other options set. Columns are sibstructures modelled and penetrance error correction approach. Rows are disease state combinations modelled.

$results A dataframe storing data visualised in the plot Column headers for the data frame file are:

Population:The modelled population as defined in which_sibstructure

Tailoring: Whether adjusted penetrance estimates were adjusted using:

  • Poisson: the default sibhship distribution generated by default within the adpenetrance function.
  • Tailored: The an approximation of the sibship distribution for the population modelled.
  • No correction: 'Step-3' penetrance estimates before adjustment by either Poisson or Tailored.

TruePenetrance: The true penetrance, each row will have one of the values specified in the argument f

States.Modelled:One of the 5 valid disease state combinations, each row will be one of the options set in the states argument

modif: Time from first sampling, starting from 0 until the youngest member of each simulated family has passed the timepoint indicated in the numsteps argument

Estimate.difference: Difference between true and estimated penetrance, averaged across number of repetitions for each value of f (as specified in amplify_f argument).


References

  1. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
  2. Wickham, H. (2007) Reshaping Data with the reshape Package. J Stat Softw.; 21(12):1-20. doi:10.18637/jss.v021.i12
  3. Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. J Stat Softw. 40(1):1-29. doi: 10.18637/jss.v040.i01
  4. Spargo, T. P., Opie-Martin, S., Bowles, H., Lewis, C. M., Iacoangeli, A., & Al-Chalabi, A. (2022). Calculating variant penetrance from family history of disease and average family size in population-scale data. Genome Medicine 14, 141. doi: 10.1186/s13073-022-01142-7

Clone this wiki locally