Skip to content

TheoPannetier/DDvTDtools

Repository files navigation

DDvTDtools

Branch Travis Codecov
master Build Status codecov.io

An R package to compare diversity-dependent (DD) and time-dependent (TD) models of diversification.

How to use DDvTDtools

DDvTDtools is an R package accompanying the manuscript:

Pannetier, T., Martinez, C., Bunnefeld, L., and Etienne, R.S. Branching patterns in phylogenies cannot distinguish diversity-dependent diversification from time-dependent diversification. Submitted

The package is a collection of functions and objects that were used to generate the data and figures presented in the article.

Note that the simulations and maximum likelihood optimisations call upon the integration of the DD master system introduced in Etienne et al. (2012), and this can take an awful lot of time. This is specially true for parameter settings with a long age (larger trees) and with extinction, and the computation is heavier for the time-dependent model. We performed these over 1000 trees each time, so if you only want to try the neat functions of the package (thanks for the interest!) you could run the functions introduced below for just a handful of trees.

Package installation

If you haven't done so already, install the DDvTDtools package from Github and the latest CRAN release of DDD.

install.packages("DDD")
devtools::install_github("TheoPannetier/DDvTDtools")

Set up directory structure

The package works from a specific relative path, and most functions expect to access data located in pre-defined directories. For example, DDvTDtools::read_sim() expects to find the simulation output in DDvTD/data/sim. The command below will set up the required directory structure in your working directory.

DDvTDtools::set_DDvTD_dir_struct()

Step 1 - Simulate phylogenetic trees

Diversity-dependent and time-dependent trees can be simulated with DDvTDtools::run_sim(), which is simply a wrapper around DDD::dd_sim() and DDD::td_sim() that standardizes the input and formats the output.

DDvTDtools::run_sim(
  sim = "DD",
  para = 1211,
  nb_trees = 1000
)

The first argument, sim refers to the name of the simulating model, either "DD" or "TD". The second argument, para, is a four-digit number that codes the values of the four parameters (crown age, baseline speciation rate, extinction rate and carrying capacity) used in the study. 1211 for example means

  • crown age = 5 myr
  • speciation rate = 0.8
  • extinction rate = 0,
  • carrying capacity = 40

(This is the fastest setting to simulate and run maximum likelihood optimisation on.)

Calling arg_para() will print the different values used through the study.

DDvTDtools::arg_para()
#> [1] 1211 1241 2211 2241 3211 3241 4211 4241

And what values they code can be found in the documentation

?DDvTDtools::arg_para()

It is also possible to directly translate the code into proper values:

DDvTDtools::para_to_pars(1211)
#> crown_age   lambda0       mu0         K 
#>       5.0       0.8       0.0      40.0

Simulated trees are stored in DDvTD/data/sim, and can be loaded with read_sim()

trees <- DDvTDtools::read_sim(sim = "DD", para = 1211)
trees[[1]][[1]]
#> 
#> Phylogenetic tree with 32 tips and 31 internal nodes.
#> 
#> Tip labels:
#>  t1, t29, t28, t14, t5, t12, ...
#> Node labels:
#>  I31, I10, I3, I2, I1, I9, ...
#> 
#> Rooted; includes branch lengths.

Step 2 - Fit the models on simulated trees

Once phylogenetic trees have been simulated, each model should be fitted to each type of trees. This is done by DDD::dd_ML() and DDD::bd_ML(), but again, we have used a wrapper to standardise the input and output.

DDvTDtools::run_optim(
  sim = "DD", 
  para = 1211, 
  optim = "DD"
)
DDvTDtools::run_optim(
  sim = "DD", 
  para = 1211, 
  optim = "TD"
)

Argument optim is equivalent to argument sim introduced above. run_optim() will fetch the simulated trees from DDvTD/data/sim and save its output in DDvTD/data/optim. You can load these tables with

df <- DDvTDtools::read_optim_results(
  sim = "DD",
  para = 1211,
  optim = "TD",
  init_k = "true_k"
)
head(df)
#>   sim ntips crown_age true_lambda0 true_mu0 true_K mc optim init_lambda0
#> 1  DD    32         5          0.8        0     40  1    TD          0.8
#> 2  DD     2         5          0.8        0     40  2    TD          0.8
#> 3  DD    25         5          0.8        0     40  3    TD          0.8
#> 4  DD    24         5          0.8        0     40  4    TD          0.8
#> 5  DD    25         5          0.8        0     40  5    TD          0.8
#> 6  DD    13         5          0.8        0     40  6    TD          0.8
#>   init_mu0 init_K    loglik       AIC lambda0_ML       mu0_ML       K_ML
#> 1        0     40 -56.88745 119.77490  4.4302438 3.289253e-01  28.698403
#> 2        0     40   0.00000   6.00000  3.3358413 1.282290e-19   1.863737
#> 3        0     40 -41.37938  88.75875  1.2178454 2.230156e-01  29.258474
#> 4        0     40 -39.21540  84.43081  0.7200940 5.308302e-10  53.757649
#> 5        0     40 -40.47115  86.94230  1.0693765 1.881130e-09  55.000372
#> 6        0     40 -17.13625  40.27250  0.9201438 8.264041e-01 899.988625
#>   hasConverged numCycles methode optimmethod   jobID
#> 1         TRUE         1   ode45     subplex 5018656
#> 2         TRUE       Inf   ode45     subplex 4929384
#> 3         TRUE       Inf   ode45     subplex 4929384
#> 4         TRUE       Inf   ode45     subplex 4929384
#> 5         TRUE       Inf   ode45     subplex 4929384
#> 6         TRUE         5   ode45     subplex 5073785

The structure of a results data frame with a description of all the variables can be found in the documentation of DDvTDtools::results_optim_struct(). init_k denotes which value of parameter K was used to initialise the maximum likelihood optimisation. The default value, true_k, means the true value, i.e. 40. The alternative is from_n, where K was instead set to the number of tips in the tree. This is for some settings which produced large trees, for which the likelihood optimisation proved particularly tedious. The values of init_k used for each setting can be called with:

DDvTDtools::get_init_k()
#>     1211     1241     2211     2241     3211     3241     4211     4241 
#> "true_k" "true_k" "true_k" "true_k" "from_n" "true_k" "from_n" "from_n"

Optimisation using init_k = from_n is run with another function, which calls run_optim()

DDvTDtools::run_optim_from_n(
  sim = "DD", 
  para = 1211, 
  optim = "TD"
)

Step 3 - Plot the results

Fig. 1 was produced from randomly sampled numbers, and I am not reporting it here.

The plots below draw results from both models fit to both type of trees, and so require that the four combinations of sim and optim for a single value of para are present in DDvTD\data\optim.

Fig. 2 - Average lineage-through-time plots

DDvTDtools::plot_ltt_nested(para = 1211)
#> Registered S3 method overwritten by 'geiger':
#>   method            from
#>   unique.multiPhylo ape

Fig. 3 - log-likelihood ratio distributions

DDvTDtools::plot_lr(para = 1211)

About

An R package to compare diversity-dependent (DD) and time-dependent (TD) models of diversification.

Resources

License

Stars

Watchers

Forks

Packages

No packages published