Branch | Travis | Codecov |
---|---|---|
master |
An R package to compare diversity-dependent (DD) and time-dependent (TD) models of diversification.
DDvTDtools
is an R package accompanying the manuscript:
Pannetier, T., Martinez, C., Bunnefeld, L., and Etienne, R.S. Branching patterns in phylogenies cannot distinguish diversity-dependent diversification from time-dependent diversification. Submitted
The package is a collection of functions and objects that were used to generate the data and figures presented in the article.
Note that the simulations and maximum likelihood optimisations call upon the integration of the DD master system introduced in Etienne et al. (2012), and this can take an awful lot of time. This is specially true for parameter settings with a long age (larger trees) and with extinction, and the computation is heavier for the time-dependent model. We performed these over 1000 trees each time, so if you only want to try the neat functions of the package (thanks for the interest!) you could run the functions introduced below for just a handful of trees.
If you haven't done so already, install the DDvTDtools
package from
Github and the latest CRAN release of DDD
.
install.packages("DDD")
devtools::install_github("TheoPannetier/DDvTDtools")
The package works from a specific relative path, and most functions
expect to access data located in pre-defined directories. For example,
DDvTDtools::read_sim()
expects to find the simulation output in
DDvTD/data/sim
. The command below will set up the required directory
structure in your working directory.
DDvTDtools::set_DDvTD_dir_struct()
Diversity-dependent and time-dependent trees can be simulated with
DDvTDtools::run_sim()
, which is simply a wrapper around
DDD::dd_sim()
and DDD::td_sim()
that standardizes the input and
formats the output.
DDvTDtools::run_sim(
sim = "DD",
para = 1211,
nb_trees = 1000
)
The first argument, sim
refers to the name of the simulating model,
either "DD"
or "TD"
. The second argument, para
, is a four-digit
number that codes the values of the four parameters (crown age, baseline
speciation rate, extinction rate and carrying capacity) used in the
study. 1211
for example means
- crown age = 5 myr
- speciation rate = 0.8
- extinction rate = 0,
- carrying capacity = 40
(This is the fastest setting to simulate and run maximum likelihood optimisation on.)
Calling arg_para()
will print the different values used through the
study.
DDvTDtools::arg_para()
#> [1] 1211 1241 2211 2241 3211 3241 4211 4241
And what values they code can be found in the documentation
?DDvTDtools::arg_para()
It is also possible to directly translate the code into proper values:
DDvTDtools::para_to_pars(1211)
#> crown_age lambda0 mu0 K
#> 5.0 0.8 0.0 40.0
Simulated trees are stored in DDvTD/data/sim
, and can be loaded with
read_sim()
trees <- DDvTDtools::read_sim(sim = "DD", para = 1211)
trees[[1]][[1]]
#>
#> Phylogenetic tree with 32 tips and 31 internal nodes.
#>
#> Tip labels:
#> t1, t29, t28, t14, t5, t12, ...
#> Node labels:
#> I31, I10, I3, I2, I1, I9, ...
#>
#> Rooted; includes branch lengths.
Once phylogenetic trees have been simulated, each model should be fitted
to each type of trees. This is done by DDD::dd_ML()
and
DDD::bd_ML()
, but again, we have used a wrapper to standardise the
input and output.
DDvTDtools::run_optim(
sim = "DD",
para = 1211,
optim = "DD"
)
DDvTDtools::run_optim(
sim = "DD",
para = 1211,
optim = "TD"
)
Argument optim
is equivalent to argument sim
introduced above.
run_optim()
will fetch the simulated trees from DDvTD/data/sim
and
save its output in DDvTD/data/optim
. You can load these tables with
df <- DDvTDtools::read_optim_results(
sim = "DD",
para = 1211,
optim = "TD",
init_k = "true_k"
)
head(df)
#> sim ntips crown_age true_lambda0 true_mu0 true_K mc optim init_lambda0
#> 1 DD 32 5 0.8 0 40 1 TD 0.8
#> 2 DD 2 5 0.8 0 40 2 TD 0.8
#> 3 DD 25 5 0.8 0 40 3 TD 0.8
#> 4 DD 24 5 0.8 0 40 4 TD 0.8
#> 5 DD 25 5 0.8 0 40 5 TD 0.8
#> 6 DD 13 5 0.8 0 40 6 TD 0.8
#> init_mu0 init_K loglik AIC lambda0_ML mu0_ML K_ML
#> 1 0 40 -56.88745 119.77490 4.4302438 3.289253e-01 28.698403
#> 2 0 40 0.00000 6.00000 3.3358413 1.282290e-19 1.863737
#> 3 0 40 -41.37938 88.75875 1.2178454 2.230156e-01 29.258474
#> 4 0 40 -39.21540 84.43081 0.7200940 5.308302e-10 53.757649
#> 5 0 40 -40.47115 86.94230 1.0693765 1.881130e-09 55.000372
#> 6 0 40 -17.13625 40.27250 0.9201438 8.264041e-01 899.988625
#> hasConverged numCycles methode optimmethod jobID
#> 1 TRUE 1 ode45 subplex 5018656
#> 2 TRUE Inf ode45 subplex 4929384
#> 3 TRUE Inf ode45 subplex 4929384
#> 4 TRUE Inf ode45 subplex 4929384
#> 5 TRUE Inf ode45 subplex 4929384
#> 6 TRUE 5 ode45 subplex 5073785
The structure of a results data frame with a description of all the
variables can be found in the documentation of
DDvTDtools::results_optim_struct()
. init_k
denotes which value of
parameter K was used to initialise the maximum likelihood optimisation.
The default value, true_k
, means the true value, i.e. 40. The
alternative is from_n
, where K was instead set to the number of tips
in the tree. This is for some settings which produced large trees, for
which the likelihood optimisation proved particularly tedious. The
values of init_k
used for each setting can be called with:
DDvTDtools::get_init_k()
#> 1211 1241 2211 2241 3211 3241 4211 4241
#> "true_k" "true_k" "true_k" "true_k" "from_n" "true_k" "from_n" "from_n"
Optimisation using init_k = from_n
is run with another function, which
calls run_optim()
DDvTDtools::run_optim_from_n(
sim = "DD",
para = 1211,
optim = "TD"
)
Fig. 1 was produced from randomly sampled numbers, and I am not reporting it here.
The plots below draw results from both models fit to both type of trees,
and so require that the four combinations of sim
and optim
for a
single value of para
are present in DDvTD\data\optim
.
Fig. 2 - Average lineage-through-time plots
DDvTDtools::plot_ltt_nested(para = 1211)
#> Registered S3 method overwritten by 'geiger':
#> method from
#> unique.multiPhylo ape
Fig. 3 - log-likelihood ratio distributions
DDvTDtools::plot_lr(para = 1211)