Skip to content

Commit

Permalink
Merge pull request #4 from brpetrucci/dev_traits
Browse files Browse the repository at this point in the history
Merge dev_traits with development
  • Loading branch information
brpetrucci committed Jul 6, 2023
2 parents 9c487ec + 1056e5b commit e2c2851
Show file tree
Hide file tree
Showing 21 changed files with 3,734 additions and 195 deletions.
17 changes: 0 additions & 17 deletions CITATION.cff

This file was deleted.

3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ S3method(print,sim)
S3method(summary,sim)
S3method(tail,sim)
export(bd.sim)
export(bd.sim.traits)
export(binner)
export(draw.sim)
export(find.lineages)
Expand All @@ -15,7 +16,9 @@ export(make.rate)
export(phylo.to.sim)
export(rexp.var)
export(sample.clade)
export(sample.clade.traits)
export(sim.counts)
export(traits.summary)
export(var.rate.div)
import(stats)
importFrom(grDevices,col2rgb)
Expand Down
224 changes: 131 additions & 93 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,75 @@

<!-- NEWS.md is generated from NEWS.Rmd. Please edit that file -->

# paleobuddy 1.0.0.9000
# paleobuddy 1.1.0.0

## Changes to vignettes

- `overview`: the Per Capita method estimation was changed to have a
more accurate estimate.
- `overview`: a complex example was added to the end to respond to
comments from a reviewer in the manuscript.
- `overview`: the Per Capita method estimation was changed to have a
more accurate estimate.
- `overview`: a complex example was added to the end to respond to
comments from a reviewer in the manuscript.

## Simulating to a number of extant species

- `bd.sim` now allows for simulation up to a number of species as well
as the original implementation simulating up to a certain time. To
choose which realization of a certain number of species we choose, we
use the method proposed by Stadler 2011 (see `bd.sim` documentation
for full reference), whereby we simulate to a much larger number of
species, then go back to check which periods had the desired number,
and choose one through weighted sampling using the amount of time
spent in each as the weights. Finally, we sample a uniform
distribution to decide the length of the final period between when the
species number was achieve and the end of the simulation.

## Trait-dependent dynamics

Added functions to simulate trait evolution and trait-dependent birth
death models, in particular State Speciation and Extinction (SSE)
models.

- `rexp.traits` is a particular case of `rexp.var` when the rate
varies with a discrete trait. Implemented to make the search for a
waiting time in this simpler case (i.e. when the rate is a step
function) more efficient.
- `bd.sim.musse` simulates species diversification following a MuSSE
model, where traits evolve from an Mk model and change speciation
and/or extinction in a discrete fashion. Allows for the simulation
of multiple traits, though currently the rates can only depend on
one of them.
death models, in particular State Speciation and Extinction (SSE) models
BiSSE, MuSSE, and QuaSSE. See documentation for full reference.

- `rexp.musse` is a particular case of `rexp.var` when the rate varies
with a discrete trait. Implemented to make the search for a waiting
time in this simpler case (i.e. when the rate is a step function) more
efficient.
- `bd.sim.traits` simulates species diversification following a MuSSE
model, where traits evolve from an Mk model and change speciation
and/or extinction in a discrete fashion. Allows for the simulation of
multiple traits, though currently the rates can only depend on one of
them (each rate can depend on a different trait, though). Can simulate
HiSSE as well, by setting the `nHidden` parameter, representing the
number of hidden states, to something higher than 1. Can set separate
number of states (observed or hidden), initial trait values, and
transition matrices for each trait. See the now expanded `?bd.sim` for
details. In the future, other trait-dependent diversification models
will be implemented, including relaxing the assumption of discrete
traits.
- `sample.clade.traits` simulates state-dependent fossil sampling,
following a similar algorithm as `bd.sim.traits`. It allows for a
hidden state model as well, but note that trait information (usually
coming from `bd.sim.traits`) needs to include all states as observed
for that (see examples in `?sample.clade.traits` and vignettes for
details).

## Adding sampled ancestors to a phylogenetic tree

- `make.phylo` now allows for an optional `fossils` input representing a
fossil record, and will then add these fossils to the tree as sampled
ancestors. These SAs are added as 0-length branches if the `saFormat`
input is set to `"branch"`, or as degree-2 nodes if it is set to
`"node"`. The `returnTrueExt` input optionally drops the tip
representing the true extinction time of a species, and make the last
sampled fossil of that species the fossil tip, if set to `FALSE`. Note
that this last functionality required a limited version of the
`drop.tip` function of the `APE` package to be copied. See
`?make.phylo` for reference and credits.

## Simple fixes

- `sample.clade.R`: added a small bit on help page to explain the
complication of using `adFun` with extant species.
- `make.phylo.R`: corrected bug in node labels.
- `sample.clade.R`: added a small bit on help page to explain the
complication of using `adFun` with extant species.
- `make.phylo.R`: corrected bug in node labels.

# paleobuddy 1.0.0

Expand All @@ -41,67 +80,66 @@ will be filled as new features and fixes are implemented.

## Main functions

- `rexp.var` generalizes `rexp` to take any function of time as a
rate. Also allows for a `shape` parameter, in which case it
similarly generalizes `rweibull`.
- `bd.sim` simulates species diversification with high flexibility in
allowed speciation and extinction rate scenarios. Produces a `sim`
object.
- `sample.clade` simulates fossil sampling. Similar flexibility to
`bd.sim`, though even more so in the case of age-dependent rates.
- `make.phylo` creates a bifurcating phylogenetic tree as a`phylo`
object (see [APE](https://CRAN.R-project.org/package=ape)) from a
`sim` object. Can take fossils to be added as length `0` branches.
- `draw.sim` plots a `sim` by drawing species durations and
relationships, and optionally adding fossils as time points or
ranges.
- `rexp.var` generalizes `rexp` to take any function of time as a rate.
Also allows for a `shape` parameter, in which case it similarly
generalizes `rweibull`.
- `bd.sim` simulates species diversification with high flexibility in
allowed speciation and extinction rate scenarios. Produces a `sim`
object.
- `sample.clade` simulates fossil sampling. Similar flexibility to
`bd.sim`, though even more so in the case of age-dependent rates.
- `make.phylo` creates a bifurcating phylogenetic tree as a`phylo`
object (see [APE](https://CRAN.R-project.org/package=ape)) from a
`sim` object. Can take fossils to be added as length `0` branches.
- `draw.sim` plots a `sim` by drawing species durations and
relationships, and optionally adding fossils as time points or ranges.

## Secondary functions

- `find.lineages` creates subsets of a `sim` defined as the clades
descended from one or more species present in the simulation. Needed
to e.g. generate phylogenetic trees from simulations with more than
one starting species.
- `make.rate` creates a purely time-dependent (or constant) rate based
on optional inputs. Used internally to allow for users to define
rate scenarios easily in `bd.sim`.
- `phylo.to.sim` creates a `sim` object from a `phylo` object,
provided the user makes choices to solve ambiguities on bifurcating
phylogenetic trees.
- `var.rate.div` calculates expected diversity for a given
diversification rate and set of times. Useful for testing `bd.sim`
and planning rate scenarios.
- `find.lineages` creates subsets of a `sim` defined as the clades
descended from one or more species present in the simulation. Needed
to e.g. generate phylogenetic trees from simulations with more than
one starting species.
- `make.rate` creates a purely time-dependent (or constant) rate based
on optional inputs. Used internally to allow for users to define rate
scenarios easily in `bd.sim`.
- `phylo.to.sim` creates a `sim` object from a `phylo` object, provided
the user makes choices to solve ambiguities on bifurcating
phylogenetic trees.
- `var.rate.div` calculates expected diversity for a given
diversification rate and set of times. Useful for testing `bd.sim` and
planning rate scenarios.

## S3 classes

- `sim` a class returned by `bd.sim` and used as an input for many
functions in the package. Formally, it is a named list of vectors
recording speciation time, extinction time, status (extant or
extinct), and parent information for each lineage in the simulation.
It contains the following methods: \*\* `print` gives some quick
details about number of extant and total species, and the first few
members of each vector. \*\* `head` and `tail` return the `sim`
object containing only a given number of species from the beginning
and end of its vectors, respectively. \*\* `summary` gives
quantitative details, e.g. quantiles of durations and speciation
waiting times. \*\* `plot` plots lineages through time (LTT) plots
for births, deaths, and diversity. \*\* `sim.counts` counts numbers
of births, deaths, and diversity for some given time `t`. \*\*
`is.sim` checks the object is a valid `sim` object. Used internally
for error checking.
- `sim` a class returned by `bd.sim` and used as an input for many
functions in the package. Formally, it is a named list of vectors
recording speciation time, extinction time, status (extant or
extinct), and parent information for each lineage in the simulation.
It contains the following methods: \*\* `print` gives some quick
details about number of extant and total species, and the first few
members of each vector. \*\* `head` and `tail` return the `sim` object
containing only a given number of species from the beginning and end
of its vectors, respectively. \*\* `summary` gives quantitative
details, e.g. quantiles of durations and speciation waiting times.
\*\* `plot` plots lineages through time (LTT) plots for births,
deaths, and diversity. \*\* `sim.counts` counts numbers of births,
deaths, and diversity for some given time `t`. \*\* `is.sim` checks
the object is a valid `sim` object. Used internally for error
checking.

## Data

- `temp` temperature data during the Cenozoic. Modified from
[RPANDA](https://CRAN.R-project.org/package=RPANDA).
- `co2` CO2 data during the Jurassic. Modified from
[RPANDA](https://CRAN.R-project.org/package=RPANDA).
- `temp` temperature data during the Cenozoic. Modified from
[RPANDA](https://CRAN.R-project.org/package=RPANDA).
- `co2` CO2 data during the Jurassic. Modified from
[RPANDA](https://CRAN.R-project.org/package=RPANDA).

## Vignettes

- `overview` gives a reasonably in depth look at the main features of
the package, including examples of workflows using most available
rate scenarios, and examples of applications.
- `overview` gives a reasonably in depth look at the main features of
the package, including examples of workflows using most available rate
scenarios, and examples of applications.

## Notes

Expand All @@ -120,27 +158,27 @@ the literature.

## Known issues

- The `integrate` function occasionally fails when given functions
that vary suddenly and rashly, usually happening in the case of
environmentally-dependent rates. I have tracked this error down to
numerical problems in `integrate`, and testing seems to indicate the
error does not prevent `integrate` from finding the correct result.
As such this is not currently something I intend to fix, though if
issues are found that indicate this could be a `paleobuddy` problem,
not an `integrate` problem, that could change.

- `paleobuddy` is the first package to implement time-dependent
parameters for Weibull-distributed waiting times. Since the authors
currently are not aware of an analytical solution to important
quantities in the BD process in this case, it is challenging to test
exactly. Simulation tests indicate pretty strongly that the
algorithm works, however, with one exception—in cases where shape is
time-dependent and varies dramatically, especially when close to
`0`, `rexp.var` seems to have a hard time finding the correct
waiting time distribution. When maintained within levels generally
accepted as sensible throughout the literature—around 0.8 to 3,
say—, and even a reasonable amount outside of that range, tests
indicate the algorithm functions as it should. A `testthat` routine
will be implemented in the future to formalize these claims, and
this issue is one I plan to work on soon, especially if users report
it as more prevalent than I thought.
- The `integrate` function occasionally fails when given functions that
vary suddenly and rashly, usually happening in the case of
environmentally-dependent rates. I have tracked this error down to
numerical problems in `integrate`, and testing seems to indicate the
error does not prevent `integrate` from finding the correct result. As
such this is not currently something I intend to fix, though if issues
are found that indicate this could be a `paleobuddy` problem, not an
`integrate` problem, that could change.

- `paleobuddy` is the first package to implement time-dependent
parameters for Weibull-distributed waiting times. Since the authors
currently are not aware of an analytical solution to important
quantities in the BD process in this case, it is challenging to test
exactly. Simulation tests indicate pretty strongly that the algorithm
works, however, with one exception—in cases where shape is
time-dependent and varies dramatically, especially when close to `0`,
`rexp.var` seems to have a hard time finding the correct waiting time
distribution. When maintained within levels generally accepted as
sensible throughout the literature—around 0.8 to 3, say—, and even a
reasonable amount outside of that range, tests indicate the algorithm
functions as it should. A `testthat` routine will be implemented in
the future to formalize these claims, and this issue is one I plan to
work on soon, especially if users report it as more prevalent than I
thought.
55 changes: 47 additions & 8 deletions R/bd.sim.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@
#' origin. Any species still living after \code{tMax} is considered extant, and
#' any species that would be generated after \code{tMax} is not present in the
#' return.
#'
#' @param N Number of species at the end of the simulation. End of the
#' simulation will be set for one of the times where \code{N} species are
#' alive, chosen from all the times there were \code{N} species alive weighted
#' by how long the simulation was in that situation. Exactly one of \code{tMax}
#' and \code{N} must be non-\code{Inf}.
#'
#' @param lShape Shape parameter defining the degree of age-dependency in
#' speciation rate. This will be equal to the shape parameter in a Weibull
Expand Down Expand Up @@ -127,6 +133,11 @@
#' \code{?sim}.
#'
#' @author Bruno do Rosario Petrucci.
#'
#' @references
#'
#' Stadler T. 2011. Simulating Trees with a Fixed Number of Extant Species.
#' Systematic Biology. 60(5):676-684.
#'
#' @examples
#'
Expand Down Expand Up @@ -161,6 +172,33 @@
#' }
#'
#' ###
#' # if we want, we can simulate up to a number of species instead
#'
#' # initial number of species
#' n0 <- 1
#'
#' # maximum simulation time
#' N <- 10
#'
#' # speciation
#' lambda <- 0.11
#'
#' # extinction
#' mu <- 0.08
#'
#' # set a seed
#' set.seed(1)
#'
#' # run the simulation, making sure we have more than 1 species in the end
#' sim <- bd.sim(n0, lambda, mu, N = N)
#'
#' # we can plot the phylogeny to take a look
#' if (requireNamespace("ape", quietly = TRUE)) {
#' phy <- make.phylo(sim)
#' ape::plot.phylo(phy)
#' }
#'
#' ###
#' # now let us complicate speciation more, maybe a linear function
#'
#' # initial number of species
Expand Down Expand Up @@ -604,12 +642,13 @@
#' @rdname bd.sim
#' @export

bd.sim <- function(n0, lambda, mu, tMax,
lShape = NULL, mShape = NULL,
envL = NULL, envM = NULL,
lShifts = NULL, mShifts = NULL,
nFinal = c(0, Inf), nExtant = c(0, Inf),
trueExt = FALSE) {
bd.sim <- function(n0, lambda, mu,
tMax = Inf, N = Inf,
lShape = NULL, mShape = NULL,
envL = NULL, envM = NULL,
lShifts = NULL, mShifts = NULL,
nFinal = c(0, Inf), nExtant = c(0, Inf),
trueExt = FALSE) {

# if we have ONLY numbers for lambda and mu, it is constant
if ((is.numeric(lambda) & length(lambda) == 1) &
Expand All @@ -619,7 +658,7 @@ bd.sim <- function(n0, lambda, mu, tMax,
m <- mu

# call bd.sim.constant
return(bd.sim.constant(n0, l, m, tMax, nFinal, nExtant, trueExt))
return(bd.sim.constant(n0, l, m, tMax, N, nFinal, nExtant, trueExt))
}

# else it is not constant
Expand All @@ -631,7 +670,7 @@ bd.sim <- function(n0, lambda, mu, tMax,
m <- make.rate(mu, tMax, envM, mShifts)

# call bd.sim.general
return(bd.sim.general(n0, l, m, tMax, lShape, mShape,
return(bd.sim.general(n0, l, m, tMax, N, lShape, mShape,
nFinal, nExtant, trueExt))
}
}
Expand Down
Loading

0 comments on commit e2c2851

Please sign in to comment.