Skip to content
Permalink
Browse files
Merge pull request #4 from bjoelle/master
Added section on sampling fossil ages with fully extinct clades
  • Loading branch information
tgvaughan committed Mar 16, 2020
2 parents 227c734 + 9858113 commit ce37fd46d0ae6cfb6222be1fc0274b5f2f2ec799
Show file tree
Hide file tree
Showing 2 changed files with 160 additions and 0 deletions.
108 main.tex
@@ -1586,6 +1586,114 @@ \section{More with XML -- changing the FBD parameterization}

Note that we have changed the priors to Exponential(1.0) for the death rate and Exponential(0.5) for the sampling rate.

\bigskip
\section{More with XML -- fully extinct clades}

We have seen in this tutorial how to co-estimate the age of the fossils with the rest of the MCMC. This can also be done for fully extinct clades, i.e. clades with no extant samples, however it is more tricky. The reason for that is that BEAST2 will always consider the youngest sample to be at age 0. This is not an issue if fossil ages are fixed, as you just need to record the age of the youngest sample as offset. It becomes a problem when we want to estimate fossil ages, as the offset may change from sample to sample, and the youngest fossil may not even be the same throughout the chain.

To fix this issue, the SA package contains a "TreeWOffset" object, which allows a tree to be recorded along with its offset. This object is not accessible through the BEAUti interface, so using it requires editing the XML file.

The first step is to locate the part of the XML where the FBD process is defined, shown in Box 7.

\begin{center}
{\tt \scriptsize{\begin{minipage}{6.5in}
\begin{lstlisting}[language=XML]
<distribution id="FBD.t:bearsTree" spec="beast.evolution.speciation.SABirthDeathModel" conditionOnRhoSampling="true" diversificationRate="@birthRateFBD.t:bearsTree" origin="@originFBD.t:bearsTree" samplingProportion="@samplingRateFBD.t:bearsTree" tree="@Tree.t:bearsTree" turnover="@deathRateFBD.t:bearsTree">
<parameter id="rFBD.t:bearsTree" lower="0.0" name="removalProbability" upper="1.0">0.0</parameter>
<parameter id="rhoFBD.t:bearsTree" estimate="false" lower="0.0" name="rho" upper="1.0">1.0</parameter>
</distribution>
\end{lstlisting}
\end{minipage}}}\\\vspace{4mm}
Box 7: BEAST 2 XML specification of the FBD process (old).\\
\end{center}

We need to add our treeWOffset object to the distribution, and connect it to the bears tree object. The "offset" parameter should be set to the starting age of the youngest fossil. Since this is an example and the bears dataset contains extant species, we will set the offset to $0$. The new FBD process specification is shown in Box 8. Note that we have not removed the "tree="@Tree.t:bearsTree" parameter from the FBD process, as this input is required.

\begin{center}
{\tt \scriptsize{\begin{minipage}{6.5in}
\begin{lstlisting}[language=XML]
<distribution id="FBD.t:bearsTree" spec="beast.evolution.speciation.SABirthDeathModel" conditionOnRhoSampling="true" diversificationRate="@birthRateFBD.t:bearsTree" origin="@originFBD.t:bearsTree" samplingProportion="@samplingRateFBD.t:bearsTree" tree="@Tree.t:bearsTree" turnover="@deathRateFBD.t:bearsTree">
<parameter id="rFBD.t:bearsTree" lower="0.0" name="removalProbability" upper="1.0">0.0</parameter>
<parameter id="rhoFBD.t:bearsTree" estimate="false" lower="0.0" name="rho" upper="1.0">1.0</parameter>
<treeWOffset id="treeWOffset:bearsTree" spec="beast.evolution.tree.TreeWOffset" tree="@Tree.t:bearsTree" offset="0.0"/>
</distribution>
\end{lstlisting}
\end{minipage}}}\\\vspace{4mm}
Box 8: BEAST 2 XML specification of the FBD process (new).\\
\end{center}

The next step is to update the "SampledNodeDateRandomWalker" operators to use our tree with offset as input. An example is shown in Boxes 9 and 10.

\begin{center}
{\tt \scriptsize{\begin{minipage}{6.5in}
\begin{lstlisting}[language=XML]
<operator id="tipDatesSampler.Arctodus_simus" spec="SampledNodeDateRandomWalker" taxonset="@Arctodus_simus" tree="@Tree.t:bearsTree" weight="1.0" windowSize="1.0"/>
\end{lstlisting}
\end{minipage}}}\\\vspace{4mm}
Box 9: BEAST 2 XML specification of a SampledNodeDateRandomWalker operator (old).\\
\end{center}

\begin{center}
{\tt \scriptsize{\begin{minipage}{6.5in}
\begin{lstlisting}[language=XML]
<operator id="tipDatesSampler.Arctodus_simus" spec="SampledNodeDateRandomWalker" taxonset="@Arctodus_simus" tree="@Tree.t:bearsTree" treeWOffset="@treeWOffset:bearsTree" weight="1.0" windowSize="1.0"/>
\end{lstlisting}
\end{minipage}}}\\\vspace{4mm}
Box 10: BEAST 2 XML specification of a SampledNodeDateRandomWalker operator (new).\\
\end{center}

Finally, we will add a logger to the trace log, shown in Box 11, in order to record the value of the offset at each sample of the chain.

\begin{center}
{\tt \scriptsize{\begin{minipage}{6.5in}
\begin{lstlisting}[language=XML]
<logger id="tracelog" fileName="bearsDivtime_FBD.1.log" logEvery="1000" model="@posterior" sanitiseHeaders="true" sort="smart">
<log idref="posterior"/>
<log idref="likelihood"/>
<log idref="prior"/>
<log idref="treeLikelihood.bears_irbp_fossils"/>
<log idref="treeLikelihood.bears_cytb_fossils"/>
<log id="TreeHeight.t:bearsTree" spec="beast.evolution.tree.TreeHeightLogger" tree="@Tree.t:bearsTree"/>
<log idref="mutationRate.s:bears_cytb_fossils"/>
[...]
<log idref="Zaragocyon_daamsi.prior"/>
<log idref="freqParameter.s:bears_irbp_fossils"/>
<log idref="freqParameter.s:bears_cytb_fossils"/>
</logger>
\end{lstlisting}
\end{minipage}}}\\\vspace{4mm}
Box 11: BEAST 2 XML specification of the trace log (old).\\
\end{center}

The new trace log is shown in Box 12.

\begin{center}
{\tt \scriptsize{\begin{minipage}{6.5in}
\begin{lstlisting}[language=XML]
<logger id="tracelog" fileName="bearsDivtime_FBD.1.log" logEvery="1000" model="@posterior" sanitiseHeaders="true" sort="smart">
<log idref="posterior"/>
<log idref="likelihood"/>
<log idref="prior"/>
<log idref="treeLikelihood.bears_irbp_fossils"/>
<log idref="treeLikelihood.bears_cytb_fossils"/>
<log id="TreeHeight.t:bearsTree" spec="beast.evolution.tree.TreeHeightLogger" tree="@Tree.t:bearsTree"/>
<log idref="mutationRate.s:bears_cytb_fossils"/>
[...]
<log idref="Zaragocyon_daamsi.prior"/>
<log idref="freqParameter.s:bears_irbp_fossils"/>
<log idref="freqParameter.s:bears_cytb_fossils"/>
<log id="offset" spec="beast.evolution.tree.OffsetLogger" treeWOffset="@treeWOffset:bearsTree"/>
</logger>
\end{lstlisting}
\end{minipage}}}\\\vspace{4mm}
Box 12: BEAST 2 XML specification of the trace log (new).\\
\end{center}

One last thing to note is that trees obtained using this procedure can not be summarized directly in TreeAnnotator, as they do not all have the same offset. One way to work around this problem is to add a "dummy" extant tip to each tree, thus placing all the other tips at the correct age. The trees with that additional tip can then be summarized normally using TreeAnnotator. Finally, the dummy tip can be removed to obtain the correct MCC tree.

We provide an \cl{R} script named \cl{treesWoffset.R} to help with this process. This script requires the \cl{treeio} package to be installed. The function \cl{transform.woffset.root} takes as input the .trees and .log files produced by a BEAST analysis (parameters \cl{treesfile} and \cl{logfile}, respectively), adds a dummy tip to all trees, and saves the resulting trees to the file given by \cl{outfile}.
The function \cl{convert.MCC.tree} can then be used to remove the dummy tip from the summary tree (contained in the file given by \cl{treefile}) and save the final tree to the file given by \cl{outfile}.

\bigskip
\section{Useful Links}

@@ -0,0 +1,52 @@
transform.woffset.root = function(treesfile, logfile, outfile) {
trees = ape::read.nexus(treesfile)
log = read.table(logfile ,header = T)
log = log$offset
print(median(log))
for(i in 1:length(trees)) trees[[i]]$offset = log[i]

presenttrees = lapply(trees, function(t) {
ntips = length(t$tip.label)
totn = ntips + t$Nnode

times = ape::node.depth.edgelength(t)
root_time = max(times) + t$offset

root = ntips+2
t$edge[which(t$edge > ntips)] = t$edge[which(t$edge > ntips)]+1
t$edge[which(t$edge >= root)] = t$edge[which(t$edge >= root)]+1

t$edge = rbind(c(root, root+1), c(root, ntips+1),t$edge)
t$edge.length = c(0.1 , root_time, t$edge.length)
t$tip.label = c(t$tip.label, "dummy")
t$Nnode = t$Nnode + 1

return(t)
})

ape::write.nexus(presenttrees, file= outfile)
}

convert.MCC.tree = function(treefile, outfile) {
tmp = treeio::read.beast(treefile)

tip = which(tmp@phylo$tip.label == "dummy")
tipn = which(tmp@data$node == as.character(tip))
node = tmp@phylo$edge[which(tmp@phylo$edge[,2] == tip),1]
noden = which(tmp@data$node == as.character(node))

offset = min(tmp@data$height_median[which(as.numeric(tmp@data$node) <= length(tmp@phylo$tip.label) &
as.numeric(tmp@data$node) != tip)])
print(offset)

tmp@phylo = ape::drop.tip(tmp@phylo, tip)
tmp@data = tmp@data[-c(tipn,noden),]
tmp@data$node = as.numeric(tmp@data$node)
tmp@data$node[(tmp@data$node) > tip] = tmp@data$node[(tmp@data$node) > tip] - 1
tmp@data$node[(tmp@data$node) > node] = tmp@data$node[(tmp@data$node) > node] - 1
tmp@data$node = as.character(tmp@data$node)

tmp@data$height_0.95_HPD = lapply(tmp@data$height_0.95_HPD, function(x) x-offset)
treeio::write.beast(tmp,outfile)
system(paste0("gsed -i 's%* UNTITLED%TREE1%' ", outfile))
}

0 comments on commit ce37fd4

Please sign in to comment.