-
Notifications
You must be signed in to change notification settings - Fork 1
/
treevo_ABC_vignette.Rmd
265 lines (178 loc) · 6.68 KB
/
treevo_ABC_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
---
title: "Analyzing Trait Evolution with Approximate Bayesian Computation in TreEvo"
author: "David W. Bapst"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Analyzing Trait Evolution with Approximate Bayesian Computation in TreEvo}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r,echo=FALSE}
# Control Box
set.seed(444)
multicore<-TRUE
coreLimit<-6
generation.time<-10000
```
For this vignette, we will use the tree of ammonite genera, and the continuous character data for those ammonites, taken from Raia et al. (2015, American Naturalist). This data is taken from their supplemental appendix, available at the American Naturalist website, and is available as an example dataset in R package `paleotree`.
```{r}
library(paleotree)
data(RaiaCopesRule)
```
```{r,echo=FALSE}
load("treevo_vignette_workspace_presaved.Rdata")
```
```{r}
ls()
```
The trees for the three groups examined in this paper all appear to be trees dated to the last appearance times (as opposed to the first appearance time) *and* specifically the end-boundary of the interval containing the last appearance.
First, let's plot the tree:
```{r}
plot(ladderize(ammoniteTreeRaia), show.tip.label=FALSE)
axisPhylo()
```
This tree has polytomies - we will have to deal with that. Let's randomly resolve tree using `multi2di` from ape, and then use function `addTermBranchLength` from package `paleotree`
```{r}
tree<-multi2di(ammoniteTreeRaia)
# let's apply ATBL
tree<-addTermBranchLength(tree,0.01)
# let's try plotting it again
plot(ladderize(tree), show.tip.label=FALSE)
axisPhylo()
```
Because we aren't adjust the internal edges at all, `multi2di` has just inserted a number of zero-length internal edges between internodes within the tree. This means that alternative resolutions of the tree (as multi2di is stochastic) won't create meaningfully different variance-covariance matrices, so we likely don't need to worry about resolving multiple trees in this case (because we're taking the trees used by Raia et al. at total face-value, which perhaps is the truly unsupported assumption).
Let's plot a traitgram of the data.
maybe use phytools rather than mine?
```{r, echo=FALSE, fig.show='hold'}
plotTraitgram(tree=tree, trait=sutureComplexity,
conf.int=FALSE, main="Ammonite Suture Complexity")
plotTraitgram(tree=tree, trait=shellSize,
conf.int=FALSE, main="Ammonite Shell Diameter")
```
how small is the smallest branch length, even after extending terminal branches?
```{r}
brlen<-tree$edge.length
min(brlen)
# zero-length branches!
```
```{r}
# what about not zero-length branches?
min(brlen[brlen!=0])
# in years
min(brlen[brlen!=0])*10^6
# let's look at the distribution of these edge lengths, ignoring zero-length
hist(brlen[brlen!=0], main="",xlab="edge length")
```
```{r}
# total number of zero-length branches
sum(brlen==0)
# proportion of edges that are zero-length
sum(brlen==0)/length(brlen)
```
Yeesh - nearly a quarter of branches on this tree are zero-length branches. Not looking so good.
```{r}
# number of zero-length internal branches
sum(brlen==0 & tree$edge[,2]>Ntip(tree))
# number of zero-length terminal branches
# (shouldn't be any because of our application of ATBL)
sum(brlen==0 & tree$edge[,2]<=Ntip(tree))
```
Let's use TreEvo!
```{r}
library(TreEvo)
packageVersion("TreEvo")
```
```{r}
# character data for doRun must be in matrix form
# with rows labeled with taxon names
charData<-matrix(sutureComplexity,ncol=1)
rownames(charData)<-names(sutureComplexity)
```
Let's analyze it with ordinary BM
```{r eval=FALSE}
resultsBM<-doRun_prc(
phy = tree,
traits = charData,
intrinsicFn=brownianIntrinsic,
extrinsicFn=nullExtrinsic,
startingPriorsFns="normal",
startingPriorsValues=matrix(c(mean(charData[,1]), sd(charData[,1]))),
intrinsicPriorsFns=c("exponential"),
intrinsicPriorsValues=matrix(c(10, 10), nrow=2, byrow=FALSE),
extrinsicPriorsFns=c("fixed"),
extrinsicPriorsValues=matrix(c(0, 0), nrow=2, byrow=FALSE),
generation.time=generation.time,
standardDevFactor=0.2,
plot=FALSE,
StartSims=10,
epsilonProportion=0.7,
epsilonMultiplier=0.7,
nStepsPRC=3,
numParticles=20,
jobName="typicalBMrun",
stopRule=FALSE,
multicore=multicore,
coreLimit=coreLimit,
verboseParticles=TRUE
)
```
Let's analyze it with ordinary BM with a bound
```{r eval=FALSE}
resultsBound<-doRun_prc(
phy = tree,
traits = charData,
intrinsicFn=boundaryMinIntrinsic,
extrinsicFn=nullExtrinsic,
startingPriorsFns="normal",
startingPriorsValues=matrix(c(mean(charData[,1]), sd(charData[,1]))),
intrinsicPriorsFns=c("exponential","normal"),
intrinsicPriorsValues=matrix(c(10, 10, -10, 1), nrow=2, byrow=FALSE),
extrinsicPriorsFns=c("fixed"),
extrinsicPriorsValues=matrix(c(0, 0), nrow=2, byrow=FALSE),
generation.time=generation.time,
standardDevFactor=0.2,
plot=FALSE,
StartSims=10,
epsilonProportion=0.7,
epsilonMultiplier=0.7,
nStepsPRC=3,
numParticles=20,
jobName="BMwithBoundRun",
stopRule=FALSE,
multicore=multicore,
coreLimit=coreLimit,
verboseParticles=FALSE
)
```
The function `summarizePosterior` provides us with the mean, standard deviation, and highest posterior density (HPD) for each free parameter in the posterior sample, for each individual run. We can specify the probability density we would like to receive the bounds for using the `alpha` argument.
For example, for each Brownian Motion run:
```{r eval=FALSE}
summarizePosterior(resultsBM[[1]]$particleDataFrame, alpha = 0.8)
summarizePosterior(resultsBM[[2]]$particleDataFrame, alpha = 0.8)
summarizePosterior(resultsBM[[3]]$particleDataFrame, alpha = 0.8)
```
Or just to look at the first run of the Bounded BM run:
```{r eval=FALSE}
summarizePosterior(resultsBound[[1]]$particleDataFrame, alpha = 0.8)
```
```{r eval=FALSE}
# This function calculates Effective Sample Size (ESS) on results.
# Performs the best when results are from multiple runs.
# ESS for single runs
pairwiseESS(resultsBM)
pairwiseESS(resultsBound)
# ESS for parameters shared between BM and bound
#pairwiseESS(list(resultsBM$particleDataFrame,resultsBound$particleDataFrame))
```
```{r fig.width = 7, fig.asp=0.4, eval=FALSE}
# plotPosteriors
# for each free parameter in the posterior, a plot is made of the distribution of values estimate in the last generation
# collect the particleDataFrames into lists
resultsBMpart<-lapply(resultsBM,function(x) x$particleDataFrame)
resultsBoundpart<-lapply(resultsBound,function(x) x$particleDataFrame)
plotPosteriors(particleDataFrame=resultsBMpart,
priorsMat=resultsBM[[1]]$PriorMatrix)
plotPosteriors(particleDataFrame=resultsBoundpart,
priorsMat=resultsBound[[1]]$PriorMatrix)
```