In [1]:
library(ape)
library(phytools)
library(caper)
library(geiger)
library(OUwie)

Loading required package: maps

Loading required package: MASS

Loading required package: mvtnorm

Loading required package: corpcor

Loading required package: nloptr

Loading required package: RColorBrewer



In [2]:
# set.seed(30)

In [3]:
tree <- read.tree('../phylogeny/place/fine_all.nwk')
tree


Phylogenetic tree with 5380 tips and 1961 internal nodes.

Tip labels:
  taxid71518, taxid83984, taxid2193, taxid83985, taxid71152, taxid2203, ...
Node labels:
  N1, N5, N18, N51, N79, N119, ...

Rooted; includes branch lengths.

In [4]:
data <- read.table('../phylogeny/place/fine_all.tsv', header = TRUE, sep = '\t', quote = '')
head(data, 3)

Unnamed: 0_level_0,taxid,length,width,volume,surface,shape,species,genus,family,order,⋯,proteins,coding,rrnas,MILC,B,MCB,ENC,ENCprime,SCUO,hash
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,taxid11,2.371708,1.0606602,1.783187,7.902917,rod-shaped,Cellulomonas gilvus,Cellulomonas,Cellulomonadaceae,Micrococcales,⋯,3206,91.77278,2,0.2440547,0.0991309,0.46763293,0.04549081,-0.03399343,-0.177342,1.21
2,taxid14,10.0,0.4898979,1.8541744,15.390598,rod-shaped,Dictyoglomus thermophilum,Dictyoglomus,Dictyoglomaceae,Dictyoglomales,⋯,1890,93.77725,2,0.05881714,-0.192631758,-0.05707928,0.0218529,-0.01104863,-0.2228213,1.5
3,taxid23,1.5,0.7,0.4874705,3.298672,rod-shaped,Shewanella colwelliana,Shewanella,Shewanellaceae,Alteromonadales,⋯,4094,87.38314,0,0.13502902,-0.002838678,0.08282373,0.21534919,-0.03105027,-1.190212,1.37


In [5]:
data[[paste("svratio")]] = (data[['volume']] / data[['surface']])

In [6]:
cols = c("volume", "surface", "svratio")

In [7]:
for (col in cols) {
    data[[paste("log", col, sep="_")]] = log10(data[[col]])
}

Binarize tree

In [8]:
tree2 <- multi2di(tree)
is.ultrametric(tree2)

In [11]:
datum <- setNames(data$log_svratio, data$taxid)

In [12]:
# set.seed(42)

Comparing models of evolution

## Ornstein-Uhlenbeck model

* "It is a modification of the Brownian model with an additional parameter $\alpha$ that measures the strength of return towards theoretical optimum."

* "It is a random walk in which trait values revert back towards some optimal value $\mu$ with an attraction strength proportional to the parameter $\alpha$."
    * "$\alpha$ is referred to as the rubber and parameter because of the way it pulls traits back towards $\mu$"
    * "$\mu$ is a long-term mean, and it is assumed that species traits evolve around this value. In **Geiger** $\mu$ is the same as the state of the root at time zero (single sttionary peak)."
    
**Interpretation:**

* $\alpha \approx 0$: trait evolution is approximately Brownian
* $\alpha \gg 0$ (i.e. large): traits shows some degree of non-Brownian behavior 
* $\alpha \ggg 0$ (i.e. really large): all imprint in history is lost and the trait evolution is essentially a rapid burst at the present. 
    * **How large?** After reescaling the tree: -log($\alpha$) = 4: low, almost Brownian. -log($\alpha$) = -4 very high value, so trait is returning towards its theoretical optimum.

"$\alpha$ scales with the tree height: taller trees will have lower $\alpha$ values because there is more time for traits to return to the optimum value, and thus the strength of the pull towards the optimum can be smaller." **Solution: reescale trees to 1**. 

In [13]:
startTime <- Sys.time()
bm <- fitContinuous(tree2, datum, model = 'BM', control = list(method = c("subplex","L-BFGS-B"),
    niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95))
endTime <- Sys.time()
print(endTime - startTime)
bm$opt

Time difference of 1.681522 secs


In [14]:
startTime <- Sys.time()
eb <- fitContinuous(tree2, datum, model = 'EB', control = list(method = c("subplex","L-BFGS-B"),
    niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95))
endTime <- Sys.time()
print(endTime - startTime)
eb$opt

“
Parameter estimates appear at bounds:
	a”


Time difference of 3.479607 secs


In [15]:
startTime <- Sys.time()
wh <- fitContinuous(tree2, datum, model = 'white', control = list(method = c("subplex","L-BFGS-B"),
    niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95))
endTime <- Sys.time()
print(endTime - startTime)
wh$opt

Time difference of 2.336618 secs


In [17]:
startTime <- Sys.time()
ou <- fitContinuous(tree2, datum, model = 'OU', ncores = 24, bounds = list(alpha = c(0, 500)))
endTime <- Sys.time()
print(endTime - startTime)
ou$opt

“Non-ultrametric tree with OU model, using VCV method.”
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”


Time difference of 1.541297 days


In [24]:
aic_cs <- setNames(c(AIC(bm), AIC(eb), AIC(wh), AIC(ou)), c('BM', 'EB', 'WH', 'OU'))
aic_cs

In [25]:
aic_cs.w <- aic.w(aic_cs)
aic_cs.w

BM EB WH OU 
 0  0  0  1 

Using a rescaled tree

In [28]:
tree_res <- read.tree('../phylogeny/place/fine_all_rescaled.nwk')
tree_res


Phylogenetic tree with 5380 tips and 1961 internal nodes.

Tip labels:
  taxid71518, taxid83984, taxid2193, taxid83985, taxid71152, taxid2203, ...
Node labels:
  N1, N5, N18, N51, N79, N119, ...

Rooted; includes branch lengths.

Binarize tree

In [29]:
tree2_res <- multi2di(tree_res)
is.ultrametric(tree2_res)

Comparing models of evolution

In [30]:
startTime <- Sys.time()
bm_res <- fitContinuous(tree2_res, datum, model = 'BM', control = list(method = c("subplex","L-BFGS-B"),
    niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95))
endTime <- Sys.time()
print(endTime - startTime)
bm_res$opt

Time difference of 2.460463 secs


In [31]:
startTime <- Sys.time()
eb_res <- fitContinuous(tree2_res, datum, model = 'EB', control = list(method = c("subplex","L-BFGS-B"),
    niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95))
endTime <- Sys.time()
print(endTime - startTime)
eb_res$opt

Time difference of 2.901003 secs


In [32]:
startTime <- Sys.time()
wh_res <- fitContinuous(tree2_res, datum, model = 'white', control = list(method = c("subplex","L-BFGS-B"),
    niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95))
endTime <- Sys.time()
print(endTime - startTime)
wh_res$opt

Time difference of 2.232431 secs


In [33]:
startTime <- Sys.time()
ou_res <- fitContinuous(tree2_res, datum, model = 'OU', ncores = 24, bounds = list(alpha = c(0, 500)))
endTime <- Sys.time()
print(endTime - startTime)
ou_res$opt

“Non-ultrametric tree with OU model, using VCV method.”
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”


Time difference of 1.111255 days


In [34]:
aic_cs_res <- setNames(c(AIC(bm_res), AIC(eb_res), AIC(wh_res), AIC(ou_res)), c('BM', 'EB', 'WH', 'OU'))
aic_cs_res

In [35]:
aic_cs_res.w <- aic.w(aic_cs_res)
aic_cs_res.w

BM EB WH OU 
 0  0  0  1 

In [2]:
-log(2.97277120831611)

In [3]:
exp(- (-1.0894945845515))

In [4]:
exp(- (-1))