Skip to content

Example 2: Missing Data Imputation and Ancestral State Reconstruction

Michael McLaren edited this page Nov 5, 2021 · 8 revisions

The phylopars function automatically reconstructs the ancestral states, imputes missing data, and computes species means according to the evolutionary model (Bruggeman et al. 2009). Reconstruction and imputation variances are also provided, the square root of which can be used to compute 95% confidence intervals for the estimates, as is illustrated below for the above dataset.

Using our simulated phylogeny and data with missing observations from the example in Example 1: Getting Started), we fit a model of evolution using the phylopars function.

library(Rphylopars)
library(phytools) # for simulating pure-birth phylogenies
set.seed(21) # Set the seed for reproducible results
trait_cov <- matrix(c(4,2,2.2,2,3,1.5,2.2,1.5,2.5),nrow = 3,ncol = 3)
tree <- pbtree(n = 20)
sim_data <- simtraits(v = trait_cov,tree = tree,nmissing = 10)
p_BM <- phylopars(trait_data = sim_data$trait_data,tree = sim_data$tree)

Next, we view our original data (with missing values)

sim_data$trait_data[,2:4] # Original data with missing values

Now, we may view the imputed missing data and imputation variance, from which we construct 95% confidence intervals:

p_BM$anc_recon[1:20,] # Data with imputed species means
p_BM$anc_var[1:20,] # Variances for each estimate
p_BM$anc_recon[1:20,] - sqrt(p_BM$anc_var[1:20,])*1.96 # Lower 95% CI
p_BM$anc_recon[1:20,] + sqrt(p_BM$anc_var[1:20,])*1.96 # Upper 95% CI

Row names of the first 1:nspecies rows correspond to species names, and the remaining row names correspond to node numbers on a postorder reordering of the tree (reorder(tree,"postorder")).

plot.phylo(reorder(tree,"postorder"))
nodelabels()

Finally, we view the reconstructed ancestral states and variances, and construct 95% confidence intervals.

p_BM$anc_recon[21:39,] # Reconstructed ancestral states for each trait
p_BM$anc_var[21:39,] # Variances for each estimate
p_BM$anc_recon[21:39,] - sqrt(p_BM$anc_var[21:39,])*1.96 # Lower 95% CI
p_BM$anc_recon[21:39,] + sqrt(p_BM$anc_var[21:39,])*1.96 # Upper 95% CI