# Gene tree inference

In [1]:
import ipcoal
import toytree

In [None]:
# infer a gene tree for only locus=0
gtree = ipcoal.phylo.infer_raxml_ng_tree(mod, idxs=0, diploid=True)
gtree.draw();

The function `infer_raxml_ng_trees` is similar to above, but includes options to control the parallelization of inferring many trees simultaneously. It returns the results as a dataframe, similar to the `.df` result from simulations. This makes it easy to compare that accuracy of gene tree inference against the true known genealogies at each position of the genome. (See the Cookbook/Example for calculating per-site gene tree estimation error using tree distances.)

In [None]:
# simulate many loci under the species tree model
mod = ipcoal.Model(sptree, Ne=1e5, nsamples=4)
mod.sim_loci(nloci=100, nsites=1000)

# infer a gene tree for each locus
res = ipcoal.phylo.infer_raxml_ng_trees(mod, nworkers=4, nthreads=2, nproc=1, diploid=True)
res.head(10)

Finally, to infer a species tree from a distribution of trees we can use the `infer_astral_tree` method. This can take an input of newick strings or ToyTree objects. You can call several options of the astral method, including providing an imap dictionary to group samples into a species/clade.

In [None]:
# get the trees from the dataframe above
inferred_trees = res.gene_tree

# get a mapping of {species_name: list of gene tree tips}
inferred_trees_imap = mod.get_imap_dict(diploid=True)

# get astral result as a ToyTree, root it and draw it
atree = ipcoal.phylo.infer_astral_tree(inferred_trees, imap=inferred_trees_imap)
atree.mod.root_on_minimal_ancestor_deviation("r0").draw(ts='p');