## (&#x1F4D7;) ipyrad Cookbook: `tetrad` species tree inference

The `ipyrad.analysis` package includes a module and command-line tool called `tetrad` that can be used to infer quartets from very large SNP alignments, and to join the quartets into a species tree for large numbers of samples following the methodology outlined by Chifman and Kubatko (2014) and implemented in `SVDQuartets`. 

The `tetrad` approach varies in several ways: (1) it uses the same mode of parallelization as `ipyrad` and is therefore easy to distribute work across multiple nodes of a large HPC cluster; (2) it can be easily implemented from the command-line tool or in a jupyter-notebook without having to write commands into the large sequence file as a nexus block; (3) it implements a strategy to perform bootstrap re-sampling of RAD loci, and of SNPs within RAD loci; (4) it calculates admixture statistics while it runs (i.e., ABBA-BABA); (5) [coming soon] advanced sampling methods for large trees when the number of quartets is too large to sample. 

### Required software
The following conda commands will install all required software, after which you can import the required modules for this notebook. 

In [27]:
## conda install ipyrad -c ipyrad
## conda install toytree -c eaton-lab

In [28]:
import ipyrad.analysis as ipa
import toytree

### This notebook

All code in this notebook is written in Python, which you can copy/paste into an IPython terminal to execute, or, preferably, run in a Jupyter notebook like this one. See the other analysis cookbooks for [instructions](http://ipyrad.readthedocs.io/analysis.html) on using Jupyter notebooks. 

This notebook assumes you have started an `ipcluster` instance, which can be started by running the commented command below in a separate terminal.

In [30]:
##
## ipcluster start 
##

### Load in the data

In [31]:
seqfile = "./analysis-ipyrad/pedic_outfiles/pedic.snps.phy"
mapfile = "./analysis-ipyrad/pedic_outfiles/pedic.snps.map"

### tetrad tree inference setup
The first step is to create a named `tetrad` Class object, which requires a minimum of two argument, a name and a sequence file. The sequence that you will typically want to enter is the `'.snps.phy'` file from `ipyrad`, which is a phylip formatted file with *all* SNPs. You can also pass it the `'.snps.map'` file, which tells `tetrad` how to the SNPS are linked within loci, so that a single SNP can be randomly sampled in each bootstrap replicate. A large number of additional paremeter settings are available that you can set when you create the `tetrad` Class object, or which you can set afterwards, as we do below where we set the `'nboots'` parameter, and the `'method'` parameter which tells `tetrad` how to sample quartets. 

In [3]:
## create a tetrad Class object
tet = ipa.tetrad(name='tutorial', seqfile=seqfile, mapfile=mapfile)

loading seq array [13 taxa x 194863 bp]
max unlinked SNPs per quartet (nloci): 38079


In [4]:
## set additional parameters
tet.nboots = 10
tet.method = "all"

### Infer the tree
The `'run()'` command distributes the computation across your cluster, print progress, and blocks until the job in complete. The results of the analysis will be written to file and also stored in your `tetrad` Class object, and can be accessed from its `.stats` and `.trees` attributes. 

In [5]:
tet.run()

host compute node: [4 cores] on oud
inferring 715 induced quartet trees
[####################] 100%  initial tree | 0:00:05 |  
[####################] 100%  boot 10      | 0:00:22 |  


### Statistics (in development)
This will be expanded in the future. For now it is empty. 

In [6]:
print tet.stats

n_quartets_sampled   0                   



### Draw the tree
Use the `ipa.tree()` function to generate a tree plot of the unrooted consensus tree with bootstrap support values. The command `'draw()'` returns a number of objects, the first of which is the `canvas`. To save the plot as an SVG image use the `toyplot.svg.render()` function like below. 

In [7]:
## the newick tree files produced by tetrad
tet.trees

boots   /home/deren/Documents/ipyrad/tests/analysis-tetrad/tutorial.boots
cons    /home/deren/Documents/ipyrad/tests/analysis-tetrad/tutorial.cons
nhx     /home/deren/Documents/ipyrad/tests/analysis-tetrad/tutorial.nhx
tree    /home/deren/Documents/ipyrad/tests/analysis-tetrad/tutorial.tree

In [16]:
## load in the tree
tre = toytree.tree(tet.trees.cons)

## draw unrooted consensus tree with support values
canvas, axes = tre.draw(
    node_labels=tre.get_node_values("support"),
    node_labels_style={"font-size": "9px"},
    node_size=20,
    );

In [24]:
## save the tree figure in [format]
import toyplot.svg
toyplot.svg.render(canvas, "analysis-tetrad/pedic-tree.svg")

### Checkpointed analysis
If you want to add more bootstrap replicates later, simply increase the the `'nboots'` attribute and execute `'run()'` again. 

In [17]:
tet.nboots = 50
tet.run()

host compute node: [4 cores] on oud
[####################] 100%  boot 50      | 0:01:21 |  


In [19]:
## load in the tree
tre = toytree.tree(tet.trees.cons)

## draw the unrooted consensus tree with support values
canvas, axes = tre.draw(
    node_labels=tre.get_node_values("support"),
    node_labels_style={"font-size": "9px"},
    node_size=20,
    );

### Load existing tetrad object
Whenever you execute the `'run()'` command your `tetrad` Class object will be saved in your working directory as a JSON file (`{name}.tet.json`). You can load an existing `tetrad` object back into memory by using the `load` argument when you create a `tetrad` object. The default working directory (`workdir`) is `'./analysis-tetrad'` unless you change it. 

In [22]:
## load the old tetrad object
oldtet = ipa.tetrad(
    name="tutorial", 
    workdir="analysis-tetrad/", 
    load=True)

## draw the tree from it
toytree.tree(oldtet.trees.cons).draw();

loading seq array [13 taxa x 194863 bp]
max unlinked SNPs per quartet (nloci): 38079
