# <span style="color:gray">ipyrad-analysis toolkit:</span> RAxML

RAxML is one of the most popular tools for inferring phylogenetic trees using maximum likelihood. It is fast even for very large data sets. The documentation for raxml is huge, and there are many options. However, I tend to use the same small number of options very frequently, which motivated me to write the `ipa.raxml()` tool to automate the process of generating RAxml command line strings, running them, and accessing the resulting tree files. The simplicity of this tool makes it easy to incorporate into other more complex tools, for example, to infer tress in sliding windows along the genome using the `ipa.treeslider` tool.



### Required software

In [1]:
# conda install ipyrad -c conda-forge -c bioconda
# conda install raxml -c conda-forge -c bioconda
# conda install ipcoal -c conda-forge

In [2]:
import ipyrad.analysis as ipa
import toytree
import ipcoal

### Download an assembled RAD-seq dataset

In [3]:
# path to an HDF5 formatted seqs file
SEQSFILE = "/tmp/oaks.seqs.hdf5"

# download example seqs file if not already present (~500Mb, takes ~5 minutes)
URL = "https://www.dropbox.com/s/c1u89nwuuv8e6ie/virentes_ref.seqs.hdf5?raw=1"
ipa.download(URL, path=SEQSFILE);

successful download: /tmp/oaks.seqs.hdf5


### Extract an alignment in PHYLIP format
By default a phylip file will be produced by ipyrad as an output file during assembly. However, you can have greater flexibility in filtering your data by post-processing with the window extracter tool to write a phylip alignment from the full assembled database file. See the window_extracter tool docs for details. Here I select to extract data from 10 samples and to allow at most 10% missing data. 

In [11]:
# the taxa to extract data for
INCLUDE = [
    'AR', 'BJSB3', 'FLBA140', 'TXGR3', 
    'MXSA3017', 'FLSA185', 'CRL0030', 'CUCA4', 
    'FLCK18', 'MXGT4'
]

# init the extracter tool to pull data from first chromosome
wex = ipa.window_extracter(
    data=SEQSFILE,
    name="oaks-8taxa",
    workdir="/tmp",
    scaffold_idxs=0,
    mincov=0.9,
    imap={'include': INCLUDE}
)

# display stats on alignment
display(wex.stats)

# write the alignment file
wex.run(force=True)

Unnamed: 0,scaffold,start,end,sites,snps,missing,samples
prefilter,Qrob_Chr01,0,,890747,7646,0.46,10
postfilter,Qrob_Chr01,0,,289252,4287,0.04,10


Wrote data to /tmp/oaks-8taxa.phy


### Infer a ML tree

In [16]:
# init raxml object with input data and (optional) parameter options
rax = ipa.raxml(data="/tmp/oaks-8taxa.phy", T=4, N=20)

# print the raxml command string for prosperity
print(rax.command)

# run the command, (options: block until finishes; overwrite existing)
rax.run(block=True, force=True)

/home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 4 -m GTRGAMMA -n test -w /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml -s /tmp/oaks-8taxa.phy -p 54321 -N 20 -x 12345
job test finished successfully


### Draw the inferred tree
After inferring a tree you can then visualize it in a notebook using `toytree`. 

In [31]:
# the output file location ({workdir}/RAxML_bipartitions.{name})
rax.trees.bipartitions

'/tmp/RAxML_bipartitions.oaks-8taxa'

In [26]:
# load from the .trees attribute of the raxml object, or from the saved tree file
tre = toytree.tree(rax.trees.bipartitions)

# draw the tree
rtre = tre.root("AR")
rtre.draw(tip_labels_align=True, node_sizes=18, node_labels="support");

### Cookbook

Most frequently used: perform 100 rapid bootstrap analyses followed by 10 rapid hill-climbing ML searches from random starting trees under the GTRGAMMA substitution model. 

In [52]:
rax = ipa.raxml(
    data="/tmp/oaks-8taxa.phy",
    name="oaks-8taxa-a",
    workdir="/tmp",
    m="GTRGAMMA",
    T=8,
    f="a",
    N=50,
)
print(rax.command)
rax.run(force=True)

/home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 8 -m GTRGAMMA -n oaks-8taxa-a -w /tmp -s /tmp/oaks-8taxa.phy -p 54321 -N 50 -x 12345
job oaks-8taxa-a finished successfully


Another common option: Perform N rapid hill-climbing ML analyses from random starting trees, with no bootstrap replicates. Be sure to use the `BestTree` output from this analysis since it does not produce a `bipartitions` output file. 

In [53]:
rax = ipa.raxml(
    data="/tmp/oaks-8taxa.phy",
    name="oaks-8taxa-d",
    workdir="/tmp",
    m="GTRGAMMA",
    T=8,
    f="d",
    N=10,
    x=None,
)
print(rax.command)
rax.run(force=True)

/home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f d -T 8 -m GTRGAMMA -n oaks-8taxa-d -w /tmp -s /tmp/oaks-8taxa.phy -p 54321 -N 10
job oaks-8taxa-d finished successfully


### Check your files
The .info and related log files will be stored in the `workdir`. Be sure to look at these for further details of your analyses. 

In [55]:
! cat /tmp/RAxML_info.oaks-8taxa-a



Using BFGS method to optimize GTR rate parameters, to disable this specify "--no-bfgs" 



This is RAxML version 8.2.12 released by Alexandros Stamatakis on May 2018.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Sarah Lutteropp   (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)
Charlie Taylor    (UF)


Alignment has 5244 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 3.71%

RAxML rapid bootstrapping and subsequent ML search

Using 1 distinct models/data partitions with joint branch length optimization



Executing 50 rapid bootstrap inferences and thereafter a thorough ML search 

All free model parameters will be estimated by RAxML
GAMMA model of rate heterogeneity, ML estimate of alpha-parameter

GAMMA Model pa