<h2><span style="color:gray">ipyrad-analysis toolkit:</span> treeslider</h2>

<h5><span style="color:red">(Reference only method)</span></h5>

With reference mapped RAD loci you can select windows of loci located close together on scaffolds and automate extracting, filtering, and submitting alignments to phylogenetic inference software (raxml or mb). 

Key features:

1. Automatically concatenates ref-mapped RAD loci in sliding windows.
2. Distributes phylogenetic inference jobs in parallel.
3. Can be easily restarted from checkpoint if interrupted.
4. Provides a tree_table (dataframe) with stats and tree results. 
5. Can be paired with other tools for further analysis of tree_table.

### Required software

In [5]:
# conda install ipyrad -c bioconda
# conda install raxml -c bioconda
# conda install toytree -c eaton-lab

In [6]:
import ipyrad.analysis as ipa
import toytree

### Short Tutorial:

The `treeslider()` tool takes the `.seqs.hdf5` database file from ipyrad as its input file. You can select scaffolds by their index (integer) or by their name (string). If you don't know what these are then first read in the data file without a scaffold argument and check the `.scaffold_table` attribute table. If one or more samples has no data in the selected window then it will be dropped from the data set. If no samples contain data in the selected window an error will be raised.


#### Load the data

In [7]:
# the path to your HDF5 formatted seqs file
data = "/home/deren/Downloads/ref_pop2.seqs.hdf5"

In [8]:
# check scaffold idx (row) against scaffold names
ipa.treeslider(data).scaffold_table.head()

Unnamed: 0,scaffold_name,scaffold_length
0,Qrob_Chr01,55068941
1,Qrob_Chr02,115639695
2,Qrob_Chr03,57474983
3,Qrob_Chr04,44977106
4,Qrob_Chr05,70629082


#### Enter window and slide arguments
You may wish to use smaller window sizes, I am using a large window size here so that the analysis will run faster (fewer windows total). 

In [4]:
# select a scaffold idx, start, and end positions
ts = ipa.treeslider(
    name="chr1_w5M_s1M",
    data=data,
    workdir="analysis-treeslider",
    scaffold_idxs=[0],
    window_size=5000000,
    slide_size=1000000,
    minsnps=10,
    inference_args={"N": 10}
)

#### submit jobs to run on cluster
Below is the raxml command that will be run, with ... replaced by arguments that will be filled in. See advanced options below for how to modify the raxml command string. 

In [5]:
# this is the tree inference command that will be used
ts.show_inference_command()

raxmlHPC-PTHREADS-SSE3 -f a -T 2 -m GTRGAMMA -n ... -w ... -s ... -p 54321 -N 10 -x 12345


In [6]:
# run tree inference in parallel
ts.run(auto=True)

Parallel connection | oud: 4 cores
building database: nwindows=51; minsnps=10
[#################   ]  88% 1:20:04 | inferring raxml trees 
Keyboard Interrupt by user



In [9]:
ts

<ipyrad.analysis.treeslider.TreeSlider at 0x7f63b63a3748>

#### Examine results
The tree table of results is also saved as a CSV formatted file in the workdir. You can re-load it later using pandas. Here we can see how much information was in each region, and extract the tree for plotting from each region. To examine how phylogenetic relationships vary across the genome using data from the tree table check out the `clade_weights()` tool. 

In [13]:
ts.tree_table.head()

AttributeError: 'TreeSlider' object has no attribute 'treetable'

<h3><span style="color:red">Advanced</span>: Restart from interrupted checkpoint</h3>

In [None]:
# select a scaffold idx, start, and end positions
ts = ipa.treeslider(
    name="chr1_w500K_s100K",
    data=data,
    workdir="analysis-treeslider",
    scaffold_idxs=[0],
    window_size=500000,
    slide_size=100000,
    minsnps=10,
    inference_method="raxml",
)

<h3><span style="color:red">Advanced</span>: Modify the raxml command</h3>

In [15]:
# this is the tree inference command that will be used
ts.show_inference_command()

raxmlHPC-PTHREADS-SSE3 -f a -T 2 -m GTRGAMMA -n ... -w ... -s ... -p 54321 -N 100 -x 12345


In [8]:
# select a scaffold idx, start, and end positions
ts = ipa.treeslider(
    name="chr1_w500K_s100K",
    data=data,
    workdir="analysis-treeslider",
    scaffold_idxs=[0],
    window_size=500000,
    slide_size=100000,
    minsnps=10,
    inference_method="raxml",
    inference_args={"m": "GTRCAT", "N": 10, "f": "d", 'x': None},
)

existing results loaded from /home/deren/Documents/ipyrad/newdocs/cookbook/analysis-treeslider/chr1_w500K_s100K.tree_table.csv


<h3><span style="color:red">Advanced</span>: Cores and threads </h3>
    
By default each alignment (window) is run using two threads, so if you have 4 cores available this will be 2 2-threaded jobs. You can set the number of cores and threads to use (must be available) using the `.ipcluster` dictionary. 

In [7]:
# automatically start 8-cores and run 4-threaded jobs
ts.ipcluster["threads"] = 4
ts.ipcluster["cores"] = 8
ts.run(auto=True)

{'cluster_id': 'ipp-983188349',
 'profile': 'default',
 'engines': 'Local',
 'quiet': 0,
 'timeout': 60,
 'cores': 4,
 'threads': 2,
 'pids': {0: 13020, 1: 13022, 2: 13024, 3: 13039}}