<h2><span style="color:gray">ipyrad-analysis toolkit:</span> window_extracter</h2>

<h5><span style="color:red">(Reference only method)</span></h5>


With reference mapped RAD loci you can extract and concatenate and filter RAD loci from a given window to use to other downstream analyses. For example, you may wish to extract all RAD loci within 10Kb of a gene of interest to see the phylogenetic tree of linked SNPs in this region. 

### Required software

In [1]:
# conda install ipyrad -c bioconda

In [2]:
import ipyrad.analysis as ipa
import toytree


### Short Tutorial:

The `window_extracter()` tool takes the `.seqs.hdf5` database file from ipyrad as its input file. You can select scaffolds by their index (integer) or by their name (string). If you don't know what these are then first read in the data file without a scaffold argument and check the `.scaffold_table` attribute table. If one or more samples has no data in the selected window then it will be dropped from the data set. If no samples contain data in the selected window an error will be raised.

In [3]:
# the path to your HDF5 formatted seqs file
data = "/home/deren/Downloads/ref_pop2.seqs.hdf5"

In [4]:
# check scaffold idx (row) against scaffold names
ipa.window_extracter(data).scaffold_table.head()

Unnamed: 0,scaffold_name,scaffold_length
0,Qrob_Chr01,55068941
1,Qrob_Chr02,115639695
2,Qrob_Chr03,57474983
3,Qrob_Chr04,44977106
4,Qrob_Chr05,70629082


In [5]:
# select a scaffold idx, start, and end positions
we = ipa.window_extracter(
    data=data,
    workdir="analysis-window_extracter",
    scaffold_idx=1,
    start=1000000,
    end=1500000,
    exclude=["reference"],
)

# show stats of the window
we.stats

Unnamed: 0,Scaffold,Start,End,nSites,nSNPs,Missing,Samples,Empty/drop samples
0,Qrob_Chr02,1000000,1500000,7424,142,0.17,29,0


In [14]:
# write the data to a phylip formatted file
we.run()

Wrote data to /home/deren/Documents/ipyrad/newdocs/cookbook/analysis-window_extracter/scaf1-1000000-1500000.phy


### optional: infer tree from output and draw it

In [15]:
# run raxml on the phylip file 
rax = ipa.raxml(data=we.outfile, N=10)
print(rax.command)
rax.run(force=True)

raxmlHPC-PTHREADS-SSE3 -f a -T 4 -m GTRGAMMA -n test -w /home/deren/Documents/ipyrad/newdocs/cookbook/analysis-raxml -s /home/deren/Documents/ipyrad/newdocs/cookbook/analysis-window_extracter/scaf1-1000000-1500000.phy -p 54321 -N 10 -x 12345
job test finished successfully


In [19]:
# plot the tree for this genome window
tre = toytree.tree(rax.trees.bipartitions)
ttt = tre.collapse_nodes(min_support=50)
ttt.draw(node_labels="support");

### Cookbook

#### TODO: Filtering to reduce missing data:


In [None]:
# Or, to include only sites with sample coverage > X
we = ipa.window_extracter(
    data=data,
    workdir="analysis-window_extracter",
    scaffold_idx=1,
    start=1000000,
    end=1500000,
    exclude=["reference"],
)

# show stats of the window
we.stats

#### Output in alternative formats

In [None]:
we.run("loci")
we.run("fasta")