## ipyrad -- interactive assembly and analysis of RAD-seq data

#### *And*, a primer on Python, Jupyter, and reproducible science

### Skills in this course:

+ introduction to RAD-seq assembly
+ ipyrad command line (CLI)
+ ipyrad Python code (API)
+ introduction to jupyter
+ introduction to parallel computing in Python

### Introduction to RAD-seq assembly

+ Short reads (usually 50-150bp) single or paired.
+ Loci usually align perfectly, not *tiled* into contigs.
+ SNP data including full sequence data.
+ usually ~1e3 - 1e6 loci.
+ phased SNPs within loci, not phased *between* loci
+ anonymous (*denovo*) or spatial-located (*reference-mapped*)



# Available assembly software

1. Standard reference-mapping approaches (BWA + Picard + GATK + ...)
2. [STACKS](http://catchenlab.life.illinois.edu/stacks/) 
3. [pyRAD](http://github.com/dereneaton/pyrad))
4. [TASSEL-UNEAK](http://www.maizegenetics.net/tassel)
5. [ipyrad](http://ipyrad.readthedocs.io)

### Advantages to using ipyrad over the other methods:
1. Provides denovo, reference, and denovo-reference hybrid assembly methods
2. Includes alignment steps to allow for indel variation 
3. Fast and massively parallelizable (hundreds/thousands of cores)
4. Low memory footprint, e.g., compared to stacks.
5. Branching methods support reproducibility and exploring parameter settings
6. Python API supports integration with Jupyter and scripting. 

## ipyrad online documentation

<img  src="../slide-images/MBL-1.png">

## The ipyrad command-line (CLI)

And introduction to the ipyrad setup and parameter settings.

In [3]:
%%bash

ipyrad -n tutorial


  New file 'params-tutorial.txt' created in /home/deren/websites/eaton-lab/slides/MBL



In [6]:
%%bash

cat params-tutorial.txt

------- ipyrad params file (v.0.7.5)--------------------------------------------
tutorial                       ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
./                             ## [1] [project_dir]: Project dir (made in curdir if not present)
                               ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
                               ## [3] [barcodes_path]: Location of barcodes file
                               ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo                         ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)
                               ## [6] [reference_sequence]: Location of reference sequence file
rad                            ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
TGCAG,                         ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (c

## Phylogenetic gene/species tree inference

<img width=500 src="../slide-images/tree-inference.png">


<span class="align-right">
    <a href="http://onlinelibrary.wiley.com/doi/10.1111/nyas.12747/abstract">Liu et al. 2015</a>
</span>