## Paired-end ddRAD tutorial - CLI

This tutorial assumes that you have already finished the introductory tutorial and focuses on the primary differences between single-end and paired-end analyses. The following topics are covered here: 

+ Naming convention for paired-end files
+ Setting up the params file for paired-end data
+ Merging paired-end reads which overlap
+ Filtering paired-end data

### Get the data
First download and extract a set of example data from the web using the command below. This will create a directory called ipsimdata/ in your current directory containing a number of test data sets. If you already downloaded these data for one of the other tutorials then you can skip this step. 

In [None]:
## The curl command needs a capital O, not a zero
curl -LkO https://github.com/dereneaton/ipyrad/raw/master/tests/ipsimdata.tar.gz
tar -xvzf ipsimdata.tar.gz

The data set we will analyze for this tutorial is now located in the ``ipsimdata/`` directory. Use any text editor or the command-line (as shown below) to look at the following six files which we will use for this tutorial. The ``sim_pairddrad_*`` data files contain a very "clean" paired ddRAD data set, while the ``sim_pairddradmerge_*`` data files contain a data set in which many paired reads are overlapping. In the latter case we will merge the reads which overlap during the ipyrad assembly. 

+ ``sim_pairddrad_R1_.fastq.gz`` -- Illumina fastq formatted read 1 (R1) data
+ ``sim_pairddrad_R2_.fastq.gz`` -- Illumina fastq formatted read 2 (R2) data
+ ``sim_pairddrad_barcodes.txt`` -- barcode information file


+ ``sim_pairddradmerge_R1_.fastq.gz`` -- Illumina fastq formatted read 1 (R1) data
+ ``sim_pairddradmerge_R2_.fastq.gz`` -- Illumina fastq formatted read 1 (R1) data
+ ``sim_pairddradmerge_barcodes.txt`` -- barcode information file

### Setting the parameters
As with any analysis, we start by creating an initial assembly. We will create one for each of the two data sets. 

In [3]:
%%bash

ipyrad -n nomerge
ipyrad -n merge


    New file `params-nomerge.txt` created in /home/deren/Documents/ipyrad/tests


    New file `params-merge.txt` created in /home/deren/Documents/ipyrad/tests



Next edit the params-base.txt file to tell it the location of the input data files and the barcodes file. We will use the wildcard "*" in the path names to the files to indicate that we are selecting multiple files. The data type is also set to 'pairddrad', which tells it that the data are paired-end and that each end was cut with a different cutter.  

In [None]:
## enter these changes to params-nomerge.txt in your text-editor

pairtest                              ## [1] [project_dir]
ipsimdata/sim_pairddrad_*.fastq.gz    ## [2] [raw_fastq_path]
ipsimdata/sim_pairddrad_barcodes.txt  ## [3] [barcodes_path]
pairddrad                             ## [7] [datatype]
TGCAG, AATT                           ## [8] [restriction_overhang]

In [None]:
## enter these changes to params-merge.txt in your text-editor

pairtest                                    ## [1] [project_dir]
ipsimdata/sim_pairddradmerge*.fastq.gz      ## [2] [raw_fastq_path]
ipsimdata/sim_pairddradmerge_barcodes.txt   ## [3] [barcodes_path]
pairddrad                                   ## [7] [datatype]
TGCAG, AATT                                 ## [8] [restriction_overhang]
1                                           ## [16] [filter_adapters]

In [5]:
%%bash

ipyrad -p params-merge.txt -s 12


 --------------------------------------------------
  ipyrad [v.0.1.74]
  Interactive assembly and analysis of RADseq data
 --------------------------------------------------
  New Assembly: merge
  ipyparallel setup: Local connection to 4 Engines

  Step1: Demultiplexing fastq data to Samples
    Saving Assembly.
  Step2: Filtering reads 
    Saving Assembly.


In [6]:
%%bash

ipyrad -p params-merge.txt -r



Summary stats of Assembly merge
------------------------------------------------
     state  reads_raw  reads_filtered
1A0      2      20000           19901
1B0      2      20000           19960
1C0      2      20000           19939
1D0      2      20000           19900
2E0      2      20000           19960
2F0      2      20000           19940
2G0      2      20000           19960
2H0      2      20000           19903
3I0      2      20000           19940
3J0      2      20000           19939
3K0      2      20000           19899
3L0      2      20000           19960


Full stats files
------------------------------------------------
step 1: ./pairtest/merge_fastqs/s1_demultiplex_stats.txt
step 2: ./pairtest/merge_edits/s2_rawedit_stats.txt
step 3: None
step 4: None
step 5: None
step 6: None
step 7: None




You can see in more detail by looking at the step2 stats output file

In [7]:
cat ./pairtest/merge_edits/s2_rawedit_stats.txt

     reads_raw  filtered_by_qscore  filtered_by_adapter  reads_passed
1A0      20000                   0                   99         19901
1B0      20000                   0                   40         19960
1C0      20000                   0                   61         19939
1D0      20000                   0                  100         19900
2E0      20000                   0                   40         19960
2F0      20000                   0                   60         19940
2G0      20000                   0                   40         19960
2H0      20000                   0                   97         19903
3I0      20000                   0                   60         19940
3J0      20000                   0                   61         19939
3K0      20000                   0                  101         19899
3L0      20000                   0                   40         19960

### Read merging

During step 3 paired reads are merged.

In [9]:
%%bash

ipyrad -p params-merge.txt -s 3


 --------------------------------------------------
  ipyrad [v.0.1.74]
  Interactive assembly and analysis of RADseq data
 --------------------------------------------------
  loading Assembly: merge [~/Documents/ipyrad/tests/pairtest/merge.json]
  ipyparallel setup: Local connection to 4 Engines

  Step3: Clustering/Mapping reads
    Saving Assembly.
