# Bismark Parameter Testing

In this notebook I'll test different `bismark` parameters using a subset of my gonad methylation data.

1. Genome Preparation (already done in [this Jupyter notebook](https://github.com/RobertsLab/project-virginica-oa/blob/master/notebooks/2018-04-27-Gonad-Methylation-Bismark.ipynb))
2. Alignment
3. Deduplication
4. Methlyation Extractor
5. HTML Processing Report
6. Summary Report

Primarily, I'll test different alignments to increase mapping efficiency.

## 0. Set working directory

In [1]:
pwd

'/Users/yaamini/Documents/project-virginica-oa/notebooks'

In [2]:
cd ../analyses/

/Users/yaamini/Documents/project-virginica-oa/analyses


In [4]:
ls

[34m2018-01-23-MBDSeq-Labwork[m[m/              [34m2018-06-11-DML-Analysis[m[m/
[34m2018-04-26-Gonad-Methylation-FastQC[m[m/    [34m2018-06-14-Gene-Enrichment-Analysis[m[m/
[34m2018-04-27-Bismark[m[m/                     [34m2018-10-11-MethylKit-Parameter-Testing[m[m/
[34m2018-05-01-MethylKit[m[m/                   README.md
[34m2018-05-29-MethylKit-Full-Samples[m[m/


In [5]:
mkdir 2018-10-03-Bismark-Parameter-Testing

In [6]:
cd 2018-10-03-Bismark-Parameter-Testing/

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-10-03-Bismark-Parameter-Testing


## 1. Genome Preparation

This step was already completed in [this Jupyter notebook](https://github.com/RobertsLab/project-virginica-oa/blob/master/notebooks/2018-04-27-Gonad-Methylation-Bismark.ipynb). The genome only needs to be prepared once. I will move on to the second step, alignment.

## 2. Alignment

In [6]:
! ../../../../../Shared/Apps/Bismark_v0.19.0/bismark -help



     This program is free software: you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
     the Free Software Foundation, either version 3 of the License, or
     (at your option) any later version.

     This program is distributed in the hope that it will be useful,
     but WITHOUT ANY WARRANTY; without even the implied warranty of
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     GNU General Public License for more details.
     You should have received a copy of the GNU General Public License
     along with this program.  If not, see <http://www.gnu.org/licenses/>.



DESCRIPTION


The following is a brief description of command line options and arguments to control the Bismark
bisulfite mapper and methylation caller. Bismark takes in FastA or FastQ files and aligns the
reads to a specified bisulfite genome. Sequence reads are transformed into a bisulfite converted forward strand
version (C->T co

 The trimmed *C. virginica* sequences I need to use are in [this OWL folder](http://owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/). Information about trimming and QC can be found in [Sam's notebook](http://onsnetwork.org/kubu4/2018/04/11/trimgalorefastqcmultiqc-trim-10bp-53-ends-c-virginica-mbd-bs-seq-fastq-data/). To run the `find` + `xargs` commands I want, I need to specify this path. I'm going to run this piece of code first just to confirm that I'm selecting the right files.

In [12]:
%%bash
find /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_*R1*.fq.gz

/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_10_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_1_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_2_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_3_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_4_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_5_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_6_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_7_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_8_s1_R1_val_1.fq.gz
/Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_9_s1_R1_val_1.fq.gz


In [13]:
%%bash
find /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_*R1*.fq.gz \
| xargs basename -s _s1_R1_val_1.fq.gz

zr2096_10
zr2096_1
zr2096_2
zr2096_3
zr2096_4
zr2096_5
zr2096_6
zr2096_7
zr2096_8
zr2096_9


#### What I need for this command:

1. Find all the *C. virginica* sequence files using `find`
2. Isolate the base name from each of these files using `xargs` that I will then pipe into...

*drumroll*

1. ...The path to `bismark`
2. --non_directional: See [this issue](https://github.com/RobertsLab/resources/issues/216)
3. -p 4: Number of simultaneous threads to run
4. -u 10000: Only aligning the first 10,000 reads
5. -score_min: This is the main paramter I want to test. Running this command with the default setting (L,0,-0,2) is stringent and led to ~20% mapping efficiency, if not lower. I will run this with three different options: L,0,-0.6; L,0,-0.9; L,0,-1.2
5. --genome + path to the folder with the .fa genome, which also has all of the bisulfite genome directories
6. -1 + Path to first paired file in the Athaliana folder
7. -2 + Path to second paired file in the Athaliana folder
8. Path to redirect standard error output.

Note that I'm not including a `-score_min` option, since I want to just run this alignment with the default setting.

### 2a. -score_min = L,0,-0.6

In [14]:
%%bash

find /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_*R1*.fq.gz \
| xargs basename -s _s1_R1_val_1.fq.gz | xargs -I{} /Users/Shared/Apps/Bismark_v0.19.0/bismark \
--non_directional \
-p 4 \
-u 10000 \
-score_min L,0,-0.6 \
--genome /Users/yaamini/Documents/project-virginica-oa/analyses/2018-04-27-Bismark/2018-04-27-Bismark-Inputs/ \
-1 /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/{}_s1_R1_val_1.fq.gz \
-2 /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/{}_s1_R2_val_2.fq.gz \
2> bismark-1003-L006.err

FastQ format assumed (by default)
Each Bowtie 2 instance is going to be run with 4 threads. Please monitor performance closely and tune down if needed!
chr NC_035780.1 (65668440 bp)
chr NC_035781.1 (61752955 bp)
chr NC_035782.1 (77061148 bp)
chr NC_035783.1 (59691872 bp)
chr NC_035784.1 (98698416 bp)
chr NC_035785.1 (51258098 bp)
chr NC_035786.1 (57830854 bp)
chr NC_035787.1 (75944018 bp)
chr NC_035788.1 (104168038 bp)
chr NC_035789.1 (32650045 bp)
chr NC_007175.2 (17244 bp)

Number of paired-end alignments with a unique best hit:	3664
Mapping efficiency:	36.6%

Sequence pairs with no alignments under any condition:	4659
Sequence pairs did not map uniquely:	1677
Sequence pairs which were discarded because genomic sequence could not be extracted:	0

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT:	972	((converted) top strand)
GA/CT/CT:	890	(complementary to (converted) top strand)
GA/CT/GA:	929	(complementary to (converted) bottom strand

Mapping efficiency was 30-40%, except for sample 1 (15%). I'm going to move this output to a new subdirectory.

In [16]:
mkdir 2018-10-03-L006-Output

In [18]:
!ls -F

[34m2018-10-03-L006-Output[m[m/
bismark-1003-L006.err
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


In [19]:
!ls -F *.*

bismark-1003-L006.err
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


In [30]:
! mv *.* 2018-10-03-L006-Output/

In [31]:
!ls -F 2018-10-03-L006-Output/

bismark-1003-L006.err
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


In [33]:
!ls -F

[34m2018-10-03-L006-Output[m[m/


### 2b. -score_min = L,0,-0.9

In [34]:
%%bash

find /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_*R1*.fq.gz \
| xargs basename -s _s1_R1_val_1.fq.gz | xargs -I{} /Users/Shared/Apps/Bismark_v0.19.0/bismark \
--non_directional \
-p 4 \
-u 10000 \
-score_min L,0,-0.9 \
--genome /Users/yaamini/Documents/project-virginica-oa/analyses/2018-04-27-Bismark/2018-04-27-Bismark-Inputs/ \
-1 /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/{}_s1_R1_val_1.fq.gz \
-2 /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/{}_s1_R2_val_2.fq.gz \
2> bismark-1003-L009.err

FastQ format assumed (by default)
Each Bowtie 2 instance is going to be run with 4 threads. Please monitor performance closely and tune down if needed!
chr NC_035780.1 (65668440 bp)
chr NC_035781.1 (61752955 bp)
chr NC_035782.1 (77061148 bp)
chr NC_035783.1 (59691872 bp)
chr NC_035784.1 (98698416 bp)
chr NC_035785.1 (51258098 bp)
chr NC_035786.1 (57830854 bp)
chr NC_035787.1 (75944018 bp)
chr NC_035788.1 (104168038 bp)
chr NC_035789.1 (32650045 bp)
chr NC_007175.2 (17244 bp)

Number of paired-end alignments with a unique best hit:	4487
Mapping efficiency:	44.9%

Sequence pairs with no alignments under any condition:	3611
Sequence pairs did not map uniquely:	1902
Sequence pairs which were discarded because genomic sequence could not be extracted:	0

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT:	1170	((converted) top strand)
GA/CT/CT:	1102	(complementary to (converted) top strand)
GA/CT/GA:	1124	(complementary to (converted) bottom str

Mapping efficiency was 40-50%, except for sample 1 (20%).

In [35]:
mkdir 2018-10-03-L009-Output

In [36]:
!ls -F *.*

bismark-1003-L009.err
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


In [37]:
!mv *.* 2018-10-03-L009-Output/

In [38]:
!ls -F 2018-10-03-L009-Output/

bismark-1003-L009.err
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


In [39]:
!ls -F

[34m2018-10-03-L006-Output[m[m/ [34m2018-10-03-L009-Output[m[m/


### 2c. -score_min = L,0,-1.2

In [40]:
%%bash

find /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/zr2096_*R1*.fq.gz \
| xargs basename -s _s1_R1_val_1.fq.gz | xargs -I{} /Users/Shared/Apps/Bismark_v0.19.0/bismark \
--non_directional \
-p 4 \
-u 10000 \
-score_min L,0,-1.2 \
--genome /Users/yaamini/Documents/project-virginica-oa/analyses/2018-04-27-Bismark/2018-04-27-Bismark-Inputs/ \
-1 /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/{}_s1_R1_val_1.fq.gz \
-2 /Volumes/web/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/{}_s1_R2_val_2.fq.gz \
2> bismark-1003-L012.err

FastQ format assumed (by default)
Each Bowtie 2 instance is going to be run with 4 threads. Please monitor performance closely and tune down if needed!
chr NC_035780.1 (65668440 bp)
chr NC_035781.1 (61752955 bp)
chr NC_035782.1 (77061148 bp)
chr NC_035783.1 (59691872 bp)
chr NC_035784.1 (98698416 bp)
chr NC_035785.1 (51258098 bp)
chr NC_035786.1 (57830854 bp)
chr NC_035787.1 (75944018 bp)
chr NC_035788.1 (104168038 bp)
chr NC_035789.1 (32650045 bp)
chr NC_007175.2 (17244 bp)

Number of paired-end alignments with a unique best hit:	5299
Mapping efficiency:	53.0%

Sequence pairs with no alignments under any condition:	2607
Sequence pairs did not map uniquely:	2094
Sequence pairs which were discarded because genomic sequence could not be extracted:	0

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT:	1386	((converted) top strand)
GA/CT/CT:	1293	(complementary to (converted) top strand)
GA/CT/GA:	1328	(complementary to (converted) bottom str

Mapping efficiency was 50-55%, except for sample 1 (28%).

In [41]:
mkdir 2018-10-03-L012-Output

In [42]:
!ls -F *.*

bismark-1003-L012.err
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


In [43]:
!mv *.* 2018-10-03-L012-Output/

In [44]:
!ls -F 2018-10-03-L012-Output/

bismark-1003-L012.err
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


In [45]:
!ls -F

[34m2018-10-03-L006-Output[m[m/ [34m2018-10-03-L009-Output[m[m/ [34m2018-10-03-L012-Output[m[m/


To optimize mapping efficiency, I will use `-score_min L,0,-1.2`. Sample 1 is concerning, but I will keep an eye on it. All associated files can now be found on [gannett](http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/).

## 3. Deduplication, Sorting, and Indexing

We deemed the `-score_min L,0,-1.2` option was the best moving forward. I will use the subset data generated with this parameter to test out options in `methylKit`. To do this, I need to finish the `bismark` pipeline, starting with deduplication, sorting, and indexing.

### 3a. Download files from gannet

Since I have gannet mounted on this machine, I could directly alter the files on gannet. However, I don't want to permanently change files on accident. Instead, I'll move files from gannet, work on them locally, then put the output back on gannet. If I didn't have gannet mounted, I could `curl` the files from the above link.

In [12]:
#Use wget to download the entire directory
! wget --no-parent -r http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output/

--2018-10-11 14:16:58--  http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6433 (6.3K) [text/html]
Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output/index.html'


2018-10-11 14:16:58 (95.9 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output/index.html' saved [6433/6433]

Loading robots.txt; please ignore errors.
--2018-10-11 14:16:58--  http://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2018-10-11 14:16:58 E

In [14]:
! ls -F

[34mgannet.fish.washington.edu[m[m/


In [20]:
cd gannet.fish.washington.edu/

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-10-03-Bismark-Parameter-Testing/gannet.fish.washington.edu


In [21]:
! ls -F

[34mspartina[m[m/


In [22]:
cd spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output/

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-10-03-Bismark-Parameter-Testing/gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output


In [23]:
! ls -F

2018-10-03-L012-Output-checksum.sha
bismark-1003-L012.err
index.html
index.html?C=D;O=A
index.html?C=D;O=D
index.html?C=M;O=A
index.html?C=M;O=D
index.html?C=N;O=A
index.html?C=N;O=D
index.html?C=S;O=A
index.html?C=S;O=D
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096

Looks like contents I wanted are in their own folder, based on the folder structure on gannet. I'll have to move the files out of the folder.

In [31]:
cd ../../../../../

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-10-03-Bismark-Parameter-Testing


In [32]:
#Make a new directory to hold all of the files from gannet
mkdir 2018-10-03-L012-Output/

In [33]:
! ls -F

[34m2018-10-03-L012-Output[m[m/     [34mgannet.fish.washington.edu[m[m/


In [34]:
#Move files from one directory to another
! mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output/* \
2018-10-03-L012-Output/

In [36]:
cd 2018-10-03-L012-Output/

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output


In [37]:
#Check to see that all of the files were moved
! ls -F

2018-10-03-L012-Output-checksum.sha
bismark-1003-L012.err
index.html
index.html?C=D;O=A
index.html?C=D;O=D
index.html?C=M;O=A
index.html?C=M;O=D
index.html?C=N;O=A
index.html?C=N;O=D
index.html?C=S;O=A
index.html?C=S;O=D
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096

In [40]:
#Now I want to remove the empty directory that all of the files are from
! rm -r ../gannet.fish.washington.edu/

In [43]:
cd ..

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-10-03-Bismark-Parameter-Testing


In [44]:
! ls -F

[34m2018-10-03-L012-Output[m[m/


In [48]:
#I also want to remove the index.html files from the directory.
! rm 2018-10-03-L012-Output/index.html*

In [52]:
cd 2018-10-03-L012-Output/

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-10-03-Bismark-Parameter-Testing/2018-10-03-L012-Output


In [53]:
!shasum -h

Usage: shasum [OPTION]... [FILE]...
Print or check SHA checksums.
With no FILE, or when FILE is -, read standard input.

  -a, --algorithm   1 (default), 224, 256, 384, 512, 512224, 512256
  -b, --binary      read in binary mode
  -c, --check       read SHA sums from the FILEs and check them
  -t, --text        read in text mode (default)
  -U, --UNIVERSAL   read in Universal Newlines mode
                        produces same digest on Windows/Unix/Mac
  -0, --01          read in BITS mode
                        ASCII '0' interpreted as 0-bit,
                        ASCII '1' interpreted as 1-bit,
                        all other characters ignored
  -p, --portable    read in portable mode (to be deprecated)

The following two options are useful only when verifying checksums:
  -s, --status      don't output anything, status code shows success
  -w, --warn        warn about improperly formatted checksum lines

  -h, --help        display this help and exit
  -v

In [54]:
#Finally, I'll verify the checksums of the downloaded files match the original checksums
!shasum -c 2018-10-03-L012-Output-checksum.sha

bismark-1003-L012.err: OK
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam: OK
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt: OK
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam: OK


Yay everything matches! Now I can move on with deduplication.

### 3b. Deduplication

In [55]:
! /Users/Shared/Apps/Bismark_v0.19.0/deduplicate_bismark -help



This script is supposed to remove alignments to the same position in the genome from the Bismark mapping output
(both single and paired-end SAM files), which can arise by e.g. excessive PCR amplification. If sequences align
to the same genomic position but on different strands they will be scored individually.

Note that deduplication is not recommended for RRBS-type experiments!

In the default mode, the first alignment to a given position will be used irrespective of its methylation call
(this is the fastest option, and as the alignments are not ordered in any way this is also near enough random).

For single-end alignments only use the start coordinate of a read will be used for deduplication.

For paired-end alignments the start-coordinate of the first read and the end coordinate of the second
read will be used for deduplication. This script expects the Bismark output to be in SAM format
(Bismark v0.6.x or higher). To deduplicate the old custom Bismark output pleas

In [56]:
%%bash
find zr2096_*_s1_R1_val_1_bismark_bt2_pe.bam \

zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_1_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_2_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_3_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_4_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_5_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_6_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_7_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_8_s1_R1_val_1_bismark_bt2_pe.bam
zr2096_9_s1_R1_val_1_bismark_bt2_pe.bam


To deduplicate:

1. Path to deduplicate_bismark
2. -p: Input data is paired
3. --bam: Write the output as a .bam file
4. Path to input files. It's important to specify only the files from the alignment! If I just used .bam, the process would become recursive, and the command will start to deduplicate the .deduplicated.bam files that are already deduplicated!
5. Path to standard error output.

In [61]:
!/Users/Shared/Apps/Bismark_v0.19.0/deduplicate_bismark \
-p \
--bam \
zr2096_*_s1_R1_val_1_bismark_bt2_pe.bam \
2> dedup-L012-1011.err


Now testing Bismark result file zr2096_10_s1_R1_val_1_bismark_bt2_pe.bam for positional sorting (which would be bad...)	skipping header line:	@HD	VN:1.0	SO:unsorted
skipping header line:	@SQ	SN:NC_035780.1	LN:65668440
skipping header line:	@SQ	SN:NC_035781.1	LN:61752955
skipping header line:	@SQ	SN:NC_035782.1	LN:77061148
skipping header line:	@SQ	SN:NC_035783.1	LN:59691872
skipping header line:	@SQ	SN:NC_035784.1	LN:98698416
skipping header line:	@SQ	SN:NC_035785.1	LN:51258098
skipping header line:	@SQ	SN:NC_035786.1	LN:57830854
skipping header line:	@SQ	SN:NC_035787.1	LN:75944018
skipping header line:	@SQ	SN:NC_035788.1	LN:104168038
skipping header line:	@SQ	SN:NC_035789.1	LN:32650045
skipping header line:	@SQ	SN:NC_007175.2	LN:17244
skipping header line:	@PG	ID:Bismark	VN:v0.19.0	CL:"bismark --non_directional -p 4 -u 10000 -score_min L,0,-1.2 --genome /Users/yaamini/Documents/project-virginica-oa/analyses/2018-04-27-Bismark/2018-04-27-Bismark-Inputs/ -1 /Volumes/web/Athaliana/20180

In [62]:
! find *dedup*

dedup-L012-1011.err
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.

In [63]:
! find *deduplicated.bam

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam


### 3b. Sorting

The next two steps will allow me to use my .bam files in IGV. According to the [`igvtools`](https://software.broadinstitute.org/software/igv/igvtools_commandline) website, I need to `sort` then `index` all of my .bam files. Instead of `igvtools`, I will use `samtools`, since `igvtools` may not have the capacity to deal with paired-end read data (ie. my data).

I will use a similar `find` and `xargs` approach to loop through all of my files.

1. Identify all deduplicated .bam files using `find`
2. Isolate the basename with `xargs` to be fed into `igvtools`

THEN

1. Path to `samtools`
2. `sort`
3. Path to input .bam file
4. -o + Path to output file

In [65]:
%%bash

find *deduplicated.bam \
| xargs basename -s _s1_R1_val_1_bismark_bt2_pe.deduplicated.bam | xargs -I{} /Users/Shared/Apps/samtools-1.8/samtools \
sort {}_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam \
-o {}_dedup.sorted.bam

In [66]:
!find *dedup.sorted.bam

zr2096_10_dedup.sorted.bam
zr2096_1_dedup.sorted.bam
zr2096_2_dedup.sorted.bam
zr2096_3_dedup.sorted.bam
zr2096_4_dedup.sorted.bam
zr2096_5_dedup.sorted.bam
zr2096_6_dedup.sorted.bam
zr2096_7_dedup.sorted.bam
zr2096_8_dedup.sorted.bam
zr2096_9_dedup.sorted.bam


### 3c. Indexing

Finally, I'll index my files since each .bam in IGV needs an associated index.

1. Identify all deduplicated .bam files using `find`
2. Isolate the basename with `xargs` to be fed into `igvtools`

THEN

1. Path to `samtools`
2. `index`
3. Path to input .bam file

In [67]:
%%bash

find *dedup.sorted.bam \
| xargs basename -s _dedup.sorted.bam | xargs -I{} /Users/Shared/Apps/samtools-1.8/samtools \
index {}_dedup.sorted.bam

In [69]:
! find *.bai

zr2096_10_dedup.sorted.bam.bai
zr2096_1_dedup.sorted.bam.bai
zr2096_2_dedup.sorted.bam.bai
zr2096_3_dedup.sorted.bam.bai
zr2096_4_dedup.sorted.bam.bai
zr2096_5_dedup.sorted.bam.bai
zr2096_6_dedup.sorted.bam.bai
zr2096_7_dedup.sorted.bam.bai
zr2096_8_dedup.sorted.bam.bai
zr2096_9_dedup.sorted.bam.bai


## 4. Methylation Extractor

I'll complete the methylation extraction and report steps so I can compare these results with what I get from `methylKit`.

To use `methylKit`:

1. Path to `bismark_methylation_extractor`
2. -p: Indicate I have paired reads
3. --bedGraph: Create `bedGraph` file with results
4. --counts
5. --scaffolds: Bypass limitation of open filehandles
6. --multicore 8
7. --gzip: Gunzip files to save space
8. Path to .bam files generated in Step 2
9. Path for standard error output file

In [70]:
! /Users/Shared/Apps/Bismark_v0.19.0/bismark_methylation_extractor \
-p \
--bedGraph \
--counts \
--scaffolds \
--multicore 8 \
*deduplicated.bam \
2> bme-L012-1011.err


Now testing Bismark result file zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bam for positional sorting (which would be bad...)	zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.1
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.2
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.3
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.4
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.5
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.6
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.7
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.8

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.1.mbias
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.2.mbias
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt.3.mbias
zr2096_10_s1_R1_val_1_bismark_bt2_pe.dedupli

In [71]:
!gunzip *bedGraph.gz

In [72]:
!ls *bedGraph*

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph.gz.methylation_calls.merged
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph.gz.methylation_calls.merged
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph.gz.methylation_calls.merged
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph.gz.methylation_calls.merged
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph.gz.methylation_calls.merged
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph.gz.methylation_calls.merged
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bedGraph.gz.me

## 5. HTML Processing Reports

In [73]:
! /Users/Shared/Apps/Bismark_v0.19.0/bismark2report

Found 10 alignment reports in current directory. Now trying to figure out whether there are corresponding optional reports

Writing Bismark HTML report to >> zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.html <<

Redundant argument in sprintf at /Users/Shared/Apps/Bismark_v0.19.0/bismark2report line 130.
Using the following alignment report:		> zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt <
Processing alignment report zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt ...
Complete

Using the following deduplication report:	> zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt <
Processing deduplication report zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplication_report.txt ...
Complete

Using the following splitting report:		> zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt <
Processing splitting report zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated_splitting_report.txt ...
Complete

Using the following M-bias report:		> zr2096_10_s1_R1_val_1_bismark_bt2_

## 6. Summary Reports

In [74]:
! /Users/Shared/Apps/Bismark_v0.19.0/bismark2summary

No Bismark/Bowtie2 single-end BAM files detected
Found Bismark/Bowtie2 paired-end files
No Bismark/Bowtie single-end BAM files detected
No Bismark/Bowtie paired-end BAM files detected

Generating Bismark summary report from 10 Bismark BAM file(s)...
>> Reading from Bismark report: zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
>> Reading from Bismark report: zr2096

## 7. Create Checksums

In [75]:
#I'm overwriting the old checksum file since all of the checksums matched
! shasum * > 2018-10-03-L012-Output-checksum.sha

In [76]:
! cat 2018-10-03-L012-Output-checksum.sha

da39a3ee5e6b4b0d3255bfef95601890afd80709  2018-10-03-L012-Output-checksum.sha
14f1254a3c4f0f27b9f6a222c260e6fb396c4864  CHG_CTOB_zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
425a5ecd623f6c83d3feb160c70781e18c31b232  CHG_CTOB_zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
33df1472528f5c94732282617855b1115b18057b  CHG_CTOB_zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
5f5ef3831c5dc07a5c8151a057c29fcad8ed19fe  CHG_CTOB_zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
2f66b7f061d53fa89f016ff43692051b0849b201  CHG_CTOB_zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
ed1af63988afcaf0ab3665b047a564956ffb1423  CHG_CTOB_zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
f3cf582099a8b7649eee8cbf644b635b21850da0  CHG_CTOB_zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
8a822ccb8dc7972f5e1ece646101898509c06d5d  CHG_CTOB_zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.txt
bf8eee200c9af841bd7b66b9770f5e0ce0028dd4  CHG_CTOB_zr2096_8_s1_R1_val_1_bismark_bt2_pe.de

I'll move the files over to gannet manually.