# FastQC trimming for *C. virginica* gonad methylation data

In this notebook, I'll use FastQC to examine file quality and trim sequences of my *C. virginica* gonad MBDSeq data.

In [10]:
pwd

'/Users/yaamini/Documents/project-virginica-oa/notebooks'

First I will make a new directory for my analyses.

In [20]:
mkdir /Users/yaamini/Documents/project-virginica-oa/analyses/2018-04-26-Gonad-Methylation-FastQC

Confirm I made the directory.

In [21]:
ls ../analyses/

[34m2018-01-23-MBDSeq-Labwork[m[m/           README.md
[34m2018-04-26-Gonad-Methylation-FastQC[m[m/


Then change my working directory to the new folder I created.

In [22]:
cd ../analyses/2018-04-26-Gonad-Methylation-FastQC/

/Users/yaamini/Documents/project-virginica-oa/analyses/2018-04-26-Gonad-Methylation-FastQC


## 1. Identify path for FastQC command-line interface

In [26]:
! ../../../../../Shared/bioinformatics/FastQC/fastqc -help


            FastQC - A high throughput sequence QC analysis tool

SYNOPSIS

	fastqc seqfile1 seqfile2 .. seqfileN

    fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] 
           [-c contaminant file] seqfile1 .. seqfileN

DESCRIPTION

    FastQC reads a set of sequence files and produces from each one a quality
    control report consisting of a number of different modules, each one of 
    which will help to identify a different potential type of problem in your
    data.
    
    If no files to process are specified on the command line then the program
    will start as an interactive graphical application.  If files are provided
    on the command line then the program will run with no user interaction
    required.  In this mode it is suitable for inclusion into a standardised
    analysis pipeline.
    
    The options for the program as as follows:
    
    -h --help       Print this help file and exit
    
    -v --version    Print the vers

## 2. Use command-line interface

Based on the help menu, I need the following information in my code:

1. Path of FastQC application: I identified the path in the above cell
2. Files to be analyzed: Sam said I can analyze the files from owl itself. The files can be found at [this link](http://owl.fish.washington.edu/nightingales/C_virginica/). 
3. Number of files to process simultaneously:I'm going to process 4 files simultaneously
4. Directory for output files: 2018-04-26-Gonad-Methylation-FastQC (i.e. my current working directory)

In [27]:
! ../../../../../Shared/bioinformatics/FastQC/fastqc \
/Volumes/web/nightingales/C_virginica/zr2096_* \
-t 4 \
-o ../2018-04-26-Gonad-Methylation-FastQC/

Started analysis of zr2096_10_s1_R1.fastq.gz
Started analysis of zr2096_10_s1_R2.fastq.gz
Started analysis of zr2096_1_s1_R1.fastq.gz
Started analysis of zr2096_1_s1_R2.fastq.gz
Approx 5% complete for zr2096_10_s1_R1.fastq.gz
Approx 5% complete for zr2096_10_s1_R2.fastq.gz
Approx 5% complete for zr2096_1_s1_R1.fastq.gz
Approx 5% complete for zr2096_1_s1_R2.fastq.gz
Approx 10% complete for zr2096_10_s1_R1.fastq.gz
Approx 10% complete for zr2096_10_s1_R2.fastq.gz
Approx 15% complete for zr2096_10_s1_R1.fastq.gz
Approx 15% complete for zr2096_10_s1_R2.fastq.gz
Approx 10% complete for zr2096_1_s1_R1.fastq.gz
Approx 10% complete for zr2096_1_s1_R2.fastq.gz
Approx 20% complete for zr2096_10_s1_R1.fastq.gz
Approx 20% complete for zr2096_10_s1_R2.fastq.gz
Approx 15% complete for zr2096_1_s1_R1.fastq.gz
Approx 25% complete for zr2096_10_s1_R1.fastq.gz
Approx 25% complete for zr2096_10_s1_R2.fastq.gz
Approx 15% complete for zr2096_1_s1_R2.fastq.gz
Approx 30% complete for zr2096_10_s1_R1.fastq.gz

My FastQC files can be found [here](https://github.com/RobertsLab/project-virginica-oa/tree/master/analyses/2018-04-26-Gonad-Methylation-FastQC).

In [29]:
ls

zr2096_10_s1_R1_fastqc.html  zr2096_5_s1_R1_fastqc.html
zr2096_10_s1_R1_fastqc.zip   zr2096_5_s1_R1_fastqc.zip
zr2096_10_s1_R2_fastqc.html  zr2096_5_s1_R2_fastqc.html
zr2096_10_s1_R2_fastqc.zip   zr2096_5_s1_R2_fastqc.zip
zr2096_1_s1_R1_fastqc.html   zr2096_6_s1_R1_fastqc.html
zr2096_1_s1_R1_fastqc.zip    zr2096_6_s1_R1_fastqc.zip
zr2096_1_s1_R2_fastqc.html   zr2096_6_s1_R2_fastqc.html
zr2096_1_s1_R2_fastqc.zip    zr2096_6_s1_R2_fastqc.zip
zr2096_2_s1_R1_fastqc.html   zr2096_7_s1_R1_fastqc.html
zr2096_2_s1_R1_fastqc.zip    zr2096_7_s1_R1_fastqc.zip
zr2096_2_s1_R2_fastqc.html   zr2096_7_s1_R2_fastqc.html
zr2096_2_s1_R2_fastqc.zip    zr2096_7_s1_R2_fastqc.zip
zr2096_3_s1_R1_fastqc.html   zr2096_8_s1_R1_fastqc.html
zr2096_3_s1_R1_fastqc.zip    zr2096_8_s1_R1_fastqc.zip
zr2096_3_s1_R2_fastqc.html   zr2096_8_s1_R2_fastqc.html
zr2096_3_s1_R2_fastqc.zip    zr2096_8_s1_R2_fastqc.zip
zr2096_4_s1_R1_fastqc.html   zr2096_9_s1_R1_fastqc.html
zr2096_4_s1_R1_fastqc.zip    zr2096_9_s

## 3. Compare FastQC results with MultiQC

MultiQC compiles existing FastQC analyses for easy comparison. But first, I need to install it. I used the code `conda install -c bioconda multiqc` in a separate Terminal window for the installation.

![multiqc-installation](https://raw.githubusercontent.com/RobertsLab/project-virginica-oa/master/images/2018-04-27-MultiQC-Installation.png)