# RNA-Seq Analysis Training Demo

## Overview

This short tutorial demonstrates how to run an RNA-Seq workflow using a prokaryotic data set. Steps in the workflow include read trimming, read QC, read mapping, and counting mapped reads per gene to quantitate gene expression.

![RNA-Seq workflow](images/rnaseq-workflow.png)

### STEP 1: Setup Environment
We to create a set of directories first.


Set up directory structure

In [None]:
!mkdir -p data
!mkdir -p data/raw_fastq
!mkdir -p data/trimmed
!mkdir -p data/fastqc
!mkdir -p data/aligned
!mkdir -p data/reference

### STEP 2: Copy FASTQ Files

In [None]:
!curl https://storage.googleapis.com/me-inbre-rnaseq-pipelinev2/data/raw_fastqSub/SRR13349122_1.fastq --output data/raw_fastq/SRR13349122_1.fastq
!curl https://storage.googleapis.com/me-inbre-rnaseq-pipelinev2/data/raw_fastqSub/SRR13349122_2.fastq --output data/raw_fastq/SRR13349122_2.fastq
!curl https://storage.googleapis.com/me-inbre-rnaseq-pipelinev2/data/raw_fastqSub/SRR13349128_1.fastq --output data/raw_fastq/SRR13349128_1.fastq
!curl https://storage.googleapis.com/me-inbre-rnaseq-pipelinev2/data/raw_fastqSub/SRR13349128_2.fastq --output data/raw_fastq/SRR13349128_2.fastq


### STEP 3: Copy reference transcriptome files that will be used by Salmon

In [None]:
!curl https://storage.googleapis.com/me-inbre-rnaseq-pipelinev2/data/reference/M_chelonae_transcripts.fasta --output data/reference/M_chelonae_transcripts.fasta
!curl https://storage.googleapis.com/me-inbre-rnaseq-pipelinev2/data/reference/decoys.txt --output data/reference/decoys.txt


### STEP 4: Copy data file for Trimmomatic

In [None]:
!curl https://storage.googleapis.com/me-inbre-rnaseq-pipelinev2/config/TruSeq3-PE.fa --output TruSeq3-PE.fa

### STEP 5: Run Trimmomatic

In [None]:
!trimmomatic PE -threads 2 data/raw_fastq/SRR13349122_1.fastq data/raw_fastq/SRR13349122_2.fastq data/trimmed/SRR13349122_1_trimmed.fastq data/trimmed/SRR13349122_2_trimmed.fastq data/trimmed/SRR13349122_1_trimmed_unpaired.fastq  data/trimmed/SRR13349122_2_trimmed_unpaired.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36
!trimmomatic PE -threads 2 data/raw_fastq/SRR13349128_1.fastq data/raw_fastq/SRR13349128_2.fastq data/trimmed/SRR13349128_1_trimmed.fastq data/trimmed/SRR13349128_2_trimmed.fastq data/trimmed/SRR13349128_1_trimmed_unpaired.fastq  data/trimmed/SRR13349128_2_trimmed_unpaired.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36

### STEP 6: Run FastQC

In [None]:
!fastqc -o data/fastqc data/trimmed/SRR13349122_1_trimmed.fastq
!fastqc -o data/fastqc data/trimmed/SRR13349128_1_trimmed.fastq


### STEP 7: Run MultiQC

In [None]:
!multiqc -f data/fastqc


### STEP 8: Index the Transcriptome so that Trimmed Reads Can Be Mapped Using Salmon

In [None]:
!salmon index -t data/reference/M_chelonae_transcripts.fasta -p 8 -i data/reference/transcriptome_index --decoys data/reference/decoys.txt -k 31 --keepDuplicates


### STEP 9: Run Salmon to Map Reads to Transcripts and Quantify Expression Levels

In [None]:
!salmon quant -i data/reference/transcriptome_index -l SF -r data/trimmed/SRR13349122_1_trimmed.fastq -p 8 --validateMappings -o data/quants/SRR13349122_quant
!salmon quant -i data/reference/transcriptome_index -l SF -r data/trimmed/SRR13349128_1_trimmed.fastq -p 8 --validateMappings -o data/quants/SRR13349128_quant


### STEP 10: Report the top 10 most highly expressed genes in the samples

In [None]:
!sort -nrk 4,4 data/quants/SRR13349122_quant/quant.sf | head -10


In [None]:
!sort -nrk 4,4 data/quants/SRR13349128_quant/quant.sf | head -10


### There will be more to come later!