# <u>RNA-Seq Analysis Phase IIa: Alignment to Reference Genome</u>
## This Notebook illustrates how to align paired-end RNA-Seq reads <br>that have already been processed through the QC pipeline.
#### Last Revision: July 2017
#### Author: Charles David
## <span style="color:red"><u>Please use this Notebook in Conjunction with the RNA-Seq Best Practices Document:</u></span>
### <a href="https://github.com/PlantandFoodResearch/BestPractices/blob/bestpractices/_bestpractices/RNASeq/RNA-Seq-Best-Practice-Guidelines_REV_July_2017.md">RNA-Seq-Best-Practice-Guidelines_REV_July_2017.md</a>

## The Raw Data files are located on PowerPlant in the following location:
### `/input/genomic/Best_Practices_Testing/RNA_Seq/PE_Data`
## The Workflow is located on PowerPlant in the following location:
### `/workspace/Best_Practices_Testing/RNA_Seq/PE_Example`
#### As Always, Please Re-Create this Workflow in YOUR OWN Workspace.

## The Raw Data files are located on PowerPlant in the following location:


`/input/genomic/plant/Actinidia/chinensis/AGRF_CAGRF15208_CB6RHANXX`

Sequence name meaning
SRCT08_1: S=Simona; R=Red; CT=treatment name, control in this case; 08=weeks after anthesis, fruit age; 1=biological replicate
there are 4 sequencing files per each sample as they were run on 2 lanes (L001 and L002) and they were paired end (R1 and R2)

## <u>The Key Steps in this Section are: </u>##

#### I. Establish Data Management Structure on PowerPlant (Continuing from the QC part)
1. Make the necessary directories for the data and the analysis
2. Name the directories using standard workflow naming conventions
3. Name files using standard workflow naming conventions

#### II. Perform the Analyses: Align to Reference Genome Using the STAR Aligner
 1. Obtain reference genome and corresponding annotation file (GFF, GTF)
 2. Index the genome
 3. Align to genome:
    - Single-Pass Mode
    - Two-Pass Mode
 4. Merge uBAM and aligned BAM to produce "Clean" BAM
 5. Clean Up Workspace:
     - Delete un-needed intermediate files
     - Compress files that are still required

#### III. Assess the results for suitabillity in downstream analysis

## <u>Step I: Establish Data Management Structure on PowerPlant (Continuing from the QC part)</u>

### Create the analysis directories in YOUR workspace:
* 007.STAR
* 008.MBA

    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR/logs`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR/annotation`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR/genome`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR/index`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR/index/logs`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR/Single_Pass_Results`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/007.STAR/Two_Pass_Results`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/008.MBA`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/008.MBA/logs`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/008.MBA/Single_Pass_Results`
    * `mkdir /workspace/USER/RNA_Seq/PE_Example/008.MBA/Two_Pass_Results`


## <u>Step II, Part 1: Get the Genome and Annotation Files to be Used in the Alignment Process</u>
##### Note that best results are obtained if the reference is good quality and closely related, with annotations

### Define Project Variables:
* Note that we are using the latest version of STAR: 2.5.2b
* We are also using the latest version of Picard Tools: 2.9.4

In [90]:
# Define the user as a variable
USER="hradxj"
PROJECTNAME="karmun_awesome_experiment"
# Define the project directory and temp subdirectory as a variable

PROJECT="/workspace/$USER/$PROJECTNAME"
TEMP="$PROJECT/TEMP"

mkdir -p ${PROJECT}/009.STAR
mkdir -p ${PROJECT}/009.STAR/logs
mkdir -p ${PROJECT}/009.STAR/annotation
mkdir -p ${PROJECT}/009.STAR/genome
mkdir -p ${PROJECT}/009.STAR/index
mkdir -p ${PROJECT}/009.STAR/index/logs
mkdir -p ${PROJECT}/009.STAR/Single_Pass_Results
mkdir -p ${PROJECT}/009.STAR/Two_Pass_Results

ln -s /workspace/ComparativeDataSources/Vitis/vinifera/Genoscope_12X/Genes/Vitis_vinifera_annotation.gff3 ${PROJECT}/009.STAR/annotation
ln -s /workspace/ComparativeDataSources/Vitis/vinifera/Genoscope_12X/Genome/reference.fasta ${PROJECT}/009.STAR/genome

ANNOT=${PROJECT}/009.STAR/annotation/Vitis_vinifera_annotation.gff3
ANNOTGTF=${PROJECT}/009.STAR/annotation/Vitis_vinifera_annotation.gtf
GENOME=${PROJECT}/009.STAR/genome/reference.fasta

#### Make appropriate directories and symlinks to files

In [19]:
# SANITY CHECK: have the directories and symlinks been made correctly?
ls -R ${PROJECT}

/workspace/hradxj/RNAseq/Red9:
000.raw		003.fastqc_smrna  006.MIA   Illumina.fa  TEMP
001.fastqc_raw	004.trimmomatic   007.STAR  Log.out
002.SMRNA	005.fastqc_trim   008.MBA   _STARtmp

/workspace/hradxj/RNAseq/Red9/000.raw:

/workspace/hradxj/RNAseq/Red9/001.fastqc_raw:
SRCT08_1_CB6RHANXX_AGTTCC_L001_R1_fastqc.html
SRCT08_1_CB6RHANXX_AGTTCC_L001_R1_fastqc.zip
SRCT08_1_CB6RHANXX_AGTTCC_L001_R2_fastqc.html
SRCT08_1_CB6RHANXX_AGTTCC_L001_R2_fastqc.zip
SRCT08_1_CB6RHANXX_AGTTCC_L002_R1_fastqc.html
SRCT08_1_CB6RHANXX_AGTTCC_L002_R1_fastqc.zip
SRCT08_1_CB6RHANXX_AGTTCC_L002_R2_fastqc.html
SRCT08_1_CB6RHANXX_AGTTCC_L002_R2_fastqc.zip
SRCT08_2_CB6RHANXX_ATGTCA_L001_R1_fastqc.html
SRCT08_2_CB6RHANXX_ATGTCA_L001_R1_fastqc.zip
SRCT08_2_CB6RHANXX_ATGTCA_L001_R2_fastqc.html
SRCT08_2_CB6RHANXX_ATGTCA_L001_R2_fastqc.zip
SRCT08_2_CB6RHANXX_ATGTCA_L002_R1_fastqc.html
SRCT08_2_CB6RHANXX_ATGTCA_L002_R1_fastqc.zip
SRCT08_2_CB6RHANXX_ATGTCA_L002_R2_fastqc.html
SRCT08_2_CB6RHANXX_ATGTCA_L002_R2_fastqc.zip
S

SRZH16_3_CB6RHANXX_ATCACG_L002_R2_fastqc.zip
SRZH16_4_CB6RHANXX_GTGAAA_L001_R1_fastqc.html
SRZH16_4_CB6RHANXX_GTGAAA_L001_R1_fastqc.zip
SRZH16_4_CB6RHANXX_GTGAAA_L001_R2_fastqc.html
SRZH16_4_CB6RHANXX_GTGAAA_L001_R2_fastqc.zip
SRZH16_4_CB6RHANXX_GTGAAA_L002_R1_fastqc.html
SRZH16_4_CB6RHANXX_GTGAAA_L002_R1_fastqc.zip
SRZH16_4_CB6RHANXX_GTGAAA_L002_R2_fastqc.html
SRZH16_4_CB6RHANXX_GTGAAA_L002_R2_fastqc.zip
SRZL08_1_CB6RHANXX_GTAGAG_L001_R1_fastqc.html
SRZL08_1_CB6RHANXX_GTAGAG_L001_R1_fastqc.zip
SRZL08_1_CB6RHANXX_GTAGAG_L001_R2_fastqc.html
SRZL08_1_CB6RHANXX_GTAGAG_L001_R2_fastqc.zip
SRZL08_1_CB6RHANXX_GTAGAG_L002_R1_fastqc.html
SRZL08_1_CB6RHANXX_GTAGAG_L002_R1_fastqc.zip
SRZL08_1_CB6RHANXX_GTAGAG_L002_R2_fastqc.html
SRZL08_1_CB6RHANXX_GTAGAG_L002_R2_fastqc.zip
SRZL08_2_CB6RHANXX_GGCTAC_L001_R1_fastqc.html
SRZL08_2_CB6RHANXX_GGCTAC_L001_R1_fastqc.zip
SRZL08_2_CB6RHANXX_GGCTAC_L001_R2_fastqc.html
SRZL08_2_CB6RHANXX_GGCTAC_L001_R2_fastqc.zip
SRZL08_2_CB6RHANXX_GGCTAC_L002_R1_fastqc.html

SRZH12_2_CB6RHANXX_GTTTCG_L002_MERGED_R2.fastq
SRZH12_3_CB6RHANXX_CACCGG_L001_MERGED_R1.fastq
SRZH12_3_CB6RHANXX_CACCGG_L001_MERGED_R2.fastq
SRZH12_3_CB6RHANXX_CACCGG_L002_MERGED_R1.fastq
SRZH12_3_CB6RHANXX_CACCGG_L002_MERGED_R2.fastq
SRZH16_1_CB6RHANXX_GATCAG_L001_MERGED_R1.fastq
SRZH16_1_CB6RHANXX_GATCAG_L001_MERGED_R2.fastq
SRZH16_1_CB6RHANXX_GATCAG_L002_MERGED_R1.fastq
SRZH16_1_CB6RHANXX_GATCAG_L002_MERGED_R2.fastq
SRZH16_2_CB6RHANXX_CAACTA_L001_MERGED_R1.fastq
SRZH16_2_CB6RHANXX_CAACTA_L001_MERGED_R2.fastq
SRZH16_2_CB6RHANXX_CAACTA_L002_MERGED_R1.fastq
SRZH16_2_CB6RHANXX_CAACTA_L002_MERGED_R2.fastq
SRZH16_3_CB6RHANXX_ATCACG_L001_MERGED_R1.fastq
SRZH16_3_CB6RHANXX_ATCACG_L001_MERGED_R2.fastq
SRZH16_3_CB6RHANXX_ATCACG_L002_MERGED_R1.fastq
SRZH16_3_CB6RHANXX_ATCACG_L002_MERGED_R2.fastq
SRZH16_4_CB6RHANXX_GTGAAA_L001_MERGED_R1.fastq
SRZH16_4_CB6RHANXX_GTGAAA_L001_MERGED_R2.fastq
SRZH16_4_CB6RHANXX_GTGAAA_L002_MERGED_R1.fastq
SRZH16_4_CB6RHANXX_GTGAAA_L002_MERGED_R2.fastq
SRZL08_1_CB6R

SRZL08_3_CB6RHANXX_ATGAGC_L001_MERGED.fastq.out
SRZL08_3_CB6RHANXX_ATGAGC_L002_MERGED.fastq.err
SRZL08_3_CB6RHANXX_ATGAGC_L002_MERGED.fastq.out
SRZL08_4_CB6RHANXX_ACTTGA_L001_MERGED.fastq.err
SRZL08_4_CB6RHANXX_ACTTGA_L001_MERGED.fastq.out
SRZL08_4_CB6RHANXX_ACTTGA_L002_MERGED.fastq.err
SRZL08_4_CB6RHANXX_ACTTGA_L002_MERGED.fastq.out
SRZL12_1_CB6RHANXX_TGACCA_L001_MERGED.fastq.err
SRZL12_1_CB6RHANXX_TGACCA_L001_MERGED.fastq.out
SRZL12_1_CB6RHANXX_TGACCA_L002_MERGED.fastq.err
SRZL12_1_CB6RHANXX_TGACCA_L002_MERGED.fastq.out
SRZL12_2_CB6RHANXX_CATTTT_L001_MERGED.fastq.err
SRZL12_2_CB6RHANXX_CATTTT_L001_MERGED.fastq.out
SRZL12_2_CB6RHANXX_CATTTT_L002_MERGED.fastq.err
SRZL12_2_CB6RHANXX_CATTTT_L002_MERGED.fastq.out
SRZL12_3_CB6RHANXX_CAGGCG_L001_MERGED.fastq.err
SRZL12_3_CB6RHANXX_CAGGCG_L001_MERGED.fastq.out
SRZL12_3_CB6RHANXX_CAGGCG_L002_MERGED.fastq.err
SRZL12_3_CB6RHANXX_CAGGCG_L002_MERGED.fastq.out
SRZL12_4_CB6RHANXX_TTAGGC_L001_MERGED.fastq.err
SRZL12_4_CB6RHANXX_TTAGGC_L001_MERGED.fa

SRZL12_3_CB6RHANXX_CAGGCG_L001_MERGED.fastq_rRNA.log
SRZL12_3_CB6RHANXX_CAGGCG_L002_MERGED.fastq_rRNA.fastq
SRZL12_3_CB6RHANXX_CAGGCG_L002_MERGED.fastq_rRNA.log
SRZL12_4_CB6RHANXX_TTAGGC_L001_MERGED.fastq_rRNA.fastq
SRZL12_4_CB6RHANXX_TTAGGC_L001_MERGED.fastq_rRNA.log
SRZL12_4_CB6RHANXX_TTAGGC_L002_MERGED.fastq_rRNA.fastq
SRZL12_4_CB6RHANXX_TTAGGC_L002_MERGED.fastq_rRNA.log
SRZL16_1_CB6RHANXX_CCGTCC_L001_MERGED.fastq_rRNA.fastq
SRZL16_1_CB6RHANXX_CCGTCC_L001_MERGED.fastq_rRNA.log
SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED.fastq_rRNA.fastq
SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED.fastq_rRNA.log
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED.fastq_rRNA.fastq
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED.fastq_rRNA.log
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED.fastq_rRNA.fastq
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED.fastq_rRNA.log
SRZL16_3_CB6RHANXX_GAGTGG_L001_MERGED.fastq_rRNA.fastq
SRZL16_3_CB6RHANXX_GAGTGG_L001_MERGED.fastq_rRNA.log
SRZL16_3_CB6RHANXX_GAGTGG_L002_MERGED.fastq_rRNA.fastq
SRZL16_3_CB6RHANXX_GAGTGG_L0

SRZH12_1_CB6RHANXX_CACTCA_L002_MERGED_R1_fastqc.html
SRZH12_1_CB6RHANXX_CACTCA_L002_MERGED_R1_fastqc.zip
SRZH12_1_CB6RHANXX_CACTCA_L002_MERGED_R2_fastqc.html
SRZH12_1_CB6RHANXX_CACTCA_L002_MERGED_R2_fastqc.zip
SRZH12_2_CB6RHANXX_GTTTCG_L001_MERGED_R1_fastqc.html
SRZH12_2_CB6RHANXX_GTTTCG_L001_MERGED_R1_fastqc.zip
SRZH12_2_CB6RHANXX_GTTTCG_L001_MERGED_R2_fastqc.html
SRZH12_2_CB6RHANXX_GTTTCG_L001_MERGED_R2_fastqc.zip
SRZH12_2_CB6RHANXX_GTTTCG_L002_MERGED_R1_fastqc.html
SRZH12_2_CB6RHANXX_GTTTCG_L002_MERGED_R1_fastqc.zip
SRZH12_2_CB6RHANXX_GTTTCG_L002_MERGED_R2_fastqc.html
SRZH12_2_CB6RHANXX_GTTTCG_L002_MERGED_R2_fastqc.zip
SRZH12_3_CB6RHANXX_CACCGG_L001_MERGED_R1_fastqc.html
SRZH12_3_CB6RHANXX_CACCGG_L001_MERGED_R1_fastqc.zip
SRZH12_3_CB6RHANXX_CACCGG_L001_MERGED_R2_fastqc.html
SRZH12_3_CB6RHANXX_CACCGG_L001_MERGED_R2_fastqc.zip
SRZH12_3_CB6RHANXX_CACCGG_L002_MERGED_R1_fastqc.html
SRZH12_3_CB6RHANXX_CACCGG_L002_MERGED_R1_fastqc.zip
SRZH12_3_CB6RHANXX_CACCGG_L002_MERGED_R2_fastqc.html
SR

SRCT08_2_CB6RHANXX_ATGTCA_L002_MERGED_trimmomatic_R2.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L001_MERGED_trimmomatic_R1.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L001_MERGED_trimmomatic_R2.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L002_MERGED_trimmomatic_R1.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L002_MERGED_trimmomatic_R2.fastq
SRCT08_4_CB6RHANXX_CAGATC_L001_MERGED_trimmomatic_R1.fastq
SRCT08_4_CB6RHANXX_CAGATC_L001_MERGED_trimmomatic_R2.fastq
SRCT08_4_CB6RHANXX_CAGATC_L002_MERGED_trimmomatic_R1.fastq
SRCT08_4_CB6RHANXX_CAGATC_L002_MERGED_trimmomatic_R2.fastq
SRCT12_1_CB6RHANXX_GGTAGC_L001_MERGED_trimmomatic_R1.fastq
SRCT12_1_CB6RHANXX_GGTAGC_L001_MERGED_trimmomatic_R2.fastq
SRCT12_1_CB6RHANXX_GGTAGC_L002_MERGED_trimmomatic_R1.fastq
SRCT12_1_CB6RHANXX_GGTAGC_L002_MERGED_trimmomatic_R2.fastq
SRCT12_2_CB6RHANXX_CGATGT_L001_MERGED_trimmomatic_R1.fastq
SRCT12_2_CB6RHANXX_CGATGT_L001_MERGED_trimmomatic_R2.fastq
SRCT12_2_CB6RHANXX_CGATGT_L002_MERGED_trimmomatic_R1.fastq
SRCT12_2_CB6RHANXX_CGATGT_L002_MERGED_trimmomatic_R2.fas

SRCT08_2_CB6RHANXX_ATGTCA_L001_MERGED_trimmomatic_unpaired_1.fastq
SRCT08_2_CB6RHANXX_ATGTCA_L001_MERGED_trimmomatic_unpaired_2.fastq
SRCT08_2_CB6RHANXX_ATGTCA_L002_MERGED_trimmomatic_unpaired_1.fastq
SRCT08_2_CB6RHANXX_ATGTCA_L002_MERGED_trimmomatic_unpaired_2.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L001_MERGED_trimmomatic_unpaired_1.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L001_MERGED_trimmomatic_unpaired_2.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L002_MERGED_trimmomatic_unpaired_1.fastq
SRCT08_3_CB6RHANXX_CTTGTA_L002_MERGED_trimmomatic_unpaired_2.fastq
SRCT08_4_CB6RHANXX_CAGATC_L001_MERGED_trimmomatic_unpaired_1.fastq
SRCT08_4_CB6RHANXX_CAGATC_L001_MERGED_trimmomatic_unpaired_2.fastq
SRCT08_4_CB6RHANXX_CAGATC_L002_MERGED_trimmomatic_unpaired_1.fastq
SRCT08_4_CB6RHANXX_CAGATC_L002_MERGED_trimmomatic_unpaired_2.fastq
SRCT12_1_CB6RHANXX_GGTAGC_L001_MERGED_trimmomatic_unpaired_1.fastq
SRCT12_1_CB6RHANXX_GGTAGC_L001_MERGED_trimmomatic_unpaired_2.fastq
SRCT12_1_CB6RHANXX_GGTAGC_L002_MERGED_trimmomatic_unpaired_1.f

SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED_trimmomatic_unpaired_2.fastq
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED_trimmomatic_unpaired_1.fastq
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED_trimmomatic_unpaired_2.fastq
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED_trimmomatic_unpaired_1.fastq
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED_trimmomatic_unpaired_2.fastq
SRZL16_3_CB6RHANXX_GAGTGG_L001_MERGED_trimmomatic_unpaired_1.fastq
SRZL16_3_CB6RHANXX_GAGTGG_L001_MERGED_trimmomatic_unpaired_2.fastq
SRZL16_3_CB6RHANXX_GAGTGG_L002_MERGED_trimmomatic_unpaired_1.fastq
SRZL16_3_CB6RHANXX_GAGTGG_L002_MERGED_trimmomatic_unpaired_2.fastq
SRZL16_4_CB6RHANXX_CAAAAG_L001_MERGED_trimmomatic_unpaired_1.fastq
SRZL16_4_CB6RHANXX_CAAAAG_L001_MERGED_trimmomatic_unpaired_2.fastq
SRZL16_4_CB6RHANXX_CAAAAG_L002_MERGED_trimmomatic_unpaired_1.fastq
SRZL16_4_CB6RHANXX_CAAAAG_L002_MERGED_trimmomatic_unpaired_2.fastq

/workspace/hradxj/RNAseq/Red9/005.fastqc_trim:

/workspace/hradxj/RNAseq/Red9/006.MIA:
logs
metrics
SRCT08_1_CB6RHANXX_AGTTCC_L0

SRZL12_4_CB6RHANXX_TTAGGC_L002_MERGED_trimmomatic.txt
SRZL16_1_CB6RHANXX_CCGTCC_L001_MERGED_trimmomatic.txt
SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED_trimmomatic.txt
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED_trimmomatic.txt
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED_trimmomatic.txt
SRZL16_3_CB6RHANXX_GAGTGG_L001_MERGED_trimmomatic.txt
SRZL16_3_CB6RHANXX_GAGTGG_L002_MERGED_trimmomatic.txt
SRZL16_4_CB6RHANXX_CAAAAG_L001_MERGED_trimmomatic.txt
SRZL16_4_CB6RHANXX_CAAAAG_L002_MERGED_trimmomatic.txt

/workspace/hradxj/RNAseq/Red9/007.STAR:
annotation  genome  index  logs  Single_Pass_Results  Two_Pass_Results

/workspace/hradxj/RNAseq/Red9/007.STAR/annotation:
Red5_manno_150317093734.gff3  Red5_PS1.1.69.0.gff3

/workspace/hradxj/RNAseq/Red9/007.STAR/genome:
PS1.1.68.5.fasta

/workspace/hradxj/RNAseq/Red9/007.STAR/index:
chrLength.txt	   chrName.txt	 genomeParameters.txt
chrNameLength.txt  chrStart.txt  logs

/workspace/hradxj/RNAseq/Red9/007.STAR/index/logs:
STAR.err  STAR.out

/workspace/hradxj/RNAseq/R

SRZH08_3_CB6RHANXX_AGTCAA_L002_MERGED_trimmomatic_Log.out
SRZH08_3_CB6RHANXX_AGTCAA_L002_MERGED_trimmomatic_Log.progress.out
SRZH08_3_CB6RHANXX_AGTCAA_L002_MERGED_trimmomatic_Log.std.out
SRZH08_3_CB6RHANXX_AGTCAA_L002_MERGED_trimmomatic__STARtmp
SRZH08_4_CB6RHANXX_GCCAAT_L001_MERGED_trimmomatic_Log.out
SRZH08_4_CB6RHANXX_GCCAAT_L001_MERGED_trimmomatic_Log.progress.out
SRZH08_4_CB6RHANXX_GCCAAT_L001_MERGED_trimmomatic_Log.std.out
SRZH08_4_CB6RHANXX_GCCAAT_L001_MERGED_trimmomatic__STARtmp
SRZH08_4_CB6RHANXX_GCCAAT_L002_MERGED_trimmomatic_Log.out
SRZH08_4_CB6RHANXX_GCCAAT_L002_MERGED_trimmomatic_Log.progress.out
SRZH08_4_CB6RHANXX_GCCAAT_L002_MERGED_trimmomatic_Log.std.out
SRZH08_4_CB6RHANXX_GCCAAT_L002_MERGED_trimmomatic__STARtmp
SRZH12_1_CB6RHANXX_CACTCA_L001_MERGED_trimmomatic_Log.out
SRZH12_1_CB6RHANXX_CACTCA_L001_MERGED_trimmomatic_Log.progress.out
SRZH12_1_CB6RHANXX_CACTCA_L001_MERGED_trimmomatic_Log.std.out
SRZH12_1_CB6RHANXX_CACTCA_L001_MERGED_trimmomatic__STARtmp
SRZH12_1_CB6RHAN

SRZL16_1_CB6RHANXX_CCGTCC_L001_MERGED_trimmomatic_Log.std.out
SRZL16_1_CB6RHANXX_CCGTCC_L001_MERGED_trimmomatic__STARtmp
SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED_trimmomatic_Log.out
SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED_trimmomatic_Log.progress.out
SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED_trimmomatic_Log.std.out
SRZL16_1_CB6RHANXX_CCGTCC_L002_MERGED_trimmomatic__STARtmp
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED_trimmomatic_Log.out
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED_trimmomatic_Log.progress.out
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED_trimmomatic_Log.std.out
SRZL16_2_CB6RHANXX_CACGAT_L001_MERGED_trimmomatic__STARtmp
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED_trimmomatic_Log.out
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED_trimmomatic_Log.progress.out
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED_trimmomatic_Log.std.out
SRZL16_2_CB6RHANXX_CACGAT_L002_MERGED_trimmomatic__STARtmp
SRZL16_3_CB6RHANXX_GAGTGG_L001_MERGED_trimmomatic_Log.out
SRZL16_3_CB6RHANXX_GAGTGG_L001_MERGED_trimmomatic_Log.progress.out
SRZL16_3_CB6RHAN


/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL08_4_CB6RHANXX_ACTTGA_L002_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL12_1_CB6RHANXX_TGACCA_L001_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL12_1_CB6RHANXX_TGACCA_L002_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL12_2_CB6RHANXX_CATTTT_L001_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL12_2_CB6RHANXX_CATTTT_L002_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL12_3_CB6RHANXX_CAGGCG_L001_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL12_3_CB6RHANXX_CAGGCG_L002_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/Single_Pass_Results/SRZL12_4_CB6RHANXX_TTAGGC_L001_MERGED_trimmomatic__STARtmp:

/workspace/hradxj/RNAseq/Red9/007.STAR/

## <u>Step II, Part 2: Index the Genome Using STAR</u>
* The inputs to this step are the genome as a multi FASTA file and the annotations as a GFF or GTF file
* The outputs include the genome index files used in the alignment steps

In [91]:
/software/bioinformatics/cufflinks-2.2.1/gffread $ANNOT -g $GENOME -T -o $ANNOTGTF
#/workspace

No fasta index found for /workspace/hradxj/karmun_awesome_experiment/009.STAR/genome/reference.fasta. Rebuilding, please wait..
Fasta index rebuilt.


In [92]:
grep "GSVIVT01012261001" < $ANNOT

chr1	Gaze	mRNA	10731	27033	79.1689	+	.	ID=GSVIVT01012261001;Name=GSVIVT01012261001;Parent=GSVIVG01012261001;Note=Complete 1
chr1	Gaze	UTR	26778	27033	6.4577	+	.	ID=51156;Parent=GSVIVT01012261001
chr1	Gaze	CDS	26733	26777	4.2537	+	0	ID=51157;Parent=GSVIVT01012261001
chr1	Gaze	CDS	26528	26632	14.9888	+	0	ID=51158;Parent=GSVIVT01012261001
chr1	Gaze	CDS	19898	19957	0.4178	+	0	ID=51159;Parent=GSVIVT01012261001
chr1	Gaze	CDS	14421	14565	0.4178	+	2	ID=51160;Parent=GSVIVT01012261001
chr1	Gaze	CDS	12997	13119	11.6000	+	2	ID=51161;Parent=GSVIVT01012261001
chr1	Gaze	CDS	12837	12859	4.7620	+	0	ID=51162;Parent=GSVIVT01012261001
chr1	Gaze	UTR	12798	12836	2.1161	+	.	ID=51163;Parent=GSVIVT01012261001
chr1	Gaze	UTR	10731	10945	3.5151	+	.	ID=51164;Parent=GSVIVT01012261001


In [93]:
grep "GSVIVT01012261001" < $ANNOTGTF

chr1	Gaze	exon	10731	10945	3.52	+	.	transcript_id "GSVIVT01012261001"; gene_id "GSVIVG01012261001"; gene_name "GSVIVG01012261001";
chr1	Gaze	exon	12798	12859	2.12	+	.	transcript_id "GSVIVT01012261001"; gene_id "GSVIVG01012261001"; gene_name "GSVIVG01012261001";
chr1	Gaze	exon	12997	13119	11.60	+	.	transcript_id "GSVIVT01012261001"; gene_id "GSVIVG01012261001"; gene_name "GSVIVG01012261001";
chr1	Gaze	exon	14421	14565	0.42	+	.	transcript_id "GSVIVT01012261001"; gene_id "GSVIVG01012261001"; gene_name "GSVIVG01012261001";
chr1	Gaze	exon	19898	19957	0.42	+	.	transcript_id "GSVIVT01012261001"; gene_id "GSVIVG01012261001"; gene_name "GSVIVG01012261001";
chr1	Gaze	exon	26528	26632	14.99	+	.	transcript_id "GSVIVT01012261001"; gene_id "GSVIVG01012261001"; gene_name "GSVIVG01012261001";
chr1	Gaze	exon	26733	27033	4.25	+	.	transcript_id "GSVIVT01012261001"; gene_id "GSVIVG01012261001"; gene_name "GSVIVG01012261001";
chr1	Gaze	CDS	12837	12859	.	+	0	transcript_id "GSVIVT01012261001"; gene_id "GSVIV

In [95]:
grep "chr1" < $GENOME

>chr1
>chr10
>chr10_random
>chr11
>chr11_random
>chr12
>chr12_random
>chr13
>chr13_random
>chr14
>chr15
>chr16
>chr16_random
>chr17
>chr17_random
>chr18
>chr18_random
>chr19
>chr1_random


In [103]:
COMMAND="module load STAR; \
STAR \
--runMode genomeGenerate \
--limitGenomeGenerateRAM 240000000000 \
--runThreadN 32 \
--genomeFastaFiles $GENOME \
--sjdbGTFfile ${ANNOTGTF} \
--sjdbGTFtagExonParentTranscript transcript_id \
--sjdbGTFtagExonParentGene gene_id \
--genomeDir ${PROJECT}/009.STAR/index; \
module unload STAR"
echo $COMMAND;
bsub \
-J STAR_Dan \
-o ${PROJECT}/009.STAR/index/logs/%J_STAR_index.out \
-e ${PROJECT}/009.STAR/index/logs/%J_STAR_index.err \
-n 32 \
-q lowpriority \
$COMMAND

#
# --sjdbOverhang 149 \ --sjdbGTFtagExonParentGene=gene_id  \


module load STAR; STAR --runMode genomeGenerate --limitGenomeGenerateRAM 240000000000 --runThreadN 32 --genomeFastaFiles /workspace/hradxj/karmun_awesome_experiment/009.STAR/genome/reference.fasta --sjdbGTFfile /workspace/hradxj/karmun_awesome_experiment/009.STAR/annotation/Vitis_vinifera_annotation.gtf --sjdbGTFtagExonParentTranscript transcript_id --sjdbGTFtagExonParentGene gene_id --genomeDir /workspace/hradxj/karmun_awesome_experiment/009.STAR/index; module unload STAR
Job <686488> is submitted to queue <lowpriority>.


In [104]:
bpeek -f 686488

<< output from stdout >>
Dec 05 09:43:07 ..... started STAR run
Dec 05 09:43:07 ... starting to generate Genome files
Dec 05 09:43:20 ... starting to sort Suffix Array. This may take a long time...
Dec 05 09:43:23 ... sorting Suffix Array chunks and saving them to disk...
Dec 05 09:46:10 ... loading chunks from disk, packing SA...
Dec 05 09:46:19 ... finished generating suffix array
Dec 05 09:46:19 ... generating Suffix Array index
Dec 05 09:47:37 ... completed Suffix Array index
Dec 05 09:47:37 ..... processing annotations GTF
Dec 05 09:47:39 ..... inserting junctions into the genome indices
Dec 05 09:48:29 ... writing Genome to disk ...
Dec 05 09:48:34 ... writing Suffix Array to disk ...
Dec 05 09:49:22 ... writing SAindex to disk
Dec 05 09:49:37 ..... finished successfully

------------------------------------------------------------
Sender: OpenLava System <openlava@aklppb31>
Subject: Job 686488: <STAR_Dan> Done

Job <STAR_Dan> was submitted from host <aklppr31> by user <hradxj>.


In [122]:
cat ${PROJECT}/009.STAR/index/logs/686488_STAR_index.err;

In [123]:
cat ${PROJECT}/009.STAR/index/logs/686488_STAR_index.out;

Dec 05 09:43:07 ..... started STAR run
Dec 05 09:43:07 ... starting to generate Genome files
Dec 05 09:43:20 ... starting to sort Suffix Array. This may take a long time...
Dec 05 09:43:23 ... sorting Suffix Array chunks and saving them to disk...
Dec 05 09:46:10 ... loading chunks from disk, packing SA...
Dec 05 09:46:19 ... finished generating suffix array
Dec 05 09:46:19 ... generating Suffix Array index
Dec 05 09:47:37 ... completed Suffix Array index
Dec 05 09:47:37 ..... processing annotations GTF
Dec 05 09:47:39 ..... inserting junctions into the genome indices
Dec 05 09:48:29 ... writing Genome to disk ...
Dec 05 09:48:34 ... writing Suffix Array to disk ...
Dec 05 09:49:22 ... writing SAindex to disk
Dec 05 09:49:37 ..... finished successfully

------------------------------------------------------------
Sender: OpenLava System <openlava@aklppb31>
Subject: Job 686488: <STAR_Dan> Done

Job <STAR_Dan> was submitted from host <aklppr31> by user <hradxj>.
Job was executed on host(

This uses genome version (1.68.5) and model version (15...34)

In [124]:
#SANITY CHECK:is there an index in the right place and is it a sensible size?
ls -s ${INDEX}

total 6784265
     26 chrLength.txt	        26 genomeParameters.txt
     26 chrNameLength.txt        2 logs
     26 chrName.txt	   4476482 SA
     26 chrStart.txt	   1723434 SAindex
   4946 exonGeTrInfo.tab      3578 sjdbInfo.txt
   2042 exonInfo.tab	      3234 sjdbList.fromGTF.out.tab
    634 geneInfo.tab	      3234 sjdbList.out.tab
 565154 Genome		      1402 transcriptInfo.tab


In [125]:
#Find trimmed reads
ls ${PROJECT}/004.trimmomatic

logs
RACP005_11_S11_L002_MERGED_trimmomatic_R1.fastq
RACP005_11_S11_L002_MERGED_trimmomatic_R2.fastq
RACP005_12_S12_L002_MERGED_trimmomatic_R1.fastq
RACP005_12_S12_L002_MERGED_trimmomatic_R2.fastq
RACP005_13_S13_L002_MERGED_trimmomatic_R1.fastq
RACP005_13_S13_L002_MERGED_trimmomatic_R2.fastq
RACP005_1_S8_L002_MERGED_trimmomatic_R1.fastq
RACP005_1_S8_L002_MERGED_trimmomatic_R2.fastq
RACP005_5_S9_L002_MERGED_trimmomatic_R1.fastq
RACP005_5_S9_L002_MERGED_trimmomatic_R2.fastq
RACP005_8_S10_L002_MERGED_trimmomatic_R1.fastq
RACP005_8_S10_L002_MERGED_trimmomatic_R2.fastq
unpaired


In [126]:
TRIMMED=${PROJECT}/004.trimmomatic
INDEX=${PROJECT}/009.STAR/index
OUT_STAR=${PROJECT}/009.STAR/Single_Pass_Results
mkdir -p ${OUT_STAR}
PREFIXLIST=`basename -a ${TRIMMED}/*.fastq | sed 's/_R[1,2].fastq//g'|sort -u `
echo $PREFIXLIST

for PREFIX in ${PREFIXLIST}
do
echo $PREFIX
R1=${TRIMMED}/${PREFIX}_R1.fastq
R2=${TRIMMED}/${PREFIX}_R2.fastq


COMMAND="module load STAR; \
            STAR \
            --runThreadN 8 \
            --genomeDir ${INDEX} \
            --readFilesIn ${R1} ${R2} \
            --sjdbGTFfile ${ANNOTGTF} \
            --sjdbGTFtagExonParentGene gene_id \
            --sjdbGTFtagExonParentTranscript transcript_id \
            --outFileNamePrefix ${OUT_STAR}/${PREFIX}.bam \
            --quantMode TranscriptomeSAM GeneCounts \
            --outStd BAM_SortedByCoordinate"
bsub \
-J STAR \
-o ${LOG}/%J_STAR_index.out \
-e ${LOG}/%J_STAR_index.err \
-n 8 \
-q lowpriority \
$COMMAND
done

RACP005_11_S11_L002_MERGED_trimmomatic RACP005_12_S12_L002_MERGED_trimmomatic RACP005_13_S13_L002_MERGED_trimmomatic RACP005_1_S8_L002_MERGED_trimmomatic RACP005_5_S9_L002_MERGED_trimmomatic RACP005_8_S10_L002_MERGED_trimmomatic
RACP005_11_S11_L002_MERGED_trimmomatic
Job <688673> is submitted to queue <lowpriority>.
RACP005_12_S12_L002_MERGED_trimmomatic
Job <688674> is submitted to queue <lowpriority>.
RACP005_13_S13_L002_MERGED_trimmomatic
Job <688675> is submitted to queue <lowpriority>.
RACP005_1_S8_L002_MERGED_trimmomatic
Job <688676> is submitted to queue <lowpriority>.
RACP005_5_S9_L002_MERGED_trimmomatic
Job <688677> is submitted to queue <lowpriority>.
RACP005_8_S10_L002_MERGED_trimmomatic
Job <688678> is submitted to queue <lowpriority>.


In [127]:
bpeek -f 688678

<< output from stdout >>
tail: /home/hradxj/.lsbatch/1512591782.688678.out: Stale file handle
tail: no files remaining


In [119]:
# Run STAR in two-pass mode
# Create list of files containing splice junctions
SJLIST=$(ls ${PROJECT}/009.STAR/Single_Pass_Results/*SJ.out.tab)
echo $SJLIST

/workspace/hradxj/karmun_awesome_experiment/009.STAR/Single_Pass_Results/RACP005_11_S11_L002_MERGED_trimmomatic.bamSJ.out.tab /workspace/hradxj/karmun_awesome_experiment/009.STAR/Single_Pass_Results/RACP005_12_S12_L002_MERGED_trimmomatic.bamSJ.out.tab /workspace/hradxj/karmun_awesome_experiment/009.STAR/Single_Pass_Results/RACP005_13_S13_L002_MERGED_trimmomatic.bamSJ.out.tab /workspace/hradxj/karmun_awesome_experiment/009.STAR/Single_Pass_Results/RACP005_1_S8_L002_MERGED_trimmomatic.bamSJ.out.tab /workspace/hradxj/karmun_awesome_experiment/009.STAR/Single_Pass_Results/RACP005_5_S9_L002_MERGED_trimmomatic.bamSJ.out.tab /workspace/hradxj/karmun_awesome_experiment/009.STAR/Single_Pass_Results/RACP005_8_S10_L002_MERGED_trimmomatic.bamSJ.out.tab


In [120]:
TRIMMED=${PROJECT}/004.trimmomatic
INDEX=${PROJECT}/009.STAR/index
OUT_STAR=${PROJECT}/009.STAR/Two_Pass_Results
mkdir -p ${OUT_STAR}
PREFIXLIST=`basename -a ${TRIMMED}/*.fastq | sed 's/_R[1,2].fastq//g'|sort -u `
echo $PREFIXLIST

for PREFIX in ${PREFIXLIST}
do
echo $PREFIX
R1=${TRIMMED}/${PREFIX}_R1.fastq
R2=${TRIMMED}/${PREFIX}_R2.fastq


COMMAND="module load STAR; \
            STAR \
            --runThreadN 8 \
            --genomeDir ${INDEX} \
            --readFilesIn ${R1} ${R2} \
            --sjdbGTFfile ${ANNOTGTF} \
            --sjdbGTFtagExonParentGene gene_id \
            --sjdbGTFtagExonParentTranscript transcript_id \
            --outFileNamePrefix ${OUT_STAR}/${PREFIX}.bam \
            --sjdbFileChrStartEnd ${SJLIST} \
            --quantMode GeneCounts \
            --outStd BAM_SortedByCoordinate"
bsub \
-J STAR \
-o ${LOG}/%J_STAR_index.out \
-e ${LOG}/%J_STAR_index.err \
-n 32 \
-q lowpriority \
$COMMAND
done

RACP005_11_S11_L002_MERGED_trimmomatic RACP005_12_S12_L002_MERGED_trimmomatic RACP005_13_S13_L002_MERGED_trimmomatic RACP005_1_S8_L002_MERGED_trimmomatic RACP005_5_S9_L002_MERGED_trimmomatic RACP005_8_S10_L002_MERGED_trimmomatic
RACP005_11_S11_L002_MERGED_trimmomatic
Job <687032> is submitted to queue <lowpriority>.
RACP005_12_S12_L002_MERGED_trimmomatic
Job <687033> is submitted to queue <lowpriority>.
RACP005_13_S13_L002_MERGED_trimmomatic
Job <687034> is submitted to queue <lowpriority>.
RACP005_1_S8_L002_MERGED_trimmomatic
Job <687035> is submitted to queue <lowpriority>.
RACP005_5_S9_L002_MERGED_trimmomatic
Job <687036> is submitted to queue <lowpriority>.
RACP005_8_S10_L002_MERGED_trimmomatic
Job <687037> is submitted to queue <lowpriority>.


In [128]:
# Find location of read count files
ls ${PROJECT}/009.STAR/Two_Pass_Results | grep ReadsPerGene

RACP005_1_S8_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab
RACP005_5_S9_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab
RACP005_8_S10_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab


In [153]:
# First create a new file with one column of all gene names. Do this for two files, one for sense
# and one for antisense
mkdir -p ${PROJECT}/011.edgeR_Vv;

cat ${PROJECT}/009.STAR/Two_Pass_Results/RACP005_1_S8_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab |awk '{print $1}'\
> ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR-genenames.tab;

In [154]:
# Now add the appropriate column
# Create a list of read count file names
READCOUNTFILELIST=$(ls ${PROJECT}/009.STAR/Two_Pass_Results | grep ReadsPerGene)


for READCOUNTFILE in $READCOUNTFILELIST
do
awk '{print $3}' < ${PROJECT}/009.STAR/Two_Pass_Results/${READCOUNTFILE} > ${PROJECT}/011.edgeR_Vv/${READCOUNTFILE}.col3;
done

paste ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR-genenames.tab \
${PROJECT}/011.edgeR_Vv/RACP005_1_S8_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab.col3 \
${PROJECT}/011.edgeR_Vv/RACP005_5_S9_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab.col3 \
${PROJECT}/011.edgeR_Vv/RACP005_8_S10_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab.col3 \
> ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR_with_unmapped.tab


In [155]:
ls ${PROJECT}/011.edgeR_Vv

GRLaV3_Vv_EdgeR-genenames.tab
GRLaV3_Vv_EdgeR_with_unmapped.tab
RACP005_1_S8_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab.col3
RACP005_5_S9_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab.col3
RACP005_8_S10_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab.col3


In [156]:
head  ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR_with_unmapped.tab

N_unmapped	5053374	11312084	3028748
N_multimapping	1917215	2934146	514757
N_noFeature	4601554	10738708	1439012
N_ambiguous	78822	288253	24490
GSVIVG01012261001	14	37	1
GSVIVG01012259001	6	18	1
GSVIVG01012257001	311	1151	151
GSVIVG01012255001	755	1819	234
GSVIVG01012253001	1	0	0
GSVIVG01012250001	0	0	0


In [157]:
wc -l  ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR_with_unmapped.tab

19763 /workspace/hradxj/karmun_awesome_experiment/011.edgeR_Vv/GRLaV3_Vv_EdgeR_with_unmapped.tab


In [158]:
tail -19759 ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR_with_unmapped.tab > ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR.tab

In [159]:
sed -i '1s/^/Gene\tGrape-Healthy\tGrape-Infected-1\tGrape-Infected-2\n/' ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR.tab;

In [160]:
head ${PROJECT}/011.edgeR_Vv/GRLaV3_Vv_EdgeR.tab;

Gene	Grape-Healthy	Grape-Infected-1	Grape-Infected-2
GSVIVG01012261001	14	37	1
GSVIVG01012259001	6	18	1
GSVIVG01012257001	311	1151	151
GSVIVG01012255001	755	1819	234
GSVIVG01012253001	1	0	0
GSVIVG01012250001	0	0	0
GSVIVG01012249001	0	0	0
GSVIVG01012247001	32	32	6
GSVIVG01012246001	0	0	0


In [163]:
# Check that column 3 from the original read count files has been correctly placed into the combined read counts file
head ${PROJECT}/009.STAR/Two_Pass_Results/RACP005_1_S8_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab
head ${PROJECT}/009.STAR/Two_Pass_Results/RACP005_5_S9_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab
head ${PROJECT}/009.STAR/Two_Pass_Results/RACP005_8_S10_L002_MERGED_trimmomatic.bamReadsPerGene.out.tab

N_unmapped	5053374	5053374	5053374
N_multimapping	1917215	1917215	1917215
N_noFeature	4418881	4601554	10329255
N_ambiguous	93498	78822	377
GSVIVG01012261001	14	14	0
GSVIVG01012259001	6	6	0
GSVIVG01012257001	311	311	0
GSVIVG01012255001	755	755	0
GSVIVG01012253001	1	1	0
GSVIVG01012250001	0	0	0
N_unmapped	11312084	11312084	11312084
N_multimapping	2934146	2934146	2934146
N_noFeature	10290185	10738708	28575210
N_ambiguous	338571	288253	710
GSVIVG01012261001	37	37	0
GSVIVG01012259001	21	18	3
GSVIVG01012257001	1151	1151	0
GSVIVG01012255001	1819	1819	0
GSVIVG01012253001	0	0	0
GSVIVG01012250001	0	0	0
N_unmapped	3028748	3028748	3028748
N_multimapping	514757	514757	514757
N_noFeature	1383417	1439012	2926614
N_ambiguous	27093	24490	84
GSVIVG01012261001	1	1	0
GSVIVG01012259001	1	1	0
GSVIVG01012257001	151	151	0
GSVIVG01012255001	234	234	0
GSVIVG01012253001	0	0	0
GSVIVG01012250001	0	0	0


In [None]:
# We have derived a set of putatively differentially expressed genes. Examine the raw read counts as a sanity check

DEGENELIST="Niben101Scf00107g03008 Niben101Scf05044g02012 Niben101Scf00837g08001 Niben101Scf03937g00009 Niben101Scf06977g00014 Niben101Scf03169g00010 Niben101Scf01475g00019 Niben101Scf03506g03001 Niben101Scf02971g01006 Niben101Scf08683g00001"
echo $DEGENELIST

