Is to analyze RNA-Seq raw data generated by Illumina sequencing from quality assessment to co-expressino analysis. The organims used in this script is Arabidopsis thaliana a model plant that is widely used in plant sciences.
The purpose of RNA-seq data is to identify the differences between 12 cytokinin "plant hormone" treatments on Arabidopsis leaves in terms of the number of differential gene expressions, Gene Ontology, and co-expressed clusters of genes in the CK-treated leaves during leaf senescence. AT leaves were placed upon treatment either for 2 hours, 48 hours, 96 hours, or 144 hours. Samples then were taken at the designated time point for RNA-extraction and were sent to for sequencing by NovaGene Inc.
the following steps are typically taken:
- Ensuring the quality of reads with
Fastqc - Trimming (post-cleaning) with
trimmomatic - Trimming (after trimmomatic) using
cutadabt - ** Mapping to a reference genome using
Hisat - Producing gene counts with
stringtie - Analysis of differential gene expression between treatments using
edgeR - Co-expressino analysis using
Tidyverse
- Bash
- Python
- R
There are five scripts in this respoiratory for analysing RNA-seq data. The first one is quality assessment of the Illumina seqeuecning raw reads, the second is aligment or mapping to the reference genome and gene count, the third script is differential gene expression by edgR, and the final script is the co-expression analysis.
Quality assessment and Mapping scripts were originally written by Dr. Tonia Schwartz an Associate professor in the biological department, Auburn University. Modified by Omar Hasannin to analysis Arbaidopsis RNA-seq data. You can find the original scripts in the this link: https://github.com/Schwartz-Lab-at-Auburn/FunctionalGenomicsCourse
The reference genome in this script has been downloaded from TAIR, as well as the functional annotation file for the co-expressin analysis.