- About this Repository
- readQC, alginment and read counting
- Analysis of RNAseq count matrix
- Miscellaneous
This is a collection of scripts to analyse RNA sequencing data from start (FASTQ or unaligned BAM file) to end (differential gene expression, heatmap, enrichment analysis). Some chunks of code were adopted from this limma tutorial
These scripts were developed on a HPC with a SLURM scheduler. Those are array jobs, however the backbone of the scripts should work on any linux based operating system. There is a script to align single end reads and a seperate one for paired end reads. The data per sample can distributed over several input FASTQ files. The scripts will run FastQC and qualimap alongside with STAR based alignment of the reads. QC metrics will be collected in a seperate script. Finally, we count the reads using Rsubread.
- this script helps to index the reference genome for STAR
- this script performs QC steps and STAR alignment of reads for paired end reads. For single end reads use this one. The scripts create folders for fastQC and qualimap analysis also log files are processed to be easy accessible for R.
- this script will create a text file with an overview of colleted QC metrics and clean up the analysis directory. It will put all bam files into one folder for the next step of the analsis.
- this script counts the reads and creates the input matrix for subsequent analysis with DESeq2 or voom limma.
The DESeq2 analysis pipeline does not require a lot of coding effort. It is couple of R-commands from data ingest to differential gene expression results. Thus, this script also includes initial data wrangling to get the data in shape for the DESeq2 data object and conversion of EntrezIDs to NCBI Symbol. You can use the vst matrix in case you want to do analyse the data outside of DESeq2.
The limma voom analysis pipeline as implemented in this script includes several QC plots to find outlying patterns in your count data and a batch removal procedure of a known batches in the dataset. It includes setting up a design and a contrast matrix for more multifacetted datasets alongside with a wrapper to convert from EntrezID to Gene Symbol and export differential gene expression results.
- this script plots vulcano plots from differential expressed gene lists
- this script creates heatmaps from a given set of genes based on a vst-matrix (DESeq2) or "voom$E" (correct slot in voom Elist)
- this script creates an upset plot (improved Venn diagram) including a wrapper to import multiple differential expressed genes list to R.
- this script creates boxplots and this one dot plots. This is a very imporant sanity check for any case control type analysis. Similar to the heatmap function it takes a vst matrix or the "$E" slot from a voom Elist.
- this script takes one or more differential expressed gene lists and performs overrepresentation analysis using enrichR. You can add as many databases (MsigDB, KEGG, etc.) listed in enrichR as you want. It returns one list of enriched terms per database. The enrichment results of all input differential gene lists will be in this file then. Finally it plots a ballon plot comparing enriched terms across differential expressed gene lists. The plot will consit of the top 5 enrichment results of each gene list.
- this script produces an enrichment lollipop plot of one single differential expressed gene list.