Skip to content

R package for processing of RNA-sequencing in the Ryten Lab.

Notifications You must be signed in to change notification settings

RHReynolds/RNAseqProcessing

Repository files navigation

RNAseq processing

Aim

  1. Comparison of RNA QC run using either trimmomatic or fastp.
  2. Provision of scripts for RNA-seq processing steps, from QC to alignment and quantification.

Using the package

Installing the package

To use, install from github. This can be done using the following lines of code:

install.packages("devtools")
library(devtools)
install_github("RHReynolds/RNAseqProcessing")

Calling tools from the command line

The executables for all tools have already been downloaded to /tools/. To run all scripts below assumes that you are able to call various tools from the command line without having to point to their exact location on the server. To do this edit your .profile in your home directory:

cd 
nano .profile

Then ensure that each tool (fastp, STAR, etc.) has an export PATH line, which will tell bash to look in /tools/ for commands. E.g. For fastp add: export PATH="/tools/fastp/:$PATH".

Scripts for RNA-seq processing

Script Processing Step Description Author(s)
prealignmentQC_fastp_PEadapters.R Pre-alignment QC This will perform fastp trimming, with adapter sequence auto-detection for PE data enabled, followed by fastQC and MultiQC. If you wish to specify adapters, this flag needs to be enabled. Script not yet produced. DZ, KD & RHR
prealignmentQC_fastp_notrimming.R Pre-alignment QC This will run fastp, but with trimming disabled, followed by fastQC and MultiQC. DZ, KD & RHR
STAR_alignment_withReadGroups_multi2pass.R Alignment Performs STAR alignment, with the option of adding read groups if needed (this is important if you're planning to use you bams for later de-duplication with UMIs). By default, this script will perform 1st pass mapping. If users wish to use it for 2nd pass mapping, together with a file of filtered junctions, call the --sj_file flag. This script is primarily for use with reads of length > 75 bp. If read length is shorter, different parameters may be necessary. For details of alignment process, read the alignment workflow. DZ & RHR
STAR_splice_junction_merge.R Alignment Performs merging of SJ.out.tab files from 1st pass mapping, removes duplicated splice junctions (as determined by genomic location) and outputs one SJ.out.tab file with the genomic coordinates. Also has optional flag for filtering junctions by the number of samples they are present in. For details of alignment process, read the alignment workflow. RHR
post_alignment_QC_RSeQC.R Post-alignment QC Performs (i) sorting and indexing of .bam files using samtools and (ii) runs post-alignment QC, using RSeQC. For details, read the alignment workflow. DZ, KD & RHR
quantification_Salmon.R Quantification Performs mapping-based quantification of transcripts and genes (the latter is only if a transcript-to-gene map is provided). This script can be used following trimming, as it does not require aligned files. Instead, Salmon will perform quasi-mapping prior to quantification. The benefit of using Salmon for quantification is its speed and ability to correct for sequence-specific biases, GC-biases and positional biases. This script is adapted for paired-end reads. For more details, read the quantification workflow. RHR
leafcutter_ds_multi_pairwise.R Differential splicing Leafcutter's command line tool for differential splicing currently only permits pairwise comparisons. If a grouping variable contains more than two groups, multipe pairwise comparisons can be performed using this script, which still calls the original Leafcutter command, with the addition of looping across each of the pairwise comparisons performed. For more details, read the leafcutter workflow. RHR
leafviz_multi_pairwise.R Differential splicing To visualise results of Leafcutter's differential splicing, Leafviz can be used. This requires that the results of the differential splicing have been formatted for use. LeafCutter provides a script, prepare_results.R, which performs this formatting, albeit for only one pairwise comparison. To format the results of multiple pairwise comparisons requires looping across the various pairwise comparisons and running the prepare_results.R for each individual pairwise comparison. This is what the leafviz_multi_pairwise.R script does. For more details, read the leafcutter workflow. RHR

Example workflow

About

R package for processing of RNA-sequencing in the Ryten Lab.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages