This repository contains workflows/scripts for processing Illumina short reads of small RNA librarires, mapping reads to the reference gneome and the miRNA database, and estimating the abundance of miRNAs.
The pipeline is designed to trim sequencing adapters from raw reads and map trimmed reads to the reference genome and the miRNA datase (including hapirins and mature miRNAs). The abudance of known/novel miRNAs is estimated according to the number of mapped reads.

The Snakemake pipeline utilizes a series of tools designed for processing sequencing data:
-
Trimmomatic: Trims sequencing adapters from raw reads for downstream processing.
-
FastQC: Generates reports of sequence quality, GC content, length distribution and adapter content for quality control.
-
FastQScreen: Generates reports of reads mapped to a set of reference databases to check the composition of the library.
-
miRDeep2: Maps reads to the reference genome and the miRNA database (miRBase: https://mirbase.org/) to estimate the abundance of miRNAs.
-
MultiQC: Aggregates the results from multiple Bioinformatics tools across samples into a single report for visualiztion.
This repository presents a streamlined Snakemake pipeline to map reads of small RNA libraries and estimate the abundance of miRNAs for downstream analysis. Key features include:
- Reads Processing: Utilizes Trimmomatic to trim sequencing adapters from raw reads.
- Quality Control: Utilizes FastQC/FastQScreen to check sequencing contents and library compositions.
- Mapping and Abundance Estimation: Utizlizes miRDeep2 to aggregate results of Bowtie, RNAFold, and ranfold for abundance estimation of miRNAs.
This pipeline provides a complete solution to estimate the abundance of miRNAs for downstream analysis.
Clone the Repository: Clone the new repository to your local machine, choosing the directory where you want to perform data analysis. Instructions for cloning can be found here.
Tailor the workflow to your project's requirements:
- Edit
config.yamlin theconfig/directory to set up the workflow execution parameters. - Modify
samples.tsvto outline your sample setup, ensuring it reflects your specific data structure and requirements.
Install Snakemake via conda with the following command:
conda create -c bioconda -c conda-forge -n snakemake snakemake-
Activate the Conda Environment and load the module of Singularity:
conda activate snakemake module load singularity
-
Test the Configuration: Perform a dry-run to validate your setup:
snakemake --use-singularity --singularity-args '--bind /path/to/bind/in/singularity' --profile config/ -n -
Cluster Execution: For cluster environments (using Slurm), submit the workflow as follows:
sbatch submit.sh
Edit submit.sh and config/config.yaml to customize the cluster environment.
-
Run the Test Dataset: Execulte the workflow with a test dataset:
cp test/config/configProject.yaml config/ sbatch submitExample.sh
Copy test/config/configProject.yaml to config/ and sbumit the job to start the workflow.
Upon successful execution, the pipeline generates a html report of known/novel miRNAs and their abundance for each sample. In addition, a MultiQC report is generated for an overview of quality control analysis of all samples.