GeneGenie is a modular, containerized workflow for preprocessing Mycobacterium tuberculosis RNA-seq data.
It supports both single- and paired-end sequencing formats and performs quality control, alignment, quantification, and reporting.
The pipeline is implemented in Nextflow DSL2, ensuring reproducibility and scalability.
- Input Validation: Uses
seqkitfor FASTQ validation; invalid samples are excluded and reported. - Quality Control & Trimming: Supports
trimgaloreandfastpfor adapter and quality trimming. - Alignment: Offers
bowtie2andSTARaligners, with automatic index generation. - BAM Processing: SAM to BAM conversion, sorting using
samtools - Quantification: Supports both
featureCountsandHTSeqfor gene-level quantification. - Comprehensive Reporting: Aggregates QC and summary metrics via
MultiQC. - Containerized Execution: The pipeline runs in a dedicated Singularity container for reproducibility.
- Parameter Profiles: Predefined profiles for possible tool combinations.
GeneGenie-1.0/
├── containers/
├── modules/
├── output/
├── reference/
│ ├── data
│ ├── genome.gtf
│ └── genome.fasta
├── workflows
│ └── rnaseq.nf
├── genegenie.nf
├── multiqc_config.yaml
├── nextflow.config
├── nextflowrun.sh
└── README.md
This pipeline uses the following Singularity containers from BioContainers:
bowtie2:2.5.2--12e15c204b09f691- Read alignmentstar:2.7.11a--0f5e3d475719bcac- Read alignmenthtseq:2.0.3--3205f67d4c550865- Read countingfastp:0.23.4--b69359f46d2a8ebf- Read Quality control and trimmingtrim-galore:0.6.10--bc38c9238980c80e- Quality control and adapter trimmingsamtools:1.19.2--fbfb56ef5299fcef- SAM/BAM processingpicard:3.4.0--2976616e7cbd4840- BAM processingsubread:2.0.6--2dd2dd526de026fd- Feature countingseqkit:2.10.0--9a5d37887d7c4e09- Sequence toolkit (Fastq validation)multiqc:1.21--d44678e7b9933bf6- Reporting
Example of command to download star container:
singularity pull oras://community.wave.seqera.io/library/star:2.7.11a--0f5e3d475719bcac
GeneGenie supports several profiles for six tool combinations. Use the -profile flag to select a profile:
| Profile | QC Tool | Aligner | Quantification |
|---|---|---|---|
| TBF | trimgalore | bowtie2 | featurecounts |
| TBH | trimgalore | bowtie2 | htseq |
| FSF | fastp | star | featurecounts |
| FSH | fastp | star | htseq |
| TSH | trimgalore | star | htseq |
| TSF | trimgalore | star | featurecounts |
Example use:
nextflow run GeneGenie.nf -profile TBF--input: Path to input CSV file (required)--read_type:singleorpaired(default:paired)--outdir: Output directory (default:${projectDir}/output)--genome_fasta: Reference genome FASTA file (required)--gtf: GTF annotation file (required for quantification)--qc_tool:trimgaloreorfastp--aligner:bowtie2orstar--quantification:featurecountsorhtseq
You may override parameters at runtime like so:
nextflow run genegenie.nf -profile TSF \
--input /path/to/reference/samplesheet.csv \
--outdir /path/to/output \
--gtf /path/to/reference/genomic.gtf \
--genome_fasta /path/to/reference/genomic.fna \
--max_cpus 8 \
--max_memory 16.GB \
--max_time 48.h \
--read_type single| Step | Tool(s) | Output Directory |
|---|---|---|
| Input Validation | seqkit | seqkit/ |
| QC & Trimming | trimgalore / fastp | trimgalore/ or fastp/ |
| Alignment | bowtie2 / STAR | bowtie2/ or star/ |
| BAM Processing | samtools | samtools/ |
| Quantification | featureCounts / HTSeq | featurecounts/ or htseq/ |
| Reporting | MultiQC | multiqc/ |
All output files are organized under the specified --outdir.
seqkit/validation_results.txt: Per-sample validation status.trimgalore/,fastp/: Cleaned FASTQ and QC logs.bowtie2/,star/: Alignment files (SAM/BAM).samtools/: Sorted BAMs, alignment metrics.featurecounts/,htseq/: Gene count tables.multiqc_report.html: Aggregated summary report.pipeline_info/: Execution reports, timeline, trace, and DAG.
- Nextflow 21.04.0 or higher
- Apptainer/Singularity
- Sufficient disk space for intermediate and output files
- Invalid samples are excluded after validation and reported in the output.
GeneGenie
🧞♂️ Reproducible, containerized, and modular workflow for M. tuberculosis RNA-seq preprocessing