Skip to content

Dev a#3

Merged
AAnnan merged 5 commits intomainfrom
devA
May 1, 2026
Merged

Dev a#3
AAnnan merged 5 commits intomainfrom
devA

Conversation

@AAnnan
Copy link
Copy Markdown
Collaborator

@AAnnan AAnnan commented May 1, 2026

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/templatetres branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Summary by CodeRabbit

Release Notes

  • New Features

    • Restructured samplesheet configuration with explicit runtime: and references: sections replacing the prior resources: block.
    • Support for modality-specific barcodes (rna_sb_barcodes, dna_sb_barcodes) and dual tagmentation modes.
  • Changed

    • Output directory names: split/rna_split_fastqs//dna_split_fastqs/, align/rna_align/, qc/TrES_Stats/.
    • TMPDIR environment variable is now required for runtime configuration.
  • Deprecated

    • Removed CLI parameters: --runtime_env_prefix and resource override flags. Configure all settings in the samplesheet instead.

@AAnnan AAnnan self-assigned this May 1, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a27957f0-70e6-4998-8d59-f719cce4764b

📥 Commits

Reviewing files that changed from the base of the PR and between b632595 and 6543676.

📒 Files selected for processing (100)
  • .codex
  • README.md
  • assets/ahrmad_template.yaml
  • assets/samplesheet.example.yaml
  • assets/samplesheet.real.example.yaml
  • assets/samplesheet.template.yaml
  • assets/test_realdata/samplesheet.real_dna.yaml
  • assets/test_realdata/samplesheet.real_rna.yaml
  • assets/testdata/TrESFlow_References/dna/human/blacklist.bed
  • assets/testdata/TrESFlow_References/dna/human/bwa/hg38.fa.0123
  • assets/testdata/TrESFlow_References/dna/human/bwa/hg38.fa.amb
  • assets/testdata/TrESFlow_References/dna/human/bwa/hg38.fa.ann
  • assets/testdata/TrESFlow_References/dna/human/bwa/hg38.fa.bwt.2bit.64
  • assets/testdata/TrESFlow_References/dna/human/bwa/hg38.fa.pac
  • assets/testdata/TrESFlow_References/dna/human/chrom.sizes
  • assets/testdata/TrESFlow_References/ligation_barcode_whitelist.txt
  • assets/testdata/TrESFlow_References/rna/human/star/.gitkeep
  • assets/testdata/TrESFlow_References/rna/human/star/Genome
  • assets/testdata/TrESFlow_References/rna/human/star/SA
  • assets/testdata/TrESFlow_References/rna/human/star/SAindex
  • assets/testdata/TrESFlow_References/rna/human/star/chrLength.txt
  • assets/testdata/TrESFlow_References/rna/human/star/chrName.txt
  • assets/testdata/TrESFlow_References/rna/human/star/chrNameLength.txt
  • assets/testdata/TrESFlow_References/rna/human/star/chrStart.txt
  • assets/testdata/TrESFlow_References/rna/human/star/genomeParameters.txt
  • assets/testdata/TrESFlow_References_missing_bwa_sidecar/dna/human/blacklist.bed
  • assets/testdata/TrESFlow_References_missing_bwa_sidecar/dna/human/bwa/hg38.fa.0123
  • assets/testdata/TrESFlow_References_missing_bwa_sidecar/dna/human/bwa/hg38.fa.amb
  • assets/testdata/TrESFlow_References_missing_bwa_sidecar/dna/human/bwa/hg38.fa.ann
  • assets/testdata/TrESFlow_References_missing_bwa_sidecar/dna/human/bwa/hg38.fa.pac
  • assets/testdata/TrESFlow_References_missing_bwa_sidecar/ligation_barcode_whitelist.txt
  • assets/testdata/TrESFlow_References_missing_star_file/ligation_barcode_whitelist.txt
  • assets/testdata/TrESFlow_References_missing_star_file/rna/human/star/Genome
  • assets/testdata/TrESFlow_References_missing_star_file/rna/human/star/SA
  • assets/testdata/TrESFlow_References_missing_star_file/rna/human/star/chrLength.txt
  • assets/testdata/TrESFlow_References_missing_star_file/rna/human/star/chrName.txt
  • assets/testdata/TrESFlow_References_missing_star_file/rna/human/star/chrNameLength.txt
  • assets/testdata/TrESFlow_References_missing_star_file/rna/human/star/chrStart.txt
  • assets/testdata/TrESFlow_References_missing_star_file/rna/human/star/genomeParameters.txt
  • assets/testdata/test_dna_I1.fastq
  • assets/testdata/test_dna_I2_dummy.fastq
  • assets/testdata/test_dna_I2_single.fastq
  • assets/testdata/test_dna_R1.fastq
  • assets/testdata/test_dna_R2.fastq
  • bin/run_split_reads_dna.py
  • bin/run_split_reads_rna.py
  • bin/run_tag.py
  • bin/run_tag_lig3.py
  • bin/run_tag_umi.py
  • bin/run_trim_galore.py
  • bin/tresflow_fastq_utils.py
  • conf/base.config
  • conf/test.config
  • docs/architecture/implemented_pipeline.md
  • docs/output.md
  • docs/usage.md
  • lib/RuntimeSupport.groovy
  • lib/SamplesheetParser.groovy
  • lib/WorkflowSupport.groovy
  • main.nf
  • modules/local/align_dna/main.nf
  • modules/local/bam_coverage_dna/main.nf
  • modules/local/fq_to_sam/main.nf
  • modules/local/mark_duplicates_dna/main.nf
  • modules/local/rna_coverage/main.nf
  • modules/local/rna_filtered_bam/main.nf
  • modules/local/rna_starsolo_align/main.nf
  • modules/local/split_dna_reads/main.nf
  • modules/local/split_duplicates_dna/main.nf
  • modules/local/split_rna_reads/main.nf
  • modules/local/tag_dna_cell_barcode/main.nf
  • modules/local/tag_dna_modality/main.nf
  • modules/local/tag_dna_sb/main.nf
  • modules/local/tag_rna_cell_barcode/main.nf
  • modules/local/tag_rna_sb/main.nf
  • modules/local/tag_rna_umi/main.nf
  • modules/local/trim_dna_fastqs/main.nf
  • modules/local/trim_rna_fastqs/main.nf
  • nextflow.config
  • nextflow_schema.json
  • nf-test.config
  • scripts/core_runtime/RNA_COVERAGE.sh
  • scripts/core_runtime/RNA_STARSOLO_ALIGN.sh
  • scripts/core_runtime/Split_ReadsV2.codon
  • subworkflows/local/dna_core/main.nf
  • subworkflows/local/rna_core/main.nf
  • tests/default.nf.test
  • tests/samplesheets/dual_duplicate_dna_sb.yaml
  • tests/samplesheets/dual_missing_dna_sb.yaml
  • tests/samplesheets/dual_tagmentation.yaml
  • tests/samplesheets/missing_bwa_sidecar.yaml
  • tests/samplesheets/missing_ligation_whitelist.yaml
  • tests/samplesheets/missing_references_root.yaml
  • tests/samplesheets/missing_rna_ref_dir.yaml
  • tests/samplesheets/missing_runtime_env_prefix.yaml
  • tests/samplesheets/missing_runtime_tmpdir.yaml
  • tests/samplesheets/missing_star_index_file.yaml
  • tests/samplesheets/single_explicit_modality_sb.yaml
  • tests/test_fastq_compression.py
  • workflows/treseq.nf

📝 Walkthrough

Walkthrough

Configuration contract is restructured from a single resources block to separate runtime (env_prefix, tmpdir) and references (species, paths for STAR/BWA indices, blacklist/chrom sizes) blocks. A new shared FASTQ utility module (tresflow_fastq_utils.py) centralizes file/compression handling across tagging and splitting scripts. Output directory names change (e.g., split/rna_split_fastqs/, align/rna_align/). FASTQ outputs transition from immediate gzip compression to uncompressed intermediate files with final pigz compression. Test data and samplesheet fixtures are added to cover reference validation scenarios.

Changes

Cohort / File(s) Summary
Documentation Updates
README.md, docs/architecture/implemented_pipeline.md, docs/output.md, docs/usage.md
Updated samplesheet contract documentation, pipeline diagram, output directory structure, and quick-start commands to reflect new runtime/references block structure and directory naming changes.
Configuration Samplesheets
assets/ahrmad_template.yaml, assets/samplesheet.*.yaml, assets/test_realdata/samplesheet.*.yaml
Replaced resources block with runtime and references sections; updated reference paths and added explicit barcode configuration for modality-specific handling.
Test Reference Data
assets/testdata/TrESFlow_References*/* (STAR/BWA indices, blacklist, chrom sizes, whitelist)
assets/testdata/test_dna_*.fastq, assets/testdata/test_dna_I*.fastq
Added comprehensive test data fixtures including mock BWA sidecar files, STAR genome files, chromosome reference data, ligation barcode whitelist, and test FASTQ files for DNA/RNA reads.
Shared FASTQ Utilities
bin/tresflow_fastq_utils.py
New 341-line utility module providing centralized FASTQ handling: gzip-aware file operations, FASTQ parsing/comment canonicalization, RG/tag extraction, split-output management, final compression via pigz, and operational logging.
Python Script Refactoring
bin/run_split_reads_*.py, bin/run_tag*.py, bin/run_trim_galore.py
Delegated FASTQ/file-handling helpers to tresflow_fastq_utils; changed output from immediate gzip to uncompressed with final pigz compression; added event logging and --pigz-threads parameter.
Shell Script Updates
scripts/core_runtime/RNA_*.sh, scripts/core_runtime/Split_ReadsV2.codon
Refactored to accept explicit reference directories instead of base+species; added FASTQ comment canonicalization with standardized RG/XI tags; removed gzip compression in favor of plain FASTQ outputs.
Groovy Library Enhancements
lib/RuntimeSupport.groovy
Added runtimeTmpdir(), validateConfiguredWritableDirectory(), shellExports() methods; extended runtimeContext() to include tmpdir; added pigz to standard runtime tools.
Samplesheet Parser
lib/SamplesheetParser.groovy
Introduced parseContract() returning {library_name, runtime, references, modalities, samples}; replaced resources parsing with dedicated resolveRuntime()/resolveReferences(); added modality-specific barcode group handling and barcode length validation.
Workflow Support
lib/WorkflowSupport.groovy
Added validateReferenceContract(), validateRnaReferences(), validateDnaReferences(), inferBwaMem2Prefix() with explicit file-based validation; removed species-based reference validation; added helper functions for required value/directory/file checks.
Main Workflow
main.nf
Updated to parse samplesheet contract, validate runtime/reference configuration, reject deprecated CLI parameters with explicit error messages, and pass parsed sampleRows to TRESEQ workflow.
Configuration Files
nextflow.config, conf/base.config, conf/test.config, nextflow_schema.json
Removed global runtime_env_prefix and resource-override parameters; increased CPU allocations for tagging processes; updated test config with DNA-specific mocked process blocks; revised schema to remove deprecated runtime options.
Nextflow Modules
modules/local/tag_*/main.nf, modules/local/split_*/main.nf, modules/local/trim_*/main.nf, modules/local/*_align/main.nf, modules/local/bam_*/main.nf, modules/local/fq_to_sam/main.nf
Integrated RuntimeSupport for shell exports; changed FASTQ outputs from .fq.gz/.fastq.gz to .fastq with final compression; updated publishDir paths to match new directory structure; added pigz-threads parameter and TrES_Stats output publishing.
Subworkflows & Workflow
subworkflows/local/rna_core/main.nf, subworkflows/local/dna_core/main.nf, workflows/treseq.nf
Updated to use meta.rna_star_index_dir instead of base+species; added support for DNA index-read selection via metadata; removed internal samplesheet parsing; added coverage_warnings output.
Test Suite
tests/default.nf.test, tests/samplesheets/*.yaml, tests/test_fastq_compression.py, nf-test.config
Expanded test coverage with 10+ new samplesheet fixtures validating missing/deprecated runtime/reference fields, dual tagmentation scenarios, and sidecar validation; added unit tests for FASTQ compression/canonicalization; updated smoke test assertions for new output structure.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 The samplesheet now splits in two—
runtime and references shiny new!
With pigz threads and STAR paths neat,
Our FASTQ flows are swift and fleet.
From resources deep to contracts clear,
TrESFlow refactors without fear! 🌟

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devA

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@AAnnan AAnnan merged commit 2a789da into main May 1, 2026
2 of 5 checks passed
@AAnnan AAnnan deleted the devA branch May 1, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant