feat: accept gzip-compressed fasta input#505
Open
tpall wants to merge 1 commit intoWrightonLabCSU:devfrom
Open
feat: accept gzip-compressed fasta input#505tpall wants to merge 1 commit intoWrightonLabCSU:devfrom
tpall wants to merge 1 commit intoWrightonLabCSU:devfrom
Conversation
Adds a small DECOMPRESS_FASTA module (`reformat.sh` from the bbmap container that other modules already use) and routes only `.gz` inputs through it via a channel branch on the `.gz` suffix. Plain fastas pass through unchanged. Sample-name normalisation strips both the trailing `.gz` (if present) and one of `.fa`/`.fna`/`.fasta` so `sample.fa` and `sample.fa.gz` yield the same downstream name. Outputs are identical regardless of input compression. Default `--fasta_fmt '*.f*'` already matches both plain and `.gz` files; schema description updated to mention this explicitly. Files: modules/local/rename/decompress_fasta.nf (new, 20 lines) workflows/dram.nf (channel branch + mix) nextflow_schema.json (description updates)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Accept gzip-compressed fasta inputs (
*.fa.gz/*.fna.gz/*.fasta.gz) without requiring users to decompress first. Plain fastas keep working unchanged.This was one of the changes bundled into the now-closed #472, split out per the maintainer-friendly path agreed when refiling #503 / #504.
What changed
+38 / -6 lines, three files:modules/local/rename/decompress_fasta.nf(new, 20 lines) — wrapsreformat.shfrom the existing bbmap container (no new dependencies). Taggedprocess_tiny.workflows/dram.nf— channel branch on.gzsuffix, decompress only the gz branch, mix both back. Sample-name stripping is unified sosample.faandsample.fa.gzyield identical downstream names.nextflow_schema.json—input_fastaandfasta_fmtdescriptions updated to mention gz support.How it works
The default
--fasta_fmt '*.f*'already matches both plain and.gzfiles, so users with a mixed directory don't need to change their launch.Test plan
nextflow inspectparses cleanly.*.faand*.fa.gz: confirm both end up annotated identically andDECOMPRESS_FASTAonly fires on the gz branch.🤖 Generated with Claude Code