feat: Rename ID col in GFF file by madeline-scyphers · Pull Request #457 · WrightonLabCSU/DRAM

madeline-scyphers · 2025-09-23T20:11:44Z

In the Prodigal GFF file, the metadata ID field is a generated unique ID that is in the format 1_1, 1_2, 2_1, 2_2, etc. This is a problem if people concatenate all the GFFs together since then the unique IDs aren't unique. In DRAM1, the IDs were repalced with the SeqID_Genenumber. So that is what we are doing here

We also replaced the python script that parsed the GFF into a summary TSV for later use in DRAM2 into a tsv and replaced with with bash parsing. Which benchmarking showed to be around 10-50 times faster.

In the Prodigal GFF file, the metadata ID field is a generated unique ID that is in the format 1_1, 1_2, 2_1, 2_2, etc. This is a problem if people concatenate all the GFFs together since then the unique IDs aren't unique. In DRAM1, the IDs were repalced with the SeqID_Genenumber. So that is what we are doing here We also replaced the python script that parsed the GFF into a summary TSV for later use in DRAM2 into a tsv and replaced with with bash parsing. Which benchmarking showed to be around 10-50 times faster.

Copilot

Pull Request Overview

This PR renames ID columns in Prodigal GFF files to ensure uniqueness when multiple files are concatenated, and replaces a Python script with faster bash parsing for GFF to TSV conversion.

Replace generated IDs (1_1, 1_2, etc.) with SeqID_GeneNumber format for uniqueness
Switch from Python script to bash parsing for GFF to TSV conversion (10-50x faster)
Add ID replacement functionality using new bash script

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

modules/local/call/call_genes_prodigal.nf

madeline-scyphers added the enhancement New feature or request label Sep 23, 2025

github-project-automation bot added this to DRAM Sep 23, 2025

github-project-automation bot moved this to To Sort in DRAM Sep 23, 2025

madeline-scyphers requested a review from Copilot September 23, 2025 20:11

Copilot AI reviewed Sep 23, 2025

View reviewed changes

modules/local/call/call_genes_prodigal.nf Show resolved Hide resolved

modules/local/call/call_genes_prodigal.nf Show resolved Hide resolved

madeline-scyphers merged commit ab7b133 into dev Sep 23, 2025
1 check passed

github-project-automation bot moved this from To Sort to Done in DRAM Sep 23, 2025

madeline-scyphers mentioned this pull request Oct 7, 2025

prodigal gff ID data rename #453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Rename ID col in GFF file#457

feat: Rename ID col in GFF file#457
madeline-scyphers merged 1 commit intodevfrom
feature/gff-rename-id-column

madeline-scyphers commented Sep 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

madeline-scyphers commented Sep 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants