Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

features transferred with empty/invalid location #18

Open
0xaf1f opened this issue May 12, 2023 · 1 comment
Open

features transferred with empty/invalid location #18

0xaf1f opened this issue May 12, 2023 · 1 comment

Comments

@0xaf1f
Copy link
Contributor

0xaf1f commented May 12, 2023

In the last of a 7-contig monkeypox genome assembly, RATT produces a feature like this one at the top:

FT   repeat_region   complement()
FT                   /note="ITR"
FT                   /rpt_type=inverted
FT                   /rpt_type=terminal
FT   gene            267..1580
FT                   /locus_tag="mpox_00004"
FT                   /gene="mpox_00004"
FT   CDS             267..1580
FT                   /locus_tag="mpox_00004"
FT                   /note="Ankyrin (CPXV-017) D1L"
FT                   /codon_start=1
FT                   /product="MPXVgp004"
FT                   /protein_id="URK20443.1"
FT                   /gene="mpox_00004"

The invalid location complement() causes parser errors when trying to read this embl file. Input and output files attached.

Command used (ran from within the output directory):

ratt -p out -t Strain ../embls ../contig7.fasta

ratt-invalid-location.tar.gz

@haessar
Copy link
Collaborator

haessar commented Oct 5, 2023

Sorry for not getting back to you sooner @0xaf1f. I had a play around with your files and found that the FT causing the issues were

FT   repeat_region   1..6439
FT                   /note="ITR"
FT                   /rpt_type=inverted
FT                   /rpt_type=terminal

In fact, removing these 4 lines from embls/mpox..ON563414.3.embl ensures that the "complement()" seen in the original output is no longer there.

The source code that is generating that "complement()" is in ratt_correction.pm:2206-2218 when trying to parse the coordinates from "FT repeat_region complement(-6437..1)" (see the intermediate file output/out.UnicyclerMpox.gnl_C_L_7.embl). I can assume this occurred during the Transfer step where the coord range 1..6439 was outside the bounds of the submitted sequence contig7.fasta (length 1667).

As a short term solution I'd recommend removing any such problem features from the input before running. Longer term there is clearly a bug in the code during this coord parsing (you might have seen the 4 "Use of uninitialized value" errors in the RATT stdout during Correction phase), but still need to figure out how to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants