Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATT 'finding' genes in the wrong location. #19

Open
MrTheronJ opened this issue Jun 15, 2023 · 1 comment
Open

RATT 'finding' genes in the wrong location. #19

MrTheronJ opened this issue Jun 15, 2023 · 1 comment

Comments

@MrTheronJ
Copy link

MrTheronJ commented Jun 15, 2023

Hey there! I was hoping to get some clarification regarding some issues I've come across...

ISSUE 1:

Ran RATT with the following inputs:

  • transfer type - Strain
  • Isolate - SEA08151 (.fasta)
  • Reference - H37Rv (.embl, .gbk)

Output generated:

The gene lpqG found by RATT has a very poor alignment to the reference gene (lpqG_alignment.txt). RATT seems to find a sequence that's only 189 bp long compared to the 723 bp long sequence in the reference. When looking upstream around ~800 bp, I found an identical sequence to the reference (as shown in lpqG_alignment.txt). Why didn't RATT initially find this sequence with a significantly better alignment?


ISSUE 2:

Ran RATT with the following inputs:

  • transfer type - Strain
  • Isolate - 1-0006 (.fasta)
  • Reference - H37Rv (.embl, .gbk)

Output generated:

The mamB gene is near the deletion of the ~9.5-kb RD1 region in H37Rv. The mamB gene found by RATT also has a very poor alignment, except for the very last ~30 bp (mamB_alignment.txt). I was wondering why it was being found there at all. I was expecting RATT not to call this mamB because of how dissimilar the sequences are. I believe the real mamB can be found at this location: FeatureLocation(ExactPosition(2268174), ExactPosition(2272994), strand=-1).

Thanks for your help! Any insight into these issues would be greatly appreciated!

@haessar
Copy link
Collaborator

haessar commented Oct 5, 2023

Apologies my response has taken so long. I've had a chance to explore this and found that the transfer type (-t option) has quite an impact in this case.
For ISSUE 1, running RATT with transfer type "Assembly" ensures gene lpqG is annotated with range (4063372, 4064094), which appears identical in length to the reference.
For ISSUE 2, RATT produced the same annotations for mamB regardless of choice of transfer type (Assembly, Species or Strain), but I did notice the following in Query/1-0006.1.Mutations.gff:
unknown BBA Synteny 2265496 2271475 0 + . note="No synteny with reference. Possible insert or too divergent"
Perhaps adjusting the nucmer parameters directly using the "Free" transfer type (see README) will ensure synteny and help with attaining the correct annotation for mamB. Is ISSUE 2 a one-off case of an incorrectly annotated gene or are there many others?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants