Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aragorn_out_to_gff3.py: Handle introns #115

Closed
sjackman opened this issue May 7, 2015 · 14 comments
Closed

aragorn_out_to_gff3.py: Handle introns #115

sjackman opened this issue May 7, 2015 · 14 comments
Assignees
Labels

Comments

@sjackman
Copy link
Contributor

sjackman commented May 7, 2015

Does aragorn_out_to_gff3.py handle introns? It doesn't appear to after scanning the code. Feature request? Here's an example of a few tRNA with introns.

3   tRNA-Val              c[883351,883869]  35      (tac)i(40,443)
5   tRNA-seC               [244644,244785]  87      (tca)i(29,57)
7   tRNA-Arg               [615338,616255]  36      (tcg)i(40,822)
2   tRNA-Ser               [324667,326116]  35      (gga)i(40,1354)
1   tRNA-Val                c[84563,87002]  2383    (tac)i(34,2347)
1   tRNA-Asp              c[114384,114687]  39      (atc)i(43,206)
4   tRNA-seC               [324290,324369]  36      (tca)i(38,6)
5   tRNA-Ser               [345441,345776]  28      (gct)i(33,251)
7   tRNA-Val               [405066,405293]  34      (cac)i(35,133)
3   tRNA-Asn               [367853,369971]  2062    (att)i(32,2028)
3   tRNA-Pro               [234486,236430]  1887    (agg)i(31,1856)
1   tRNA-Ser                 [66020,66741]  32      (gct)i(37,633)
2   tRNA-Leu                 [84137,84224]  36      (tag)i(38,10)
3   tRNA-Arg               [168865,170209]  1285    (gcg)i(32,1253)
4   tRNA-His               [175766,176602]  776     (gtg)i(34,742)
1   tRNA-seC                c[88201,88280]  36      (tca)i(38,6)
@hexylena
Copy link
Collaborator

hexylena commented May 7, 2015

@sjackman it sure doesn't! I'd never seen results like that, but I only work with small viruses. How should this be marked up in the GFF3?

@hexylena
Copy link
Collaborator

hexylena commented May 7, 2015

Also, I don't suppose you could link me to a fasta sequence which would produce the data with introns, so I can test?

@hexylena hexylena self-assigned this May 7, 2015
@hexylena hexylena added the bug label May 7, 2015
@sjackman
Copy link
Contributor Author

sjackman commented May 7, 2015

Dang, you're fast. I'll get you some data.

@sjackman
Copy link
Contributor Author

sjackman commented May 7, 2015

Here you go. Thanks! https://gist.github.com/sjackman/31cd4e17347ff1af488c

@hexylena
Copy link
Collaborator

hexylena commented May 7, 2015

@sjackman (never had comments on gists, I'm assuming it emails you, but in case it doesn't...)

would you mind sharing the command you ran this with? As of Aragorn v1.2.36 I didn't seem to be able to produce any headers with a )i( sequence in them, despite trying a number of configurations.

@sjackman
Copy link
Contributor Author

sjackman commented May 7, 2015

No, I didn't get a notification from the gist. That seems like a GitHub issue/bug.

@sjackman
Copy link
Contributor Author

sjackman commented May 7, 2015

Sorry, I meant to post the shell command and forgot.

aragorn -gcbact -i -c -w -o aragorn.tsv sample.fa

@hexylena
Copy link
Collaborator

hexylena commented May 7, 2015

brilliant, cheers. I'll ping you when I have a fix in place

@sjackman
Copy link
Contributor Author

sjackman commented May 7, 2015

I'll mention that I've only ever seen a single intron. I have no idea whether ARAGORN could possibly output multiple introns. Thanks, Eric!

@hexylena
Copy link
Collaborator

hexylena commented May 7, 2015

Good to know. Having never worked with anything other than bacteriophages, I appreciate the insight.

@hexylena
Copy link
Collaborator

hexylena commented May 7, 2015

@sjackman not sure how familiar you are with the aragorn/maker/etc, but in the gff3 file you posted I'm having a bit of trouble rationalising a couple of the feature locations:

1   tRNA-Lys                  c[1533,4118]  34      (ttt)i(39,2512)
7   tRNA-Met                c[26588,26660]  34      (cat)
8   tRNA-Tyr                 [26850,27468]  559     (ata)i(36,523)

becomes:

1   maker   tRNA    1534    4115    100 -   .   ID=tRNA1;Parent=gene1;Name=trnK-UUU;_AED=0.48;_eAED=0.48;_QI=12|1|1|1|0|0|2|1|19
1   maker   tRNA    26588   26660   100 -   .   ID=tRNA7;Parent=gene7;Name=trnM-CAU;_AED=0.00;_eAED=0.00;_QI=0|-1|1|1|-1|0|1|1|24
1   maker   tRNA    26850   27468   100 +   .   ID=tRNA8;Parent=gene8;Name=trnV-UAC;_AED=0.48;_eAED=0.48;_QI=0|1|1|1|0|0|2|1|25

In the tRNA1, the location is modified, +1 on the left hand, and -3 on the right, whilst in tRNA2, the location is untouched. Given that the tRNA runs the full length of the feature, and that the other intron containing tRNA has an umodified location, I can't attribute it to the presence/absence of an intron. Is this a maker bug? Are you familiar enough to say?

I was just checking my work against the "reference", when I noticed a couple of these sorts of discrepancies.

hexylena pushed a commit that referenced this issue May 7, 2015
Some maker data was provided, but it's technically incorrect.
I validated my results against the aragorn report, and found
inconsistencies with what Maker has done, and what ARAGORN
has done. This may be a result of maker using another tRNA caller
or just plain ol' bugs.

Nevertheless, despite the ugly code (a lot better than it was
previously), this should be technically correct.
@sjackman
Copy link
Contributor Author

sjackman commented May 7, 2015

Hi, Eric. Very sorry. I should have mentioned that the GFF is generated not de novo but by aligning tRNA sequences from a difference (very closely related) species to the reference by MAKER using Exonerate. So, the coordinates of the de novo ARAGORN features and the MAKER/Exonerate features may not agree exactly.

@hexylena
Copy link
Collaborator

hexylena commented May 7, 2015

@sjackman no worries, not a problem. That explains the differences!

@bgruening
Copy link
Owner

Seems that I missed a lot! Thanks @erasche for fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants