Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransDecoder missing obvious ORFs? #114

Closed
soungalo opened this issue Sep 28, 2020 · 8 comments
Closed

TransDecoder missing obvious ORFs? #114

soungalo opened this issue Sep 28, 2020 · 8 comments

Comments

@soungalo
Copy link

I am running TransDecoder 5.5.0 (installed using conda) to predict plant proteins from transcripts.
I used the commands:

TransDecoder.LongOrfs -t transcripts.fasta -m 1
TransDecoder.Predict -t transcripts.fasta

I notice some strange behavior. For example, one of the records in transcripts.fasta is:

>transcript:AT1G13607.1.mrna1
ATGAACAATTTTCGAATAACCATTGTTGCGTTCTTAGCCGTTCTTGTCTTCACCACAACT
GTTACGAATTCTTTGGATGAACCTAATATGGACACTATATCCAAATCCAGAGAATACAAA
TGTAAAATTGACCTGGATTGTTCAAACCACATTGCATGTAGGCATTGTTCTTATCGCAAT
TGCAAATGCGATCATGGAACCTGTAAATGCATGCCATGACTCTTAACCCACAAGGCCACA
AGCCATTGATCCAACTGCATCTTCAACTCGTCTTAATCTCTCCTATATATGTACTCTTTT
GTTTGTAATGCAAAAGAAAATAAAACATAATATTTTCAGTTGATAAACTACTAATGAAAT
ATTATACGTCAACGAAATTTAGTATATAAACTACAAAACGGCAAAAATAGCTTTCTCGAA
ACCAACAAAGTTAATTGGACAAACGACAAAAA

When I run this sequence through NCBI's ORF finder, I get several peptides, including a 72 AA long one which I assume is correct.
image
However, relevant predictions from TransDecoder's output look like this:

>transcript:AT1G13607.1.mrna1.p10 GENE.transcript:AT1G13607.1.mrna1~~transcript:AT1G13607.1.mrna1.p10  ORF type:5prime_partial len:20 (-),score=0.90 transcript:AT1G13607.1.mrna1:391-450(-)
FVVCPINFVGFEKAIFAVL*
>transcript:AT1G13607.1.mrna1.p18 GENE.transcript:AT1G13607.1.mrna1~~transcript:AT1G13607.1.mrna1.p18  ORF type:complete len:2 (+),score=0.25 transcript:AT1G13607.1.mrna1:120-125(+)
M*
>transcript:AT1G13607.1.mrna1.p8 GENE.transcript:AT1G13607.1.mrna1~~transcript:AT1G13607.1.mrna1.p8  ORF type:complete len:23 (-),score=4.50 transcript:AT1G13607.1.mrna1:192-260(-)
MQLDQWLVALWVKSHGMHLQVP*
>transcript:AT1G13607.1.mrna1.p11 GENE.transcript:AT1G13607.1.mrna1~~transcript:AT1G13607.1.mrna1.p11  ORF type:complete len:18 (-),score=1.79 transcript:AT1G13607.1.mrna1:142-195(-)
MIAFAIAIRTMPTCNVV*
>transcript:AT1G13607.1.mrna1.p12 GENE.transcript:AT1G13607.1.mrna1~~transcript:AT1G13607.1.mrna1.p12  ORF type:complete len:14 (+),score=1.40 transcript:AT1G13607.1.mrna1:289-330(+)
MYSFVCNAKENKT*

As you can see - all very short peptides, none of which is the result detected by the simple ORF finder.
Could you please help me understand why are these results missing from my output and possibly what can be done to obtain them with TransDecoder? Were they actually missed, or were they filtered for some reason (labeled as low quality)?
Two thing I noticed:

  1. The "correct" ORF starts in position 1 of the transcript (i.e no UTR) - is this expected to affect TransDecoder's prediction?
  2. In TransDecoder's results, I see peptides p8,p10,p11,p12,p18 - where are all others (e.g. p1-p7)?

Thanks a lot!

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 28, 2020 via email

@soungalo
Copy link
Author

Thanks for the quick reply. I tried with -m 50, which indeed changed the output, but I'm still not getting the expected peptide. I now get a single peptide:

>transcript:AT1G13607.1.mrna1.p3 GENE.transcript:AT1G13607.1.mrna1~~transcript:AT1G13607.1.mrna1.p3  ORF type:complete len:50 (+),score=2.34 transcript:AT1G13607.1.mrna1:77-226(+)
MNLIWTLYPNPENTNVKLTWIVQTTLHVGIVLIAIANAIMEPVNACHDS*

Running with -m 70 resulted in no peptides being predicted for this transcript.
I suspect something else is going on. Any idea what that might be?

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 28, 2020 via email

@soungalo
Copy link
Author

I ran everything on the full transcriptome (28,280 transcripts).
Not sure if that's what you meant. Is there additional information somewhere that can help understand why this ORF was not predicted?

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 28, 2020 via email

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 28, 2020 via email

@soungalo
Copy link
Author

Thanks. Did that and it worked fine.

@Huangyizhong
Copy link

Thanks. Did that and it worked fine.
Hi, there. I have the same problem about the results from the ORFfinder. As we can see that the longest ORF was labeled with the partial. How to understand it? Should them be remained in the subsequent analysis?
Thanks.
Yizhong Huang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants