Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarking TSEBRA with BUSCO: my results are not good #6

Closed
amvarani opened this issue Aug 20, 2021 · 7 comments
Closed

benchmarking TSEBRA with BUSCO: my results are not good #6

amvarani opened this issue Aug 20, 2021 · 7 comments

Comments

@amvarani
Copy link

Hi there,
I would like to describe my experience using TSEBRA with a plant genome, using BUSCO as benchmark
I have a repeat masked genome and BRAKER1 and BRAKER2 annotation results
My results are:

BRAKER1: C:97.7%[S:87.2%,D:10.5%],F:1.4%,M:0.9%,n:1614
BRAKER2: C:97.7%[S:72.6%,D:25.1%],F:0.9%,M:1.4%,n:1614

TSEBRA: C:78.4%[S:75.1%,D:3.3%],F:10.8%,M:10.8%,n:1614

The same annotated genome deposited at Phytozome: C:99.7%[S:66.9%,D:32.8%],F:0.1%,M:0.2%,n:1614

Why TSEBRA is messing up the BRAKER1 and BRAKER2 annotation ?
Maybe I need to tune up the Configuration File ?
Any help ?

Thanks a lot

@LarsGab
Copy link
Collaborator

LarsGab commented Aug 23, 2021

Hi,

I'm sorry that TSEBRA didn't work properly for your annotation.
The problem could be that the default configuration filters too many transcripts out.
I included a more inclusive configuration into the repository at TSEBRA/config/pref_braker1.cfg, which you can use instead of the default.cfg.
I hope this improves your results.
Best, Lars

@amvarani
Copy link
Author

Hi Lars,
Thanks a lot for your reply.
However, changing the conf file to "pref_braker1.cfg" the busco results still not good:

C:79.6%[S:76.1%,D:3.5%],F:9.4%,M:11.0%,n:1614

@LarsGab
Copy link
Collaborator

LarsGab commented Aug 23, 2021

It seems to me that there are quite a few transcripts in your BRAKER results that are not supported by RNA-seq or protein evidence. TSEBRA removes all of these transcripts.
I added another configuration file (keep_ab_initio.cfg) to the repository that keeps these transcripts.

@amvarani
Copy link
Author

Hi there!
Well, still not good:
C:79.0%[S:75.1%,D:3.9%],F:10.8%,M:10.2%,n:1614
Can I send my files for you to take a look, if possible ?

@LarsGab
Copy link
Collaborator

LarsGab commented Aug 23, 2021

Hi,
yes, please send me the files so I can take a look at the issue. My email is lars.gabriel@uni-greifswald.de
Best, Lars

@amvarani
Copy link
Author

Hi there,
Finally, with the kindly help of @LarsGab, I have found the problem !
I was using the EvidenceModeler scripts: "augustus_GTF_to_EVM_GFF3.pl" and "gff3_file_to_proteins.pl" to convert the TSEBRA GTF file to GFF3 and them fasta protein format, respectively
I noticed that the conversion made by these scripts did not work proper, when we run Braker with the option "--alternatives-from-evidence=true"
For a solution, the best strategy is to use the Augustus scripts "gtf2gff.pl" and "gtf2aa.pl", respectively.
Using these scripts, I finally got a reasonable BUSCO scores:

C:98.4%[S:93.6%,D:4.8%],F:0.6%,M:1.0%,n:1614

@smallfishcui
Copy link

@amvarani
Thank you for sharing! It's important to know, because I also use the two perl scripts which you use before to convert the files to measure busco. I will try the way you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants