benchmarking TSEBRA with BUSCO: my results are not good #6

amvarani · 2021-08-20T20:27:23Z

Hi there,
I would like to describe my experience using TSEBRA with a plant genome, using BUSCO as benchmark
I have a repeat masked genome and BRAKER1 and BRAKER2 annotation results
My results are:

BRAKER1: C:97.7%[S:87.2%,D:10.5%],F:1.4%,M:0.9%,n:1614
BRAKER2: C:97.7%[S:72.6%,D:25.1%],F:0.9%,M:1.4%,n:1614

TSEBRA: C:78.4%[S:75.1%,D:3.3%],F:10.8%,M:10.8%,n:1614

The same annotated genome deposited at Phytozome: C:99.7%[S:66.9%,D:32.8%],F:0.1%,M:0.2%,n:1614

Why TSEBRA is messing up the BRAKER1 and BRAKER2 annotation ?
Maybe I need to tune up the Configuration File ?
Any help ?

Thanks a lot

LarsGab · 2021-08-23T08:41:00Z

Hi,

I'm sorry that TSEBRA didn't work properly for your annotation.
The problem could be that the default configuration filters too many transcripts out.
I included a more inclusive configuration into the repository at TSEBRA/config/pref_braker1.cfg, which you can use instead of the default.cfg.
I hope this improves your results.
Best, Lars

amvarani · 2021-08-23T11:45:21Z

Hi Lars,
Thanks a lot for your reply.
However, changing the conf file to "pref_braker1.cfg" the busco results still not good:

C:79.6%[S:76.1%,D:3.5%],F:9.4%,M:11.0%,n:1614

LarsGab · 2021-08-23T13:36:49Z

It seems to me that there are quite a few transcripts in your BRAKER results that are not supported by RNA-seq or protein evidence. TSEBRA removes all of these transcripts.
I added another configuration file (keep_ab_initio.cfg) to the repository that keeps these transcripts.

amvarani · 2021-08-23T14:11:37Z

Hi there!
Well, still not good:
C:79.0%[S:75.1%,D:3.9%],F:10.8%,M:10.2%,n:1614
Can I send my files for you to take a look, if possible ?

LarsGab · 2021-08-23T15:50:30Z

Hi,
yes, please send me the files so I can take a look at the issue. My email is lars.gabriel@uni-greifswald.de
Best, Lars

amvarani · 2021-08-26T11:42:28Z

Hi there,
Finally, with the kindly help of @LarsGab, I have found the problem !
I was using the EvidenceModeler scripts: "augustus_GTF_to_EVM_GFF3.pl" and "gff3_file_to_proteins.pl" to convert the TSEBRA GTF file to GFF3 and them fasta protein format, respectively
I noticed that the conversion made by these scripts did not work proper, when we run Braker with the option "--alternatives-from-evidence=true"
For a solution, the best strategy is to use the Augustus scripts "gtf2gff.pl" and "gtf2aa.pl", respectively.
Using these scripts, I finally got a reasonable BUSCO scores:

C:98.4%[S:93.6%,D:4.8%],F:0.6%,M:1.0%,n:1614

smallfishcui · 2022-10-30T17:17:57Z

@amvarani
Thank you for sharing! It's important to know, because I also use the two perl scripts which you use before to convert the files to measure busco. I will try the way you suggested.

amvarani closed this as completed Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarking TSEBRA with BUSCO: my results are not good #6

benchmarking TSEBRA with BUSCO: my results are not good #6

amvarani commented Aug 20, 2021

LarsGab commented Aug 23, 2021

amvarani commented Aug 23, 2021

LarsGab commented Aug 23, 2021

amvarani commented Aug 23, 2021

LarsGab commented Aug 23, 2021

amvarani commented Aug 26, 2021

smallfishcui commented Oct 30, 2022

benchmarking TSEBRA with BUSCO: my results are not good #6

benchmarking TSEBRA with BUSCO: my results are not good #6

Comments

amvarani commented Aug 20, 2021

LarsGab commented Aug 23, 2021

amvarani commented Aug 23, 2021

LarsGab commented Aug 23, 2021

amvarani commented Aug 23, 2021

LarsGab commented Aug 23, 2021

amvarani commented Aug 26, 2021

smallfishcui commented Oct 30, 2022