Low Busco and Gene count on Cannabis genome #660

megahitokiri · 2023-08-19T21:19:46Z

Hi Braker Team,

I am running BRAKER3 for annotating new Cannabis Haplotyes, but I am having a really low busco ~3% and only between 11,000 to 14,000 genes predicted where the expected is around 40,000.

I tried to run MAKER3 and get the expected number of genes but really low BUSCO too. I genome mode the busco is around 98%.

I do not know if you could have a recommendation on how to increase the busco and the number of genes predicted.

Here is the code that I am trying.

PROJECT="AGS106_Hap2"
GENOME="/DATA/home/jmlazaro/Projects/Annotations_Cannabis/CAP_Snakemake_Sundance/MAIN_FASTAs/AGS106_Hap2.GAP.CHR_ID.reviewed.chr_assembled.fasta"
RNA_DATA="/DATA/home/jmlazaro/Projects/Annotations_Cannabis/AGS106_Hap2/AGS106_Hap2/Minimap_Aligned/minimap2.sorted.MAPQ20.dedup.bam"
CPU_Number="48"
PROTEIN_DATASET="/DATA/home/jmlazaro/github/orthodb-clades/BRAKER3_Clades/Viridiplantae.fa"

#export sif
export BRAKER_SIF=$PWD/braker3.sif

wd=$PROJECT

singularity instance start -B ${PWD}:${PWD} ${BRAKER_SIF} Hap2

singularity exec instance://Hap2 braker.pl --genome=$GENOME --bam=$RNA_DATA --softmasking --workingdir=${wd} --GENEMARK_PATH=${ETP}/gmes --prot_seq=$PROTEIN_DATASET --threads $CPU_Number --skip_fixing_broken_genes --gff3 --verbosity 4

singularity instance stop Hap2

my augustus hint file is her: https://sunflowergenome.org/annotations-data/assets/data/annotations/Cannabis/augustus.hints.gtf.gz

Thanks for your help or insights.

KatharinaHoff · 2023-08-21T14:22:23Z

Hard to say what goes wrong, here.

However, Cannabis sativa is a case where a reference annotation for another strain exists. Proteins should map well.

I recommend downloading the annotated proteins of Cannabis sativa (all 3 strains that have an annotation), Trema orientale, Parasponia andersonii, concatenate them and use as input for GALBA. If BUSCO scores are still low, visualize in a genome browser, check visually what's going wrong. (Do that with the BRAKER & MAKER predictions, too.) Try to visualize the BUSCOs as well.

BUSCO doesn't care about repeat masking. Maybe your genome is overmasked? You should see that when you look in a browser at the BUSCOs.

Did you run the new BUSCO with miniprot support? Or compleasm? I would give compleasm a test run, see whether that remotely reproduces the genomic BUSCO scores.

megahitokiri · 2023-08-24T19:42:50Z

Thanks for your reply, Katharina. I am still testing the results. I will let you know once it is finished.

lovelynewGao · 2023-10-25T08:58:19Z

Hello, have you resolved the issue now? I had the same issue when I predicted my genome using braker3 and could you please share any advice to improve the busco score or the number of predicted genes?

KatharinaHoff · 2023-11-20T08:48:47Z

The number of predicted genes can be changed if you re-run TSEBRA, manually, enforcing the best previous gene set (e.g. the genemark or the augustus gene set).

The BUSCOs may or may not improve with that.

In the case originally reported here, protein BUSCOs were too low for several pipelines compared to genome level BUSCOs. It is important to be aware that BUSCO (and compleasm) are not gene predictors. They will report the presence of a conserved protein sequence in the genome regardless of whether splice sites are valid, and whether there's a valid start and stop codon associated. You can figure this out by visualizing the BUSCOs/compleasm BUSCOs in a genome browser, next to a track with gene predictions.

KatharinaHoff · 2023-11-27T15:47:26Z

In addition to above advice, I have today added the functionality that BUSCO runs compleasm on genome level to generate hints for prediction with Augustus. These changes are currently only in branch https://github.com/Gaius-Augustus/BRAKER/tree/compleasm . However, this will only pick up on complete or duplicated BUSCOs without frame shifts.

I am also working on automating running TSEBRA in a way that minimizes missing BUSCOs, but that's still ongoing work.

KatharinaHoff · 2024-02-16T12:18:41Z

That branch was merged into master. The solution is documented on a poster: https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/posters/poster_PAG2024.pdf

lovelynewGao · 2024-02-28T14:26:44Z

That branch was merged into master. The solution is documented on a poster: https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/posters/poster_PAG2024.pdf

Thank you for your detailed suggestions, which I will take to try to run my data. If I have better results, I will let you know.

KatharinaHoff added the question Further information is requested label Aug 21, 2023

KatharinaHoff self-assigned this Nov 20, 2023

KatharinaHoff closed this as completed Feb 16, 2024

zzbbf123 mentioned this issue Mar 19, 2024

missing BUSCOs in BRAKER #784

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Busco and Gene count on Cannabis genome #660

Low Busco and Gene count on Cannabis genome #660

megahitokiri commented Aug 19, 2023

KatharinaHoff commented Aug 21, 2023

megahitokiri commented Aug 24, 2023

lovelynewGao commented Oct 25, 2023

KatharinaHoff commented Nov 20, 2023

KatharinaHoff commented Nov 27, 2023

KatharinaHoff commented Feb 16, 2024

lovelynewGao commented Feb 28, 2024

Low Busco and Gene count on Cannabis genome #660

Low Busco and Gene count on Cannabis genome #660

Comments

megahitokiri commented Aug 19, 2023

KatharinaHoff commented Aug 21, 2023

megahitokiri commented Aug 24, 2023

lovelynewGao commented Oct 25, 2023

KatharinaHoff commented Nov 20, 2023

KatharinaHoff commented Nov 27, 2023

KatharinaHoff commented Feb 16, 2024

lovelynewGao commented Feb 28, 2024