-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low Busco and Gene count on Cannabis genome #660
Comments
Hard to say what goes wrong, here. However, Cannabis sativa is a case where a reference annotation for another strain exists. Proteins should map well. I recommend downloading the annotated proteins of Cannabis sativa (all 3 strains that have an annotation), Trema orientale, Parasponia andersonii, concatenate them and use as input for GALBA. If BUSCO scores are still low, visualize in a genome browser, check visually what's going wrong. (Do that with the BRAKER & MAKER predictions, too.) Try to visualize the BUSCOs as well. BUSCO doesn't care about repeat masking. Maybe your genome is overmasked? You should see that when you look in a browser at the BUSCOs. Did you run the new BUSCO with miniprot support? Or compleasm? I would give compleasm a test run, see whether that remotely reproduces the genomic BUSCO scores. |
Thanks for your reply, Katharina. I am still testing the results. I will let you know once it is finished. |
Hello, have you resolved the issue now? I had the same issue when I predicted my genome using braker3 and could you please share any advice to improve the busco score or the number of predicted genes? |
The number of predicted genes can be changed if you re-run TSEBRA, manually, enforcing the best previous gene set (e.g. the genemark or the augustus gene set). The BUSCOs may or may not improve with that. In the case originally reported here, protein BUSCOs were too low for several pipelines compared to genome level BUSCOs. It is important to be aware that BUSCO (and compleasm) are not gene predictors. They will report the presence of a conserved protein sequence in the genome regardless of whether splice sites are valid, and whether there's a valid start and stop codon associated. You can figure this out by visualizing the BUSCOs/compleasm BUSCOs in a genome browser, next to a track with gene predictions. |
In addition to above advice, I have today added the functionality that BUSCO runs compleasm on genome level to generate hints for prediction with Augustus. These changes are currently only in branch https://github.com/Gaius-Augustus/BRAKER/tree/compleasm . However, this will only pick up on complete or duplicated BUSCOs without frame shifts. I am also working on automating running TSEBRA in a way that minimizes missing BUSCOs, but that's still ongoing work. |
That branch was merged into master. The solution is documented on a poster: https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/posters/poster_PAG2024.pdf |
Thank you for your detailed suggestions, which I will take to try to run my data. If I have better results, I will let you know. |
Hi Braker Team,
I am running BRAKER3 for annotating new Cannabis Haplotyes, but I am having a really low busco ~3% and only between 11,000 to 14,000 genes predicted where the expected is around 40,000.
I tried to run MAKER3 and get the expected number of genes but really low BUSCO too. I genome mode the busco is around 98%.
I do not know if you could have a recommendation on how to increase the busco and the number of genes predicted.
Here is the code that I am trying.
PROJECT="AGS106_Hap2"
GENOME="/DATA/home/jmlazaro/Projects/Annotations_Cannabis/CAP_Snakemake_Sundance/MAIN_FASTAs/AGS106_Hap2.GAP.CHR_ID.reviewed.chr_assembled.fasta"
RNA_DATA="/DATA/home/jmlazaro/Projects/Annotations_Cannabis/AGS106_Hap2/AGS106_Hap2/Minimap_Aligned/minimap2.sorted.MAPQ20.dedup.bam"
CPU_Number="48"
PROTEIN_DATASET="/DATA/home/jmlazaro/github/orthodb-clades/BRAKER3_Clades/Viridiplantae.fa"
#export sif
export BRAKER_SIF=$PWD/braker3.sif
wd=$PROJECT
singularity instance start -B ${PWD}:${PWD} ${BRAKER_SIF} Hap2
singularity exec instance://Hap2 braker.pl --genome=$GENOME --bam=$RNA_DATA --softmasking --workingdir=${wd} --GENEMARK_PATH=${ETP}/gmes --prot_seq=$PROTEIN_DATASET --threads $CPU_Number --skip_fixing_broken_genes --gff3 --verbosity 4
singularity instance stop Hap2
my augustus hint file is her: https://sunflowergenome.org/annotations-data/assets/data/annotations/Cannabis/augustus.hints.gtf.gz
Thanks for your help or insights.
The text was updated successfully, but these errors were encountered: