-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database update? #99
Comments
Hi, Yes, that's correct - we used a 2018 database with KEGG-uniref idmappings during CheckM2 development, but UNIREF has since decided not to include KEGG id mapping in its future updates, meaning that currently CheckM2 is using the last available database from 2018. Given the reliance of CheckM2 on fast diamond-based protein annotation, we haven't switched to KEGG hmm-searches. We are currently exploring using an alternative annotation system using DRAM-based (or other annotation tools, e.g. String/EggNog) annotation of the full GTDB protein database, but that is still at the benchmarking stage for now. Nevertheless, though the annotation database is a bit old, we'll be using newly added publicly available genomes to update CheckM2 (newest CheckM2 update incorporating GTDB R214 should hopefully be out by the end of the month). |
Thanks for the quick answer! Btw, I know you've been looking into using Kegg pathways as part of the completeness scoring (correct me if I am wrong). Do you think that Gene Ontology (GO) might be a better fit for pathway lookups, generally speaking? |
How old is the uniref database that CheckM2 is currently using? I see a reference to 3rd june 2018 in the main publication but am not sure if it has been updated since?
Am I correct in assuming that you downloaded uniref100 and the idmappings (https://www.uniprot.org/help/downloads), and then kept only the proteins that have a kegg orthology mapping?
Cheers.
The text was updated successfully, but these errors were encountered: