-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional gene models to the metagenome mode #24
Comments
Hi Antonio, psyched to hear this! First of all I've been aware of I've been thinking about how to allow custom metagenomic models to be passed to Pyrodigal, and actually not a lot would have to be changed for everything to work. The more complicated part would be how to store the models to allow you (or other usecases of custom models) to load efficiently. But otherwise it would be feasible to have an For the second question, I'd have to think about how to integrate it efficiently; I think you could actually try to count the number of extracted nodes without actually scoring them for putative gene density. But indeed, this may be a bit more out of scope compared to the metagenomic stuff. |
Great to hear! I like this interface idea. If
This would make more sense than what people (me included) usually do. Although it is not within the scope right now, it would be interesting to have something like this in the future or for another project. That's a feature that is lacking in all gene callers. |
It almost took a year but I've started updating the interface to allow this. At the moment i can compile an external package that depends on |
That's great! Thanks @althonos Not sure if I understand the interface, though. The gene models would be packaged in a separate package and then read by pyrodigal? |
Yes, I'll make a repo and invite you to that 👍 |
Version |
Thank you! Really good idea to store the models in json files to avoid compilation. |
I actually did something even more hacky to avoid storing them in JSON once installed 🙈 |
Thanks for adding these models! Is there a way to use the models with pyrodigal in meta mode on the CLI? |
@rohansachdeva : I have added a CLI for |
Awesome - thank you! |
Hey @althonos,
Thanks for your work on pyrodigal! It is an amazing tool!
I'm wondering if you would be interested in adding additional gene models to the metagenome mode. In prodigal-gv I added a couple of models trained on genomes of giant viruses (the main reason for that is that none of the pre-trained models in Prodigal detect the TATATA RBS motif that is super common in those viruses) and, more importantly, models for phages that use the translation table 15 (very prevalent in Crassvirales).
These models proved to be really useful in geNomad and they ended up improving the detection of giant viruses and phages with code 15 in IMG/VR. But I can see two main disadvantages of using them:
I'm planning to adopt pyrodigal for my next projects and I'd use these models a lot, but I can always change the code locally if you feel that adding additional models is not within the scope of this project. No worries :)
Somewhat related to this:
I'm doing a large-scale gene prediction for hundreds of thousands of genomes and some of them will have alternative genetic codes. I wrote a function that takes part of the genome (not the whole thing, to speed things up) and tests different translation tables (4, 11, and 15) to evaluate whether the genome potentially uses an alternative code (I just compare the gene density between the codes). Do you think a function like this could be useful in pyrodigal?
There are multiple papers where people look for alt-coded bacteria/viruses by running Prodigal multiple times and comparing the gene density. Having this implemented in an elegant and efficient solution could be very useful. Databases (NCBI, IMG, etc.) are full of Crassvirales with truncated genes.
Just an idea! Please ignore all of that if you feel it would be out of the scope for this package.
Thanks again!
The text was updated successfully, but these errors were encountered: