Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional gene models to the metagenome mode #24

Closed
apcamargo opened this issue Nov 5, 2022 · 11 comments
Closed

Add additional gene models to the metagenome mode #24

apcamargo opened this issue Nov 5, 2022 · 11 comments
Labels
enhancement New feature or request

Comments

@apcamargo
Copy link

Hey @althonos,

Thanks for your work on pyrodigal! It is an amazing tool!

I'm wondering if you would be interested in adding additional gene models to the metagenome mode. In prodigal-gv I added a couple of models trained on genomes of giant viruses (the main reason for that is that none of the pre-trained models in Prodigal detect the TATATA RBS motif that is super common in those viruses) and, more importantly, models for phages that use the translation table 15 (very prevalent in Crassvirales).

These models proved to be really useful in geNomad and they ended up improving the detection of giant viruses and phages with code 15 in IMG/VR. But I can see two main disadvantages of using them:

  • Because more models would be evaluated, the execution would be a bit slower (but this is negligible in my experience).
  • The gene predictions in metagenome mode would not match standard Prodigal anymore.

I'm planning to adopt pyrodigal for my next projects and I'd use these models a lot, but I can always change the code locally if you feel that adding additional models is not within the scope of this project. No worries :)

Somewhat related to this:

I'm doing a large-scale gene prediction for hundreds of thousands of genomes and some of them will have alternative genetic codes. I wrote a function that takes part of the genome (not the whole thing, to speed things up) and tests different translation tables (4, 11, and 15) to evaluate whether the genome potentially uses an alternative code (I just compare the gene density between the codes). Do you think a function like this could be useful in pyrodigal?

There are multiple papers where people look for alt-coded bacteria/viruses by running Prodigal multiple times and comparing the gene density. Having this implemented in an elegant and efficient solution could be very useful. Databases (NCBI, IMG, etc.) are full of Crassvirales with truncated genes.

Just an idea! Please ignore all of that if you feel it would be out of the scope for this package.

Thanks again!

@althonos
Copy link
Owner

althonos commented Nov 6, 2022

Hi Antonio, psyched to hear this!

First of all I've been aware of prodigal-gv for some time, and I recently recommended it to a colleague working with virae!

I've been thinking about how to allow custom metagenomic models to be passed to Pyrodigal, and actually not a lot would have to be changed for everything to work. The more complicated part would be how to store the models to allow you (or other usecases of custom models) to load efficiently. But otherwise it would be feasible to have an OrfFinder.find_genes call that takes an additional argument which would be a list of MetagenomicModel objects, or use the default ones if None given.

For the second question, I'd have to think about how to integrate it efficiently; I think you could actually try to count the number of extracted nodes without actually scoring them for putative gene density. But indeed, this may be a bit more out of scope compared to the metagenomic stuff.

@apcamargo
Copy link
Author

I've been thinking about how to allow custom metagenomic models to be passed to Pyrodigal, and actually not a lot would have to be changed for everything to work. The more complicated part would be how to store the models to allow you (or other usecases of custom models) to load efficiently. But otherwise it would be feasible to have an OrfFinder.find_genes call that takes an additional argument which would be a list of MetagenomicModel objects, or use the default ones if None given.

Great to hear! I like this interface idea. If None is used, would you also restrict the model search within a range of GC values or you would allow users to do a full search?

For the second question, I'd have to think about how to integrate it efficiently; I think you could actually try to count the number of extracted nodes without actually scoring them for putative gene density. But indeed, this may be a bit more out of scope compared to the metagenomic stuff.

This would make more sense than what people (me included) usually do. Although it is not within the scope right now, it would be interesting to have something like this in the future or for another project. That's a feature that is lacking in all gene callers.

@althonos
Copy link
Owner

althonos commented Sep 5, 2023

It almost took a year but I've started updating the interface to allow this. At the moment i can compile an external package that depends on pyrodigal but uses your prodigal-gv model, but I'm working on a way that doesn't need compiling (using training info in some other format) so that it's easier to distribute :)

@apcamargo
Copy link
Author

That's great! Thanks @althonos

Not sure if I understand the interface, though. The gene models would be packaged in a separate package and then read by pyrodigal?

@althonos
Copy link
Owner

althonos commented Sep 6, 2023

Yes, I'll make a repo and invite you to that 👍

@althonos
Copy link
Owner

althonos commented Sep 7, 2023

Version 3.0.0 of Pyrodigal now supports using user-provided metagenomic models to run gene finding in meta mode. The giant-virus models are distributed in pyrodigal-gv.

@althonos althonos closed this as completed Sep 7, 2023
@apcamargo
Copy link
Author

Thank you! Really good idea to store the models in json files to avoid compilation.

@althonos
Copy link
Owner

althonos commented Sep 8, 2023

I actually did something even more hacky to avoid storing them in JSON once installed 🙈

@rohansachdeva
Copy link

Thanks for adding these models!

Is there a way to use the models with pyrodigal in meta mode on the CLI?

@althonos
Copy link
Owner

@rohansachdeva : I have added a CLI for pyrodigal-gv in latest version v0.3.0. Use pyrodigal-gv instead of prodigal in the shell and you'll be all set 😃

@rohansachdeva
Copy link

Awesome - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants