Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyHMMER 0.8.0 TMD should be 0 for last node #40

Closed
jamaliki opened this issue May 18, 2023 · 5 comments
Closed

PyHMMER 0.8.0 TMD should be 0 for last node #40

jamaliki opened this issue May 18, 2023 · 5 comments
Labels
question Further information is requested

Comments

@jamaliki
Copy link

Hi,

We use your excellent software in ModelAngelo! Ever since the upgrade to pyHMMER v0.8.0, we are getting the following error during HMMAlign:

ValueError: Invalid HMM: TMD should be 0 for last node

I have attached an example of the HMM files we write. Could you please advise how we should proceed so that it works with the new version?
0.hmm.txt

@althonos
Copy link
Owner

HI @jamaliki!

I think the error is related to #36: since hmmalign doesn't automatically verify whether the given HMM is valid, there was a chance for a segmentation fault on invalid input. To mitigate this, I added an extra step to check the HMM is valid before invoking the hmmalign pipeline, and it seems to be what you're getting here.

May I ask, how did you obtain the HMMs? The error is exactly what it says, i.e. that the transition probability from match state to deletion ($T_{M \to D}$) should be zero for the last node, otherwise your HMM may be "stuck" in a deletion loop and won't align properly (I think?).

@jamaliki
Copy link
Author

I see, thank you very much. We are generating these HMM's by hand based on a neural network's output. I think I have fixed it in this commit, if you would like to take a look: commit.

Overall, I think this is closed. Thank you!

@althonos
Copy link
Owner

Yes, that should do it!

By the way, you can also create a HMM and fill the emission scores & transition probabilities from Python, and then call HMM.write to write the result, instead of manually formatting the HMMs in text format 😉

@jamaliki
Copy link
Author

Oh interesting ! Is there a tutorial on how to fill out the emission and transition probabilities?

@althonos
Copy link
Owner

althonos commented May 19, 2023

Not exactly a tutorial, but the attributes are documented: you will want to create a HMM from scratch, and then fill the transition_probabilities, insert_emissions and match_emissions arrays.

I added some goodies to v0.8.1 that was released like 5 minutes ago to help with this:

  • the pyhmmer.plan7.Transitions class, which can be used to index the transitions probabilities matrix instead of raw indices, makes the code more legible
  • the HMM.validate method, which does the same validation as hmmalign, you can call it when you're done filling the HMM, it will raise a ValueError if the contents are wrong.

For instance, you can use the following code to fill the transitions, I let you adapt the rest for the match and insert emissions -- the only thing to keep in mind is that you should put raw probabilities, not log-scaled ones, so:

from pyhmmer.easel import Alphabet
from pyhmmer.plan7 import HMM, Transitions

alphabet = Alphabet.amino()
hmm = HMM(alphabet, M=len(aa_log_probs), name=b"example")

for res_index in range(len(aa_log_probs)):
    mm = max(confidence[res_index] - delta, gamma)
    hmm.transitions_probabilities[res_index+1, Transitions.MM] = mm
    hmm.transitions_probabilities[res_index+1, Transitions.MD] = hmm.transitions_probabilities[res_index+1, Transitions.MI] = (1.0 - mm) / 2
    hmm.transitions_probabilities[res_index+1, Transitions.IM] = hmm.transitions_probabilities[res_index+1, Transitions.DM] = 1.0 - delta
    hmm.transitions_probabilities[res_index+1, Transitions.II] = hmm.transitions_probabilities[res_index+1, Transitions.DD] = delta

hmm.validate()

(the first row is for the entry probabilities, so you should start at row 1 for the remaining ones).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants