Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault in hmmalign #36

Closed
ChristophKnapp opened this issue Apr 4, 2023 · 8 comments
Closed

segmentation fault in hmmalign #36

ChristophKnapp opened this issue Apr 4, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@ChristophKnapp
Copy link

ChristophKnapp commented Apr 4, 2023

Hi there

This might be my fault (I'm just getting started with pyhmmer) or related to issue

new empty HMM segfaults when saved to file

but when I execute this code

import pyhmmer as phmm
with phmm.easel.SequenceFile(shared_mount+"DB/pfam_IPR002213_top19.fasta") as sf:
		sequences = sf.read_block()
sequences = sequences.digitize(phmm.easel.Alphabet.amino())
hmm = phmm.plan7.HMM(100, phmm.easel.Alphabet.amino())
align = phmm.hmmer.hmmalign(hmm, sequences)

I'm getting a "Segmentation fault (core dumped)".

Given that I would expect a meaningful error when my syntax wouldn't be correct, the only way I can get help is by letting you know.

In the pfam_IPR002213_top19.fasta file are the top 19 sequences of a much larger file.

Regards

Christoph

@zdk123
Copy link
Contributor

zdk123 commented Apr 4, 2023

You are aligning sequences to an empty HMM. What are you expecting to happen?

@ChristophKnapp
Copy link
Author

ChristophKnapp commented Apr 4, 2023 via email

@zdk123
Copy link
Contributor

zdk123 commented Apr 4, 2023

I was simply asking if you expected that code to work or if you were just flagging this for better error handling...

@ChristophKnapp
Copy link
Author

ChristophKnapp commented Apr 4, 2023 via email

@zdk123
Copy link
Contributor

zdk123 commented Apr 4, 2023

@ChristophKnapp I've re-read your comment several times and I still don't know what you need help with.

I agree the segfault is not ideal... perhaps it's coming from hmmer3, like the other issue you linked, and can't be fixed in the python code - but in that case the segfault could still be prevented in pyhmmer by raising an error when trying to use an empty HMM object.

If you're trying to construct an HMM for use in an actual project, you either have to build an HMM from sequences yourself (see this example) or read an existing HMM from a file.

@althonos
Copy link
Owner

althonos commented Apr 4, 2023

Hi @ChristophKnapp,

I understand where you could think Zachary's comment could come out at snappy, but he has always been very helpful and there is no reason to take it as such. It is common for less mature projects to have confusing error paths, and so it can be often the case that people will report issues where the actual issue report could be improved (for instance, the Rust language has an entire working group dedicated to improving errors).

To address the issue itself: indeed, what your are doing is not standard, and probably not something the HMMER CLI would have allowed you to do. hmmalign (in HMMER) uses a HMM to align together several sequences by using the HMM as the reference. This can be really helpful when you are trying to align together several new members of a protein family, for instance. By manually creating an empty HMM, you enter a path not planned by HMMER, and that may be hard to know why you're getting a segfault. In any case, end-users like you should not get a segfault, so I will look into the code path so that this actually raises a proper error ahead of the crash. On your side, you should indeed follow Zachary's suggestion and either build an initial HMM from the input alignment, or load an external HMM to align to, if aligning your sequences to a reference HMM is what you intend to do.

@althonos althonos added the bug Something isn't working label Apr 4, 2023
@ChristophKnapp
Copy link
Author

ChristophKnapp commented Apr 5, 2023 via email

@althonos
Copy link
Owner

This bug was caused by a combination of two things, both of which are now fixed in v0.7.4:

  • TraceAligner doesn't validate HMMs given in input, and the underlying HMMER code doesn't either, so it could crash on erroneous data, this was fixed in fe6b916.
  • Creating an empty HMM from Python didn't create a valid HMM, this was fixed in 002c68e so that HMM.__init__ uses arbitrary probabilities to ensure validity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants