Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nhmmer reports Alphabet mismatch #12

Closed
fbosnic opened this issue Oct 29, 2021 · 3 comments
Closed

nhmmer reports Alphabet mismatch #12

fbosnic opened this issue Oct 29, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@fbosnic
Copy link

fbosnic commented Oct 29, 2021

I'm trying to run nhmmer function but can't seem to get the alphabet right. Running

   import pyhmmer    
   seq1 = pyhmmer.easel.TextSequence(name=b"seq1", sequence="ACCGACA")
   seq2 = pyhmmer.easel.TextSequence(name=b"seq2", sequence="GGGCCAACA")
   rna = pyhmmer.easel.Alphabet.rna()
   dig1, dig2 = [s.digitize(rna) for s in [seq1, seq2]]
   builder = pyhmmer.plan7.Builder(rna, prior_scheme="alphabet")
   
   gen = pyhmmer.hmmer.nhmmer([dig1], [dig2], builder=builder)
   next(gen)

results in

    Traceback (most recent call last):
      File "pyhmmer_test.py", line 10, in <module>
        next(gen)
      File "/apps/conda/fbosnic/envs/test/lib/python3.8/site-packages/pyhmmer/hmmer.py", line 310, in _multi_threaded
        raise thread.error from None
      File "/apps/conda/fbosnic/envs/test/lib/python3.8/site-packages/pyhmmer/hmmer.py", line 112, in run
        self.process(index, query)
      File "/apps/conda/fbosnic/envs/test/lib/python3.8/site-packages/pyhmmer/hmmer.py", line 125, in process
        hits = self.search(query)
      File "/apps/conda/fbosnic/envs/test/lib/python3.8/site-packages/pyhmmer/hmmer.py", line 166, in search
        return self.pipeline.search_seq(query, self.sequences, self.builder)
      File "pyhmmer/plan7.pyx", line 3777, in pyhmmer.plan7.Pipeline.search_seq
      File "pyhmmer/plan7.pyx", line 3819, in pyhmmer.plan7.Pipeline.search_seq
    pyhmmer.errors.AlphabetMismatch: Expected Alphabet.amino(), found Alphabet.rna()

Am I using it correctly, does the alphabet need to be set somewhere else as well?

As far as I could trace it, the following line might be the cause

self.pipeline = Pipeline(alphabet=Alphabet.amino(), **options)

@althonos althonos added the bug Something isn't working label Oct 29, 2021
@althonos
Copy link
Owner

Hi @fbosnic ,

The issue you're having is indeed coming from that particular line, and since I was not testing nhmmer until now I didn't realize there was an bug there. However, after writing more extensive test cases, I also noticed that I couldn't get pyhmmer.hmmer.nhmmer to get the same hits as the nhmmer binary. I implemented the pyhmmer.hmmer.nhmmer function thinking nhmmer was just doing the same thing as phmmer with the DNA or RNA alphabet instead of the Amino alphabet, but since it may not be the case, I'll try to look into this issue a bit further and look if I need to rewrite the current implementation.

@fbosnic
Copy link
Author

fbosnic commented Oct 31, 2021

Hi @althonos,

I see, thanks for the answer. I would be very interested in using the function once this is sorted out.

@althonos
Copy link
Owner

Hi @fbosnic ,

The difference between nhmmer and phmmer is that nhmmer will use the "long target pipeline" instead of the default pipeline used by everything else. This allows pipelined searches to run on target sequences longer than 100,000 residues using a sliding window to process each target.

I've released v0.4.9, which has a fixed implementation for nhmmer. The function now accepts HMM arguments in addition to DigitalSequence and DigitalMSA arguments, and will run a long target search pipeline. I have added tests to make sure the results are consistent between HMMER and PyHMMER on the same query HMM and target genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants