Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to search HMM file with multiple profiles, ie Pfam? #23

Closed
nextgenusfs opened this issue Aug 12, 2022 · 2 comments
Closed

how to search HMM file with multiple profiles, ie Pfam? #23

nextgenusfs opened this issue Aug 12, 2022 · 2 comments
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@nextgenusfs
Copy link

I'm probably missing something obvious, but I can't seem to figure out how to use pyhmmer to search against a database of HMM profiles, it Pfam. When I try to read the file, it does not seem to iterate over the individual profiles.

Loading sequences you can do this, which create a list of the digitized sequences in

 with pyhmmer.easel.SequenceFile('inputfile.fasta', digital=True, alphabet=alphabet) as seq_file:
    sequences = list(seq_file)

I naively assumed that the HMMFile would do the same?

with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file:
    hmm = list(hmm_file.read())

This errors out with TypeError: 'pyhmmer.plan7.HMM' object is not an iterator.

If I just run this:

with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file:
    hmm = hmm_file.read()

Then it is just the single HMM profile (the first one in the Pfam database 1-cysPrx_C). So what am I doing wrong?

@nextgenusfs
Copy link
Author

Okay, well I guess this is the way to do it.

import pyhmmer
# load sequences
alphabet = pyhmmer.easel.Alphabet.amino()
with pyhmmer.easel.SequenceFile('queries.fasta', digital=True, alphabet=alphabet) as seq_file:
    sequences = list(seq_file)
# then load Pfam and search
with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file:
    hits = list(pyhmmer.hmmsearch(hmm_file, sequences))

@althonos
Copy link
Owner

Hi @nextgenusfs ! Happy that you figured it out. The issue was in here:

I naively assumed that the HMMFile would do the same?

with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file:
   hmm = list(hmm_file.read())

If you wanted to read all the HMMs from the input file, you'd have to do:

with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file:
   hmms = list(hmm_file)

But what you end up doing in your second comment is even better, because it will stream the HMMs as they are read from the file, instead of pre-loading them all in a list (which, for a HMM database like Pfam, can load a lot of things in memory),

@althonos althonos added documentation Improvements or additions to documentation question Further information is requested labels Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants