-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multithreads running problem #161
Comments
It's not clear from your question whether you mean that your job was literally allocated only 1 core (by your cluster management software), or if you mean that you looked at cpu utilization statistics and saw ~100% not ~600%. If the former: see the documentation for your cluster management software (i.e. qsub). You have to give the If the latter: hmmscan is i/o intensive, and unless you have atypically wonderful disk speed, it will typically saturate these days at 1-2 cores, making it usually pointless (or even counterproductive) to ask for more. The default 2 cores is set to be about right for typical use. I wouldn't recommend trying to use 6 cores (unless you have disks that can support ~3-6x typical SSD read speeds). |
Hello, Having a similar issue. I am trying to give it more --cpu and it appears to make no difference whether I put 1 or 28. I get no speed up. I am unsure if the multithread is working? Do I need to config differently? |
Try 2. Above 2, there will typically be rapidly diminishing returns, depending on your hardware. |
Thank you for your reply. It's an honor to discuss with you. Again thank you so much for your time. |
If you plot hmmsearch/phmmer/nhmmer search time as a function of the number
of cores (--cpu) used, going from one to two typically cuts search time by
almost 50%. Going from 2->3 typically gives very little benefit, and after
that search time tends to go up somewhat as you add cores. The reason for
this is that converting the raw data in a FASTA file into the data
structures HMMER uses takes about half as much time as searching a
database, and HMMER 3 only allocates one thread to processing the database
file. Therefore, once you have two worker threads (--cpu 2) consuming the
data that the parsing thread generates, adding more worker threads doesn't
help, as they wind up spending all of their time waiting for the parsing
thread to generate sequences for them to search. As the number of worker
threads gets large, performance will decrease below the 2-thread point
because the worker threads waiting for data from the parsing thread creates
contention for lock data structures, which slows things down.
The problem is even worse for hmmscan, as it takes about 200x as much data
to represent an HMM as it does to represent a sequence of the same length.
As a result, hmmscan is performance-limited by the time to read and parse
its input database, and typically sees no benefit from additional worker
threads.
Given that, the best way to take advantage of machines with many CPU cores
is to run multiple searches in parallel, each using 1-2 worker threads.
That's easy if you have many searches to do. If you have few searches to
perform, you could try chopping your database into pieces that you search
simultaneously and using the -Z option to set the database size to the size
of your original database.
This I/O limit on performance is something of an artifact of how much disk
speeds have improved since HMMER 3 was released. At the time it was
released, disk bandwidths were typically about 200 MB/sec, and that was the
bottleneck on HMMER's performance. Today, SSDs with bandwidths of 5-7
GB/sec are common, which makes the method we use to parse database files
the bottleneck. For HMMER 4, we have a combination of a new database
format and a better parser design, which should break this bottleneck, but
HMMER 4 is not ready for general use.
Hope that helps,
…-Nick
On Fri, Apr 9, 2021 at 2:01 PM Richard Allen White III < ***@***.***> wrote:
Thank you for your reply. It's an honor to discuss with you.
We use slurm mostly. Should we include the qsub?
As we are hoping to use more threads to get a speed up for testing.
Could we use 2 nodes with max threads?
Again thank you so much for your time.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDJBZBHC3UJ3IYTX4NUAMTTH46G7ANCNFSM4H3FQTCQ>
.
|
Thank you very much Nick! |
Change default threads for nhmmer from 10 to 2. See <EddyRivasLab/hmmer#161>
HMMER 4
Dose HMMER 4-develop branch is ready for general use now? how could i use it to multithreads running,thanks. |
Sorry, no, HMMER4 is not ready. |
Hello!
I set 6 for --cpu when I qsub my hmmscan command, but it was running only with 1 cpu. Does anyone know why and how to run the command parallelly? Thanks a lot for help.
The text was updated successfully, but these errors were encountered: