multithreads running problem #161

JaneDY · 2019-06-25T08:00:09Z

Hello!
I set 6 for --cpu when I qsub my hmmscan command, but it was running only with 1 cpu. Does anyone know why and how to run the command parallelly? Thanks a lot for help.

cryptogenomicon · 2019-06-25T10:55:51Z

It's not clear from your question whether you mean that your job was literally allocated only 1 core (by your cluster management software), or if you mean that you looked at cpu utilization statistics and saw ~100% not ~600%.

If the former: see the documentation for your cluster management software (i.e. qsub). You have to give the --cpu <n> option to hmmscan, and you also have to give an argument to qsub to request cores. This argument will probably be something like -pe smp <n>.

If the latter: hmmscan is i/o intensive, and unless you have atypically wonderful disk speed, it will typically saturate these days at 1-2 cores, making it usually pointless (or even counterproductive) to ask for more. The default 2 cores is set to be about right for typical use. I wouldn't recommend trying to use 6 cores (unless you have disks that can support ~3-6x typical SSD read speeds).

raw937 · 2021-04-08T17:31:32Z

Hello,

Having a similar issue. I am trying to give it more --cpu and it appears to make no difference whether I put 1 or 28. I get no speed up. I am unsure if the multithread is working? Do I need to config differently?

cryptogenomicon · 2021-04-08T17:47:42Z

Try 2. Above 2, there will typically be rapidly diminishing returns, depending on your hardware.

raw937 · 2021-04-09T18:00:56Z

Thank you for your reply. It's an honor to discuss with you.
We use slurm mostly. Should we include the qsub?
As we are hoping to use more threads to get a speed up for testing.
Could we use 2 nodes with max threads?

Again thank you so much for your time.

npcarter · 2021-04-09T19:19:05Z

If you plot hmmsearch/phmmer/nhmmer search time as a function of the number of cores (--cpu) used, going from one to two typically cuts search time by almost 50%. Going from 2->3 typically gives very little benefit, and after that search time tends to go up somewhat as you add cores. The reason for this is that converting the raw data in a FASTA file into the data structures HMMER uses takes about half as much time as searching a database, and HMMER 3 only allocates one thread to processing the database file. Therefore, once you have two worker threads (--cpu 2) consuming the data that the parsing thread generates, adding more worker threads doesn't help, as they wind up spending all of their time waiting for the parsing thread to generate sequences for them to search. As the number of worker threads gets large, performance will decrease below the 2-thread point because the worker threads waiting for data from the parsing thread creates contention for lock data structures, which slows things down. The problem is even worse for hmmscan, as it takes about 200x as much data to represent an HMM as it does to represent a sequence of the same length. As a result, hmmscan is performance-limited by the time to read and parse its input database, and typically sees no benefit from additional worker threads. Given that, the best way to take advantage of machines with many CPU cores is to run multiple searches in parallel, each using 1-2 worker threads. That's easy if you have many searches to do. If you have few searches to perform, you could try chopping your database into pieces that you search simultaneously and using the -Z option to set the database size to the size of your original database. This I/O limit on performance is something of an artifact of how much disk speeds have improved since HMMER 3 was released. At the time it was released, disk bandwidths were typically about 200 MB/sec, and that was the bottleneck on HMMER's performance. Today, SSDs with bandwidths of 5-7 GB/sec are common, which makes the method we use to parse database files the bottleneck. For HMMER 4, we have a combination of a new database format and a better parser design, which should break this bottleneck, but HMMER 4 is not ready for general use. Hope that helps,

…

-Nick

On Fri, Apr 9, 2021 at 2:01 PM Richard Allen White III < ***@***.***> wrote: Thank you for your reply. It's an honor to discuss with you. We use slurm mostly. Should we include the qsub? As we are hoping to use more threads to get a speed up for testing. Could we use 2 nodes with max threads? Again thank you so much for your time. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#161 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDJBZBHC3UJ3IYTX4NUAMTTH46G7ANCNFSM4H3FQTCQ> .

raw937 · 2021-04-09T19:21:23Z

Thank you very much Nick!

Change default threads for nhmmer from 10 to 2. See <EddyRivasLab/hmmer#161>

jieegao · 2024-10-11T06:36:05Z

HMMER 4

HMMER 4

If you plot hmmsearch/phmmer/nhmmer search time as a function of the number of cores (--cpu) used, going from one to two typically cuts search time by almost 50%. Going from 2->3 typically gives very little benefit, and after that search time tends to go up somewhat as you add cores. The reason for this is that converting the raw data in a FASTA file into the data structures HMMER uses takes about half as much time as searching a database, and HMMER 3 only allocates one thread to processing the database file. Therefore, once you have two worker threads (--cpu 2) consuming the data that the parsing thread generates, adding more worker threads doesn't help, as they wind up spending all of their time waiting for the parsing thread to generate sequences for them to search. As the number of worker threads gets large, performance will decrease below the 2-thread point because the worker threads waiting for data from the parsing thread creates contention for lock data structures, which slows things down. The problem is even worse for hmmscan, as it takes about 200x as much data to represent an HMM as it does to represent a sequence of the same length. As a result, hmmscan is performance-limited by the time to read and parse its input database, and typically sees no benefit from additional worker threads. Given that, the best way to take advantage of machines with many CPU cores is to run multiple searches in parallel, each using 1-2 worker threads. That's easy if you have many searches to do. If you have few searches to perform, you could try chopping your database into pieces that you search simultaneously and using the -Z option to set the database size to the size of your original database. This I/O limit on performance is something of an artifact of how much disk speeds have improved since HMMER 3 was released. At the time it was released, disk bandwidths were typically about 200 MB/sec, and that was the bottleneck on HMMER's performance. Today, SSDs with bandwidths of 5-7 GB/sec are common, which makes the method we use to parse database files the bottleneck. For HMMER 4, we have a combination of a new database format and a better parser design, which should break this bottleneck, but HMMER 4 is not ready for general use. Hope that helps,
…
-Nick
On Fri, Apr 9, 2021 at 2:01 PM Richard Allen White III < @.***> wrote: Thank you for your reply. It's an honor to discuss with you. We use slurm mostly. Should we include the qsub? As we are hoping to use more threads to get a speed up for testing. Could we use 2 nodes with max threads? Again thank you so much for your time. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#161 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDJBZBHC3UJ3IYTX4NUAMTTH46G7ANCNFSM4H3FQTCQ .

Dose HMMER 4-develop branch is ready for general use now？ how could i use it to multithreads running，thanks.

cryptogenomicon · 2024-10-11T11:48:20Z

Sorry, no, HMMER4 is not ready.

cryptogenomicon closed this as completed Jun 25, 2019

hariszaf mentioned this issue Jan 18, 2024

Hmmsearch has hard-coded 4 threads per chunk analysis emo-bon/MetaGOflow#47

Closed

nylander added a commit to nylander/birdscanner2 that referenced this issue May 28, 2024

Update config.yaml

4f9da8c

Change default threads for nhmmer from 10 to 2. See <EddyRivasLab/hmmer#161>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multithreads running problem #161

multithreads running problem #161

JaneDY commented Jun 25, 2019

cryptogenomicon commented Jun 25, 2019 •

edited

Loading

raw937 commented Apr 8, 2021

cryptogenomicon commented Apr 8, 2021

raw937 commented Apr 9, 2021

npcarter commented Apr 9, 2021 via email

raw937 commented Apr 9, 2021

jieegao commented Oct 11, 2024

cryptogenomicon commented Oct 11, 2024

multithreads running problem #161

multithreads running problem #161

Comments

JaneDY commented Jun 25, 2019

cryptogenomicon commented Jun 25, 2019 • edited Loading

raw937 commented Apr 8, 2021

cryptogenomicon commented Apr 8, 2021

raw937 commented Apr 9, 2021

npcarter commented Apr 9, 2021 via email

raw937 commented Apr 9, 2021

jieegao commented Oct 11, 2024

cryptogenomicon commented Oct 11, 2024

cryptogenomicon commented Jun 25, 2019 •

edited

Loading