Implement multi-processing in fastHadd to merge DQM histograms faster #10469

dmitrijus · 2015-07-30T08:18:30Z

(forward port of #10280)

Removed a few small optimizations:

first file is no longer added to the set, but merged on top of an empty set, just like any other file;
removed dirnames, objnames, fullnames string caches;
removed fullname from MicroME, since it can be generated using other fields;
replaced 'custom' merge algorithm with std::set::insert (with hints).

As a result, the code is way more simplier and does not increase the
time it takes to merge.

Implement multi-process merging in fastHadd.

A group of processes are organized in a virtual tree.

Each process has a unique node_id (1 - number_of_proc) and
will start (fork) 2*node_id + {0,1} processes (if new node_id <=
number_of_proc).

Each process will merge all files identified by (file_index % (node_id - 1) == 0),
plus all histograms from his child processes.
IPC is done via pipe()s.

This gives us somewhat good latency. With 7 processes this will
decrease the merging down 11s (see below).

I have tried using threads to merge faster, but this was futile.
Root just has too many global internal states.
And I couldn't serialize root access, since it takes the majority of time.

I've tried fastParallelHadd.py (changing various parameters),
but the lowest time I could get was ~26seconds.

All numbers were done using fastHadd to merge 100 files, summary:

- 50.321s, 227mb  -> old fastHadd with no changes
- 50.198s, 225mb  -> new fastHadd
- 18.719s, 550mb  -> new fastHadd, using 3 processes
- 11.143s, 1202mb -> new fastHadd, using 7 processes

Memory is sum of proportional set size (PSS).

I will make ports to 75 and 76 after and if this is accepted.

1. Removed a few small optimizations: - first file is no longer added to the set, but merged on top of an empty set, just like any other file; - removed dirnames, objnames, fullnames string caches; - removed fullname from MicroME, since it can be generated using other fields; - replaced 'custom' merge algorithm with std::set::insert (with hints). As a result, the code is way more simplier and does not increase the time it takes to merge. 2. Implement multi-process merging in fastHadd. A group of processes are organized in a virtual tree. Each process has a unique node_id (1 - number_of_proc) and will start (fork) 2*node_id + {0,1} processes (if new node_id <= number_of_proc). Each process will merge all files identified by (file_index % (node_id - 1) == 0), plus all histograms from his child processes. IPC is done via pipe()s. This gives us somewhat good latency. With 7 processes this will decrease the merging down 11s (see below). I have tried using threads to merge faster, but this was futile. Root just has too many global internal states. And I couldn't serialize root access, since it takes the majority of time. I've tried fastParallelHadd.py (changing various parameters), but the lowest time I could get was ~26seconds. All numbers were done using fastHadd to merge 100 files, summary: - 50.321s, 227mb -> old fastHadd with no changes - 50.198s, 225mb -> new fastHadd - 18.719s, 550mb -> new fastHadd, using 3 processes - 11.143s, 1202mb -> new fastHadd, using 7 processes Memory is sum of proportional set size (PSS). (cherry picked from commit 92b609a)

Previously, root would load itself at every fork made. With this change, merging uses less memory and is slightly faster. PSS for -j 7 is ~1008mb and takes 10.923s. (cherry picked from commit b5439df)

cmsbuild · 2015-07-30T08:21:37Z

A new Pull Request was created by @dmitrijus (Dmitrijus) for CMSSW_7_5_X.

Implement multi-processing in fastHadd to merge DQM histograms faster

It involves the following packages:

DQMServices/Components

@cmsbuild, @danduggan, @deguio can you please review it and eventually sign? Thanks.
@barvic this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.
@Degano you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

deguio · 2015-07-30T09:30:55Z

+1

cmsbuild · 2015-07-30T09:31:37Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs once checked with relvals in the development release cycle of CMSSW or unless it breaks tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @smuzaffar

Implement multi-processing in fastHadd to merge DQM histograms faster

Dmitrijus Bugelskis added 2 commits July 30, 2015 10:16

Preload root before fork().

4631a36

Previously, root would load itself at every fork made. With this change, merging uses less memory and is slightly faster. PSS for -j 7 is ~1008mb and takes 10.923s. (cherry picked from commit b5439df)

cmsbuild added this to the Next CMSSW_7_5_X milestone Jul 30, 2015

cmsbuild added comparison-pending dqm-pending orp-pending pending-signatures tests-pending labels Jul 30, 2015

cmsbuild removed comparison-pending dqm-pending orp-pending pending-signatures tests-pending labels Jul 30, 2015

cmsbuild added comparison-pending dqm-approved fully-signed orp-pending tests-pending labels Jul 30, 2015

davidlange6 added a commit that referenced this pull request Jul 30, 2015

Merge pull request #10469 from dmitrijus/parallel_hadd_75x_ok

75184ba

Implement multi-processing in fastHadd to merge DQM histograms faster

davidlange6 merged commit 75184ba into cms-sw:CMSSW_7_5_X Jul 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement multi-processing in fastHadd to merge DQM histograms faster #10469

Implement multi-processing in fastHadd to merge DQM histograms faster #10469

dmitrijus commented Jul 30, 2015

cmsbuild commented Jul 30, 2015

deguio commented Jul 30, 2015

cmsbuild commented Jul 30, 2015

Implement multi-processing in fastHadd to merge DQM histograms faster #10469

Implement multi-processing in fastHadd to merge DQM histograms faster #10469

Conversation

dmitrijus commented Jul 30, 2015

Removed a few small optimizations:

Implement multi-process merging in fastHadd.

cmsbuild commented Jul 30, 2015

deguio commented Jul 30, 2015

cmsbuild commented Jul 30, 2015