Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multi-processing in fastHadd to merge DQM histograms faster #10469

Merged
merged 2 commits into from Jul 30, 2015

Conversation

dmitrijus
Copy link
Contributor

(forward port of #10280)

Removed a few small optimizations:

  • first file is no longer added to the set, but merged on top of an empty set, just like any other file;
  • removed dirnames, objnames, fullnames string caches;
  • removed fullname from MicroME, since it can be generated using other fields;
  • replaced 'custom' merge algorithm with std::set::insert (with hints).

As a result, the code is way more simplier and does not increase the
time it takes to merge.

Implement multi-process merging in fastHadd.

A group of processes are organized in a virtual tree.

Each process has a unique node_id (1 - number_of_proc) and
will start (fork) 2*node_id + {0,1} processes (if new node_id <=
number_of_proc).

Each process will merge all files identified by (file_index % (node_id - 1) == 0),
plus all histograms from his child processes.
IPC is done via pipe()s.

This gives us somewhat good latency. With 7 processes this will
decrease the merging down 11s (see below).

I have tried using threads to merge faster, but this was futile.
Root just has too many global internal states.
And I couldn't serialize root access, since it takes the majority of time.

I've tried fastParallelHadd.py (changing various parameters),
but the lowest time I could get was ~26seconds.

All numbers were done using fastHadd to merge 100 files, summary:

- 50.321s, 227mb  -> old fastHadd with no changes
- 50.198s, 225mb  -> new fastHadd
- 18.719s, 550mb  -> new fastHadd, using 3 processes
- 11.143s, 1202mb -> new fastHadd, using 7 processes

Memory is sum of proportional set size (PSS).

I will make ports to 75 and 76 after and if this is accepted.

Dmitrijus Bugelskis added 2 commits July 30, 2015 10:16
1. Removed a few small optimizations:
	- first file is no longer added to the set, but merged on top
	  of an empty set, just like any other file;
	- removed dirnames, objnames, fullnames string caches;
	- removed fullname from MicroME, since it can be generated using
	  other fields;
	- replaced 'custom' merge algorithm with std::set::insert (with
	  hints).

As a result, the code is way more simplier and does not increase the
time it takes to merge.

2. Implement multi-process merging in fastHadd.

A group of processes are organized in a virtual tree.

Each process has a unique node_id (1 - number_of_proc) and
will start (fork) 2*node_id + {0,1} processes (if new node_id <=
number_of_proc).

Each process will merge all files identified
by (file_index % (node_id - 1) == 0),
plus all histograms from his child processes.
IPC is done via pipe()s.

This gives us somewhat good latency. With 7 processes this will
decrease the merging down 11s (see below).

I have tried using threads to merge faster, but this was futile.
Root just has too many global internal states.
And I couldn't serialize root access, since it takes the majority of time.

I've tried fastParallelHadd.py (changing various parameters),
but the lowest time I could get was ~26seconds.

All numbers were done using fastHadd to merge 100 files, summary:
	- 50.321s, 227mb  -> old fastHadd with no changes
	- 50.198s, 225mb  -> new fastHadd
	- 18.719s, 550mb  -> new fastHadd, using 3 processes
	- 11.143s, 1202mb -> new fastHadd, using 7 processes
Memory is sum of proportional set size (PSS).

(cherry picked from commit 92b609a)
Previously, root would load itself at every fork made.
With this change, merging uses less memory and is slightly faster.

PSS for -j 7 is ~1008mb and takes 10.923s.

(cherry picked from commit b5439df)
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @dmitrijus (Dmitrijus) for CMSSW_7_5_X.

Implement multi-processing in fastHadd to merge DQM histograms faster

It involves the following packages:

DQMServices/Components

@cmsbuild, @danduggan, @deguio can you please review it and eventually sign? Thanks.
@barvic this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.
@Degano you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@deguio
Copy link
Contributor

deguio commented Jul 30, 2015

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs once checked with relvals in the development release cycle of CMSSW or unless it breaks tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @Degano, @smuzaffar

davidlange6 added a commit that referenced this pull request Jul 30, 2015
  Implement multi-processing in fastHadd to merge DQM histograms faster
@davidlange6 davidlange6 merged commit 75184ba into cms-sw:CMSSW_7_5_X Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants