Shared memory transfers, parallel loading, avoid concatenates #150

JelleAalbers · 2019-04-17T20:46:36Z

This introduces three performance improvements to strax's core:

Use shared memory rather than pickle (multiprocessing's default) for transferring numpy arrays between worker processes. The code for this is in a separate package, npshmex, which in turn uses the sharedarray package. This allows strax to paralellize over more cores. It's especially important if ParallelSourcePlugin's optimizations aren't / cannot be applied. Currently only raw_records is a ParallelSourcePlugin, while we'll often want to process starting from records.
- On crashes, this can result in stale files in /dev/shm, which will most likely linger until a machine reboot. We will most likely make shared memory an optional behaviour in the future, so we can turn it off during normal analysis-facility use of strax but take advantage of it for (at least online) processing.
Saving and loading (and their associated decompression) are now performed in worker threads rather than all in the main thread. Since compressors/decompressors should release the GIL, this means save and load operations now take advantage of multiple cores, while the main thread is free to do more important things.
We avoid some unnecessary calls to np.concatenate, for example by not merging iterators for plugins that only have one dependency. Concatenation is relatively expensive because it requires a full memory copy.

As a test, I made and saved records and peaks from raw_records for run 001944 in the xenonnt daqtest db. This would be about 10 GB raw from the DAQ (3.3 GB compressed raw_records). The task took 74.4 seconds before these changes, and now takes 18 seconds. Note:

This is using max_workers=10 on eb0, reading from /data/eventbuilder-sharded, and writing to /home/xedaq. Adding more workers seems to give no benefit and actually slightly hurts performance.
We got high data rates from strax before (even higher than the post-changes rate quoted here) because we were also making raw_records, which is a ParallelSourcePlugin, and thus triggers special optimizations described in the docs linked above. This PR makes strax work quickly also without these optimizations, since they might be a bit fiddly to enable for all plugins. Moreover, we had some preliminary single-electron tail cutting enabled before, which we turned off during the DAQ test since it inadvertently removed most S2s.
At the new rate we're processing ~550 MB_raw/sec on eb0 without ParallelSourcePlugin (though this is only from raw_records -> peaks). The rate with ParallelSourcePlugin (online processing configuration) likely also improves (though not by all that much, I would guess).

Finally, two minor changes:

This makes data created using strax versions before Adjust file naming convention #143 (i.e. is essentially everything) readable again (by checking for two metadata file names). We can remove this in the future when we no longer care about old raw data, but it's a very small overhead.
This introduces GIL monitoring if using strax.utils.profile_threaded, using the very nice gil_load package. It does not work inside jupyter notebooks, because we have to start monitoring the GIL before starting threads, but notebooks have threads of their own.

... at least for now

Write to _temp first, then rename

JelleAalbers · 2019-04-20T05:51:53Z

The shared memory transfers were giving problems on mac OSX. Since it is anyway an experimental new feature, I made it optional and default off for now. To enable it (and speed up low-level processing), set the context option allow_shm = True.

Finally, I updated travis CI to try Ubuntu Xenial (16) as well as Ubuntu Trusty (14).

JelleAalbers added 19 commits April 16, 2019 11:29

Remove unused input rechunking

5b960e3

Ensure cleanup is run after succesful iter

4254465

Make old-format data readable

1eab736

... at least for now

Remove expensive debug print statements

580547c

Avoid unnecessary array merging

327ce45

Avoid unneeded concatenate

f17a558

Attempt parallel saving

c7d237b

GIL monitoring

fd2a521

Parallel save and load

01873b5

Shared memory array passing

02b4768

Auto-unshm output, sane structured array passing

a43c760

Factor out shm code to separate package

2b19dba

Reenable test, export print_entry

3d6c40f

Try to fix parallel saving

02dc8a1

Fix parallel save/load

dd68a9e

Fix requirements file

fef293d

Make shm use optional (off by default)

a78aa89

Safe file writing

87549ce

Write to _temp first, then rename

Try ubuntu xenial on Travis

28f3ce3

JelleAalbers merged commit dcbff9f into AxFoundation:master Apr 20, 2019

JelleAalbers deleted the profile branch April 20, 2019 05:56

JelleAalbers mentioned this pull request Jun 10, 2019

Multiple outputs and restructured ParallelSourcePlugin #190

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared memory transfers, parallel loading, avoid concatenates #150

Shared memory transfers, parallel loading, avoid concatenates #150

JelleAalbers commented Apr 17, 2019 •

edited

JelleAalbers commented Apr 20, 2019

Shared memory transfers, parallel loading, avoid concatenates #150

Shared memory transfers, parallel loading, avoid concatenates #150

Conversation

JelleAalbers commented Apr 17, 2019 • edited

JelleAalbers commented Apr 20, 2019

JelleAalbers commented Apr 17, 2019 •

edited