Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared memory transfers, parallel loading, avoid concatenates #150

Merged
merged 19 commits into from Apr 20, 2019

Conversation

JelleAalbers
Copy link
Member

@JelleAalbers JelleAalbers commented Apr 17, 2019

This introduces three performance improvements to strax's core:

  • Use shared memory rather than pickle (multiprocessing's default) for transferring numpy arrays between worker processes. The code for this is in a separate package, npshmex, which in turn uses the sharedarray package. This allows strax to paralellize over more cores. It's especially important if ParallelSourcePlugin's optimizations aren't / cannot be applied. Currently only raw_records is a ParallelSourcePlugin, while we'll often want to process starting from records.
    • On crashes, this can result in stale files in /dev/shm, which will most likely linger until a machine reboot. We will most likely make shared memory an optional behaviour in the future, so we can turn it off during normal analysis-facility use of strax but take advantage of it for (at least online) processing.
  • Saving and loading (and their associated decompression) are now performed in worker threads rather than all in the main thread. Since compressors/decompressors should release the GIL, this means save and load operations now take advantage of multiple cores, while the main thread is free to do more important things.
  • We avoid some unnecessary calls to np.concatenate, for example by not merging iterators for plugins that only have one dependency. Concatenation is relatively expensive because it requires a full memory copy.

As a test, I made and saved records and peaks from raw_records for run 001944 in the xenonnt daqtest db. This would be about 10 GB raw from the DAQ (3.3 GB compressed raw_records). The task took 74.4 seconds before these changes, and now takes 18 seconds. Note:

  • This is using max_workers=10 on eb0, reading from /data/eventbuilder-sharded, and writing to /home/xedaq. Adding more workers seems to give no benefit and actually slightly hurts performance.
  • We got high data rates from strax before (even higher than the post-changes rate quoted here) because we were also making raw_records, which is a ParallelSourcePlugin, and thus triggers special optimizations described in the docs linked above. This PR makes strax work quickly also without these optimizations, since they might be a bit fiddly to enable for all plugins. Moreover, we had some preliminary single-electron tail cutting enabled before, which we turned off during the DAQ test since it inadvertently removed most S2s.
  • At the new rate we're processing ~550 MB_raw/sec on eb0 without ParallelSourcePlugin (though this is only from raw_records -> peaks). The rate with ParallelSourcePlugin (online processing configuration) likely also improves (though not by all that much, I would guess).

Finally, two minor changes:

  • This makes data created using strax versions before Adjust file naming convention #143 (i.e. is essentially everything) readable again (by checking for two metadata file names). We can remove this in the future when we no longer care about old raw data, but it's a very small overhead.
  • This introduces GIL monitoring if using strax.utils.profile_threaded, using the very nice gil_load package. It does not work inside jupyter notebooks, because we have to start monitoring the GIL before starting threads, but notebooks have threads of their own.

@JelleAalbers
Copy link
Member Author

The shared memory transfers were giving problems on mac OSX. Since it is anyway an experimental new feature, I made it optional and default off for now. To enable it (and speed up low-level processing), set the context option allow_shm = True.

Finally, I updated travis CI to try Ubuntu Xenial (16) as well as Ubuntu Trusty (14).

@JelleAalbers JelleAalbers merged commit dcbff9f into AxFoundation:master Apr 20, 2019
@JelleAalbers JelleAalbers deleted the profile branch April 20, 2019 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant