Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-allocate arrays to read multi-module detector data #218

Merged
merged 6 commits into from Oct 6, 2021

Conversation

takluyver
Copy link
Member

This should improve performance by avoiding concatenating large arrays, which involves copying a lot of data in memory. cc @kakhahmed .

Some of the code for this is shared with the virtual CXI writing, as they're now doing a similar task - identifying blocks of source data which can be directly mapped to blocks of the output space.

@takluyver takluyver added the enhancement New feature or request label Sep 9, 2021
@takluyver
Copy link
Member Author

takluyver commented Sep 9, 2021

This brings the time for getting data through AGIPD1M.get_array() in line with getting data for a single source:

image

Edit: This is what the timings looked like before:

image

Note the high CPU time with the AGIPD1M interface, indicating that it was using multiple cores. @kakhahmed did a similar test where it used 1 core and took 40 seconds wall time. I'd guess the difference is in different builds of NumPy or different underlying libraries (like MKL or BLAS). On this branch, the example takes less wall time and less CPU time.

@takluyver
Copy link
Member Author

If you use pulse selection, this selects pulse indices from the file with a numpy array ('fancy indexing'). I suspected that this was inefficient, so I spent a while playing with identifying regular selections and using an h5py MultiBlockSlice (representing an HDF5 hyperslab selection) to process them, in branch xtdf-slice-pulses.

However, it doesn't seem to make a big difference, in testing with some LPD data @daviddoji pointed me to. So I'm leaving it out for now as extra complexity.

@daviddoji
Copy link

LGTM!

@takluyver takluyver added this to the 1.8 milestone Oct 1, 2021
@takluyver takluyver merged commit 9ab34ad into master Oct 6, 2021
@takluyver takluyver deleted the refactor-multimod-array branch October 6, 2021 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants