Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIA work: select only relevant experiments #1665

Merged
merged 6 commits into from
Apr 19, 2021
Merged

PIA work: select only relevant experiments #1665

merged 6 commits into from
Apr 19, 2021

Conversation

rjgildea
Copy link
Contributor

@rjgildea rjgildea commented Apr 19, 2021

For hdf5 grid scans, the length of the experiment is equal to the number of images in the grid scan. This results in a loop over every experiment which can add significant overhead:

dials/array_family/flex_ext.py

Lines 1205 to 1220 in a66ee17

for i, expt in enumerate(experiments):
if "imageset_id" in self:
sel_expt = self["imageset_id"] == i
else:
sel_expt = self["id"] == i
for i_panel in range(len(expt.detector)):
sel = sel_expt & (panel_numbers == i_panel)
centroid_position, centroid_variance, _ = centroid_px_to_mm_panel(
expt.detector[i_panel],
expt.scan,
self["xyzobs.px.value"].select(sel),
self["xyzobs.px.variance"].select(sel),
cctbx.array_family.flex.vec3_double(sel.count(True), (1, 1, 1)),
)
self["xyzobs.mm.value"].set_selected(sel, centroid_position)
self["xyzobs.mm.variance"].set_selected(sel, centroid_variance)

called via:

reflections.centroid_px_to_mm(experiments)

For hdf5 grid scans, the length of the experiment is equal to the number of
images in the grid scan. This results in a loop over every experiments in some
places of the code, which can add significant overhead.
@rjgildea
Copy link
Contributor Author

Test code:

$ cat pia.py 
import time
import sys

from dials.command_line.find_spots_server import work


def run(filename, n):
    for i in range(n):
        parameters = ['d_max=40', f'scan_range={i+1},{i+1}']
        print(f"Starting PIA on {filename}")
        # Do the per-image-analysis
        start = time.time()
        results = work(filename, cl=parameters)
        runtime = time.time() - start
        print(
            f"PIA completed on {filename} with parameters {parameters}, results['n_spots_total'] spots found in {runtime:.2f} seconds"
        )

if __name__ == "__main__":
    run(sys.argv[1], n=10)

Using main, spotfinding takes ~1.8-2.0 seconds per image:

$ dials.python pia.py /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=1,1'], results['n_spots_total'] spots found in 8.21 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=2,2'], results['n_spots_total'] spots found in 1.83 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=3,3'], results['n_spots_total'] spots found in 1.84 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=4,4'], results['n_spots_total'] spots found in 1.83 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=5,5'], results['n_spots_total'] spots found in 1.99 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=6,6'], results['n_spots_total'] spots found in 1.87 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=7,7'], results['n_spots_total'] spots found in 1.88 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=8,8'], results['n_spots_total'] spots found in 2.12 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=9,9'], results['n_spots_total'] spots found in 2.04 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=10,10'], results['n_spots_total'] spots found in 1.96 seconds

With the changes in this PR, spotfinding now takes ~1-1.25s per image:

$ dials.python pia.py /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=1,1'], results['n_spots_total'] spots found in 5.96 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=2,2'], results['n_spots_total'] spots found in 1.23 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=3,3'], results['n_spots_total'] spots found in 1.20 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=4,4'], results['n_spots_total'] spots found in 1.04 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=5,5'], results['n_spots_total'] spots found in 1.37 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=6,6'], results['n_spots_total'] spots found in 1.19 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=7,7'], results['n_spots_total'] spots found in 1.26 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=8,8'], results['n_spots_total'] spots found in 1.20 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=9,9'], results['n_spots_total'] spots found in 1.05 seconds
Starting PIA on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs
PIA completed on /dls/i04/data/2021/cm28182-1/20210226/swrm/gw/gridscantest_5k_500hz_extra_PIA/gridscantest_1.nxs with parameters ['d_max=40', 'scan_range=10,10'], results['n_spots_total'] spots found in 1.26 seconds

@graeme-winter
Copy link
Contributor

I can see how the difference is substantial for big grid scans, but I would welcome a similar analysis (for documentation) of e.g. an 80 image scan.

Will test the change set in a moment and verify that it makes no difference beyond time saving.

@rjgildea
Copy link
Contributor Author

rjgildea commented Apr 19, 2021

For a 25 x 20 ROI gridscan (500 images) there is still an appreciable speedup:

$ dials.python pia.py /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs 
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=1,1'], results['n_spots_total'] spots found in 3.26 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=2,2'], results['n_spots_total'] spots found in 0.56 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=3,3'], results['n_spots_total'] spots found in 0.52 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=4,4'], results['n_spots_total'] spots found in 0.41 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=5,5'], results['n_spots_total'] spots found in 0.52 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=6,6'], results['n_spots_total'] spots found in 0.51 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=7,7'], results['n_spots_total'] spots found in 0.41 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=8,8'], results['n_spots_total'] spots found in 0.50 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=9,9'], results['n_spots_total'] spots found in 0.48 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=10,10'], results['n_spots_total'] spots found in 0.47 seconds
$ dials.python pia.py /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs 
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=1,1'], results['n_spots_total'] spots found in 3.12 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=2,2'], results['n_spots_total'] spots found in 0.45 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=3,3'], results['n_spots_total'] spots found in 0.35 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=4,4'], results['n_spots_total'] spots found in 0.35 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=5,5'], results['n_spots_total'] spots found in 0.35 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=6,6'], results['n_spots_total'] spots found in 0.33 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=7,7'], results['n_spots_total'] spots found in 0.35 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=8,8'], results['n_spots_total'] spots found in 0.43 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=9,9'], results['n_spots_total'] spots found in 0.41 seconds
Starting PIA on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs
PIA completed on /dls/i04/data/2021/cm28182-2/xraycentring/manual/xrc_26.nxs with parameters ['d_max=40', 'scan_range=10,10'], results['n_spots_total'] spots found in 0.33 seconds

@ndevenish ndevenish merged commit 1df1ca8 into main Apr 19, 2021
@ndevenish ndevenish deleted the pia-work-faster branch April 19, 2021 15:49
Copy link
Member

@ndevenish ndevenish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess

ndevenish pushed a commit that referenced this pull request Apr 19, 2021
For hdf5 grid scans, the length of the experiment is equal to the number of
images in the grid scan. This results in a loop over every experiments in some
places of the code, which can add significant overhead.

Bikesheds-by: Nicholas Devenish <ndevenish@gmail.com>
Bikesheds-by: Markus Gerstel <markus.gerstel@diamond.ac.uk>
DiamondLightSource-build-server added a commit that referenced this pull request Apr 19, 2021
Bugfixes
--------

- ``dials.scale``: Fix crash when full-matrix minimisation is unsuccessful due to indeterminate normal equations. (#1653)
- ``dials.scale``: Fix crash when no reflections remain after initial filtering. (#1654)
- ``dials.export``: Fix error observed with ``format=mmcif`` for narrow sweeps with low symmetry (#1656)
- Fix image numbering inconsistency in ascii histogram of per-image spot counts (#1660)
- ``dials.find_spots_server``: Significant performance improvement for HDF5 grid scans. (#1665)
DiamondLightSource-build-server added a commit that referenced this pull request Apr 20, 2021
Bugfixes
--------

- ``dials.scale``: Fix crash when full-matrix minimisation is unsuccessful due to indeterminate normal equations. (#1653)
- ``dials.scale``: Fix crash when no reflections remain after initial filtering. (#1654)
- ``dials.export``: Fix error observed with ``format=mmcif`` for narrow sweeps with low symmetry (#1656)
- Fix image numbering inconsistency in ascii histogram of per-image spot counts (#1660)
- ``dials.find_spots_server``: Significant performance improvement for HDF5 grid scans. (#1665)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants