Avoid integrate memory limit #1392

jbeilstenedmands · 2020-09-04T07:53:13Z

This PR attempts to provide a workaround for the longstanding dials.integrate memory crash issues (#659).

When there is not enough memory to process the data, simply split the reflection table into subsets that will be within the memory limit and process each individually for the integrating step (profile modelling still done on all reflections). This only affects cases where the processing would currently fail, at the 'cost' of having to have multiple reads of the raw data.

I have been testing on the beta lactamase dataset initially, and will now look to perform further tests, others welcome to test also.

Set a large sigma_m to force higher memory requirements.

The following parameters have been modified:
        scan_range = 0 180
        integration {
          mp {
            nproc = 4
          }
        }
        profile {
          gaussian_rs {
            parameters {
              sigma_b = 0.04
              sigma_m = 1
            }
          }
        }
        input {
          experiments = ../../refined_experiments.json
          reflections = ../../refined.pickle
        }

Set memory just above what's needed to trigger (i.e. current behaviour)
ulimit -v 1500000:

WARN: Memory situation report:
          Available system memory (excluding swap)          :  8.1 GB
          Available swap memory                             :  1.4 GB
          Available system memory (including swap)          :  9.6 GB
          Maximum memory for processing (including swap)    :  8.6 GB
          Maximum memory for processing (excluding swap)    :  7.3 GB
          Memory ulimit detected                            :  1.5 GB
          Memory ulimit in use                              :  0.4 GB
          Available system memory (limited)                 :  1.2 GB
          Available system memory (incl. swap; limited)     :  1.2 GB
          Maximum memory for processing (exc. swap; limited):  1.0 GB
          Memory required per process                       :  0.9 GB
        Reducing number of processes from 4 to 1 due to memory constraints.

....

        Timing information for integration
        +-------------------+----------------+
        | Read time         | 30.28 seconds  |
        | Extract time      | 6.08 seconds   |
        | Pre-process time  | 0.14 seconds   |
        | Process time      | 101.34 seconds |
        | Post-process time | 0.00 seconds   |
        | Total time        | 140.20 seconds |
        +-------------------+----------------+

Set memory below what's needed to trigger
ulimit -v 1200000

        Predicted maximum memory needed exceeds available memory.
        Splitting reflection table into 2 subsets for processing
        
        Processing subset 1 of reflection table
         Split 2523 reflections overlapping job boundaries
        
  WARN: Memory situation report:
          Available system memory (excluding swap)          :  8.1 GB
          Available swap memory                             :  1.4 GB
          Available system memory (including swap)          :  9.5 GB
          Maximum memory for processing (including swap)    :  8.6 GB
          Maximum memory for processing (excluding swap)    :  7.3 GB
          Memory ulimit detected                            :  1.2 GB
          Memory ulimit in use                              :  0.4 GB
          Available system memory (limited)                 :  0.8 GB
          Available system memory (incl. swap; limited)     :  0.8 GB
          Maximum memory for processing (exc. swap; limited):  0.7 GB
          Memory required per process                       :  0.5 GB
        Reducing number of processes from 4 to 1 due to memory constraints.

....

Timing information for integration
        +-------------------+----------------+
        | Read time         | 60.33 seconds  |
        | Extract time      | 6.05 seconds   |
        | Pre-process time  | 0.16 seconds   |
        | Process time      | 105.18 seconds |
        | Post-process time | 0.00 seconds   |
        | Total time        | 174.73 seconds |
        +-------------------+----------------+

Output datasets are identical.

#659 In the integrator, calculate the maximum memory needed. If this exceeds the available memory, then split the reflection table into random subsets and process by performing multiple passes over the imagesets. Applies only to regular integrators (not threaded) and will only take effect in situations where processing currently exits.

codecov · 2020-09-04T08:20:39Z

Codecov Report

Merging #1392 (c115135) into main (fdebdfd) will decrease coverage by 1.93%.
The diff coverage is 77.43%.

@@            Coverage Diff             @@
##             main    #1392      +/-   ##
==========================================
- Coverage   66.61%   64.68%   -1.94%     
==========================================
  Files         615      618       +3     
  Lines       68843    69747     +904     
  Branches     9585     9577       -8     
==========================================
- Hits        45858    45114     -744     
- Misses      21054    22845    +1791     
+ Partials     1931     1788     -143

graeme-winter · 2020-09-04T09:44:58Z

Before looking at the code: did you provide a mechanism in this to force this splitting behaviour? Seems that that would be very useful for evaluating the impact on the data.

Completely behind this as a stop-gap on the road to looking at a proper solution.

jbeilstenedmands · 2020-09-04T09:52:19Z

There is no mechanism to force this, there should be no impact on the data which should be exactly the same.

…ting

graeme-winter · 2020-09-04T11:07:58Z

👍 @jbeilstenedmands thank you

dagewa · 2020-09-09T08:53:46Z

I like this approach 👍 I would like to test and provide a review, which I can do next week.

jbeilstenedmands · 2020-09-09T08:56:53Z

This is now ready for review. A few extra things to note - I used a random splitting to divide the data as a way to quickly but approximately halve the amount of memory needed per image during processing, this could in theory be done more systematically but I didn't feel this was necessary.
I wasn't sure of a way to explicitly test hitting a memory limit during processing so would welcome suggestions here. This can be manually tested for now with the n_subset_split option, but I wasn't expecting this to be included upon merging.

Anthchirp

just a few minor comments.

algorithms/integration/integrator.py

algorithms/integration/__init__.py

algorithms/integration/integrator.py

algorithms/integration/processor.py

Co-authored-by: Markus Gerstel <2102431+Anthchirp@users.noreply.github.com>

Anthchirp · 2020-09-09T12:01:20Z

Sorry for the review spam. Looks like Github doesn't like rewrite suggestions for 40+ lines - it gave me no indication that it actually created those comments, which is how we ended up with 5 of them.

graeme-winter · 2020-09-10T06:13:39Z

To aid reviewing things like this we now have a command-line tool which will match integration results and verify that they are identical (not polished)

dials/dials_scratch@b578b69

Grey-Area regular :) $ dials_scratch.match_identical_integration integrated.refl ../split-4/integrated.refl

returns nothing => all good

Have verified changes above make literally no difference to the integrated data in one case, will now review.

dagewa · 2020-09-15T11:14:37Z

I tried testing as above with the beta lactamase data, but it tripped up - actually during the first stage with ulimit -v 1500000.

Processing subset 2 of reflection table
 Split 11061 reflections overlapping job boundaries

Memory situation report:
  Available system memory (excluding swap)          :  5.8 GB
  Available swap memory                             :  2.1 GB
  Available system memory (including swap)          :  8.0 GB
  Maximum memory for processing (including swap)    :  7.2 GB
  Maximum memory for processing (excluding swap)    :  5.3 GB
  Memory ulimit detected                            :  1.5 GB
  Memory ulimit in use                              :  1.1 GB
  Available system memory (limited)                 :  0.5 GB
  Available system memory (incl. swap; limited)     :  0.5 GB
  Maximum memory for processing (exc. swap; limited):  0.4 GB
  Memory required per process                       :  0.5 GB

Traceback (most recent call last):
  File "/home/fcx32934/sw/cctbx/build/../modules/dials/command_line/integrate.py", line 701, in <module>
    run()
  File "/home/fcx32934/sw/cctbx/build/../modules/dials/command_line/integrate.py", line 679, in run
    params, experiments, reference
  File "/home/fcx32934/sw/cctbx/build/../modules/dials/command_line/integrate.py", line 556, in run_integration
    reflections = integrator.integrate()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/integrator.py", line 1270, in integrate
    processed, this_time_info = _run_processor(table)
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/integrator.py", line 1227, in _run_processor
    reflections, _, time_info = processor.process()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/processor.py", line 324, in process
    self.manager.initialize()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/processor.py", line 635, in initialize
    self.compute_processors()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/processor.py", line 887, in compute_processors
    % _average_bbox_size(self.reflections)
MemoryError: Please report this error to dials-support@lists.sourceforge.net: 
          Not enough memory to run integration jobs.  This could be caused by a
          highly mosaic crystal model.  Possible solutions include increasing the
          percentage of memory allowed for shoeboxes or decreasing the block size.
          The average shoebox size is 18 x 19 pixels x 17 images - is your crystal
          really this mosaic?

dagewa · 2020-09-15T11:30:44Z

Looks like a ulimit -v 1600000 job gave me a completed 2-split run. I'll try a higher limit to get a no-split run for comparison. Meanwhile, the error above shows there is a grey area, where the split occurs but there is apparently still not enough memory to complete the job.

jbeilstenedmands · 2020-09-15T11:33:37Z

Thanks for the report @dagewa. Perhaps there is enough memory to process the table but not also to hold everything else in memory as well. I believe integration has a parameter max_memory_usage "The maximum percentage of available memory to use for allocating shoebox arrays.", so maybe I need to include this here also.

dagewa · 2020-09-15T11:37:12Z

Maybe it is worth sampling a range of ulimit -v values, say in 100 MB steps, just to see what the range of behaviour is?

array_family/flex_ext.py

Is more conservative and accounts for max_memory_usage phil parameter

…ory_limit

…y_limit

jbeilstenedmands added 5 commits September 4, 2020 08:19

Move memory assessment code to separate function.

befc481

Add function to calculate max memory needed during integration

a88c36f

Add __add__ method for timing info class

143a543

Add random_split method to reflection tables

540fe6b

graeme-winter self-requested a review September 4, 2020 09:43

Add n_subset_split option to force reflection table splitting for tes…

d90a5f3

…ting

jbeilstenedmands added 4 commits September 9, 2020 09:29

Add test for reflection_table.random_split()

1646125

Add test for TimingInfo class

63aff41

Test max_memory_needed function.

612cef8

Add news

58ef678

jbeilstenedmands marked this pull request as ready for review September 9, 2020 08:50

jbeilstenedmands requested a review from dagewa September 9, 2020 08:51

Anthchirp reviewed Sep 9, 2020

View reviewed changes

jbeilstenedmands and others added 4 commits September 9, 2020 12:38

Raise helpful error message rather than assertion

c9e1002

Co-authored-by: Markus Gerstel <2102431+Anthchirp@users.noreply.github.com>

Add helpful assertion message

03e0bba

Co-authored-by: Markus Gerstel <2102431+Anthchirp@users.noreply.github.com>

Move division into helper function.

a6e62d7

Be more specific in exception handling.

bad8e64

github-actions bot added the PR: merge conflicts label Sep 10, 2020

dagewa reviewed Sep 15, 2020

View reviewed changes

array_family/flex_ext.py Outdated Show resolved Hide resolved

jbeilstenedmands and others added 3 commits September 17, 2020 15:23

Use available immediate limit.

0fac1a5

Is more conservative and accounts for max_memory_usage phil parameter

Move setting random seed from inside reflection table method

5c17a79

Merge branch 'master' into avoid_integrate_memory_limit

c115135

github-actions bot removed the PR: merge conflicts label Sep 17, 2020

github-actions bot added the PR: merge conflicts label Jan 21, 2021

Merge remote-tracking branch 'origin/master' into avoid_integrate_mem…

2867557

…ory_limit

github-actions bot removed the PR: merge conflicts label Jan 21, 2021

github-actions bot added the PR: merge conflicts label Feb 17, 2021

Base automatically changed from master to main March 1, 2021 11:12

ndevenish added 2 commits March 12, 2021 10:25

Merge remote-tracking branch 'origin/main' into avoid_integrate_memor…

99c98da

…y_limit

fixup new fstring and move format string

7b65369

github-actions bot removed the PR: merge conflicts label Mar 12, 2021

Update 1392.feature

902d480

ndevenish merged commit 8702065 into main Mar 16, 2021

ndevenish deleted the avoid_integrate_memory_limit branch March 16, 2021 09:40

ndevenish mentioned this pull request Mar 16, 2021

Fix fetching array ranges from ImageSet in integration #1631

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid integrate memory limit #1392

Avoid integrate memory limit #1392

jbeilstenedmands commented Sep 4, 2020

codecov bot commented Sep 4, 2020 •

edited

graeme-winter commented Sep 4, 2020

jbeilstenedmands commented Sep 4, 2020

graeme-winter commented Sep 4, 2020

dagewa commented Sep 9, 2020

jbeilstenedmands commented Sep 9, 2020

Anthchirp left a comment

Anthchirp commented Sep 9, 2020

graeme-winter commented Sep 10, 2020 •

edited

dagewa commented Sep 15, 2020

dagewa commented Sep 15, 2020

jbeilstenedmands commented Sep 15, 2020

dagewa commented Sep 15, 2020

Avoid integrate memory limit #1392

Avoid integrate memory limit #1392

Conversation

jbeilstenedmands commented Sep 4, 2020

codecov bot commented Sep 4, 2020 • edited

Codecov Report

graeme-winter commented Sep 4, 2020

jbeilstenedmands commented Sep 4, 2020

graeme-winter commented Sep 4, 2020

dagewa commented Sep 9, 2020

jbeilstenedmands commented Sep 9, 2020

Anthchirp left a comment

Choose a reason for hiding this comment

Anthchirp commented Sep 9, 2020

graeme-winter commented Sep 10, 2020 • edited

dagewa commented Sep 15, 2020

dagewa commented Sep 15, 2020

jbeilstenedmands commented Sep 15, 2020

dagewa commented Sep 15, 2020

codecov bot commented Sep 4, 2020 •

edited

graeme-winter commented Sep 10, 2020 •

edited