Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid integrate memory limit #1392

Merged
merged 21 commits into from
Mar 16, 2021
Merged

Avoid integrate memory limit #1392

merged 21 commits into from
Mar 16, 2021

Conversation

jbeilstenedmands
Copy link
Contributor

This PR attempts to provide a workaround for the longstanding dials.integrate memory crash issues (#659).

When there is not enough memory to process the data, simply split the reflection table into subsets that will be within the memory limit and process each individually for the integrating step (profile modelling still done on all reflections). This only affects cases where the processing would currently fail, at the 'cost' of having to have multiple reads of the raw data.

I have been testing on the beta lactamase dataset initially, and will now look to perform further tests, others welcome to test also.

Set a large sigma_m to force higher memory requirements.

The following parameters have been modified:
        scan_range = 0 180
        integration {
          mp {
            nproc = 4
          }
        }
        profile {
          gaussian_rs {
            parameters {
              sigma_b = 0.04
              sigma_m = 1
            }
          }
        }
        input {
          experiments = ../../refined_experiments.json
          reflections = ../../refined.pickle
        }

Set memory just above what's needed to trigger (i.e. current behaviour)
ulimit -v 1500000:

WARN: Memory situation report:
          Available system memory (excluding swap)          :  8.1 GB
          Available swap memory                             :  1.4 GB
          Available system memory (including swap)          :  9.6 GB
          Maximum memory for processing (including swap)    :  8.6 GB
          Maximum memory for processing (excluding swap)    :  7.3 GB
          Memory ulimit detected                            :  1.5 GB
          Memory ulimit in use                              :  0.4 GB
          Available system memory (limited)                 :  1.2 GB
          Available system memory (incl. swap; limited)     :  1.2 GB
          Maximum memory for processing (exc. swap; limited):  1.0 GB
          Memory required per process                       :  0.9 GB
        Reducing number of processes from 4 to 1 due to memory constraints.

....

        Timing information for integration
        +-------------------+----------------+
        | Read time         | 30.28 seconds  |
        | Extract time      | 6.08 seconds   |
        | Pre-process time  | 0.14 seconds   |
        | Process time      | 101.34 seconds |
        | Post-process time | 0.00 seconds   |
        | Total time        | 140.20 seconds |
        +-------------------+----------------+

Set memory below what's needed to trigger
ulimit -v 1200000

        Predicted maximum memory needed exceeds available memory.
        Splitting reflection table into 2 subsets for processing
        
        Processing subset 1 of reflection table
         Split 2523 reflections overlapping job boundaries
        
  WARN: Memory situation report:
          Available system memory (excluding swap)          :  8.1 GB
          Available swap memory                             :  1.4 GB
          Available system memory (including swap)          :  9.5 GB
          Maximum memory for processing (including swap)    :  8.6 GB
          Maximum memory for processing (excluding swap)    :  7.3 GB
          Memory ulimit detected                            :  1.2 GB
          Memory ulimit in use                              :  0.4 GB
          Available system memory (limited)                 :  0.8 GB
          Available system memory (incl. swap; limited)     :  0.8 GB
          Maximum memory for processing (exc. swap; limited):  0.7 GB
          Memory required per process                       :  0.5 GB
        Reducing number of processes from 4 to 1 due to memory constraints.

....

Timing information for integration
        +-------------------+----------------+
        | Read time         | 60.33 seconds  |
        | Extract time      | 6.05 seconds   |
        | Pre-process time  | 0.16 seconds   |
        | Process time      | 105.18 seconds |
        | Post-process time | 0.00 seconds   |
        | Total time        | 174.73 seconds |
        +-------------------+----------------+

Output datasets are identical.

#659

In the integrator, calculate the maximum memory needed. If this exceeds the
available memory, then split the reflection table into random subsets and
process by performing multiple passes over the imagesets.

Applies only to regular integrators (not threaded) and will only take effect
in situations where processing currently exits.
@codecov
Copy link

codecov bot commented Sep 4, 2020

Codecov Report

Merging #1392 (c115135) into main (fdebdfd) will decrease coverage by 1.93%.
The diff coverage is 77.43%.

@@            Coverage Diff             @@
##             main    #1392      +/-   ##
==========================================
- Coverage   66.61%   64.68%   -1.94%     
==========================================
  Files         615      618       +3     
  Lines       68843    69747     +904     
  Branches     9585     9577       -8     
==========================================
- Hits        45858    45114     -744     
- Misses      21054    22845    +1791     
+ Partials     1931     1788     -143     

@graeme-winter
Copy link
Contributor

Before looking at the code: did you provide a mechanism in this to force this splitting behaviour? Seems that that would be very useful for evaluating the impact on the data.

Completely behind this as a stop-gap on the road to looking at a proper solution.

@jbeilstenedmands
Copy link
Contributor Author

There is no mechanism to force this, there should be no impact on the data which should be exactly the same.

@graeme-winter
Copy link
Contributor

👍 @jbeilstenedmands thank you

@jbeilstenedmands jbeilstenedmands marked this pull request as ready for review September 9, 2020 08:50
@dagewa
Copy link
Member

dagewa commented Sep 9, 2020

I like this approach 👍 I would like to test and provide a review, which I can do next week.

@jbeilstenedmands
Copy link
Contributor Author

This is now ready for review. A few extra things to note - I used a random splitting to divide the data as a way to quickly but approximately halve the amount of memory needed per image during processing, this could in theory be done more systematically but I didn't feel this was necessary.
I wasn't sure of a way to explicitly test hitting a memory limit during processing so would welcome suggestions here. This can be manually tested for now with the n_subset_split option, but I wasn't expecting this to be included upon merging.

Copy link
Member

@Anthchirp Anthchirp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few minor comments.

algorithms/integration/integrator.py Outdated Show resolved Hide resolved
algorithms/integration/__init__.py Outdated Show resolved Hide resolved
algorithms/integration/integrator.py Outdated Show resolved Hide resolved
algorithms/integration/processor.py Outdated Show resolved Hide resolved
algorithms/integration/processor.py Outdated Show resolved Hide resolved
jbeilstenedmands and others added 4 commits September 9, 2020 12:38
Co-authored-by: Markus Gerstel <2102431+Anthchirp@users.noreply.github.com>
Co-authored-by: Markus Gerstel <2102431+Anthchirp@users.noreply.github.com>
@Anthchirp
Copy link
Member

Sorry for the review spam. Looks like Github doesn't like rewrite suggestions for 40+ lines - it gave me no indication that it actually created those comments, which is how we ended up with 5 of them.

@graeme-winter
Copy link
Contributor

graeme-winter commented Sep 10, 2020

To aid reviewing things like this we now have a command-line tool which will match integration results and verify that they are identical (not polished)

dials/dials_scratch@b578b69

Grey-Area regular :) $ dials_scratch.match_identical_integration integrated.refl ../split-4/integrated.refl

returns nothing => all good

Have verified changes above make literally no difference to the integrated data in one case, will now review.

@dagewa
Copy link
Member

dagewa commented Sep 15, 2020

I tried testing as above with the beta lactamase data, but it tripped up - actually during the first stage with ulimit -v 1500000.

Processing subset 2 of reflection table
 Split 11061 reflections overlapping job boundaries

Memory situation report:
  Available system memory (excluding swap)          :  5.8 GB
  Available swap memory                             :  2.1 GB
  Available system memory (including swap)          :  8.0 GB
  Maximum memory for processing (including swap)    :  7.2 GB
  Maximum memory for processing (excluding swap)    :  5.3 GB
  Memory ulimit detected                            :  1.5 GB
  Memory ulimit in use                              :  1.1 GB
  Available system memory (limited)                 :  0.5 GB
  Available system memory (incl. swap; limited)     :  0.5 GB
  Maximum memory for processing (exc. swap; limited):  0.4 GB
  Memory required per process                       :  0.5 GB

Traceback (most recent call last):
  File "/home/fcx32934/sw/cctbx/build/../modules/dials/command_line/integrate.py", line 701, in <module>
    run()
  File "/home/fcx32934/sw/cctbx/build/../modules/dials/command_line/integrate.py", line 679, in run
    params, experiments, reference
  File "/home/fcx32934/sw/cctbx/build/../modules/dials/command_line/integrate.py", line 556, in run_integration
    reflections = integrator.integrate()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/integrator.py", line 1270, in integrate
    processed, this_time_info = _run_processor(table)
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/integrator.py", line 1227, in _run_processor
    reflections, _, time_info = processor.process()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/processor.py", line 324, in process
    self.manager.initialize()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/processor.py", line 635, in initialize
    self.compute_processors()
  File "/home/fcx32934/sw/cctbx/modules/dials/algorithms/integration/processor.py", line 887, in compute_processors
    % _average_bbox_size(self.reflections)
MemoryError: Please report this error to dials-support@lists.sourceforge.net: 
          Not enough memory to run integration jobs.  This could be caused by a
          highly mosaic crystal model.  Possible solutions include increasing the
          percentage of memory allowed for shoeboxes or decreasing the block size.
          The average shoebox size is 18 x 19 pixels x 17 images - is your crystal
          really this mosaic?

@dagewa
Copy link
Member

dagewa commented Sep 15, 2020

Looks like a ulimit -v 1600000 job gave me a completed 2-split run. I'll try a higher limit to get a no-split run for comparison. Meanwhile, the error above shows there is a grey area, where the split occurs but there is apparently still not enough memory to complete the job.

@jbeilstenedmands
Copy link
Contributor Author

Thanks for the report @dagewa. Perhaps there is enough memory to process the table but not also to hold everything else in memory as well. I believe integration has a parameter max_memory_usage "The maximum percentage of available memory to use for allocating shoebox arrays.", so maybe I need to include this here also.

@dagewa
Copy link
Member

dagewa commented Sep 15, 2020

Maybe it is worth sampling a range of ulimit -v values, say in 100 MB steps, just to see what the range of behaviour is?

array_family/flex_ext.py Outdated Show resolved Hide resolved
Base automatically changed from master to main March 1, 2021 11:12
@ndevenish ndevenish merged commit 8702065 into main Mar 16, 2021
@ndevenish ndevenish deleted the avoid_integrate_memory_limit branch March 16, 2021 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants