Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate independent blocks of experiments in refinement #2336

Merged

Conversation

dagewa
Copy link
Member

@dagewa dagewa commented Feb 6, 2023

For #2235.

This changeset identifies subsets of the experiment list that are independent (i.e. don't share any models with other subsets) and performs refinement separately for each of these groups. This is put under control of the user of dials.refine via the separate_independent_sets parameter (True by default).

It turned out to be much easier to do this at a high level before any Refiners are made rather than attempting a precise analysis of the structure of the refinement problem. For that reason, this is not attempted at all if any constraints or restraints are present, as these may link the models in a way that is not clear from the experiment list alone.

Note that refinement results are not the same if refinement is done in separate groups compared to all at once. There could be many reasons for this:

  • Outlier rejection may differ
  • The number of steps to convergence may differ between each group and with the global job
  • Refinement engines may take a different path to convergence depending on the parameters available for refinement. For example, the Levenberg Marquardt algorithm uses a damping parameter, updates to which depend on the overall sum of squared residuals

Currently draft to assess the impact of the change. @phyy-nx, I'm interested to hear if this will affect any of your use cases. Note, if anything is shared between all the models (such as a beam or detector) then this will have no effect. Groups of experiments must be strictly independent from other groups for this to trigger. Also, as this occurs at a high level (inside the run_dials_refine function of the command line program) it does not affect other usages of Refiners, such as indexing.

I have not yet added any new tests or attempted to perform the refinement of separate groups in parallel. That could be interesting, but could be expensive in terms of memory usage.

Still to do:
- Report on which ids are in each block (table in log file)
- Rejoin results after refining each block
- Report on overall RMSDs by experiment after rejoining
- Tests
Also, return None for refiner and history if refinement has been
done in disjoint blocks. In such a case, refuse to write out
certain types of debugging/analysis files.
compared with not, so put this behaviour under user control.
@dagewa dagewa linked an issue Feb 6, 2023 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Feb 6, 2023

Codecov Report

Merging #2336 (21d0ea7) into main (f117929) will increase coverage by 4.36%.
The diff coverage is 46.71%.

❗ Current head 21d0ea7 differs from pull request most recent head fda43ac. Consider uploading reports for the commit fda43ac to get more accurate results

@@            Coverage Diff             @@
##             main    #2336      +/-   ##
==========================================
+ Coverage   78.62%   82.99%   +4.36%     
==========================================
  Files         603      593      -10     
  Lines       73644    68676    -4968     
  Branches    10005     9246     -759     
==========================================
- Hits        57903    56998     -905     
+ Misses      13612     9539    -4073     
- Partials     2129     2139      +10     

@phyy-nx
Copy link
Member

phyy-nx commented Feb 6, 2023

Very interesting. I believe Brewster 2018, Figure 6 is exactly this use case. The blue curve is normal indexing with a slightly wrong model, the green is refining a detector model per image and the red is joint refinement with a single detector model. Both red and green go to the right location, but green has a wide distribution due to independence of each sample. This was with what would have been the old default of separate_independent_sets=False from this PR.

I deposited the expt files needed to make those plots but apparently i didn't deposit the refl files, so repeating that exactly isn't easy. I'll see what I can to reproduce it on this branch.

@dagewa
Copy link
Member Author

dagewa commented Feb 7, 2023

To reassure that differences in refinement are not due to a bug introduced by this change we can compare output if we use the basic Gauss–Newton engine, with no outlier rejection, and restricting the number of steps to make sure both blocks take the same number of steps (in fact, in this case, they do anyway):

dials.import $(dials.data get -q insulin)/insulin_1_0{01..22}.img
dials.find_spots imported.expt
dials.index imported.expt strong.refl\
  output.experiments=first.expt output.reflections=first.refl
dials.import $(dials.data get -q insulin)/insulin_1_0{23..45}.img
dials.find_spots imported.expt
dials.index imported.expt strong.refl\
  output.experiments=second.expt output.reflections=second.refl
dials.combine_experiments first.{expt,refl} second.{expt,refl}
dials.refine combined.{expt,refl} scan_varying=false\
  outlier.algorithm=null engine=GaussNewton max_iterations=2\
  output.log=first.refined.log
dials.refine combined.{expt,refl} scan_varying=false\
  outlier.algorithm=null engine=GaussNewton max_iterations=2\
  separate_independent_sets=False output.log=second.refined.log

In both cases the final RMSD table is

RMSDs by experiment:
+-------+--------+----------+----------+------------+
|   Exp |   Nref |   RMSD_X |   RMSD_Y |     RMSD_Z |
|    id |        |     (px) |     (px) |   (images) |
|-------+--------+----------+----------+------------|
|     0 |  11002 |  0.24532 |  0.35418 |    0.25377 |
|     1 |  11012 |  0.26789 |  0.38878 |    0.24988 |
+-------+--------+----------+----------+------------+

@phyy-nx
Copy link
Member

phyy-nx commented Feb 7, 2023

Ok, here's what I did. I re-indexed run 29 of the thermolysin data in Brewster 2018, but changing Z by 1mm. I then grabbed 1000 images and did hierarchy level 0 dials.refine on the main branch, with 1 or N detector models. Result looks like Figure 6 from the paper:

image

Blue is before refinement, orange is refinement with N detector models and green is refinement with 1 detector model. Orange is wider than green as expected.

Next I checked out this branch and refined with 1 detector model using the default separate_independent_sets=True, and with N detector models with separate_independent_sets=False. I verified I got identical unit cell distributions as on main. Note, the latter case took 4 minutes and 5 seconds.

Finally I refined with N detector models using separate_independent_sets=True. Here however I hit a problem. The program ran for a while and then my interactive node at LCLS became unresponsive and I was kicked off. I moved the data to the 64 core dials.lbl.gov and tried again. The program ran for 22 minutes before it crashed strangely with just the message Killed. I was watching top and memory slowly increased until it hit at least 40 GB of memory usage before it stopped.

Not sure where to go from here. I've been curious about this since we wrote that paper since refining independently should give the same result as refining together if the models are independent, but it doesn't, likely given the reasons you gave above @dagewa. So I hope there is a solution to this memory problem. Regardless, I've put the files in a google drive and shared them with you if it helps.

@dagewa
Copy link
Member Author

dagewa commented Feb 8, 2023

Thanks @phyy-nx, that's interesting. I'm not sure why the memory usage is so high this way, but I'll try to profile it.

@dagewa
Copy link
Member Author

dagewa commented Feb 9, 2023

I can see the memory problem. It's because of copy.deepcopy(params). I wanted that so that each refinement job got its own copy of the refinement PHIL parameters, because some of the Auto parameters would be modified by the Refiner. But guess what?

In [5]: params.input.reflections
Out[5]: [FilenameDataWrapper(filename='L785_pr2336/filtered.refl', data=<dials_array_family_flex_ext.reflection_table object at 0x7f0a9d560180>)]

Each copy of params also copies the entire reflection table (oh and the experiments too). So your job was duplicating a 30 MB reflection table 1000 times even before it got into refinement 😱

@dagewa
Copy link
Member Author

dagewa commented Feb 9, 2023

I can work around that, but I've realised there is another problem. In your refine.phil you have outlier.separate_experiments=False. This is supposed to join the reflections from all experiments together for outlier rejection purposes. But the splitting is done before outlier rejection, so as a consequence this parameter is effectively ignored.

I guess it would be better to do the outlier rejection step prior to splitting, so it is really only the refinement that is done in independent sets. However, this is going to be a somewhat more significant change. At the moment, outlier rejection is done as part of the construction of a Refiner, not as a separate step.

@dagewa
Copy link
Member Author

dagewa commented Feb 10, 2023

Hi @phyy-nx, with the change in 1e4763d this job:

time dials.refine L785_pr2336/filtered.{expt,refl} L785_pr2336/refine.phil outlier.algorithm=None

completes in 1m25s on my laptop and uses very little memory. So, that's as it should be. But it's still not the job you want to be able to run. I had to set outlier.algorithm=None because otherwise the SauterPoon outlier algorithm removes all reflections from the first experiment and refinement fails.

Separating the outlier rejection from the rest of the construction of the Refiner is not as bad as I first thought. I think I'll be able to do that quite easily.

@phyy-nx
Copy link
Member

phyy-nx commented Feb 10, 2023

Ok, good to hear!

1. _build_reflection_manager_and_predictor
2. _build_refiner

This allows to interrupt building the refiner after outlier rejection is
done. Add a method, reflections_after_outlier_rejection, which does this.
@dagewa
Copy link
Member Author

dagewa commented Mar 1, 2023

With those last changes, dials.refine will respect outlier.separate_experiments=False and do one round of global outlier rejection before getting stuck into refining each experiment independently. I ran this job:

time dials.refine L785_pr2336/filtered.{expt,refl} L785_pr2336/refine.phil output.log=branch.log output.experiments=branch.expt output.reflections=branch.refl

and the same on main. The resulting tables of RMSDs by experiment differ only by insignificant digits.

@dagewa dagewa marked this pull request as ready for review March 1, 2023 16:56
@dagewa
Copy link
Member Author

dagewa commented May 3, 2023

Hi @graeme-winter, I don't want to merge this without a review / testing from you, as this came from your use case in #2235

@graeme-winter
Copy link
Contributor

Hi @graeme-winter, I don't want to merge this without a review / testing from you, as this came from your use case in #2235

Noted, gimme a mo

@graeme-winter
Copy link
Contributor

OK, ran a quick test against a 10 experiment lysozyme data set ->


      117.09 real       132.30 user        12.40 sys
          1348579328  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
             5327929  page reclaims
                1911  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   5  signals received
                5601  voluntary context switches
               54666  involuntary context switches
        798101697148  instructions retired
        351156830466  cycles elapsed
          1116266496  peak memory footprint

Meanwhile,

On main

      151.22 real       200.34 user        13.05 sys
          1531678720  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
             5964217  page reclaims
                 723  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   5  signals received
                7458  voluntary context switches
              154298  involuntary context switches
       1138406642559  instructions retired
        480421803314  cycles elapsed
          1347272704  peak memory footprint

Seems to have substantially reduced the wall clock time

@graeme-winter
Copy link
Contributor

Modest but probably useful reduction in memory footprint

>>> main = 1531678720
>>> branch = 1348579328
>>> main - branch
183099392
>>> _ / (1024.**2)
174.6171875
>>> main / (1024.**2)
1460.72265625
>>> branch / (1024.**2)
1286.10546875
>>> 1 - (branch / main)
0.11954164382462662

@graeme-winter
Copy link
Contributor

Output:

< RMSDs by experiment:
< +-------+--------+----------+----------+------------+
< |   Exp |   Nref |   RMSD_X |   RMSD_Y |     RMSD_Z |
< |    id |        |     (px) |     (px) |   (images) |
< |-------+--------+----------+----------+------------|
< |     0 |   7676 |  0.13675 |  0.12385 |   0.048474 |
< |     1 |   8082 |  0.14012 |  0.12605 |   0.04502  |
< |     2 |   7039 |  0.14684 |  0.12941 |   0.050837 |
< |     3 |   6868 |  0.14581 |  0.12924 |   0.056064 |
< |     4 |   7950 |  0.14091 |  0.12468 |   0.047592 |
< |     5 |   6974 |  0.14146 |  0.12772 |   0.048678 |
< |     6 |   8174 |  0.13517 |  0.12453 |   0.043553 |
< |     8 |   8207 |  0.143   |  0.13285 |   0.052857 |
---
> |     0 |   7676 |  0.13675 |  0.12384 |   0.048474 |
> |     1 |   8083 |  0.14014 |  0.12608 |   0.045023 |
> |     2 |   7039 |  0.14694 |  0.12942 |   0.0508   |
> |     3 |   6867 |  0.14588 |  0.1291  |   0.056095 |
> |     4 |   7950 |  0.14086 |  0.12467 |   0.047603 |
> |     5 |   6972 |  0.14163 |  0.12754 |   0.048595 |
> |     6 |   8174 |  0.13507 |  0.12463 |   0.043555 |
> |     8 |   8206 |  0.143   |  0.13283 |   0.052848 |

I consider these results to be equivalent

Copy link
Contributor

@graeme-winter graeme-winter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change set delivers on promise of reducing resource requirements without meaningfully affecting the quality of output -> win

Would ask: in the loop-over-data-set part for refinement please can we print the experiment ID being refined, as a progress report for the impatient user?

Otherwise looks excellent, thank you

newsfragments/2336.feature Outdated Show resolved Hide resolved
src/dials/algorithms/refinement/refiner.py Show resolved Hide resolved
src/dials/command_line/refine.py Show resolved Hide resolved
src/dials/command_line/refine.py Show resolved Hide resolved
@dagewa
Copy link
Member Author

dagewa commented May 3, 2023

Thank you. Looks like a useful set of suggestions / comments, which I will work through soon.

@dagewa
Copy link
Member Author

dagewa commented May 4, 2023

I've addressed all comments now, so just waiting for the green ticks.

@dagewa dagewa merged commit 9918059 into main May 5, 2023
16 checks passed
@dagewa dagewa deleted the 2235-dialsrefine-possible-to-block-diagonalise-the-matrix branch May 5, 2023 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dials.refine: possible to block-diagonalise the matrix?
4 participants