-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse scan-varying refinement #2022
Conversation
This is a hangover from before refactoring in 0f6b2e4
Add tests for scalar versions, add tests for mat3 * mat3.
- parts() - dot(other) - rotate_around_origin(direction, angle)
This turned out to be quite complicated. The intersection and intersection_i_seqs methods did not quite do the right thing because they require sorted input arrays, whereas this has to preserve the sorted order. I ended up with a loop in Python to do what I want, but this should be converted to C++ for speed.
Just calculate as required - it's cheap enough
This currently fails in downstream calculations, so keep the dense version around for now, to remove later once this is fixed.
With changes in _grads_model_loop, the sparse calculations are seen to give the same results as the dense versions. Changes still need to be made to _grads_detector_loop.
Use a map to look up values. Still slow and needs to move to C++
This will record the normal matrix at each step in the refinement history. In addition, leave a stub for recording the Jacobian structure in the history as well. Also tidy some of the code.
elements of the index arrays.
…al Python version At the moment, including a test of total execution time. The Python version is 2 times *faster*.
results. Unfortunately this test fails when the inputs have random order!
version. Use this to find corner cases where the results are not the same.
The NumPy method returns sorted values, which means that it only works for selection when the input arrays are themselves sorted. This is often the case but cannot be guaranteed. Therefore this method is not suitable for use in SparseFlex.select. Added a test to demonstrate the failure of this method when the input arrays are unsorted.
For reasons I don't understand, using the unordered_map a little differently makes a dramatic difference to the execution speed. The C++ version is now the fastest: about 8 times faster than the pure Python version and 4 times faster than the NumPy version. Added a test that demonstrates execution speed.
Tagged @graeme-winter for review, and testing with multi-turn datasets |
Noted, wilco, just processing a ... 10-turn data set right now |
Right, this is going to be a process...
timing for 1st run, on main, in P1 |
Not P1, I23
|
Branch; i23
|
P1 on the branch
|
A reduction in wallclock time is good to see, but it is surprising there is essentially no reduction in memory requirements. With the 360° test dataset I saw a reduction to 93% of the memory requirement - so not a large reduction but measurable. I thought this would improve with more parameters, but maybe the memory is going somewhere else. I'm looking at the 10-turn datasets now... |
This branch does not change the default refinement engine, only the sparseness of the Jacobian. One thing that seemed worth exploring is whether setting
No, that didn't help! Peak memory requirements for |
The purpose of this branch is to properly account for sparseness during gradient calculations for scan-varying refinement. The plots indicate that has been achieved. However, it appears this only translates to a modest improvement in overall performance of |
Ok, something strange going on here. I tried a full scan-varying run on branch vs main, and I get a very significant difference in run time and memory requirements. The branch takes 1/3 of the time and uses 1/3 of the memory than main. At least, according to Command: |
branch
main
The branch is better than 4 times faster and uses almost three times less memory I note a difference with @graeme-winter's runs is that I set |
Yep, something stupid is happening when
|
scan-varying macrocycle. Enables a big improvement in performance for default dials.refine jobs.
🙌 for code review. I had been testing setting
|
Sounds like a successful review process. I will update and re-run to verify |
Man needs to review dictionary under "S for shortly" |
"real soon now" 😆 |
@graeme-winter Are you still planning to run the branch against one of your test cases, or is my test sufficient? |
|
Thanks for checking! Looks like a reduction in memory usage of over four times and nearly 5 times shorter wallclock time. Not sure why the "PR macos python38" check is hanging on amber? |
Codecov Report
@@ Coverage Diff @@
## main #2022 +/- ##
==========================================
- Coverage 68.58% 68.44% -0.14%
==========================================
Files 649 654 +5
Lines 75364 76396 +1032
Branches 10793 10914 +121
==========================================
+ Hits 51686 52290 +604
- Misses 21665 22066 +401
- Partials 2013 2040 +27 |
This PR enables better use of sparse storage during gradient calculation for scan-varying parameterisations. This avoids the storage of a large number of explicit zero elements. For a 3600 image, 360° test dataset this reduces the overall memory requirement of a
dials.refine scan_varying=true
to 93% of that required on the main branch. This saving should be improved for larger scans, i.e. multi-turn data collection, as sparseness of the Jacobian increases. As well as the memory reduction, total wall clock time for the run is reduced to 77% of the main branch (see #1800).Fixes #1800.