Skip to content

Conversation

@garrettwrong
Copy link
Collaborator

@garrettwrong garrettwrong commented Apr 2, 2024

See #1107 . This adapts some of the CL code brought over for Cn molecules which used the 3N sync matrix method to the C1 case.

Todo

  • Add S weighting, as was optional in the Matlab code.
  • Add J weighting, as was optional in the Matlab code.
  • Speedup, its borderline unusable for experimental runs and gives little feedback...
  • Docstrings
  • Tests
  • Refactor with other 3N based classes. Probably they (Cn) could inherit from this Sync3N one.

I quickly hacked this together enough to minimally function. I'm thinking to take on the 1-3 todo items while Josh is out and pass it along to him to finish integrating.

@garrettwrong garrettwrong added enhancement New feature or request Optimization Performance or Resource Optimzation labels Apr 2, 2024
@garrettwrong garrettwrong self-assigned this Apr 2, 2024
@garrettwrong garrettwrong force-pushed the sync3n branch 2 times, most recently from b522147 to acf553d Compare April 4, 2024 15:37
@garrettwrong
Copy link
Collaborator Author

Adding some notes.

I've created a CUDA kernel for the slow operation signs_times_v and minimal (cupy) launching code along with some simple auto enabling configuration. The speedups are 100x+ for relevant problems. Presently I am keeping that work in this branch (off of sync3n): https://github.com/ComputationalCryoEM/ASPIRE-Python/tree/sync3n_cupy

There are some concerns/bugs between the python, matlab, and mex versions that need to be addressed, but they would be minor changes. Josh might be working them out in his current D2 work now anyway.

The original Mex codes of this thead-loop structure all appear to have a common threading ABA bug... I have fixed that in my CUDA version.

I will be able to mimic the kernel and supporting code I wrote for the other similar mex codes (probability and triangle scores for the same speedups, but need to work on a few other projects first before I come back to that.

@garrettwrong
Copy link
Collaborator Author

Some updates.

Today I was able to complete a rough draft of the pairs_probabilities CUDA kernel, cupy launcher, and dispatch code for that function. Speedups are similar to the last kernel 100x+. Next is the triangle_scores kernel which should be similar.

After that I can move onto merging the _cupy branch into this branch and cleaning things up. With the accelerated code we should be able to actually test the Sync3N features. FWIW, the large experiment using the pure python/numpy sync3n implementation was kicked off before I started writing the kernel code... and the handedness synchronization was only half way done yesterday...

@garrettwrong
Copy link
Collaborator Author

For 1000 images ...

Size:	1000
Allclose inds?  True
Allclose arb?  True
gpu_time: 7.763748619996477
host_time: 8093.168460645
speedup: 1042.4305135022098

@garrettwrong
Copy link
Collaborator Author

Triangle scores is also fast now:

Size:   1000
Allclose cum_scores?  True
Allclose hist_scores?  True
gpu_time: 4.986900245989091
host_time: 6058.410942732007
speedup: 1214.8650752749106

Added a test file to compare between the host python and cupy launched CUDA codes.

Broke the CUDA codes out into their own source file so that it is easier to keep it clean.

Merged sync3n_cupy into this branch.

@garrettwrong garrettwrong force-pushed the sync3n branch 2 times, most recently from c36fec1 to b05d4c2 Compare April 26, 2024 19:41
@garrettwrong
Copy link
Collaborator Author

Fixed up some typing concerns.

Added some minimal docstrings.

Added a unit test that conditionally compares the host (numpy) implementation to gpu (cupy launched cuda). This test is coded to automatically skip when cupy is not able to be imported.

@garrettwrong
Copy link
Collaborator Author

Added 3 tests to compare dummy data with results from MATLAB as a sanity check of the low level function port.

Added some minimal up casting to allow running the underlying sync3n methods in both singles and doubles. This is tested with parameterized fixture. Would still recommend using doubles with our CL codes for now.

@garrettwrong
Copy link
Collaborator Author

Added a unit test based on CL Sync 2N and tried to complete most of the docstrings.

Found a small bug in the CL sync tests (doesn't appear it was actually testing both types, just running doubles twice).

@garrettwrong garrettwrong changed the title WIP: Sync3n Sync3n Initial Add Jul 3, 2024
@garrettwrong garrettwrong requested a review from j-c-c July 3, 2024 15:58
Copy link
Collaborator

@j-c-c j-c-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Just a few things.

@garrettwrong
Copy link
Collaborator Author

Going to wait until after hack24 is reviewed/merged to retest and move this forward. (They both touch/have gpu/cupy code.)

@garrettwrong garrettwrong force-pushed the sync3n branch 3 times, most recently from 97b4368 to df83b6e Compare July 30, 2024 13:10
@garrettwrong
Copy link
Collaborator Author

Running with JSB2017 Matlab class averages.

@garrettwrong
Copy link
Collaborator Author

Running with JSB2017 Matlab class averages.

For the 80s run at 89 pixels the Sweighting solve was infeasable (I'll look into that more, happens too much). Without Sweighting we did produce a reasonable molecule. Average aligned rotation error was ~16* wrt MATLAB, FSC was 80A wrt to EMDB. Not very good, but better than before...

I noticed MATLAB was run using twice as many n_theta and must also have been using non-default shift steps. I am retrying with those changes. Unfortunately I have not found the actual configuration used in the MATLAB logs yet... still looking. I'm basing the values on lines from areas of the log, but its not certain they were the same values for the different functions. Anyway, hoping this will at least yield some improvement.

@garrettwrong garrettwrong marked this pull request as ready for review August 9, 2024 19:11
janden
janden previously approved these changes Aug 27, 2024
@garrettwrong garrettwrong merged commit 758df6b into develop Aug 28, 2024
@garrettwrong garrettwrong deleted the sync3n branch August 28, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request GPU Optimization Performance or Resource Optimzation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants