Skip to content

Conversation

@garrettwrong
Copy link
Collaborator

Adds a CUDA kernel for building common lines. Approx order of magnitude faster before tuning/optimization. Matches Python clmatrix. Have some concerns whether Python matches MATLAB. TBD

Currently has changes from other branches in review.

@garrettwrong garrettwrong added enhancement New feature or request Optimization Performance or Resource Optimzation GPU labels Aug 15, 2024
@garrettwrong garrettwrong self-assigned this Aug 15, 2024
@garrettwrong
Copy link
Collaborator Author

Implemented a better PF memory layout for a pretty good speedup. This takes the 89px JSB2017 problem from ~10hours for a common matrix line build to about 10 minutes.

I'll need to address the unit tests etc next.

Base automatically changed from sync3n to develop August 28, 2024 14:08
@garrettwrong garrettwrong force-pushed the pcl branch 3 times, most recently from 998fb0e to a79f8c4 Compare September 24, 2024 18:31
@garrettwrong
Copy link
Collaborator Author

garrettwrong commented Sep 24, 2024

Happy to report that for JSB 80S 179px class averages I am achieving mean aligned angular distance of 0.36 degrees with the CUDA implementations
at commit a79f8c4 as compared with the published MATLAB code. This is up to transposing the image data, and without S weighting or J weighting on. That is, just the base Sync3N algorithm which include building CL matrix, voting procedure, building S, and the global handedness sync.

These kernels can definitely be further optimized. I plan to revisit the S weighting soon. Couple other things coming up...

@garrettwrong
Copy link
Collaborator Author

This will be in 13.1

@codecov
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 29.24528% with 75 lines in your changes missing coverage. Please review.

Project coverage is 86.92%. Comparing base (c8a51b8) to head (c47b626).
Report is 45 commits behind head on develop.

Files with missing lines Patch % Lines
src/aspire/abinitio/commonline_base.py 37.70% 38 Missing ⚠️
src/aspire/abinitio/commonline_sync3n.py 13.95% 37 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1167      +/-   ##
===========================================
- Coverage    87.37%   86.92%   -0.46%     
===========================================
  Files          132      132              
  Lines        13639    13735      +96     
===========================================
+ Hits         11917    11939      +22     
- Misses        1722     1796      +74     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@garrettwrong
Copy link
Collaborator Author

First round of self review today. Want to look at it one more time and rerun larger manual tests. Then will open it up.

@garrettwrong garrettwrong changed the title WIP: CUDA build common lines CUDA build common lines Oct 11, 2024
@garrettwrong
Copy link
Collaborator Author

Total (unweighted) Sync3N algorithm is now right around 30 minutes on caf (A100 GPU) for 3000 179x179 single precision images. Doubles around 45m. (80S JSB class averages).

@garrettwrong garrettwrong marked this pull request as ready for review October 11, 2024 17:37
@garrettwrong garrettwrong requested a review from janden as a code owner October 11, 2024 17:37
@garrettwrong garrettwrong requested a review from j-c-c October 11, 2024 19:41
Copy link
Collaborator

@j-c-c j-c-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just a couple comments/questions.

@garrettwrong garrettwrong requested a review from j-c-c October 16, 2024 19:46
@garrettwrong
Copy link
Collaborator Author

Looks good. Just a few things.

Also, codecov seems to complain that GPU paths are not tested? I assume this is due to the auth problems you were talking about last week.

Yeah I did see that with the last push. I was hoping 1199 resolved but only fixed some of the problems there.

The GPU tests are running on caf/decaf.

That auth issue impacted all uploads and is why we had no codecov reports at all for a while. However, it isn't why the caf/decaf report isn't there.

In this case, the caf/decaf codecov upload is failing (says report is not found). I will look into it and try to patch in another tiny PR. I think it is probably relating to not using default directories. (defaults probably assumed by codecov). I just enabled ampere reports in 1199 with the hope to reports for the GPU code we've been adding (its not coverage reporting we previously had).

@garrettwrong garrettwrong requested a review from janden October 18, 2024 15:29
@garrettwrong garrettwrong merged commit 8011e3a into develop Oct 21, 2024
36 checks passed
@garrettwrong garrettwrong deleted the pcl branch October 21, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request GPU Optimization Performance or Resource Optimzation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants