-
Notifications
You must be signed in to change notification settings - Fork 26
CUDA build common lines #1167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA build common lines #1167
Conversation
|
Implemented a better PF memory layout for a pretty good speedup. This takes the 89px JSB2017 problem from ~10hours for a common matrix line build to about 10 minutes. I'll need to address the unit tests etc next. |
998fb0e to
a79f8c4
Compare
|
Happy to report that for JSB 80S 179px class averages I am achieving mean aligned angular distance of 0.36 degrees with the CUDA implementations These kernels can definitely be further optimized. I plan to revisit the S weighting soon. Couple other things coming up... |
|
This will be in 13.1 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1167 +/- ##
===========================================
- Coverage 87.37% 86.92% -0.46%
===========================================
Files 132 132
Lines 13639 13735 +96
===========================================
+ Hits 11917 11939 +22
- Misses 1722 1796 +74 ☔ View full report in Codecov by Sentry. |
|
First round of self review today. Want to look at it one more time and rerun larger manual tests. Then will open it up. |
|
Total (unweighted) Sync3N algorithm is now right around 30 minutes on |
j-c-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just a couple comments/questions.
Yeah I did see that with the last push. I was hoping 1199 resolved but only fixed some of the problems there. The GPU tests are running on caf/decaf. That auth issue impacted all uploads and is why we had no codecov reports at all for a while. However, it isn't why the caf/decaf report isn't there. In this case, the caf/decaf codecov upload is failing (says report is not found). I will look into it and try to patch in another tiny PR. I think it is probably relating to not using default directories. (defaults probably assumed by codecov). I just enabled ampere reports in 1199 with the hope to reports for the GPU code we've been adding (its not coverage reporting we previously had). |
Adds a CUDA kernel for building common lines. Approx order of magnitude faster before tuning/optimization. Matches Python clmatrix. Have some concerns whether Python matches MATLAB. TBD
Currently has changes from other branches in review.