-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using CUDA.jl #98
Using CUDA.jl #98
Conversation
… scalar operations on GPU
Codecov Report
@@ Coverage Diff @@
## master #98 +/- ##
=======================================
Coverage 91.89% 91.89%
=======================================
Files 20 20
Lines 1506 1506
=======================================
Hits 1384 1384
Misses 122 122
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using @views
is certainly just a convenience. We'll get better performance writing kernels. This is also true even in cases where we do not use @views
(kernels seem to be faster than naive broadcasting, at least right now, which may be an inefficiency in broadcasting that could be fixed sometime in the future). Note that if we write kernels and use KernelAbstractions
we also will get multithreaded speed up even on algebraic operations (not just FFTs).
OK, I see your point. It takes a bit away from the "Julia can seamlessly be GPU-ready" idea that I had in mind. But, what you are saying is that if the developers work a bit harder here (writing kernels) then the experience will be seamless for the users, right? |
@glwagner this PR is ready, but don't merge yet. First FourierFlows/FourierFlows.jl#198 needs to be merged, make a new release of FourierFlows.jl and then we should remove |
I think I was confusing. If we write kernels, they will be multithreaded and possibly will have slightly better GPU performance than broadcasting. Multithreading is not a GPU concept; multithreading will speed up CPU computations. Note that fused broadcasts could become multithreaded in the future. They just aren't right now. I think GPU broadcasting could be improved, perhaps. I am not implying that hand-written kernels are necessary for performant code. I apologize if I implied that. |
The fact that I had to use
@CUDA.allowscalar
so many times (especially in theMultiLaywerQG
) probably means something. It seems like@views
usegetindex
on the GPU; I was getting errors if I didn't include@CUDA.allowscalar
in front of expressions with@views
.!! This should only be merged after FourierFlows/FourierFlows.jl#198 is in master. !!
Closes #96