Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using CUDA.jl #98

Merged
merged 13 commits into from
Aug 30, 2020
Merged

Using CUDA.jl #98

merged 13 commits into from
Aug 30, 2020

Conversation

navidcy
Copy link
Member

@navidcy navidcy commented Aug 29, 2020

  • Drops CuArrays.jl and CUDAapi.jl and uses CUDA.jl instead
  • Disallows scalar operations on GPU

The fact that I had to use @CUDA.allowscalar so many times (especially in the MultiLaywerQG) probably means something. It seems like @views use getindex on the GPU; I was getting errors if I didn't include @CUDA.allowscalar in front of expressions with @views.

!! This should only be merged after FourierFlows/FourierFlows.jl#198 is in master. !!

Closes #96

@navidcy navidcy requested a review from glwagner August 29, 2020 11:54
@codecov
Copy link

codecov bot commented Aug 29, 2020

Codecov Report

Merging #98 into master will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #98   +/-   ##
=======================================
  Coverage   91.89%   91.89%           
=======================================
  Files          20       20           
  Lines        1506     1506           
=======================================
  Hits         1384     1384           
  Misses        122      122           
Impacted Files Coverage Δ
src/GeophysicalFlows.jl 100.00% <ø> (ø)
src/barotropicqg.jl 100.00% <100.00%> (ø)
src/barotropicqgql.jl 100.00% <100.00%> (ø)
src/multilayerqg.jl 100.00% <100.00%> (ø)
src/twodnavierstokes.jl 100.00% <100.00%> (ø)
src/utils.jl 100.00% <100.00%> (ø)
test/runtests.jl 89.70% <100.00%> (ø)
test/test_barotropicqg.jl 100.00% <100.00%> (ø)
test/test_barotropicqgql.jl 100.00% <100.00%> (ø)
test/test_multilayerqg.jl 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17443e2...3b7fc63. Read the comment docs.

Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using @views is certainly just a convenience. We'll get better performance writing kernels. This is also true even in cases where we do not use @views (kernels seem to be faster than naive broadcasting, at least right now, which may be an inefficiency in broadcasting that could be fixed sometime in the future). Note that if we write kernels and use KernelAbstractions we also will get multithreaded speed up even on algebraic operations (not just FFTs).

src/barotropicqg.jl Outdated Show resolved Hide resolved
src/barotropicqg.jl Outdated Show resolved Hide resolved
src/barotropicqgql.jl Outdated Show resolved Hide resolved
src/barotropicqgql.jl Outdated Show resolved Hide resolved
src/barotropicqgql.jl Outdated Show resolved Hide resolved
@navidcy
Copy link
Member Author

navidcy commented Aug 29, 2020

Using @views is certainly just a convenience. We'll get better performance writing kernels. This is also true even in cases where we do not use @views (kernels seem to be faster than naive broadcasting, at least right now, which may be an inefficiency in broadcasting that could be fixed sometime in the future). Note that if we write kernels and use KernelAbstractions we also will get multithreaded speed up even on algebraic operations (not just FFTs).

OK, I see your point. It takes a bit away from the "Julia can seamlessly be GPU-ready" idea that I had in mind. But, what you are saying is that if the developers work a bit harder here (writing kernels) then the experience will be seamless for the users, right?
But then what's then this is the same for PyCUDA and things like that, no?

@navidcy
Copy link
Member Author

navidcy commented Aug 29, 2020

@glwagner this PR is ready, but don't merge yet. First FourierFlows/FourierFlows.jl#198 needs to be merged, make a new release of FourierFlows.jl and then we should remove Manifest.toml from here and add a [compat] entry in Project.toml to force GeophysicalFlows.jl use the latest release of FourierFlows.jl.

@glwagner
Copy link
Member

glwagner commented Aug 30, 2020

OK, I see your point. It takes a bit away from the "Julia can seamlessly be GPU-ready" idea that I had in mind. But, what you are saying is that if the developers work a bit harder here (writing kernels) then the experience will be seamless for the users, right?
But then what's then this is the same for PyCUDA and things like that, no?

I think I was confusing. If we write kernels, they will be multithreaded and possibly will have slightly better GPU performance than broadcasting. Multithreading is not a GPU concept; multithreading will speed up CPU computations.

Note that fused broadcasts could become multithreaded in the future. They just aren't right now. I think GPU broadcasting could be improved, perhaps.

I am not implying that hand-written kernels are necessary for performant code. I apologize if I implied that.

@navidcy navidcy merged commit 3b21b1d into master Aug 30, 2020
@navidcy navidcy deleted the UseCUDAjl branch September 18, 2020 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop CuArrays.jl from deps and move to CUDA.jl
2 participants