Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUify the TwoDTurb module #25

Merged
merged 27 commits into from
Oct 8, 2019
Merged

GPUify the TwoDTurb module #25

merged 27 commits into from
Oct 8, 2019

Conversation

navidcy
Copy link
Member

@navidcy navidcy commented Aug 29, 2019

This PR modifies the TwoDTurb module so that it can run smoothly on both CPU and GPU.

@navidcy navidcy requested a review from glwagner August 29, 2019 22:45
@navidcy
Copy link
Member Author

navidcy commented Aug 29, 2019

It'll be good if we can remove the boilerplate in

struct Vars{Aphys, Atrans} <: TwoDTurbVars
zeta :: Aphys
u :: Aphys
v :: Aphys
zetah :: Atrans
uh :: Atrans
vh :: Atrans
end
struct ForcedVars{Aphys, Atrans} <: TwoDTurbVars
zeta :: Aphys
u :: Aphys
v :: Aphys
zetah :: Atrans
uh :: Atrans
vh :: Atrans
Fh :: Atrans
end
struct StochasticForcedVars{Aphys, Atrans} <: TwoDTurbVars
zeta :: Aphys
u :: Aphys
v :: Aphys
zetah :: Atrans
uh :: Atrans
vh :: Atrans
Fh :: Atrans
prevsol :: Atrans
end

I tried replacing it with

const physicalvars = [:zeta, :u, :v]
const transformvars = [ Symbol(var, :h) for var in physicalvars ]
const forcedvars = [:Fh]
const stochforcedvars = [:prevsol]

varspecs = cat(
  getfieldspecs(physicalvars, :(Aphys)),
  getfieldspecs(transformvars, :(Atrans)),
  dims=1)

forcedvarspecs = cat(varspecs, getfieldspecs(forcedvars, :(Atrans)), dims=1)

stochforcedvarspecs = cat(forcedvarspecs, getfieldspecs(stochforcedvars, :(Atrans)), dims=1)

struct Vars{Aphys, Atrans} <: TwoDTurbVars
  [ :( $(spec[1])::$(spec[2]) ) for spec in varspecs]
end

struct ForcedVars{Aphys, Atrans} <: TwoDTurbVars
  [ :( $(spec[1])::$(spec[2]) ) for spec in forcedvarspecs]
end

struct StochasticForcedVars{Aphys, Atrans} <: TwoDTurbVars
  [ :( $(spec[1])::$(spec[2]) ) for spec in stochforcedvarspecs]
end

but it didn't work... Did I do something wrong?

@codecov
Copy link

codecov bot commented Aug 29, 2019

Codecov Report

Merging #25 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #25      +/-   ##
==========================================
+ Coverage   98.63%   98.64%   +0.01%     
==========================================
  Files           6        6              
  Lines         513      518       +5     
==========================================
+ Hits          506      511       +5     
  Misses          7        7
Impacted Files Coverage Δ
src/twodturb.jl 100% <100%> (ø) ⬆️
src/utils.jl 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9621db...9276b42. Read the comment docs.

@coveralls
Copy link

coveralls commented Aug 29, 2019

Pull Request Test Coverage Report for Build 263

  • 38 of 38 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.6%) to 98.456%

Totals Coverage Status
Change from base Build 251: 0.6%
Covered Lines: 510
Relevant Lines: 518

💛 - Coveralls

@navidcy
Copy link
Member Author

navidcy commented Aug 30, 2019

Also, when a 2D grid is created on the GPU I get this message:

┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/.julia/packages/GPUArrays/J4c3Q/src/indexing.jl:16

and I couldn't figure out what causes it and how to make sure the code is efficient.

test/test_twodturb.jl Outdated Show resolved Hide resolved
@glwagner
Copy link
Member

@navidcy I think you want to use @eval somehow to reduce this boiler plate.

But there's another way --- we can define Vars generally, but give the optional fields special parameter types so that they can be set to nothing if need be. Not sure what the best way forward is.

@glwagner
Copy link
Member

Also, when a 2D grid is created on the GPU I get this message:

Ya that's annoying. I've been getting that too in different code using GPUifyLoops. Your guess is as good as mine.

@navidcy
Copy link
Member Author

navidcy commented Aug 30, 2019

@navidcy I think you want to use @eval somehow to reduce this boiler plate.

But there's another way --- we can define Vars generally, but give the optional fields special parameter types so that they can be set to nothing if need be. Not sure what the best way forward is.

can you try do that? I've lost my patience with @eval...

@navidcy
Copy link
Member Author

navidcy commented Aug 30, 2019

Btw, I tried running examples/twodturb_mcwilliams1984.jl on GPU and it was painfully slow...
Perhaps the diags being CPU arrays induce constant communication back and forth from GPU to CPU?

I removed saveoutput and run only with the diagnostics, i.e., stepforward!(prob, diags, nsubs). On my laptop I got:

step: 0200, t: 2, ΔE: 0.9909, ΔZ: 0.4213, τ: 0.04 min
step: 0400, t: 4, ΔE: 0.9884, ΔZ: 0.2665, τ: 0.06 min
step: 0600, t: 6, ΔE: 0.9873, ΔZ: 0.1979, τ: 0.07 min

while on bruno using GPU:

step: 0200, t: 2, ΔE: 0.9909, ΔZ: 0.4213, τ: 1.11 min
step: 0400, t: 4, ΔE: 0.9884, ΔZ: 0.2665, τ: 2.17 min
step: 0600, t: 6, ΔE: 0.9873, ΔZ: 0.1979, τ: 3.21 min

When I run without computing diagnostics on bruno, i.e., stepforward!(prob, nsubs), I got:

step: 0200, t: 2, ΔE: 1.0000, ΔZ: 1.0000, τ: 0.01 min
step: 0400, t: 4, ΔE: 1.0000, ΔZ: 1.0000, τ: 0.03 min
step: 0600, t: 6, ΔE: 1.0000, ΔZ: 1.0000, τ: 0.04 min

I think we should revisit the diagnostics.jl in FourierFlows.jl. I feel that the fact that diagnostics are saved in CPU arrays creates a bottleneck.

@glwagner
Copy link
Member

I feel that the fact that diagnostics are saved in CPU arrays creates a bottleneck.

I don't think this by itself is a problem (you can have a CPU array of CuArrays, for example). But it does sound like something is awry. Diagnostics also have have to well-written to be efficient.

@navidcy
Copy link
Member Author

navidcy commented Sep 1, 2019

I feel that the fact that diagnostics are saved in CPU arrays creates a bottleneck.

I don't think this by itself is a problem (you can have a CPU array of CuArrays, for example). But it does sound like something is awry. Diagnostics also have have to well-written to be efficient.

OK, so regarding this PR: should we merge? Anything else needed?

@navidcy navidcy changed the title GPUfy the TwoDTurb module GPUify the TwoDTurb module Sep 19, 2019
@glwagner glwagner merged commit 9d5b5a6 into master Oct 8, 2019
@glwagner glwagner deleted the GPUfyTwoDTurb branch October 8, 2019 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants