-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plots replace PyPlot in more examples #75
Conversation
…calFlows.jl into PlotsReplacePyPlot
Codecov Report
@@ Coverage Diff @@
## master #75 +/- ##
=======================================
Coverage 98.63% 98.63%
=======================================
Files 6 6
Lines 511 511
=======================================
Hits 504 504
Misses 7 7 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a major improvement of the original example and thus I support merging as-is.
I have raised a number of issues that could also be addressed before merging. But this improvements can also be saved for a subsequent PR.
|
||
if j%(1000/nsubs)==0; println(log) end | ||
|
||
p[1][1][:z] = Array(vs.zeta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the call to Array
is to ensure this works on the GPU? Could be good to add a comment here, especially since this is an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the problem I wanted you to help me out.
It's a bit complicated... if I don't us Array()
then @animate
does not use the first frame for vorticity. Very strange....
examples/barotropicqg_betaforced.jl
Outdated
@@ -55,7 +55,6 @@ kf, dkf = 14.0, 1.5 # forcing wavenumber and width of forcing ring in wavenu | |||
|
|||
gr = TwoDGrid(nx, Lx) | |||
|
|||
x, y = gridpoints(gr) | |||
Kr = [ gr.kr[i] for i=1:gr.nkr, j=1:gr.nl] | |||
|
|||
forcingcovariancespectrum = @. exp(-(sqrt(gr.Krsq)-kf)^2/(2*dkf^2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is hard to read and somewhat inscrutable. I might infer that kf
is a forcing wavenumber... but maybe writing forcing_wavenumber
would make this unambiguous. A comment that explains the point of this could help too. dkf
could be named forcing_bandwidth
.
examples/barotropicqg_betaforced.jl
Outdated
@@ -55,7 +55,6 @@ kf, dkf = 14.0, 1.5 # forcing wavenumber and width of forcing ring in wavenu | |||
|
|||
gr = TwoDGrid(nx, Lx) | |||
|
|||
x, y = gridpoints(gr) | |||
Kr = [ gr.kr[i] for i=1:gr.nkr, j=1:gr.nl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this line here? I'm not sure where Kr
is used, but the next line down seems unrelated.
examples/barotropicqg_betaforced.jl
Outdated
@@ -69,7 +68,7 @@ seed!(1234) # reset of the random number generator for reproducibility | |||
nothing # hide | |||
|
|||
# Next we construct function `calcF!` that computes a forcing realization every timestep | |||
function calcFq!(Fh, sol, t, cl, v, p, g) | |||
function calcFq!(Fh, sol, t, cl, vs, pr, gr) | |||
ξ = ArrayType(dev)(exp.(2π*im*rand(Float64, size(sol)))/sqrt(cl.dt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this allocate a new array every time-step?
You may want to allocate this array outside of the function, and then reference it as a global. I think declaring it const
will ensure that it runs on the GPU.
Another option is to make calcFq
a callable object that carries around a reference to ξ
(rather than referencing it as a global). The complication with this approach is that I think you will need to use Adapt
to get it to work on the GPU.
It's probably important to introduce what we think is best practice for this problem in the example. If its complicated, I think that means we simply need to add more simple examples to bridge the gap between simple cases and this sophisticated case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess referencing ξ
as a global is probably a simple and adequate solution for this problem.
I'm worried this forcing function won't work (at least not generically, or for large problems) on the GPU if it allocates every time-step, as there is no garbage collection on the GPU.
Also, the syntax on this fairly complex line is very compact, which makes it difficult to read. Putting some white space between the arithmetic operations will make this part of the code easier to read and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution is to avoid ξ
whatsoever by:
Fqh .= forcing_spectrum .* ArrayType(dev)(exp.(2π*im*rand(eltype(gr), size(sol)))/sqrt(clock.dt))
But that seems cumbersome, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... I tried @btime
and it made no difference...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your solution still allocates memory. Memory is allocated by this snippet:
rand(eltype(gr), size(sol))
Which is embedded within the expression above. This function creates an array of size(sol)
.
This function always executes on the CPU, as well.
Note that (the last time I checked) it is not possible to generate random numbers on the GPU within a kernel (and therefore without allocation GPU memory). This function will be slow on the GPU. It may be best to write this code only to execute on the CPU, so that people are not misled into thinking it would be a good idea to run this on the GPU. I'm pretty sure this code would be very slow on the GPU.
One way to avoid allocating memory on the CPU is probably to write a function that executes a loop over the two dimensions of ξ
. For example, you can fill an array with random values by writing
for i in eachindex(a)
@inbounds a[i] = rand()
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also important to note that ArrayType(GPU())(a)
creates a new array that has the data in a
on the GPU, and therefore allocates memory on the GPU. Because there is no garbage collection on the GPU, the GPU will run out of memory when this function is executed many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example:
julia> a = zeros(256, 256)
julia> @time a .= rand(Float64, size(a));
0.000175 seconds (9 allocations: 512.297 KiB)
julia> function fillrand!(a)
for i in eachindex(a)
@inbounds a[i] = rand()
end
nothing
end
julia> @time fillrand!(a)
0.009296 seconds (6.76 k allocations: 315.136 KiB)
julia> @time fillrand!(a)
0.000686 seconds (4 allocations: 160 bytes)
fillrand!
, which allocates 160 bytes rather than 512,297 bytes, is actually slower in this case... but it could speed up time-stepping since the garbage collector won't have to be called... I'm not sure. It probably depends on the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, best of all:
using Random
a = zeros(256, 256)
@time rand!(a)
@time rand!(a)
I get
julia> @time rand!(a);
0.000087 seconds (4 allocations: 160 bytes)
@glwagner this PR has become huge. I'm tempted to merge as soon as tests and docs are built properly and open issues regarding
What do you reckon? |
@glwagner, since this PR is about PyPlot.jl->Plots.jl, I'm merging and moving the discussion about memory allocation and forcing functions on CPU/GPU elsewhere. |
No description provided.