Use stress divergence and diffusive flux divergence from Turbulence Closures #245

glwagner · 2019-05-27T19:27:03Z

This PR integrates the TurbulenceClosures module into time stepping and boundary conditions.

The need to abstractly deal with anisotropic transport coefficients for arbitrary boundaries introduces considerable complexity. This problem is solved by exporting NamedTuples that collect functions to calculate the diagonal components of viscosity and diffusivity at the necessary locations. The consequence of this implementation is viewed in the apply_bcs! function.

Only the ConstantAnisotropicDiffusivityclosure (corresponding to the former default) is currently tested.

In the future, we should probably make ConstantIsotropicDiffusivity the default, and remove the option to set the horizontal and vertical diffusion tensor components in the Model constructor.

…ion, cleans up code and adds comments

…rgence operators

…tions

…dary transport coefficients with tensor diffusivity abstraction

Merge branch 'master' into integrate-turbulence-closures

ali-ramadhan · 2019-05-27T19:56:16Z

src/time_steppers.jl

@@ -152,29 +154,29 @@ function calculate_interior_source_terms!(grid::Grid, constants, eos, cfg, u, v,
                @inbounds Gu[i, j, k] = (-u∇u(grid, u, v, w, i, j, k)
                                            + Gu_cori(grid, v, fCor, i, j, k)
                                            - δx_c2f(grid, pHY′, i, j, k) / (Δx * ρ₀)
-                                            + 𝜈∇²u(grid, u, 𝜈h, 𝜈v, i, j, k)
+                                            + ∂ⱼ_2ν_Σ₁ⱼ(i, j, k, grid, closure, eos, grav, u, v, w, T, S)


Dumb question but is Σ the strain-rate tensor?

quite! I will add some comments to turbulence_closures.jl

ali-ramadhan · 2019-05-27T20:01:03Z

Cool! Looks pretty neat considering how much tedious math is buried underneath. Thanks for cleaning up the Model constructor.

In the future, we should probably make ConstantIsotropicDiffusivity the default, and remove the option to set the horizontal and vertical diffusion tensor components in the Model constructor.

+1.

Do you want to close #120 once this is merged?

ali-ramadhan · 2019-05-27T20:04:38Z

src/time_steppers.jl

+            κ_bottom = κ(i, j, grid.Nz, grid, closure, eos, g, u, v, w, T, S)
+
+               apply_z_top_bc!(top_bc,    i, j, grid, ϕ, Gϕ, κ_top,    t, iteration, u, v, w, T, S)
+            apply_z_bottom_bc!(bottom_bc, i, j, grid, ϕ, Gϕ, κ_bottom, t, iteration, u, v, w, T, S)


It looks like if you want to impose a z boundary condition that does not depend on κ, you still have to calculate κ using the full closure which can be expensive if using an LES closure. Not sure how to get around this, probably some clever multiple dispatch?

This is probably fine for now as constant Smagorinsky isn't integrated yet, and the performance hit probably isn't big enough to worry about right now.

Hmm yes that is possible. It doesn't have to clever :-P --- we will just pass the function to calculate the transport coefficient into apply_z_top_bc!, which dispatches on the type of its first argument (the boundary condition now). That's possibly a better design anyways.

But actually, it might get compiled away if not used anyways --- the type of both boundary conditions is known to the function apply_z_bcs!.

I didn't put a significant amount of effort into this problem because I'm expecting a lot to simplify once we have halo regions.

johncmarshall54 · 2019-05-27T20:09:08Z

Shouldn't the default be anisotropic diffusion with the anisotrop scaled by delx, delz. This will be by far the most common model configuration.

…

On Mon, May 27, 2019, 4:04 PM Ali Ramadhan ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/time_steppers.jl <#245 (comment)> : > @loop for j in (1:grid.Ny; (blockIdx().y - 1) * blockDim().y + threadIdx().y) @loop for i in (1:grid.Nx; (blockIdx().x - 1) * blockDim().x + threadIdx().x) - apply_z_top_bc!(top_bc, i, j, grid, ϕ, Gϕ, κ, t, iteration, u, v, w, T, S) - apply_z_bottom_bc!(bottom_bc, i, j, grid, ϕ, Gϕ, κ, t, iteration, u, v, w, T, S) + + κ_top = κ(i, j, 1, grid, closure, eos, g, u, v, w, T, S) + κ_bottom = κ(i, j, grid.Nz, grid, closure, eos, g, u, v, w, T, S) + + apply_z_top_bc!(top_bc, i, j, grid, ϕ, Gϕ, κ_top, t, iteration, u, v, w, T, S) + apply_z_bottom_bc!(bottom_bc, i, j, grid, ϕ, Gϕ, κ_bottom, t, iteration, u, v, w, T, S) It looks like if you want to impose a z boundary condition that does not depend on κ, you still have to calculate κ using the full closure which can be expensive if using an LES closure. Not sure how to get around this, probably some clever multiple dispatch? This is probably fine for now as constant Smagorinsky isn't integrated yet, and the performance hit probably isn't big enough to worry about right now. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#245>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKXUEQXS3APE4PYPCYWURM3PXQ5FPANCNFSM4HP6AWMA> .

glwagner · 2019-05-27T20:30:07Z

@johncmarshall54 we can make that the default.

I wouldn't expect any constant diffusivity closure to be common for LES application considering the ready availability of better subgrid-scale models.

The purpose of making ConstantIsotropicDiffusivity the default has a purpose: I want users to choose certain closures (and realize that any specification which is not molecular diffusivity is, in fact, a sub grid model for turbulence), which will motivate them to think about the consequences of their choice.

glwagner · 2019-05-27T20:38:42Z

Do you want to close #120 once this is merged?

Let's wait --- I'm working on test cases for ConstantSmagorinsky now. The next PR should be the one.

codecov · 2019-05-28T00:13:04Z

Codecov Report

Merging #245 into master will decrease coverage by 4.77%.
The diff coverage is 88.54%.

@@            Coverage Diff             @@
##           master     #245      +/-   ##
==========================================
- Coverage   75.37%   70.59%   -4.78%     
==========================================
  Files          23       23              
  Lines         877      857      -20     
==========================================
- Hits          661      605      -56     
- Misses        216      252      +36

Impacted Files	Coverage Δ
src/Oceananigans.jl	`83.33% <ø> (+11.9%)`	⬆️
src/models.jl	`90.47% <100%> (ø)`	⬆️
src/closures/turbulence_closures.jl	`100% <100%> (ø)`	⬆️
src/closures/closure_operators.jl	`95.34% <100%> (+0.3%)`	⬆️
src/time_steppers.jl	`83.13% <100%> (+2.02%)`	⬆️
src/closures/constant_diffusivity_closures.jl	`63.33% <38.88%> (-36.67%)`	⬇️
src/model_configuration.jl	`0% <0%> (-100%)`	⬇️
src/operators/ops_regular_cartesian_grid.jl	`65.59% <0%> (-20.58%)`	⬇️
src/utils.jl	`12.5% <0%> (-13.59%)`	⬇️
src/fields.jl	`45.94% <0%> (-11.63%)`	⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22b7121...1fdaf74. Read the comment docs.

codecov · 2019-05-28T00:13:04Z

Codecov Report

Merging #245 into master will decrease coverage by 4.11%.
The diff coverage is 88.54%.

@@            Coverage Diff             @@
##           master     #245      +/-   ##
==========================================
- Coverage   74.71%   70.59%   -4.12%     
==========================================
  Files          23       23              
  Lines         866      857       -9     
==========================================
- Hits          647      605      -42     
- Misses        219      252      +33

Impacted Files	Coverage Δ
src/Oceananigans.jl	`83.33% <ø> (ø)`	⬆️
src/models.jl	`90.47% <100%> (ø)`	⬆️
src/closures/turbulence_closures.jl	`100% <100%> (ø)`	⬆️
src/closures/closure_operators.jl	`95.34% <100%> (+0.3%)`	⬆️
src/time_steppers.jl	`83.13% <100%> (-0.69%)`	⬇️
src/closures/constant_diffusivity_closures.jl	`63.33% <38.88%> (-36.67%)`	⬇️
src/model_configuration.jl	`0% <0%> (-100%)`	⬇️
src/operators/ops_regular_cartesian_grid.jl	`65.59% <0%> (-20.44%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0eea137...f273962. Read the comment docs.

glwagner · 2019-05-28T00:21:42Z

any suggestions for tests I might add to get the coverage up?

glwagner · 2019-05-28T01:19:06Z

Another q: should I shower the code in a hailstorm of @inlines? @vchuravy, I know we force inline on the GPU, but what about the CPU? Do we force inline the innards of functions annotated with @launch as well?

glwagner · 2019-05-28T01:44:14Z

@ali-ramadhan do you mind running some benchmarks to test for performance regression under this PR?

glwagner · 2019-05-28T02:17:18Z

Lastly, I am thinking that all the doc strings in closure_operators.jl are actually a detriment to readability and understandability. Thoughts?

ali-ramadhan · 2019-05-28T11:10:43Z

any suggestions for tests I might add to get the coverage up?

Hmmm, I think for now it's sufficient that the regression tests pass as this PR should preserve existing functionality. If you're going to implement more rigorous/high-level LES tests in the future then the coverage will go up. And it'll probably become clearer which unit tests are needed.

Lastly, I am thinking that all the doc strings in closure_operators.jl are actually a detriment to readability and understandability. Thoughts?

I kind of agree, but with the docstrings we can integrate them into the documentation, and if the docstrings have LaTeX then we can view the operators alongside the math in the docs. I guess it's readable documentation vs. more readable code? Good practice says we should probably keep them, but maybe we can separate them somehow? I guess right now we only read the code but maybe in the future we'll mainly be reading the docs and not the code.

ali-ramadhan · 2019-05-28T11:13:17Z

@ali-ramadhan do you mind running some benchmarks to test for performance regression under this PR?

Looks really bad, still waiting on GPU results but CPU is ~100x slower! We can't merge this as is.

Something must be wrong somewhere. Memory allocations went from 6.00 KiB per iteration to 1.04 GiB per iteration. It used to be resolution independent but now the allocation increases with the number of grid points. I'll edit this comment with the full benchmark timings when they're done.

I benchmarked master yesterday so I know it's still performing well (maybe 2-3% slower because of the extra stuff and fixes we recently introduced).

glwagner · 2019-05-28T11:13:28Z

I'm getting an error when trying to compile constant Smagorinsky:

ERROR: LoadError: InvalidIRError: compiling #12(RegularCartesianGrid{Float64,StepRangeLen{Float64,Base.TwicePrecision{Flo
at64},Base.TwicePrecision{Float64}}}, PlanetaryConstants{Float64}, LinearEquationOfState{Float64}, ConstantSmagorinsky{Fl
oat64}, CUDAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global
}, CUDAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CU
DAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnat
ive.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnative.C
uDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float64,3,CUDAnative.AS.Global}, CUDAnative.CuDevi
ceArray{Float64,3,CUDAnative.AS.Global}, Forcing{typeof(Oceananigans.zero_func),typeof(Oceananigans.zero_func),typeof(Oce
ananigans.zero_func),typeof(Oceananigans.zero_func),typeof(Oceananigans.zero_func)}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f__apply)
Stacktrace:
 [1] overdub at /data5/glwagner/.julia/packages/Cassette/xggAf/src/context.jl:260
 [2] ν_ccc at /data5/glwagner/Projects/Oceananigans.jl/src/closures/constant_smagorinsky.jl:109
 [3] ν_Σᵢⱼ at /data5/glwagner/Projects/Oceananigans.jl/src/closures/closure_operators.jl:405
 [4] ∂x_faa at /data5/glwagner/Projects/Oceananigans.jl/src/closures/closure_operators.jl:64
 [5] ∂x_2ν_Σ₁₁ at /data5/glwagner/Projects/Oceananigans.jl/src/closures/closure_operators.jl:409
 [6] ∂ⱼ_2ν_Σ₁ⱼ at /data5/glwagner/Projects/Oceananigans.jl/src/closures/closure_operators.jl:432
 [7] calculate_interior_source_terms! at /data5/glwagner/Projects/Oceananigans.jl/src/time_steppers.jl:152
 [8] #12 at /data5/glwagner/.julia/packages/GPUifyLoops/hBRid/src/context.jl:136
Reason: unsupported dynamic function invocation (call to Cassette.overdub)

I think this is specific to the package upgrades. I also got this error when running the rayleigh_benard_passive_tracer.jl script (associated with the function that forces salinity). Mysteriously, it does not throw this error when it runs the tests. Not sure where to raise an issue (if this is an issue)...

glwagner · 2019-05-28T11:16:06Z

@ali-ramadhan are those regressions on the CPU or GPU? On the CPU, we might need inline (we need inline to elide a number of potential memory allocation points I think). Not sure about GPU.

Edit: just realize you said its CPU.

Ok, well we might need to shower the code in @inlines, but it'd be good to get input from @vchuravy because I feel there should be a better solution (force inline on both CPU and GPU).

And of course allocation could also be a separate issue; we will see.

ali-ramadhan · 2019-05-28T11:17:45Z

Which version of GPUifyLoops are you using? Maybe post ] status and we can compare versions. I had issues with v0.2.4 but they showed up in the tests.

ali-ramadhan · 2019-05-28T11:18:16Z

@ali-ramadhan are those regressions on the CPU or GPU? On the CPU, we might need inline (we need inline to elide a number of potential memory allocation points I think). Not sure about GPU.

Just CPU right now. They're taking too long so I'm just gonna kill it and run the GPU tests by themselves.

I'd be pretty surprised if @inline gives a 100x performance boost but I'm still not super familiar with writing performant Julia code.

ali-ramadhan · 2019-05-28T11:23:56Z

@glwagner You might be right about the @inline thing as the GPU models run reasonably fast, albeit ~30% slower. But not a factor of ~100x.

Before:

──────────────────────────────────────────────────────────────────────────────────────────────────
            Oceananigans.jl benchmarks                    Time                   Allocations      
                                                  ──────────────────────   ───────────────────────
                Tot / % measured:                       227s / 45.6%           18.7GiB / 0.06%    

Section                                   ncalls     time   %tot     avg     alloc   %tot      avg
──────────────────────────────────────────────────────────────────────────────────────────────────
256x256x256 static ocean (CPU, Float32)       10    54.4s  52.5%   5.44s   60.0KiB  0.48%  6.00KiB
256x256x256 static ocean (CPU, Float64)       10    36.9s  35.6%   3.69s   77.8KiB  0.62%  7.78KiB
128x128x128 static ocean (CPU, Float32)       10    6.38s  6.16%   638ms   60.0KiB  0.48%  6.00KiB
128x128x128 static ocean (CPU, Float64)       10    4.04s  3.90%   404ms   77.8KiB  0.62%  7.78KiB
 64x 64x 64 static ocean (CPU, Float32)       10    748ms  0.72%  74.8ms   60.0KiB  0.48%  6.00KiB
 64x 64x 64 static ocean (CPU, Float64)       10    412ms  0.40%  41.2ms   77.8KiB  0.62%  7.78KiB
256x256x256 static ocean (GPU, Float64)       10    284ms  0.27%  28.4ms   1.59MiB  12.9%   163KiB
256x256x256 static ocean (GPU, Float32)       10    243ms  0.23%  24.3ms   1.35MiB  11.0%   139KiB
 32x 32x 32 static ocean (CPU, Float32)       10   80.3ms  0.08%  8.03ms   60.0KiB  0.48%  6.00KiB
 32x 32x 32 static ocean (CPU, Float64)       10   45.2ms  0.04%  4.52ms   77.8KiB  0.62%  7.78KiB
128x128x128 static ocean (GPU, Float64)       10   35.9ms  0.03%  3.59ms   1.59MiB  12.9%   163KiB
128x128x128 static ocean (GPU, Float32)       10   32.3ms  0.03%  3.23ms   1.35MiB  11.0%   139KiB
 64x 64x 64 static ocean (GPU, Float64)       10   6.54ms  0.01%   654μs   1.59MiB  12.9%   163KiB
 64x 64x 64 static ocean (GPU, Float32)       10   6.14ms  0.01%   614μs   1.35MiB  11.0%   139KiB
 32x 32x 32 static ocean (GPU, Float64)       10   5.77ms  0.01%   577μs   1.59MiB  12.9%   163KiB
 32x 32x 32 static ocean (GPU, Float32)       10   5.68ms  0.01%   568μs   1.35MiB  11.0%   139KiB
──────────────────────────────────────────────────────────────────────────────────────────────────

Now:

 ──────────────────────────────────────────────────────────────────────────────────────────────────
                Intermediate output                        Time                   Allocations      
                                                   ──────────────────────   ───────────────────────
                 Tot / % measured:                       116s / 69.4%            115GiB / 81.4%    

 Section                                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────────────
  64x 64x 64 static ocean (CPU, Float32)       10    70.3s  87.0%   7.03s   83.5GiB  88.9%  8.35GiB
  32x 32x 32 static ocean (CPU, Float32)       10    10.5s  13.0%   1.05s   10.4GiB  11.1%  1.04GiB
 ──────────────────────────────────────────────────────────────────────────────────────────────────

 ──────────────────────────────────────────────────────────────────────────────────────────────────
             Oceananigans.jl benchmarks                    Time                   Allocations      
                                                   ──────────────────────   ───────────────────────
                 Tot / % measured:                      63.8s / 1.31%           13.6GiB / 0.09%    

 Section                                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────────────
 256x256x256 static ocean (GPU, Float64)       10    366ms  43.9%  36.6ms   1.79MiB  13.6%   183KiB
 256x256x256 static ocean (GPU, Float32)       10    344ms  41.2%  34.4ms   1.51MiB  11.4%   155KiB
 128x128x128 static ocean (GPU, Float64)       10   49.8ms  5.96%  4.98ms   1.79MiB  13.6%   183KiB
 128x128x128 static ocean (GPU, Float32)       10   48.6ms  5.82%  4.86ms   1.51MiB  11.4%   155KiB
  64x 64x 64 static ocean (GPU, Float64)       10   7.78ms  0.93%   778μs   1.79MiB  13.6%   183KiB
  64x 64x 64 static ocean (GPU, Float32)       10   7.66ms  0.92%   766μs   1.51MiB  11.4%   155KiB
  32x 32x 32 static ocean (GPU, Float32)       10   5.35ms  0.64%   535μs   1.51MiB  11.4%   154KiB
  32x 32x 32 static ocean (GPU, Float64)       10   4.57ms  0.55%   457μs   1.79MiB  13.6%   183KiB
 ──────────────────────────────────────────────────────────────────────────────────────────────────

glwagner · 2019-05-28T11:24:30Z

which closure is this using?

ali-ramadhan · 2019-05-28T11:26:42Z

which closure is this using?

The model is constructed using

model = Model(N=(Nx, Ny, Nz), L=(Lx, Ly, Lz), arch=arch, float_type=float_type)

so looks like it should be using the constructor defaults

             ν = 1.05e-6, νh=ν, νv=ν, 
             κ = 1.43e-7, κh=κ, κv=κ, 
closure = ConstantAnisotropicDiffusivity(νh=νh, νv=νv, κh=κh, κv=κv)

glwagner · 2019-05-28T11:27:16Z

Hmm... with a vanilla closure, the change is completely encapsulated in the addition of two layers of abstraction (we are just calling a simple diffusion operator). So, let's figure out how to make the abstractions fast. I think the slow down for vanilla closures should be nil.

The 'abstraction slowdown' causes much larger problems with the complicated closures, so we need to solve that problem anyways.

Edit: I see your post, so what I said above holds.

ali-ramadhan · 2019-05-28T11:32:05Z

Hmmm, since this is a CPU issue might be a good time to just profile the code.

…nstants and eos to type of model

…f gain (?)

glwagner · 2019-05-29T11:09:26Z

@ali-ramadhan do you mind running the benchmarks again?

There are still some performance issues, especially with ConstantSmagorinsky. But I think it might be wise to deal with these in a future PR if we are happy with these changes.

ali-ramadhan · 2019-05-29T11:22:20Z

Some comments:

CPU is back to normal speed now!
Float32 is now just as fast as Float64 on the CPU which is an improvement.
GPU Float32 slowed down. It used to be 10-15% faster. Now it's on par with Float64.
Large CPU models are ~25% slower now.
Large GPU models are ~30% slower now.

──────────────────────────────────────────────────────────────────────────────────────────────────
             Oceananigans.jl benchmarks                    Time                   Allocations      
                                                   ──────────────────────   ───────────────────────
                 Tot / % measured:                       215s / 47.9%           19.4GiB / 0.07%    

 Section                                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────────────
 256x256x256 static ocean (CPU, Float64)       10    46.2s  44.9%   4.62s   60.3KiB  0.44%  6.03KiB
 256x256x256 static ocean (CPU, Float32)       10    44.7s  43.4%   4.47s   38.9KiB  0.28%  3.89KiB
 128x128x128 static ocean (CPU, Float64)       10    5.11s  4.97%   511ms   60.3KiB  0.44%  6.03KiB
 128x128x128 static ocean (CPU, Float32)       10    5.07s  4.92%   507ms   38.9KiB  0.28%  3.89KiB
  64x 64x 64 static ocean (CPU, Float64)       10    475ms  0.46%  47.5ms   60.3KiB  0.44%  6.03KiB
  64x 64x 64 static ocean (CPU, Float32)       10    458ms  0.44%  45.8ms   38.9KiB  0.28%  3.89KiB
 256x256x256 static ocean (GPU, Float64)       10    370ms  0.36%  37.0ms   1.79MiB  13.3%   183KiB
 256x256x256 static ocean (GPU, Float32)       10    366ms  0.36%  36.6ms   1.48MiB  11.0%   152KiB
  32x 32x 32 static ocean (CPU, Float32)       10   50.6ms  0.05%  5.06ms   38.9KiB  0.28%  3.89KiB
 128x128x128 static ocean (GPU, Float64)       10   49.9ms  0.05%  4.99ms   1.79MiB  13.3%   183KiB
 128x128x128 static ocean (GPU, Float32)       10   49.3ms  0.05%  4.93ms   1.48MiB  11.0%   152KiB
  32x 32x 32 static ocean (CPU, Float64)       10   48.9ms  0.05%  4.89ms   60.3KiB  0.44%  6.03KiB
  64x 64x 64 static ocean (GPU, Float64)       10   8.39ms  0.01%   839μs   1.79MiB  13.3%   183KiB
  64x 64x 64 static ocean (GPU, Float32)       10   7.93ms  0.01%   793μs   1.48MiB  11.0%   152KiB
  32x 32x 32 static ocean (GPU, Float64)       10   5.28ms  0.01%   528μs   1.79MiB  13.3%   183KiB
  32x 32x 32 static ocean (GPU, Float32)       10   4.90ms  0.00%   490μs   1.48MiB  11.0%   151KiB
 ──────────────────────────────────────────────────────────────────────────────────────────────────

CPU Float64 -> Float32 speedup:
 32x 32x 32 static ocean: 0.966
 64x 64x 64 static ocean: 1.038
128x128x128 static ocean: 1.009
256x256x256 static ocean: 1.035

GPU Float64 -> Float32 speedup:
 32x 32x 32 static ocean: 1.078
 64x 64x 64 static ocean: 1.058
128x128x128 static ocean: 1.013
256x256x256 static ocean: 1.011

CPU -> GPU speedup:
 32x 32x 32 static ocean (Float32): 10.340
 32x 32x 32 static ocean (Float64): 9.271
 64x 64x 64 static ocean (Float32): 57.721
 64x 64x 64 static ocean (Float64): 56.647
128x128x128 static ocean (Float32): 102.893
128x128x128 static ocean (Float64): 102.454
256x256x256 static ocean (Float32): 121.837
256x256x256 static ocean (Float64): 124.820

glwagner · 2019-05-29T11:48:36Z

Huh --- on my GPU I was getting 2x speed up for Float32. I'll have to check that again.

The slow down has to do with the abstractions I have introduced. 30% is a huge slow down for one function, indicative of a major problem --- probably a type inference issue?

I think that once this problem is solved the code may become faster because of the disambiguation this PR lends to the innermost kernels.

This problem becomes catastrophic for the closures, which make heavy use of the abstraction. So solving this problem is imperative.

We can restore the performance of the default closure by simply pasting the old operators into constant_diffusivity_closures.jl. However, I believe the issue with type inference is solvable.

Unfortunately, I'm in Cthulhu hell right now trying to figure it out...

I'm wondering whether these problems will vanish once we eliminate branches from the inmost functions...

glwagner · 2019-05-29T12:31:13Z

There are still some type problems --- the Adams-Bashforth parameter is explicitly Float64.

Use stress divergence and diffusive flux divergence from Turbulence Closures Former-commit-id: 0b36067

glwagner added 8 commits May 27, 2019 09:13

minors edits

d8a9006

refactors model to use TurbulenceClosures instead of model configurat…

4d679d3

…ion, cleans up code and adds comments

merges regression tests updated post-CliMA#241

ed1a1b5

defaults closure to ConstantAnisotropicDiffusivity

d2e177b

fixes bugs in second derivative operators and streamlines stress dive…

51f7ae9

…rgence operators

adds tensor diffusivity superabstraction for arbitrary boundary condi…

016dc05

…tions

drops-in diffusive operators from TurbulenceClosures; calculates boun…

7ccb4ca

…dary transport coefficients with tensor diffusivity abstraction

merging bugfixes and misc upstream changes

f273962

Merge branch 'master' into integrate-turbulence-closures

ali-ramadhan reviewed May 27, 2019

View reviewed changes

glwagner mentioned this pull request May 28, 2019

Unsupported dynamic function invocation #248

Closed

glwagner added 11 commits May 28, 2019 07:44

adds simple forcing script with user-defined salinity forcing

631b861

let the inlines rain down

9b26ac3

more inlines! and new differential operators

1891a88

edits simple forcing to avoid dynamic function invotation

780a6e5

adds ability to set type of constants and eos; constrains types of co…

d9c5e13

…nstants and eos to type of model

fixes type instability in low-level operators

f839702

fixes bug associated with the cursed reverse indexing convention

ff3273c

restores old operators for anisotropic diffusivity, causing large per…

7fe47fa

…f gain (?)

testing performance effects of different operators

4361c53

attempts, and failure to squeeze performance from abstract operators

8e9d5b8

reverts to abstract operators

4cda5de

Merge branch 'master' into integrate-turbulence-closures

1fdaf74

glwagner merged commit 0b36067 into CliMA:master May 29, 2019

glwagner deleted the integrate-turbulence-closures branch May 29, 2019 21:14

glwagner mentioned this pull request Jun 2, 2019

Better organization of rotation-related and buoyancy-related constants and parameters #217

Closed

arcavaliere pushed a commit to arcavaliere/Oceananigans.jl that referenced this pull request Nov 6, 2019

Merge pull request CliMA#245 from glwagner/integrate-turbulence-closures

65f1db7

Use stress divergence and diffusive flux divergence from Turbulence Closures Former-commit-id: 0b36067

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use stress divergence and diffusive flux divergence from Turbulence Closures #245

Use stress divergence and diffusive flux divergence from Turbulence Closures #245

glwagner commented May 27, 2019

ali-ramadhan May 27, 2019

glwagner May 27, 2019

ali-ramadhan commented May 27, 2019 •

edited

Loading

ali-ramadhan May 27, 2019

glwagner May 27, 2019

johncmarshall54 commented May 27, 2019 via email

glwagner commented May 27, 2019 •

edited

Loading

glwagner commented May 27, 2019

codecov bot commented May 28, 2019 •

edited

Loading

codecov bot commented May 28, 2019

glwagner commented May 28, 2019

glwagner commented May 28, 2019

glwagner commented May 28, 2019

glwagner commented May 28, 2019

ali-ramadhan commented May 28, 2019

ali-ramadhan commented May 28, 2019

glwagner commented May 28, 2019

glwagner commented May 28, 2019 •

edited

Loading

ali-ramadhan commented May 28, 2019

ali-ramadhan commented May 28, 2019 •

edited

Loading

ali-ramadhan commented May 28, 2019

glwagner commented May 28, 2019

ali-ramadhan commented May 28, 2019

glwagner commented May 28, 2019 •

edited

Loading

ali-ramadhan commented May 28, 2019

glwagner commented May 29, 2019

ali-ramadhan commented May 29, 2019 •

edited

Loading

glwagner commented May 29, 2019

glwagner commented May 29, 2019

Use stress divergence and diffusive flux divergence from Turbulence Closures #245

Use stress divergence and diffusive flux divergence from Turbulence Closures #245

Conversation

glwagner commented May 27, 2019

ali-ramadhan May 27, 2019

Choose a reason for hiding this comment

glwagner May 27, 2019

Choose a reason for hiding this comment

ali-ramadhan commented May 27, 2019 • edited Loading

ali-ramadhan May 27, 2019

Choose a reason for hiding this comment

glwagner May 27, 2019

Choose a reason for hiding this comment

johncmarshall54 commented May 27, 2019 via email

glwagner commented May 27, 2019 • edited Loading

glwagner commented May 27, 2019

codecov bot commented May 28, 2019 • edited Loading

Codecov Report

codecov bot commented May 28, 2019

Codecov Report

glwagner commented May 28, 2019

glwagner commented May 28, 2019

glwagner commented May 28, 2019

glwagner commented May 28, 2019

ali-ramadhan commented May 28, 2019

ali-ramadhan commented May 28, 2019

glwagner commented May 28, 2019

glwagner commented May 28, 2019 • edited Loading

ali-ramadhan commented May 28, 2019

ali-ramadhan commented May 28, 2019 • edited Loading

ali-ramadhan commented May 28, 2019

glwagner commented May 28, 2019

ali-ramadhan commented May 28, 2019

glwagner commented May 28, 2019 • edited Loading

ali-ramadhan commented May 28, 2019

glwagner commented May 29, 2019

ali-ramadhan commented May 29, 2019 • edited Loading

glwagner commented May 29, 2019

glwagner commented May 29, 2019

ali-ramadhan commented May 27, 2019 •

edited

Loading

glwagner commented May 27, 2019 •

edited

Loading

codecov bot commented May 28, 2019 •

edited

Loading

glwagner commented May 28, 2019 •

edited

Loading

ali-ramadhan commented May 28, 2019 •

edited

Loading

glwagner commented May 28, 2019 •

edited

Loading

ali-ramadhan commented May 29, 2019 •

edited

Loading