Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model #1210

francispoulin · 2020-11-25T17:08:57Z

This generalized the runge_kutta3.jl script to work for any model. Also, it combines the time stepping for the fields, say velocities, and the tracers into one loop. Thanks to @glwagner and @ali-ramadhan !

I do get the following warnings that I would like to eliminate. Any suggestions how i can do that?

WARNING: ignoring conflicting import of Models.fields into IncompressibleModels
WARNING: using Models.fields in module Oceananigans conflicts with an existing identifier.

…th Models and or fields not found.

…hold be doing.

…he same time. Will do tests next.

francispoulin · 2020-11-25T17:44:00Z

Given some checks were not successful I now see that there are some problems with this PR. I glanced at the errors but I don't prettend to understand what's going on.

ali-ramadhan · 2020-11-25T18:06:36Z

Ah I think the method conflict warnings were because function fields end was being defined in src/Oceananigans.jl and in src/Models/Models.jl. I think you meant to only define it in src/Oceananigans.jl since it's needed in src/TimeSteppers.jl (which is defined before src/Models/Models.jl).

By the time I figured this out I already modified quite a bit so I just pushed my changes, but warnings should be gone now!

ali-ramadhan · 2020-11-25T18:11:23Z

I think the tests almost pass! It's just a few tests in test_abstract_operations.jl that don't: https://buildkite.com/clima/oceananigans/builds/670#3ec51acd-7496-449e-afb6-ace178715cf3/14-394

Maybe they're just missing a using Oceananigans: fields or maybe fields needs to be exported from the Oceananigans and/or Models modules?

francispoulin · 2020-11-25T18:42:37Z

Thanks @ali-ramadhan for finding the problem and fixing it.

I will pull the updated code and try the test that you pointed out. Not sure I will be able to figure it out or not though but I will try.

…MA/Oceananigans.jl into fjp/generalize-runge-kutta-3

francispoulin

I think this is a pretty straightforward change. One benefits is that we don't have to double up the time stepping for tracers, as they are done at the same time as the other fields, and it does allow time stepping for virtually any model.

I am happy that the tests all seem to pass and no more warnings, thanks to @ali-ramadhan

ali-ramadhan

Looks great! Less code, more features too.

I think there are some conflicts with the master branch so you might have to git merge master into this branch and resolve the conflicts locally and push the changes before the PR can be merged.

Sometimes the conflicts are pretty minor in which case you could resolve them directly on GitHub. But in this case GitHub says they're too complex.

francispoulin · 2020-11-26T02:15:48Z

Looks great! Less code, more features too.

I think there are some conflicts with the master branch so you might have to git merge master into this branch and resolve the conflicts locally and push the changes before the PR can be merged.

Sometimes the conflicts are pretty minor in which case you could resolve them directly on GitHub. But in this case GitHub says they're too complex.

Sounds good. Luckily those are files that I created. I was planning on fixing them up tomorrow anyhow, now theres just added incentive.

glwagner · 2020-11-26T15:46:05Z

Benchmark? The reason we combined the updates for velocities was a perceived performance gain. Probably we were wrong about that, but it'd be good to show it.

francispoulin · 2020-11-26T15:49:06Z

Benchmark? The reason we combined the updates for velocities was a perceived performance gain. Probably we were wrong about that, but it'd be good to show it.

Do I understand that to mean that you want to test the performance of the code before and after the PR? If there are tests that I can run to do this with both versions, I would be happy to try that on my desktop, only CPU, but that probably wouldn't be as nice as trying it on a better computer.

ali-ramadhan · 2020-11-26T17:04:19Z

So there's a script (https://github.com/CliMA/Oceananigans.jl/blob/master/benchmark/benchmark_regression.jl) that benchmarks the current branch against the master branch but it doesn't currently print the results so I'll fix it and run it on this branch on Tartarus.

glwagner · 2020-11-26T17:20:32Z

Do I understand that to mean that you want to test the performance of the code before and after the PR? If there are tests that I can run to do this with both versions, I would be happy to try that on my desktop, only CPU, but that probably wouldn't be as nice as trying it on a better computer.

We would like to test the performance of the version of Oceananigans on this PR versus Oceananigans#master on the CPU and GPU and for a variety of problem sizes. Maintaining good performance is a top priority of ours. Generally speaking we would like to avoid performance regressions --- even small ones (which accumulated over many PRs could become significant).

…tore_tendencies.jl.

…mmented out.

…ext need to look at benchmarks.

francispoulin · 2020-11-26T21:08:25Z

New branch fjp/generalize-runge-kutta-3:

I did 2 trials to try and get an idea of the variance we can expect. Sorry if this is too much information.

Trial 1:

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   3.770 ms │   3.925 ms │   3.975 ms │   4.535 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │  64 │  24.751 ms │  24.945 ms │  25.124 ms │  26.909 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 128 │ 218.012 ms │ 218.721 ms │ 219.037 ms │ 220.987 ms │ 247.69 KiB │   1916 │
│           CPU │     Float64 │  32 │   4.253 ms │   4.437 ms │   4.509 ms │   5.229 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │  64 │  29.137 ms │  29.446 ms │  29.689 ms │  31.794 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 128 │ 257.251 ms │ 258.619 ms │ 259.852 ms │ 270.451 ms │ 299.80 KiB │   1916 │
│           GPU │     Float32 │  32 │   2.489 ms │   2.591 ms │   2.755 ms │   3.150 ms │ 814.41 KiB │  11740 │
│           GPU │     Float32 │  64 │  10.374 ms │  13.950 ms │  13.590 ms │  14.010 ms │ 814.38 KiB │  11746 │
│           GPU │     Float32 │ 128 │  88.020 ms │ 125.190 ms │ 122.408 ms │ 133.906 ms │ 814.38 KiB │  11746 │
│           GPU │     Float64 │  32 │   5.323 ms │   5.438 ms │   5.431 ms │   5.573 ms │ 892.33 KiB │  11574 │
│           GPU │     Float64 │  64 │  34.741 ms │  43.748 ms │  42.586 ms │  44.978 ms │ 892.30 KiB │  11580 │
│           GPU │     Float64 │ 128 │ 279.110 ms │ 333.392 ms │ 328.209 ms │ 335.085 ms │ 892.30 KiB │  11580 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
[2020/11/26 16:03:50.829] INFO  Writing Incompressible_model_benchmarks.html...
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬──────────┬─────────┬─────────┐
│ Float_types │  Ns │  speedup │  memory │  allocs │
├─────────────┼─────┼──────────┼─────────┼─────────┤
│     Float32 │  32 │  1.51499 │ 3.28804 │ 6.12735 │
│     Float32 │  64 │  1.78816 │ 3.28791 │ 6.13048 │
│     Float32 │ 128 │   1.7471 │ 3.28791 │ 6.13048 │
│     Float64 │  32 │  0.81598 │ 2.97644 │ 6.04071 │
│     Float64 │  64 │ 0.673082 │ 2.97634 │ 6.04384 │
│     Float64 │ 128 │ 0.775721 │ 2.97634 │ 6.04384 │
└─────────────┴─────┴──────────┴─────────┴─────────┘

Trial 2:

                                       Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   3.658 ms │   3.795 ms │   3.878 ms │   4.654 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │  64 │  25.066 ms │  25.346 ms │  25.454 ms │  26.454 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 128 │ 212.990 ms │ 213.482 ms │ 215.127 ms │ 224.655 ms │ 247.69 KiB │   1916 │
│           CPU │     Float64 │  32 │   4.111 ms │   4.206 ms │   4.303 ms │   5.146 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │  64 │  31.135 ms │  32.233 ms │  32.367 ms │  33.781 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 128 │ 252.694 ms │ 262.043 ms │ 263.575 ms │ 275.524 ms │ 299.80 KiB │   1916 │
│           GPU │     Float32 │  32 │   2.458 ms │   2.506 ms │   2.572 ms │   3.091 ms │ 814.19 KiB │  11726 │
│           GPU │     Float32 │  64 │  12.548 ms │  15.949 ms │  15.599 ms │  16.169 ms │ 814.38 KiB │  11746 │
│           GPU │     Float32 │ 128 │  79.348 ms │ 117.337 ms │ 114.294 ms │ 121.564 ms │ 814.38 KiB │  11746 │
│           GPU │     Float64 │  32 │   5.351 ms │   5.401 ms │   5.461 ms │   5.947 ms │ 893.42 KiB │  11644 │
│           GPU │     Float64 │  64 │  37.831 ms │  38.582 ms │  38.586 ms │  39.330 ms │ 892.30 KiB │  11580 │
│           GPU │     Float64 │ 128 │ 265.749 ms │ 335.525 ms │ 326.271 ms │ 353.112 ms │ 892.30 KiB │  11580 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
[2020/11/26 18:37:36.115] INFO  Writing Incompressible_model_benchmarks.html...
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬──────────┬─────────┬─────────┐
│ Float_types │  Ns │  speedup │  memory │  allocs │
├─────────────┼─────┼──────────┼─────────┼─────────┤
│     Float32 │  32 │  1.51403 │ 3.28716 │ 6.12004 │
│     Float32 │  64 │  1.58924 │ 3.28791 │ 6.13048 │
│     Float32 │ 128 │  1.81939 │ 3.28791 │ 6.13048 │
│     Float64 │  32 │ 0.778644 │ 2.98009 │ 6.07724 │
│     Float64 │  64 │ 0.835425 │ 2.97634 │ 6.04384 │
│     Float64 │ 128 │ 0.780995 │ 2.97634 │ 6.04384 │
└─────────────┴─────┴──────────┴─────────┴─────────┘

Old branch master:

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   3.731 ms │   4.014 ms │   4.048 ms │   4.752 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │  64 │  25.071 ms │  25.897 ms │  26.004 ms │  27.032 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │ 128 │ 214.549 ms │ 216.681 ms │ 218.408 ms │ 227.438 ms │ 242.42 KiB │   1876 │
│           CPU │     Float64 │  32 │   4.230 ms │   4.334 ms │   4.430 ms │   5.244 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │  64 │  28.847 ms │  29.348 ms │  29.573 ms │  30.704 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │ 128 │ 254.216 ms │ 254.715 ms │ 255.230 ms │ 260.031 ms │ 293.44 KiB │   1876 │
│           GPU │     Float32 │  32 │   2.474 ms │   2.625 ms │   2.764 ms │   3.510 ms │ 802.67 KiB │  11417 │
│           GPU │     Float32 │  64 │  10.381 ms │  13.617 ms │  13.292 ms │  13.719 ms │ 802.48 KiB │  11413 │
│           GPU │     Float32 │ 128 │  76.589 ms │ 114.593 ms │ 113.372 ms │ 132.651 ms │ 802.48 KiB │  11413 │
│           GPU │     Float64 │  32 │   5.366 ms │   5.420 ms │   5.439 ms │   5.610 ms │ 877.02 KiB │  11251 │
│           GPU │     Float64 │  64 │  33.735 ms │  38.491 ms │  38.027 ms │  38.614 ms │ 876.83 KiB │  11247 │
│           GPU │     Float64 │ 128 │ 293.481 ms │ 316.512 ms │ 316.715 ms │ 343.279 ms │ 876.83 KiB │  11247 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘
[2020/11/26 16:09:31.583] INFO  Writing Incompressible_model_benchmarks.html...
      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬──────────┬─────────┬─────────┐
│ Float_types │  Ns │  speedup │  memory │  allocs │
├─────────────┼─────┼──────────┼─────────┼─────────┤
│     Float32 │  32 │  1.52907 │ 3.31105 │ 6.08582 │
│     Float32 │  64 │  1.90176 │ 3.31028 │ 6.08369 │
│     Float32 │ 128 │  1.89087 │ 3.31028 │ 6.08369 │
│     Float64 │  32 │ 0.799624 │ 2.98876 │ 5.99733 │
│     Float64 │  64 │  0.76246 │ 2.98813 │  5.9952 │
│     Float64 │ 128 │ 0.804754 │ 2.98813 │  5.9952 │
└─────────────┴─────┴──────────┴─────────┴─────────┘

For this one single test (clearly more needs to be done) it seems that on average the speedup is slightly lower and the memory is also slightly lower, compared to master

francispoulin · 2020-11-26T22:52:24Z

New branch fjp/generalize-runge-kutta-3:

                              Multithreading benchmarks
┌──────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────────┬─────────┐
│ size │ threads │      min │   median │     mean │      max │     memory │  allocs │
├──────┼─────────┼──────────┼──────────┼──────────┼──────────┼────────────┼─────────┤
│  512 │       1 │ 23.857 s │ 23.857 s │ 23.857 s │ 23.857 s │ 300.64 KiB │    1970 │
│  512 │       2 │ 19.049 s │ 19.049 s │ 19.049 s │ 19.049 s │ 127.11 MiB │ 8291846 │
│  512 │       4 │  9.636 s │  9.636 s │  9.636 s │  9.636 s │  59.15 MiB │ 3839470 │
│  512 │       8 │  5.231 s │  5.231 s │  5.231 s │  5.231 s │  25.81 MiB │ 1644097 │
│  512 │      16 │  3.759 s │  3.786 s │  3.786 s │  3.814 s │ 119.15 MiB │ 1939597 │
│  512 │      32 │  3.517 s │  3.521 s │  3.521 s │  3.524 s │  11.72 MiB │  648129 │
│  512 │      36 │  3.532 s │  3.533 s │  3.533 s │  3.534 s │  11.09 MiB │  566273 │
└──────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────────┴─────────┘
[2020/11/26 18:22:08.583] INFO  Writing Multithreading_benchmarks.html...
             Multithreading speedup
┌──────┬─────────┬─────────┬─────────┬─────────┐
│ size │ threads │ speedup │  memory │  allocs │
├──────┼─────────┼─────────┼─────────┼─────────┤
│  512 │       1 │     1.0 │     1.0 │     1.0 │
│  512 │       2 │  1.2524 │ 432.949 │ 4209.06 │
│  512 │       4 │ 2.47574 │ 201.484 │ 1948.97 │
│  512 │       8 │ 4.56069 │  87.897 │ 834.567 │
│  512 │      16 │ 6.30058 │ 405.837 │ 984.567 │
│  512 │      32 │ 6.77607 │ 39.9205 │ 328.999 │
│  512 │      36 │ 6.75252 │ 37.7587 │ 287.448 │
└──────┴─────────┴─────────┴─────────┴─────────┘

Old branch master:

                             Multithreading benchmarks
┌──────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────────┬──────────┐
│ size │ threads │      min │   median │     mean │      max │     memory │   allocs │
├──────┼─────────┼──────────┼──────────┼──────────┼──────────┼────────────┼──────────┤
│  512 │       1 │ 24.654 s │ 24.654 s │ 24.654 s │ 24.654 s │ 294.28 KiB │     1930 │
│  512 │       2 │ 21.380 s │ 21.380 s │ 21.380 s │ 21.380 s │ 172.25 MiB │ 11250274 │
│  512 │       4 │  9.585 s │  9.585 s │  9.585 s │  9.585 s │  63.01 MiB │  4093201 │
│  512 │       8 │  5.417 s │  5.417 s │  5.417 s │  5.417 s │  34.91 MiB │  2242285 │
│  512 │      16 │  3.989 s │  3.991 s │  3.991 s │  3.993 s │ 123.02 MiB │  2196707 │
│  512 │      32 │  3.655 s │  3.676 s │  3.676 s │  3.698 s │  11.86 MiB │   663272 │
│  512 │      36 │  3.783 s │  3.794 s │  3.794 s │  3.804 s │ 115.48 MiB │  1646037 │
└──────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────────┴──────────┘
[2020/11/26 16:58:13.592] INFO  Writing Multithreading_benchmarks.html...
             Multithreading speedup
┌──────┬─────────┬─────────┬─────────┬─────────┐
│ size │ threads │ speedup │  memory │  allocs │
├──────┼─────────┼─────────┼─────────┼─────────┤
│  512 │       1 │     1.0 │     1.0 │     1.0 │
│  512 │       2 │ 1.15317 │ 599.384 │ 5829.16 │
│  512 │       4 │ 2.57214 │ 219.267 │ 2120.83 │
│  512 │       8 │ 4.55103 │ 121.466 │ 1161.81 │
│  512 │      16 │ 6.17726 │ 428.071 │ 1138.19 │
│  512 │      32 │ 6.70601 │ 41.2683 │ 343.664 │
│  512 │      36 │ 6.49899 │ 401.838 │ 852.869 │
└──────┴─────────┴─────────┴─────────┴─────────┘

Seems comparable to me but two observations:

New branch typically uses less memory
Mean of the new branch tends to be faster.

ali-ramadhan · 2020-11-26T23:10:20Z

Thanks for running the benchmarks!

benchmark_incompressible_model.jl only times 10 time steps so the statistics probably aren't super robust but it does seem that the incompressible model has slowed down a bit in all cases...

…atrices.

ali-ramadhan · 2020-12-01T17:12:19Z

I ran the benchmark_incompressible_model.jl script on the master branch (twice) and this branch (also twice), and
actually see a tiny bit of a speedup, maybe only significant for larger CPU models though.

Hard to say whether it's noise, it might be more due to other processes causing small variations in runtime.

To me I don't think this PR slows down or speeds up the code, but it simplifies and improves the time stepping code so it should be merged.

There's a few more memory allocations now (due to extra kernel launches) but this shouldn't affect performance.

System info

Oceananigans v0.44.1
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V

Master branch

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   5.399 ms │   5.668 ms │   5.758 ms │   7.186 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │  64 │  36.710 ms │  37.583 ms │  37.974 ms │  41.678 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │ 128 │ 312.780 ms │ 313.477 ms │ 313.622 ms │ 314.726 ms │ 242.42 KiB │   1876 │
│           CPU │     Float32 │ 256 │    2.802 s │    2.819 s │    2.819 s │    2.836 s │ 242.42 KiB │   1876 │
│           CPU │     Float64 │  32 │   5.828 ms │   6.049 ms │   6.157 ms │   7.044 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │  64 │  43.084 ms │  43.619 ms │  43.650 ms │  44.363 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │ 128 │ 365.051 ms │ 365.317 ms │ 365.475 ms │ 366.288 ms │ 293.44 KiB │   1876 │
│           CPU │     Float64 │ 256 │    3.602 s │    3.653 s │    3.653 s │    3.703 s │ 293.44 KiB │   1876 │
│           GPU │     Float32 │  32 │   2.797 ms │   2.870 ms │   2.918 ms │   3.435 ms │ 802.70 KiB │  11419 │
│           GPU │     Float32 │  64 │   3.120 ms │   3.207 ms │   3.300 ms │   4.224 ms │ 802.52 KiB │  11415 │
│           GPU │     Float32 │ 128 │   4.019 ms │   4.066 ms │   4.192 ms │   5.244 ms │ 802.52 KiB │  11415 │
│           GPU │     Float32 │ 256 │  15.942 ms │  23.497 ms │  22.763 ms │  23.588 ms │ 802.48 KiB │  11413 │
│           GPU │     Float64 │  32 │   3.079 ms │   3.166 ms │   3.226 ms │   3.728 ms │ 877.05 KiB │  11253 │
│           GPU │     Float64 │  64 │   3.458 ms │   3.522 ms │   3.591 ms │   3.981 ms │ 876.86 KiB │  11249 │
│           GPU │     Float64 │ 128 │   4.536 ms │   4.572 ms │   4.723 ms │   6.000 ms │ 876.58 KiB │  11231 │
│           GPU │     Float64 │ 256 │  21.794 ms │  32.107 ms │  31.073 ms │  32.198 ms │ 876.83 KiB │  11247 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘

      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬─────────┬─────────┬─────────┐
│ Float_types │  Ns │ speedup │  memory │  allocs │
├─────────────┼─────┼─────────┼─────────┼─────────┤
│     Float32 │  32 │  1.9749 │ 3.31118 │ 6.08689 │
│     Float32 │  64 │   11.72 │ 3.31041 │ 6.08475 │
│     Float32 │ 128 │ 77.0954 │ 3.31041 │ 6.08475 │
│     Float32 │ 256 │  119.98 │ 3.31028 │ 6.08369 │
│     Float64 │  32 │ 1.91043 │ 2.98887 │  5.9984 │
│     Float64 │  64 │ 12.3861 │ 2.98823 │ 5.99627 │
│     Float64 │ 128 │ 79.9049 │ 2.98727 │ 5.98667 │
│     Float64 │ 256 │ 113.772 │ 2.98813 │  5.9952 │
└─────────────┴─────┴─────────┴─────────┴─────────┘

This branch

                                        Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │     memory │ allocs │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────┤
│           CPU │     Float32 │  32 │   5.444 ms │   5.636 ms │   6.018 ms │   9.357 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │  64 │  36.689 ms │  37.147 ms │  37.348 ms │  38.446 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 128 │ 314.926 ms │ 316.673 ms │ 318.545 ms │ 338.621 ms │ 247.69 KiB │   1916 │
│           CPU │     Float32 │ 256 │    2.778 s │    2.781 s │    2.781 s │    2.783 s │ 247.69 KiB │   1916 │
│           CPU │     Float64 │  32 │   5.735 ms │   6.063 ms │   6.136 ms │   7.018 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │  64 │  43.243 ms │  43.446 ms │  43.607 ms │  44.871 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 128 │ 366.596 ms │ 367.479 ms │ 367.682 ms │ 369.125 ms │ 299.80 KiB │   1916 │
│           CPU │     Float64 │ 256 │    3.281 s │    3.331 s │    3.331 s │    3.381 s │ 299.80 KiB │   1916 │
│           GPU │     Float32 │  32 │   2.888 ms │   2.939 ms │   2.994 ms │   3.485 ms │ 814.47 KiB │  11744 │
│           GPU │     Float32 │  64 │   3.148 ms │   3.224 ms │   3.293 ms │   3.913 ms │ 814.28 KiB │  11740 │
│           GPU │     Float32 │ 128 │   4.002 ms │   4.089 ms │   4.210 ms │   5.287 ms │ 814.38 KiB │  11746 │
│           GPU │     Float32 │ 256 │  16.015 ms │  23.712 ms │  22.928 ms │  23.994 ms │ 814.38 KiB │  11746 │
│           GPU │     Float64 │  32 │   3.159 ms │   3.190 ms │   3.249 ms │   3.757 ms │ 892.39 KiB │  11578 │
│           GPU │     Float64 │  64 │   3.472 ms │   3.534 ms │   3.640 ms │   4.632 ms │ 892.20 KiB │  11574 │
│           GPU │     Float64 │ 128 │   4.479 ms │   4.537 ms │   4.700 ms │   6.206 ms │ 891.98 KiB │  11560 │
│           GPU │     Float64 │ 256 │  21.481 ms │  31.610 ms │  30.599 ms │  31.686 ms │ 892.30 KiB │  11580 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────┘

      Incompressible model CPU -> GPU speedup
┌─────────────┬─────┬─────────┬─────────┬─────────┐
│ Float_types │  Ns │ speedup │  memory │  allocs │
├─────────────┼─────┼─────────┼─────────┼─────────┤
│     Float32 │  32 │ 1.91728 │ 3.28829 │ 6.12944 │
│     Float32 │  64 │ 11.5212 │ 3.28753 │ 6.12735 │
│     Float32 │ 128 │ 77.4361 │ 3.28791 │ 6.13048 │
│     Float32 │ 256 │ 117.271 │ 3.28791 │ 6.13048 │
│     Float64 │  32 │ 1.90073 │ 2.97665 │  6.0428 │
│     Float64 │  64 │ 12.2925 │ 2.97603 │ 6.04071 │
│     Float64 │ 128 │    81.0 │  2.9753 │  6.0334 │
│     Float64 │ 256 │ 105.386 │ 2.97634 │ 6.04384 │
└─────────────┴─────┴─────────┴─────────┴─────────┘

glwagner · 2020-12-01T17:25:13Z

I think a slow down for small models, speed up for large models makes sense given that this PR splits one relatively large kernel into three smaller ones (two times). Seems like an acceptable trade off to me (and also nearly unnoticeable). Why are the validation experiments failing?

ali-ramadhan · 2020-12-01T17:27:22Z

Ah we can ignore the validation experiments pipeline failure.

It's failing because .buidlkite/validation-pipeline.yml is not on this branch. I've since disabled GitHub triggers for the validation pipeline. Now it's triggered every night at 3 am EST and can be triggered manually but needs more work at PR #1223.

francispoulin · 2020-12-01T18:04:41Z

@ali-ramadhan Any idea why there are two failures above? Should we fix these before we merge? If it's a bit tricky I'm happy to leave the merging upto you, as you are co-author on this PR. Happy to chat if that would help.

ali-ramadhan · 2020-12-01T19:46:03Z

@francispoulin Ah those two failures are unrelated to this PR (it's the validation experiments pipeline failure I mentioned above) so this should be good to merge if you're happy with the PR.

francispoulin · 2020-12-01T19:48:31Z

Thank you @ali-ramadhan ! That is what I thought but wanted to make sure.

I will now go and press the big red button, at long last.

francispoulin added 5 commits November 24, 2020 16:48

My beginnings of generalizin runge-kutta-3 but having difficulties wi…

571cd76

…th Models and or fields not found.

Fixing the problems with fields and has comments fo what I think we s…

6604ad8

…hold be doing.

Trying new method that does not work.

2a808c1

Generalized time stepping seems to work. Does fields and tracers at t…

e9ce226

…he same time. Will do tests next.

Removing some unwanted files.

6f51d63

Get rid of method conflict warnings

a5aa552

francispoulin changed the title ~~Fjp/generalize runge kutta 3~~ Generalize runge kutta 3 for any model Nov 25, 2020

francispoulin and others added 4 commits November 25, 2020 14:06

First attempt at modifying qab2 and not quite working yet.

f9ad887

Export the fields method

f665903

A working version of quasiAdamsBashforth2 in the new formalism.

fba293a

Merge branch 'fjp/generalize-runge-kutta-3' of https://github.com/Cli…

87be266

…MA/Oceananigans.jl into fjp/generalize-runge-kutta-3

francispoulin changed the title ~~Generalize runge kutta 3 for any model~~ Generalize unge kutta 3 and QuasiAdamsBashforth2 for any model Nov 25, 2020

francispoulin changed the title ~~Generalize unge kutta 3 and QuasiAdamsBashforth2 for any model~~ Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model Nov 25, 2020

francispoulin commented Nov 25, 2020

View reviewed changes

ali-ramadhan approved these changes Nov 26, 2020

View reviewed changes

francispoulin added 3 commits November 26, 2020 14:58

Trying to sort out conflict with new branch and master. Need to fix s…

476d6d7

…tore_tendencies.jl.

Works but there is a wait in store_tendencies.jl that needed to be co…

581272b

…mmented out.

After changing store_tendencies.jl the time-stepping seems to work. N…

eec2a5d

…ext need to look at benchmarks.

ali-ramadhan mentioned this pull request Nov 26, 2020

Running Oceananigans with 2 threads allocates the most memory #1218

Closed

Slight modification on rk3 and qab2 to avoid unnecessarily defining m…

a99d531

…atrices.

glwagner approved these changes Dec 1, 2020

View reviewed changes

francispoulin merged commit 1b2efe5 into master Dec 1, 2020

francispoulin deleted the fjp/generalize-runge-kutta-3 branch December 1, 2020 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model #1210

Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model #1210

francispoulin commented Nov 25, 2020

francispoulin commented Nov 25, 2020

ali-ramadhan commented Nov 25, 2020

ali-ramadhan commented Nov 25, 2020

francispoulin commented Nov 25, 2020

francispoulin left a comment

ali-ramadhan left a comment

francispoulin commented Nov 26, 2020

glwagner commented Nov 26, 2020

francispoulin commented Nov 26, 2020

ali-ramadhan commented Nov 26, 2020

glwagner commented Nov 26, 2020

francispoulin commented Nov 26, 2020 •

edited

Loading

francispoulin commented Nov 26, 2020 •

edited

Loading

ali-ramadhan commented Nov 26, 2020

ali-ramadhan commented Dec 1, 2020

glwagner commented Dec 1, 2020

ali-ramadhan commented Dec 1, 2020 •

edited

Loading

francispoulin commented Dec 1, 2020

ali-ramadhan commented Dec 1, 2020

francispoulin commented Dec 1, 2020

Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model #1210

Generalize RungeKutta3 and QuasiAdamsBashforth2 for any model #1210

Conversation

francispoulin commented Nov 25, 2020

francispoulin commented Nov 25, 2020

ali-ramadhan commented Nov 25, 2020

ali-ramadhan commented Nov 25, 2020

francispoulin commented Nov 25, 2020

francispoulin left a comment

Choose a reason for hiding this comment

ali-ramadhan left a comment

Choose a reason for hiding this comment

francispoulin commented Nov 26, 2020

glwagner commented Nov 26, 2020

francispoulin commented Nov 26, 2020

ali-ramadhan commented Nov 26, 2020

glwagner commented Nov 26, 2020

francispoulin commented Nov 26, 2020 • edited Loading

francispoulin commented Nov 26, 2020 • edited Loading

ali-ramadhan commented Nov 26, 2020

ali-ramadhan commented Dec 1, 2020

System info

Master branch

This branch

glwagner commented Dec 1, 2020

ali-ramadhan commented Dec 1, 2020 • edited Loading

francispoulin commented Dec 1, 2020

ali-ramadhan commented Dec 1, 2020

francispoulin commented Dec 1, 2020

francispoulin commented Nov 26, 2020 •

edited

Loading

francispoulin commented Nov 26, 2020 •

edited

Loading

ali-ramadhan commented Dec 1, 2020 •

edited

Loading