Fixes writing of 3d-reduced `Field`s to NetCDF #2865

tomchor · 2023-01-14T02:18:20Z

This fixes the error we were getting when writing Fields reduced over 3 dimensions to disk with NetCDFOutputWriter according to the upstream provided in a PR at NCDatasets: Alexander-Barth/NCDatasets.jl#197.

This PR also adds a test to catch this in the future.

For now this is only working on the master branch of NCDatasets so tests should fail for now, but once a new version of NCDatasets is released I'll update the packages.

Closes #2857

glwagner · 2023-01-14T04:04:00Z

Can you use a descriptive link to the PR / issue in NCDatasets (rather than just "here")?

glwagner · 2023-01-14T04:04:22Z

The key issue is Alexander-Barth/NCDatasets.jl#197

tomchor · 2023-01-31T15:46:01Z

We're getting errors when running the tests on buildkite that I'm not getting when running on a GPU locally.

For example this:

Field boundary conditions [GPU, RectilinearGrid]: Test Failed at /net/ocean/home/data44/data5/glwagner/.buildkite-agent/builds/sverdrup-7/clima/oceananigans/test/test_computed_field.jl:475
--
  | Expression: #= /net/ocean/home/data44/data5/glwagner/.buildkite-agent/builds/sverdrup-7/clima/oceananigans/test/test_computed_field.jl:475 =# CUDA.@allowscalar all(ST.data[1:Nx, 1:Ny, 0] .== ST.data[1:Nx, 1:Ny, 1])

The above line works for me on GPU when doing include("test_computed_field.jl"), but for some reason fails on buildkite.

@glwagner @simone-silvestri any ideas as to why? The fact that I can't reproduce these errors locally is making it hard for me to solve them

glwagner · 2023-02-01T01:24:45Z

test/test_computed_field.jl

-    @. u.data = 1 + rand()
-    @. v.data = 2 + rand()
-    @. w.data = 3 + rand()
+    CUDA.@allowscalar begin


Is there any way to do this without @allowscalar? We should be reducing where allowscalar appears in tests, not adding new tests with this.

glwagner · 2023-02-01T01:25:27Z

test/test_computed_field.jl

@@ -301,7 +303,7 @@ function computations_with_averaged_field_derivative(model)

    set!(model, T = (x, y, z) -> 3 * z)

-    return all(interior(shear)[2:3, 2:3, 2:3] .== interior(T)[2:3, 2:3, 2:3])
+    return CUDA.@allowscalar all(interior(shear)[2:3, 2:3, 2:3] .== interior(T)[2:3, 2:3, 2:3])


Same as above. We should minimize usage of @allowscalar in tests. This is one of the largest sources of technical debt in our tests and has incurred a lot of pain in the past

glwagner · 2023-02-01T01:25:39Z

test/test_computed_field.jl

@@ -320,7 +322,7 @@ function computations_with_computed_fields(model)
    tke = Field(tke_op)
    compute!(tke)

-    return all(interior(tke)[2:3, 2:3, 2:3] .== 9/2)
+    return CUDA.@allowscalar all(interior(tke)[2:3, 2:3, 2:3] .== 9/2)


Same as above

glwagner · 2023-02-01T01:27:57Z

There are a lot of new instances of @allowscalar, but rather than adding new instances we should be refactoring the tests so they don't appear.

When we find that we have to use @allowscalar, it often indicates that our Field infrastructure is somehow deficient / doesn't support necessary operations, which causes us to resort to indexing and other syntax that requires @allowscalar.

@navidcy @simone-silvestri

tomchor · 2023-02-01T02:01:47Z

There are a lot of new instances of @allowscalar, but rather than adding new instances we should be refactoring the tests so they don't appear.

I added these because it was the only way to make tests pass locally. However, I can't fully reproduce tests results locally anyway, like I mentioned in my previous comment, so these may well be unnecessary (since these lines might be passing on buildkite).

glwagner · 2023-02-01T02:12:05Z

test/test_computed_field.jl

@@ -320,7 +322,7 @@ function computations_with_computed_fields(model)
    tke = Field(tke_op)
    compute!(tke)

-    return all(interior(tke)[2:3, 2:3, 2:3] .== 9/2)
+    return CUDA.@allowscalar all(interior(tke)[2:3, 2:3, 2:3] .== 9/2)


Suggested change

return CUDA.@allowscalar all(interior(tke)[2:3, 2:3, 2:3] .== 9/2)

return all(interior(tke, 2:3, 2:3, 2:3) .== 9/2)

glwagner · 2023-02-01T02:12:30Z

I suggested a syntax change that could help

tomchor · 2023-02-01T04:04:15Z

I suggested a syntax change that could help

You're right it does help. I'll try replacing these one by one and see if that helps with the error on buildkite. Although if would be useful to figure out why I'm not getting the same errors locally.

glwagner · 2023-02-01T14:02:59Z

Awesome!

tomchor · 2023-02-02T17:46:08Z

Apparently the new syntax does help avoid @allowscalar instances, and things do compile locally for me, but the errors on buildkite are still there:

Computations with Averaged Fields [GPU, RectilinearGrid]: Test Failed at /net/ocean/home/data44/data5/glwagner/.buildkite-agent/builds/sverdrup-13/clima/oceananigans/test/test_computed_field.jl:583
--
  | Expression: all(interior(tke_yz) .== 9 / 2)

Any ideas on what might be the cause of the differences between builkite and my local server? If someone could also run one of the failing tests on a GPU locally and see if they get the same errors that buildkite is throwing, that would be helpful.

tomchor · 2023-02-05T15:41:58Z

Finally got the tests passing! It was something having to do with GPUCompiler.jl.

This is ready to merge/review.

tomchor · 2023-02-08T01:44:15Z

@navidcy @simone-silvestri @glwagner with #2899 being merged, this bugfix PR is pretty trivial. Can I get a review whenever any of you have the time?

glwagner · 2023-02-08T04:29:31Z

test/test_netcdf_output_writer.jl

+    set!(model, c=1)
+
+    Δt = 1/64 # Nice floating-point number
+    simulation = Simulation(model, Δt=Δt, stop_time=50Δt)


Can we do fewer steps? Our tests currently stretch the limits of our resources so we need to be as parsimonious as possible when adding new tests. Note also that compilation cost is the main thing. If this test can be combined with another test, that'd be ideal. For example, many different NetCDF tests could use the same simulation --- there's no need to run independent simulations?

Also suggest using stop_iteration

Sorry @glwagner I ended up merging before you commented. I copied the test template for others in the same file so there are more tests that we apply this change to. Would you like me to open another PR for this?

Right, I'm pointing out that we don't want to copy/paste test code now without care, since a lot of our test code is poorly written / wasteful and our CI is straining under the pressure... :-/

Any PRs that reduce test cost will be greatly appreciated! I can't tell if all the changes will make a big difference, you are better placed to analyze that.

After running these a few time for this PR, I'd say that this one won't make much of a difference. But I have identified several tests that instantiate its own model each that could be merged together. That, I think, will have a more significant impact. I'll open a PR about it once some of my other PRs are merged!

sounds good

fix issue and add test

2f3d345

tomchor marked this pull request as draft January 14, 2023 02:19

navidcy added the output 💾 label Jan 14, 2023

fix

24b23d3

tomchor marked this pull request as ready for review January 14, 2023 23:24

tomchor requested review from simone-silvestri, glwagner and navidcy January 14, 2023 23:25

tomchor mentioned this pull request Jan 14, 2023

Error when writing Array{Float64, 0} Alexander-Barth/NCDatasets.jl#197

Closed

tomchor added 5 commits January 19, 2023 13:12

Merge branch 'main' into tc/reduced-field-netcdf

ab9e600

update packages

2a09adf

update only ncdatasets

dd87d58

Merge branch 'main' into tc/reduced-field-netcdf

f203fec

fixes for test_computed_fields and test_time_stepping

aeb2501

glwagner reviewed Feb 1, 2023

View reviewed changes

removed some allowscalars

92d6721

tomchor mentioned this pull request Feb 4, 2023

Fixes writing of 3d-reduced Fields to NetCDF while pinning GPUArrays #2898

Closed

pin GPUCompiler to 0.16.4

b8b26d2

tomchor added 2 commits February 4, 2023 17:53

remove some allowscalar instances

702cf35

bump patch version

31de62c

tomchor requested a review from glwagner February 5, 2023 15:40

tomchor mentioned this pull request Feb 5, 2023

Upgrades KernelAbstractions with pinned GPUCompiler #2900

Closed

tomchor added 2 commits February 7, 2023 09:58

Merge branch 'main' into tc/reduced-field-netcdf

b5c2823

Update Project.toml

77672d1

simone-silvestri approved these changes Feb 8, 2023

View reviewed changes

tomchor merged commit 4940d29 into main Feb 8, 2023

tomchor deleted the tc/reduced-field-netcdf branch February 8, 2023 02:14

glwagner reviewed Feb 8, 2023

View reviewed changes

tomchor mentioned this pull request Feb 10, 2023

Refactor of some stretched rectilinear grid tests #2917

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes writing of 3d-reduced `Field`s to NetCDF #2865

Fixes writing of 3d-reduced `Field`s to NetCDF #2865

tomchor commented Jan 14, 2023 •

edited

Loading

glwagner commented Jan 14, 2023 •

edited

Loading

glwagner commented Jan 14, 2023 •

edited

Loading

tomchor commented Jan 31, 2023

glwagner Feb 1, 2023

glwagner Feb 1, 2023

glwagner Feb 1, 2023

glwagner commented Feb 1, 2023

tomchor commented Feb 1, 2023

glwagner Feb 1, 2023

glwagner commented Feb 1, 2023

tomchor commented Feb 1, 2023

glwagner commented Feb 1, 2023

tomchor commented Feb 2, 2023

tomchor commented Feb 5, 2023

tomchor commented Feb 8, 2023

glwagner Feb 8, 2023 •

edited

Loading

glwagner Feb 8, 2023

tomchor Feb 8, 2023

glwagner Feb 8, 2023

glwagner Feb 8, 2023

tomchor Feb 9, 2023

glwagner Feb 9, 2023

	return CUDA.@allowscalar all(interior(tke)[2:3, 2:3, 2:3] .== 9/2)
	return all(interior(tke, 2:3, 2:3, 2:3) .== 9/2)

Fixes writing of 3d-reduced Fields to NetCDF #2865

Fixes writing of 3d-reduced Fields to NetCDF #2865

Conversation

tomchor commented Jan 14, 2023 • edited Loading

glwagner commented Jan 14, 2023 • edited Loading

glwagner commented Jan 14, 2023 • edited Loading

tomchor commented Jan 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glwagner commented Feb 1, 2023

tomchor commented Feb 1, 2023

Choose a reason for hiding this comment

glwagner commented Feb 1, 2023

tomchor commented Feb 1, 2023

glwagner commented Feb 1, 2023

tomchor commented Feb 2, 2023

tomchor commented Feb 5, 2023

tomchor commented Feb 8, 2023

glwagner Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fixes writing of 3d-reduced `Field`s to NetCDF #2865

Fixes writing of 3d-reduced `Field`s to NetCDF #2865

tomchor commented Jan 14, 2023 •

edited

Loading

glwagner commented Jan 14, 2023 •

edited

Loading

glwagner commented Jan 14, 2023 •

edited

Loading

glwagner Feb 8, 2023 •

edited

Loading