Different results from identical tracers when using NetCDFOutputWriter #2931

tomchor · 2023-02-17T00:28:46Z

The example below creates a nonhydrostatic model with 6 identical tracers, calculates the vertical flux of those tracers in an identical way for all 6, and then writes it all to NetCDF. In the same file, it writes a vertical (x-z) slice of the fluxes using the indices flag, and an y-average of them:

using Oceananigans
using Oceananigans.Units

grid = RectilinearGrid(size=(16, 16, 16), extent = (500, 500, 120))

n_tracers = 6
tracer_symbols = [ Symbol(:τ,i) for i in 1:n_tracers ]
model = NonhydrostaticModel(; grid, tracers = (tracer_symbols...,))
@info model

uᵢ(x, y, z) = 1e-2 * randn()
set!(model, w=uᵢ)

tracer_IC_odd(x, y, z) = sin(2π*z/grid.Lz)
for i in 1:n_tracers
    @info "Setting tracer $i"
    expression = Meta.parse("set!(model, τ$i=tracer_IC_odd)")
    eval(expression)
end

simulation = Simulation(model, Δt=30, stop_time=0.15hours)

u, v, w = model.velocities

wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key,τ) in pairs(model.tracers))

outputs_full = (; wτ...)

outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full) )

outputs_xz1 = merge(outputs_full, outputs_yavg)
simulation.output_writers[:xz1_writer] = NetCDFOutputWriter(model, outputs_xz1;
                       filename = "data/TEST.nc",
                       schedule = TimeInterval(simulation.stop_time),
                       verbose=true,
                       indices = (:, 1, :),
                       overwrite_existing = true,
                       )
run!(simulation)

While the outputs should be the same (since the tracers and their advection are identical), I get different results for the y-averaged fluxes for different tracers:

Note that, while similar, tracers α=1,3,4 are different from α=2,5,6. The difference isn't large in this example, but can be made larger with more complexity in the calculations.

A couple of notes:

Changing the line wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key,τ) in pairs(model.tracers)) to

wτ = NamedTuple(Symbol(:w, key) => w*τ for (key,τ) in pairs(model.tracers))

gets rid of the issue. Although doing the above prevents a user from using Field(op, data=scratch_data.data) to save memory, which in some cases (my case for example) is very important.

Not passing the indices flag (i.e. writing 3D fields instead slices) apparently gets rid of this issue.

Here, even though the obvious easy solution is to separate averages from slices when writing, a user wouldn't know that since this fails silently and the wrong results can be pretty subtle (as the example above hopefully illustrates). For example, it popped up in one of my simulations and it took me a while to even realize what was happening, let alone figure out the solution.

My main question is: is this expected behavior? If so, should we somehow warn users (or even throw an error) do avoid mistakes?

The text was updated successfully, but these errors were encountered:

simone-silvestri · 2023-02-17T00:43:25Z

It seems that in one case you are calculating the y average an in the other you are outputting the field at (:, 1, :).
Is this true?

tomchor · 2023-02-17T00:58:33Z

It seems that in one case you are calculating the y average an in the other you are outputting the field at (:, 1, :). Is this true?

Yes

navidcy · 2023-02-19T03:00:09Z

Does this only appear for 6 tracers? What about 2?

navidcy · 2023-02-19T03:01:15Z

Also, if w and τ are fields why do you do Field(w*τ)?

navidcy · 2023-02-19T03:02:58Z

Also, if w and τ are fields why do you do Field(w*τ)?

OK, I see you explained that you want to use scratch_data...

navidcy · 2023-02-19T03:09:27Z

Can we reduce the MWE even more? Why not Nx, Ny, Nz = 4, 4, 4? And stop_time=30?

Also, can you post the code to plot?

navidcy · 2023-02-19T03:23:59Z

I run this:

using Oceananigans

grid = RectilinearGrid(size=(16, 16, 16), extent = (500, 500, 120))

n_tracers = 6
tracer_symbols = [ Symbol(:τ, i) for i in 1:n_tracers ]
model = NonhydrostaticModel(; grid, tracers = (tracer_symbols...,))
@info model

uᵢ(x, y, z) = 1e-2 * randn()
set!(model, w=uᵢ)

tracer_IC_odd(x, y, z) = sin(2π * z / grid.Lz)
for i in 1:n_tracers
    @info "Setting tracer $i"
    expression = Meta.parse("set!(model, τ$i=tracer_IC_odd)")
    eval(expression)
end

simulation = Simulation(model, Δt=30, stop_iteration=4)

u, v, w = model.velocities

wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key, τ) in pairs(model.tracers))

outputs_full = (; wτ...)

outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full))

outputs_xz1 = merge(outputs_full, outputs_yavg)
simulation.output_writers[:xz1_writer] = NetCDFOutputWriter(model, outputs_xz1;
                       filename = "test.nc",
                       schedule = TimeInterval(simulation.stop_time),
                       verbose=true,
                       indices = (:, 1, :),
                       overwrite_existing = true,
                       )
run!(simulation)

and got

[ Info: Initializing simulation...
[ Info: Writing to NetCDF: ./test.nc...
[ Info: Computing NetCDF outputs for time index 1: ["wτ3", "wτ2_yavg", "wτ6_yavg", "wτ1", "wτ5_yavg", "wτ6", "wτ2", "wτ5", "wτ4", "wτ1_yavg", "wτ4_yavg", "wτ3_yavg"]...
[ Info: Computing wτ3 done: time=439.823 ms
[ Info: Computing wτ2_yavg done: time=3.404 seconds
[ Info: Computing wτ6_yavg done: time=3.018 seconds
[ Info: Computing wτ1 done: time=225.326 ms
[ Info: Computing wτ5_yavg done: time=2.950 seconds
[ Info: Computing wτ6 done: time=292.708 μs
[ Info: Computing wτ2 done: time=192.674 ms
[ Info: Computing wτ5 done: time=190.263 ms
[ Info: Computing wτ4 done: time=193.185 ms
[ Info: Computing wτ1_yavg done: time=1.210 seconds
[ Info: Computing wτ4_yavg done: time=2.954 seconds
[ Info: Computing wτ3_yavg done: time=2.953 seconds
[ Info: Writing done: time=17.732 seconds, size=19.5 KiB, Δsize=0.0 B
[ Info:     ... simulation initialization complete (18.528 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (30.965 seconds).
[ Info: Simulation is stopping after running for 49.565 seconds.
[ Info: Model iteration 4 equals or exceeds stop iteration 4.

and then

julia> using NCDatasets

julia> ds = NCDataset(simulation.output_writers[:xz1_writer].filepath, "r")
NCDataset: ./test.nc
Group: /

Dimensions
   zC = 16
   zF = 17
   xC = 16
   yF = 1
   xF = 16
   yC = 1
   time = 1

Variables
  zC   (16)
    Datatype:    Float64
    Dimensions:  zC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the z-direction.

  zF   (17)
    Datatype:    Float64
    Dimensions:  zF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the z-direction.

  xC   (16)
    Datatype:    Float64
    Dimensions:  xC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the x-direction.

  yF   (1)
    Datatype:    Float64
    Dimensions:  yF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the y-direction.

  xF   (16)
    Datatype:    Float64
    Dimensions:  xF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the x-direction.

  yC   (1)
    Datatype:    Float64
    Dimensions:  yC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the y-direction.

  time   (1)
    Datatype:    Float64
    Dimensions:  time
    Attributes:
     units                = seconds
     longname             = Time

  wτ3   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ2_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ6_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ1   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ5_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ6   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ2   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ5   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ4   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ1_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ4_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ3_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

Global attributes
  interval             = Inf
  Oceananigans         = This file was generated using Oceananigans v0.79.4 (DEVELOPMENT BRANCH)
  Julia                = This file was generated using Julia Version 1.8.5
Commit 17cfb8e65e (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin22.2.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 6 on 8 virtual cores
Environment:
  JULIA_EDITOR = code

  output time interval = Output was saved every Inf years.
  date                 = This file was generated on 2023-02-18T19:16:16.882.
  schedule             = TimeInterval

julia> ds["wτ2_yavg"][:, :, 1]
16×17 Matrix{Float64}:
 0.0   0.000239391   0.00083127    0.00119759    0.00100968    0.00105376    0.000547622  …  -0.000823575  -0.00120598   -0.00105608   -0.000569584  -0.000338584  -7.31754e-5   0.0
 0.0  -0.000272188  -0.000758612  -0.000936866  -0.000932501  -0.000901216  -0.000569969      0.000851883   0.0010599     0.00082886    0.000575857   0.000439241   7.574e-5     0.0
 0.0   0.000214064   0.000566256   0.000828549   0.00134003    0.00116415    0.000864495     -0.000798704  -0.000681288  -0.00016673   -8.373e-5     -0.000169139   4.80975e-6   0.0
 0.0  -9.46733e-5   -0.000369457  -0.000711505  -0.00134226   -0.00106128   -0.00056589       0.000770231   0.000740732   0.0003337     0.000169505   0.00016793    6.28015e-5   0.0
 0.0   7.24424e-6    8.88843e-5    0.000181775   0.000217206  -7.28566e-5   -0.000650438      5.28438e-5    7.66377e-5    0.000354346   0.000364643   0.000234465   3.8087e-5    0.0
 0.0  -0.000138346  -0.000478074  -0.000792918  -0.000680375  -0.000466813   0.000147319  …   0.000106948  -2.15028e-5   -0.000203383  -0.000232146  -0.000163638  -0.000101512  0.0
 0.0   0.00018169    0.000677117   0.00112315    0.00109023    0.000650344   0.00011895      -0.000305076  -0.000244745  -0.000376204  -0.000309182  -0.000293188   1.46846e-6   0.0
 0.0  -0.000162776  -0.000532657  -0.000719588  -0.000509252   0.000263136   0.000623105     -0.000843197  -0.00101884   -0.00104725   -0.000711489  -0.000339487  -0.000173752  0.0
 0.0   0.000216915   0.000456321   0.000635276   0.000467607  -9.47766e-6   -0.000190093      0.00045725    0.000502882   0.000982609   0.000585199   0.000491581   0.000224349  0.0
 0.0  -0.000179709  -0.000246697  -0.00039405   -0.000311943  -0.000275781  -0.000321412      0.000779653   0.00090438    0.000526134   0.000687727   0.000144865  -4.29204e-5   0.0
 0.0   0.000212135   0.00045376    0.000821791   0.000824723   0.000830667   0.000767886  …  -0.000756743  -0.000672515  -0.000567891  -0.000655449  -0.000291907  -5.14761e-5   0.0
 0.0  -0.000207786  -0.000570261  -0.00113736   -0.0011126    -0.00095512   -0.000668289      0.000187081  -2.50371e-5   -0.000234227  -3.30057e-5    7.37231e-5    4.74698e-5   0.0
 0.0   3.44085e-5    6.01794e-5    0.000160129  -5.71002e-5   -0.000251935  -0.000348341      0.000175676   0.000452654   0.000757255   0.000292256   1.47004e-5   -4.29739e-5   0.0
 0.0   0.000127176   0.000354777   0.000538542   0.000684584   0.000355887   0.000248227      0.000121747  -8.66234e-5   -5.15146e-5    0.000247927   0.000267242   0.000177876  0.0
 0.0  -0.000112676  -0.000247409  -0.000271409  -0.00044693    0.000199694   0.000249875     -0.000585559  -0.00072856   -0.000851675  -0.000696049  -0.000430978  -0.00022939   0.0
 0.0  -6.48691e-5   -0.0002854    -0.0005231    -0.000241099  -0.000523152  -0.000253047  …   0.000609541   0.0009479     0.000772055   0.000367521   0.000193174   8.25992e-5   0.0

julia> ds["wτ2_yavg"] == ds["wτ1_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ3_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ4_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ5_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ6_yavg"]
true

So seems that all is good? So the problem comes when I continue the integration longer?
But why do the info statements appear in this order?

[ Info: Initializing simulation...
[ Info: Writing to NetCDF: ./test.nc...
[ Info: Computing NetCDF outputs for time index 1: ["wτ3", "wτ2_yavg", "wτ6_yavg", "wτ1", "wτ5_yavg", "wτ6", "wτ2", "wτ5", "wτ4", "wτ1_yavg", "wτ4_yavg", "wτ3_yavg"]...
[ Info: Computing wτ3 done: time=439.823 ms
[ Info: Computing wτ2_yavg done: time=3.404 seconds
[ Info: Computing wτ6_yavg done: time=3.018 seconds
[ Info: Computing wτ1 done: time=225.326 ms
[ Info: Computing wτ5_yavg done: time=2.950 seconds
[ Info: Computing wτ6 done: time=292.708 μs
[ Info: Computing wτ2 done: time=192.674 ms
[ Info: Computing wτ5 done: time=190.263 ms
[ Info: Computing wτ4 done: time=193.185 ms
[ Info: Computing wτ1_yavg done: time=1.210 seconds
[ Info: Computing wτ4_yavg done: time=2.954 seconds
[ Info: Computing wτ3_yavg done: time=2.953 seconds

?

glwagner · 2023-02-19T21:10:52Z

But why do the info statements appear in this order?

Because we use Dict for outputs:

Oceananigans.jl/src/OutputWriters/netcdf_output_writer.jl

Line 342 in c929676

    
           outputs = Dict(string(name) => construct_output(outputs[name], model.grid, indices, with_halos) for name in keys(outputs))

so iteration order is not deterministic.

We could use OrderedDict to change that (but I don't think that's a relevant issue in this case?)

navidcy · 2023-02-19T21:13:46Z

But the way @tomchor wrote the example, is outputs_yavg dependent on outputs_full? Seems like it...

E.g.,

outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full))

?

glwagner · 2023-02-19T21:16:53Z

Oh I see, so there is an assumption that val is computed correctly in here? I'm a little confused by this example, having trouble understanding it.

navidcy · 2023-02-19T21:18:28Z

I also have trouble understanding it...

Perhaps @tomchor can elaborate? Or simplify it to exemplify the issue?

glwagner · 2023-02-19T21:21:57Z

Dependencies between fields are supposed to be accounted for. For example if we write

wc = Field(w * c)
wc_average = Field(Average(wc, dims=1))

then

compute!(wc_average)

should first call compute!(wc). For example:

Oceananigans.jl/src/Fields/field_reductions.jl

Lines 72 to 77 in c929676

    
           function compute!(field::ReducedComputedField, time=nothing) 
        
               reduction = field.operand 
        
               compute_at!(reduction.operand, time) 
        
               reduction.reduce!(field, reduction.operand) 
        
               return field 
        
           end

where reduction.operand === wc` in the above example. Thus this calls

Oceananigans.jl/src/AbstractOperations/computed_field.jl

Lines 64 to 72 in c929676

    
           function compute!(comp::ComputedField, time=nothing) 
        
               # First compute `dependencies`: 
        
               @apply_regionally compute_at!(comp.operand, time) 
        
               @apply_regionally compute_field!(comp) 
        
               fill_halo_regions!(comp) 
        
               return comp 
        
           end

Note that compute_at! should always compute if needed because

Oceananigans.jl/src/Fields/field.jl

Lines 451 to 462 in c929676

    
           function compute_at!(field::Field, time) 
        
               if isnothing(field.status) # then always compute: 
        
                   compute!(field, time) 
        
               # Otherwise, compute only on initialization or if field.status.time is not current, 
        
               elseif time == zero(time) || time != field.status.time 
        
                   compute!(field, time) 
        
                   field.status.time = time 
        
               end 
        
               return field 
        
           end

glwagner · 2023-02-19T21:23:13Z

However, I would also recommend using

wc_average = Average(w*c, dims=1)

because this is more efficient (usually). One can in principle save some time by constructing a computational graph for the diagnostics, but I'm not sure it's worth it most of the time...

glwagner · 2023-02-19T21:24:13Z

I guess its also a difference with JLD2OutputWriter since we are allowed to use NamedTuple there:

Oceananigans.jl/src/OutputWriters/jld2_output_writer.jl

Line 179 in c929676

    
           outputs = NamedTuple(Symbol(name) => construct_output(outputs[name], model.grid, indices, with_halos)

I don't remember exactly why we use Dict for NetCDF...

glwagner · 2023-02-19T21:43:57Z

So the problem comes when I continue the integration longer?

This could be worth trying, note that

julia> using Oceananigans.Units

julia> 0.15hours / 30
18.0

So you can run for 18 steps instead of 4. (@tomchor why 18?)

glwagner · 2023-02-19T22:20:38Z

I ran

using Oceananigans
using NCDatasets

Nx = Ny = Nz = 16
grid = RectilinearGrid(size=(Nx, Ny, Nz), extent=(1, 1, 1))

tracer_names = Tuple(Symbol(:τ, n) for n = 1:6)
model = NonhydrostaticModel(; grid, tracers=tracer_names)

uᵢ(x, y, z) = randn()
cᵢ(x, y, z) = sin(2π * z / grid.Lz)
kw = NamedTuple(c => cᵢ for c in tracer_names)
set!(model; u=uᵢ, v=uᵢ, w=uᵢ, kw...)

simulation = Simulation(model, Δt=0.1/Nx, stop_iteration=100)

u, v, w = model.velocities
fluxes = NamedTuple(Symbol("wτ$n") => Field(w*c) for (n, c) in enumerate(model.tracers))
averaged_fluxes = NamedTuple(Symbol("avg_wτ$n") => Average(flux, dims=2) for (n, flux) in enumerate(fluxes))

jld2_filename = "test.jld2"
nc_filename = "test.nc"
kwargs = (schedule = IterationInterval(1),
          verbose = true,
          indices = (:, 1, :),
          overwrite_existing = true)

simulation.output_writers[:jld2] = JLD2OutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = jld2_filename,
                                                    kwargs...)

simulation.output_writers[:nc] = NetCDFOutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = nc_filename,
                                                    kwargs...)

run!(simulation)

ds = Dataset(nc_filename)

Ntracers = length(tracer_names)
flux_timeseries = Dict("wτ$n" => FieldTimeSeries(filename, "wτ$n") for n = 1:Ntracers)
average_flux_timeseries = Dict("wτ$n" => FieldTimeSeries(filename, "avg_wτ$n") for n = 1:Ntracers)

flux_1_nc = ds["wτ1"]
avg_flux_1_nc = ds["avg_wτ1"]
flux_1 = flux_timeseries["wτ1"]
avg_flux_1 = average_flux_timeseries["wτ1"]

for n = 2:Ntracers
    flux_n = flux_timeseries["wτ$n"]
    avg_flux_n = average_flux_timeseries["wτ$n"]

    @show "Fluxes for tracer $n:"
    @show all(flux_1[:, 1, :, :] .≈ flux_n[:, 1, :, :])
    @show all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :])
    @show all(flux_1_nc .≈ ds["wτ$n"])
    @show all(avg_flux_1_nc .≈ ds["avg_wτ$n"])
end

close(ds)

and all the fluxes and averaged fluxes are identical for both JLD2 and NetCDF output writers.

tomchor · 2023-02-20T15:53:22Z

I also have trouble understanding it...

Perhaps @tomchor can elaborate? Or simplify it to exemplify the issue?

So you can run for 18 steps instead of 4. (@tomchor why 18?)

Sorry for the unclear example, guys and thanks for the help. I posted this after many hours of trying to catch the culprit in a very complex simulation and at the time I was so tired that the MWE seemed reasonable to me. Now I see it's pretty badly set up. I'm gonna work a bit on this today and come up with a better MWE if we need one.

But to explain a bit better, the main goal of this snippet (other than showing the issue) is to write (in the same file) an xz-slice (at j=1) and a y-average (using Average(field, dims=(2,)).

For that I first create a tuple of "full" fields (fields without slicing or averaging, which I call outputs_full). Then, based on that tuple, get each element and average it in y, creating the tuple outputs_yavg.

When I pass both of those tuples (merged) to the NetCDFOutputWriter along with the option indices=(:,1,:), I get, in the same file, sliced fields (which are called wτ1, etc.) and y-averaged fields (called wτ1_yavg, etc.).

I just ran @glwagner's MWE locally and the issue doesn't appear, even though at first it does exactly what my MWE does, so I need to track what's the important change there.

tomchor · 2023-02-20T15:57:02Z

So seems that all is good? So the problem comes when I continue the integration longer?

It appears that the issue does get worse the longer you run, yes. The original MWE I posted runs for 18 time steps and it looks like this:

Running it for 2 time steps it looks like this:

The results are still different from different tracers, but not visibly so. (The order of magnitude of the differences is around 1e-5 in this case)

tomchor · 2023-02-20T17:09:41Z

It seems that merely switching the order you add the output writer in @glwagner's example makes the issue pop up. In @glwagner's original example the JLD2 writer is added first, and then the netcdf writer, and there's no issue. The example below (which is pretty much the same example, except with that order switched) fails for me:

using Oceananigans
using NCDatasets

Nx = Ny = Nz = 16
grid = RectilinearGrid(size=(Nx, Ny, Nz), extent=(1, 1, 1))

tracer_names = Tuple(Symbol(:τ, n) for n = 1:6)
model = NonhydrostaticModel(; grid, tracers=tracer_names)

uᵢ(x, y, z) = randn()
cᵢ(x, y, z) = sin(2π * z / grid.Lz)
kw = NamedTuple(c => cᵢ for c in tracer_names)
set!(model; u=uᵢ, v=uᵢ, w=uᵢ, kw...)

simulation = Simulation(model, Δt=0.1/Nx, stop_iteration=100)

u, v, w = model.velocities
fluxes = NamedTuple(Symbol("wτ$n") => Field(w*c) for (n, c) in enumerate(model.tracers))
averaged_fluxes = NamedTuple(Symbol("avg_wτ$n") => Average(flux, dims=2) for (n, flux) in enumerate(fluxes))

jld2_filename = "test.jld2"
nc_filename = "test.nc"
kwargs = (schedule = IterationInterval(1),
          verbose = true,
          indices = (:, 1, :),
          overwrite_existing = true)

simulation.output_writers[:nc] = NetCDFOutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = nc_filename,
                                                    kwargs...)
simulation.output_writers[:jld2] = JLD2OutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = jld2_filename,
                                                    kwargs...)

run!(simulation)

ds = Dataset(nc_filename)

Ntracers = length(tracer_names)
flux_timeseries = Dict("wτ$n" => FieldTimeSeries(jld2_filename, "wτ$n") for n = 1:Ntracers)
average_flux_timeseries = Dict("wτ$n" => FieldTimeSeries(jld2_filename, "avg_wτ$n") for n = 1:Ntracers)

avg_flux_1_nc = ds["avg_wτ1"]
avg_flux_1 = average_flux_timeseries["wτ1"]

for n = 2:Ntracers
    flux_n = flux_timeseries["wτ$n"]
    avg_flux_n = average_flux_timeseries["wτ$n"]

    @show "Fluxes for tracer $n:"
    @show all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :])
    @show all(avg_flux_1_nc .≈ ds["avg_wτ$n"])
end

close(ds)

This gives me:

"Fluxes for tracer $(n):" = "Fluxes for tracer 2:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
"Fluxes for tracer $(n):" = "Fluxes for tracer 3:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = true
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = true
"Fluxes for tracer $(n):" = "Fluxes for tracer 4:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
"Fluxes for tracer $(n):" = "Fluxes for tracer 5:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
"Fluxes for tracer $(n):" = "Fluxes for tracer 6:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
closed NetCDF NCDataset

And indeed plotting it reveals:

This makes no sense to me. Does it have to do with JLD2 writer "pre-computing" the outputs in a better way that the netcdf writer does?

navidcy · 2023-02-20T18:54:19Z

Interesting.

Is 6 tracers necessary for this to appear?

tomchor · 2023-02-20T21:26:30Z

Interesting.

Is 6 tracers necessary for this to appear?

I was just able to reproduce this with as few as 3 tracers, but not two.

glwagner · 2023-02-21T04:59:36Z

Ok, I can reproduce this.

Here's a few observations:

It works when JLD2OutputWriter computes the outputs first.
When NetCDFOutputWriter computes the outputs, the JLD2 output is also wrong. This shows that the output is not being recomputed.
The different between JLD2 and NetCDF is that JLD2 uses a NamedTuple while NetCDF uses Dict. With Dict, the order in which the outputs are computed is not deterministic. Yet that should not matter here, as far as I can tell. So there is a bug somewhere.
I tested changing Dict to OrderedDict and the problem disappears.

We can merge this last change. But I'd also like to dig a little further to see if there isn't some more insidious bug, because I don't understand why we need deterministic computation of output. (On the other hand, I think deterministic output computation is a potentially useful feature so it makes sense to support this with NetCDFOutputWriter).

glwagner · 2023-02-21T05:04:50Z

Also, if we do not include the computation of fluxes (in addition to averaged fluxes) here, there's no issue? Is that right?

glwagner · 2023-02-21T06:08:36Z

Hmm I also want to point out that this pattern is inefficient with memory. If we are only interested in the values at indices=(:, 1, :), then it is wasteful to allocate Field(w*c) etc. Instead we should compute w*c only at the indices required.

The other ambiguous aspect of this setup is what we expect to happen when we ask for Field(Average(field, dims=2), indices=(:, 1, :)). The field is reduced in y, but we are asking for specific y-indices --- should this be allowed?

There's still a mystery here, even if the setup if a little confusing...

navidcy · 2023-02-21T12:23:30Z

The other ambiguous aspect of this setup is what we expect to happen when we ask for Field(Average(field, dims=2), indices=(:, 1, :)). The field is reduced in y, but we are asking for specific y-indices --- should this be allowed?

That we do/allow this is bit confusing for me also.

glwagner · 2023-02-21T12:50:12Z

It could be consistent with the indexing behavior of Nothing location --- ie if you "slice" a field along a reduced dimension, nothing happens.

Yet still I feel that the indices kwarg isn't being used properly here...

glwagner · 2023-02-21T13:27:11Z

I think I understand the issue now.

Here's a summary of what we are trying to do.

Create a 3D computed field wc = Field(w*c)
Create a reduced field that averages this field: wc_avg = Average(wc, dims=2).

Note that we usually recommend averaging an operation rather than Field via wc_avg = Average(w*c, dims=2), which is more efficient if we don't require the three-dimensional results in wc. If we do require the 3D wc as well, we could compute its average in post processing on the fly. But I guess there are use cases for precomputing the 3D field (convenience, or GPU compilation issues...)

Next, we

Slice into both the reduced field and the 3D field in an output writer by passing the kwarg indices = (:, 1, :).

As discussed above, we aren't sure if this is good practice, since we're asking the output writers to "reindex" a reduced field in y along a specific y index, ie (:, 1, :). Being a reduced field, the y-index is meaningless, so this kwarg is basically ignored for the reduced field. I think a better way to write this code is to build the indices for each output individually / manually rather than using the convenience kwarg.

Nevertheless --- for this last step, the output writer creates a "view field" that slices into the original 3D field (ie it does not allocate any additional memory, but instead creates a new field whose data is a view into the parent 3D field):

Oceananigans.jl/src/OutputWriters/output_construction.jl

Line 46 in e394bf7

return Field(user_output; indices)

this calls

Oceananigans.jl/src/Fields/field.jl

Line 180 in e394bf7

Field(f::Field; indices=f.indices) = view(f, indices...) # hmm...

which calls

Oceananigans.jl/src/Fields/field.jl

Lines 298 to 322 in e394bf7

    
           function Base.view(f::Field, i, j, k) 
        
               grid = f.grid 
        
               loc = location(f) 
        
               # Validate indices (convert Int to UnitRange, error for invalid indices) 
        
               window_indices = validate_indices((i, j, k), loc, f.grid) 
        
               # Choice: OffsetArray of view of OffsetArray, or OffsetArray of view? 
        
               #     -> the first retains a reference to the original f.data (an OffsetArray) 
        
               #     -> the second loses it, so we'd have to "re-offset" the underlying data to access. 
        
               #     -> we choose the second here, opting to "reduce indirection" at the cost of "index recomputation". 
        
               # 
        
               # OffsetArray around a view of parent with appropriate indices: 
        
               windowed_data = offset_windowed_data(f.data, loc, grid, window_indices)   
        
               boundary_conditions = FieldBoundaryConditions(window_indices, f.boundary_conditions) 
        
               return Field(loc, 
        
                            grid, 
        
                            windowed_data, 
        
                            boundary_conditions, 
        
                            window_indices, 
        
                            f.operand, 
        
                            f.status) 
        
           end

Note that view(f, i, j, k) passes f.status into the sliced field. This is the issue --- then when we call compute!(sliced_f), the parent field f "thinks" it has been computed, even though values were only computed on (:, 1, :).

It's an easy fix since we just have to set the status of the sliced field to nothing.

navidcy · 2023-02-21T13:32:25Z

heroic debugging 👍🏼 :)

glwagner · 2023-02-21T13:35:27Z

@navidcy also made me wonder if we should copy boundary conditions in similar...

navidcy · 2023-02-21T13:39:50Z

yeap, open question: #2882

tomchor · 2023-02-21T15:02:30Z

That's some great debugging there, @glwagner. Thanks!

Yeah I agree passing indices alongside averages is unclear to say the least. When I first set up the output writer to do this (with only one tracer) I was surprised that it worked out of the box since I expected an error or warning. But since it made code simpler and it worked, I kept it. Then this error creeped up on me 😬

I'd be okay if you want to not allow that, or throw a warning or something in this case.

glwagner · 2023-02-21T15:47:07Z

Have you tried using wc = Field(w*c, indices=(:, 1, :)) and also wc_avg = Average(w*c, dims=2)? This would allow you to avoid using the indices kwarg.

tomchor · 2023-02-21T15:53:48Z

Have you tried using wc = Field(w*c, indices=(:, 1, :)) and also wc_avg = Average(w*c, dims=2)? This would allow you to avoid using the indices kwarg.

I haven't tried that solution specifically, but I suspect it would work. I think the issue with this bug isn't that there aren't workarounds (for example, one could just separate slices and averages into two different files, which is what I'm currently doing) it's just that it fails silently and subtly, so it could catch users off guard.

glwagner · 2023-02-21T15:55:25Z

No we should fix this for sure. I'm asking because you would save a lot of memory if you avoid writing Field(w*c).

tomchor · 2023-02-21T16:12:50Z

No we should fix this for sure. I'm asking because you would save a lot of memory if you avoid writing Field(w*c).

Ah, I see. That's a good point, I'll investigate that. I'm wrapping things in Field() calls because I use the data option in all of them to reuse scratch space and save memory. But indeed when averages are involved that might not be the best way to go...

glwagner · 2023-02-21T23:59:55Z

Averaging operations does not allocate any extra memory and is more performant than precalculating a field, storing the data, and then taking the average of that.

In general, you only need 3D scratch space if you have 3D output.

glwagner added the bug 🐞 Even a perfect program still has bugs label Feb 17, 2023

glwagner mentioned this issue Feb 21, 2023

Don't conflate the status of a sliced field with its parent field #2933

Merged

navidcy added the output 💾 label Feb 21, 2023

navidcy closed this as completed in #2933 Feb 24, 2023

Different results from identical tracers when using NetCDFOutputWriter #2931

Different results from identical tracers when using NetCDFOutputWriter #2931

Comments

tomchor commented Feb 17, 2023

simone-silvestri commented Feb 17, 2023

tomchor commented Feb 17, 2023

navidcy commented Feb 19, 2023

navidcy commented Feb 19, 2023

navidcy commented Feb 19, 2023

navidcy commented Feb 19, 2023 • edited Loading

navidcy commented Feb 19, 2023

glwagner commented Feb 19, 2023

navidcy commented Feb 19, 2023 • edited Loading

glwagner commented Feb 19, 2023

navidcy commented Feb 19, 2023

glwagner commented Feb 19, 2023

glwagner commented Feb 19, 2023

glwagner commented Feb 19, 2023

glwagner commented Feb 19, 2023

glwagner commented Feb 19, 2023

tomchor commented Feb 20, 2023

tomchor commented Feb 20, 2023

tomchor commented Feb 20, 2023

navidcy commented Feb 20, 2023

tomchor commented Feb 20, 2023

glwagner commented Feb 21, 2023

glwagner commented Feb 21, 2023

glwagner commented Feb 21, 2023

navidcy commented Feb 21, 2023

glwagner commented Feb 21, 2023

glwagner commented Feb 21, 2023 • edited Loading

navidcy commented Feb 21, 2023

glwagner commented Feb 21, 2023

navidcy commented Feb 21, 2023

tomchor commented Feb 21, 2023

glwagner commented Feb 21, 2023

tomchor commented Feb 21, 2023

glwagner commented Feb 21, 2023

tomchor commented Feb 21, 2023

glwagner commented Feb 21, 2023

navidcy commented Feb 19, 2023 •

edited

Loading

navidcy commented Feb 19, 2023 •

edited

Loading

glwagner commented Feb 21, 2023 •

edited

Loading