Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results from identical tracers when using NetCDFOutputWriter #2931

Closed
tomchor opened this issue Feb 17, 2023 · 36 comments · Fixed by #2933
Closed

Different results from identical tracers when using NetCDFOutputWriter #2931

tomchor opened this issue Feb 17, 2023 · 36 comments · Fixed by #2933
Labels
bug 🐞 Even a perfect program still has bugs output 💾

Comments

@tomchor
Copy link
Collaborator

tomchor commented Feb 17, 2023

The example below creates a nonhydrostatic model with 6 identical tracers, calculates the vertical flux of those tracers in an identical way for all 6, and then writes it all to NetCDF. In the same file, it writes a vertical (x-z) slice of the fluxes using the indices flag, and an y-average of them:

using Oceananigans
using Oceananigans.Units

grid = RectilinearGrid(size=(16, 16, 16), extent = (500, 500, 120))

n_tracers = 6
tracer_symbols = [ Symbol(,i) for i in 1:n_tracers ]
model = NonhydrostaticModel(; grid, tracers = (tracer_symbols...,))
@info model

uᵢ(x, y, z) = 1e-2 * randn()
set!(model, w=uᵢ)

tracer_IC_odd(x, y, z) = sin(2π*z/grid.Lz)
for i in 1:n_tracers
    @info "Setting tracer $i"
    expression = Meta.parse("set!(model, τ$i=tracer_IC_odd)")
    eval(expression)
end

simulation = Simulation(model, Δt=30, stop_time=0.15hours)

u, v, w = model.velocities

wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key,τ) in pairs(model.tracers))

outputs_full = (; wτ...)

outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full) )

outputs_xz1 = merge(outputs_full, outputs_yavg)
simulation.output_writers[:xz1_writer] = NetCDFOutputWriter(model, outputs_xz1;
                       filename = "data/TEST.nc",
                       schedule = TimeInterval(simulation.stop_time),
                       verbose=true,
                       indices = (:, 1, :),
                       overwrite_existing = true,
                       )
run!(simulation)

While the outputs should be the same (since the tracers and their advection are identical), I get different results for the y-averaged fluxes for different tracers:

image

Note that, while similar, tracers α=1,3,4 are different from α=2,5,6. The difference isn't large in this example, but can be made larger with more complexity in the calculations.

A couple of notes:

  • Changing the line wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key,τ) in pairs(model.tracers)) to
= NamedTuple(Symbol(:w, key) => w*τ for (key,τ) in pairs(model.tracers))

gets rid of the issue. Although doing the above prevents a user from using Field(op, data=scratch_data.data) to save memory, which in some cases (my case for example) is very important.

  • Not passing the indices flag (i.e. writing 3D fields instead slices) apparently gets rid of this issue.

Here, even though the obvious easy solution is to separate averages from slices when writing, a user wouldn't know that since this fails silently and the wrong results can be pretty subtle (as the example above hopefully illustrates). For example, it popped up in one of my simulations and it took me a while to even realize what was happening, let alone figure out the solution.

My main question is: is this expected behavior? If so, should we somehow warn users (or even throw an error) do avoid mistakes?

@simone-silvestri
Copy link
Collaborator

It seems that in one case you are calculating the y average an in the other you are outputting the field at (:, 1, :).
Is this true?

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 17, 2023

It seems that in one case you are calculating the y average an in the other you are outputting the field at (:, 1, :). Is this true?

Yes

@glwagner glwagner added the bug 🐞 Even a perfect program still has bugs label Feb 17, 2023
@navidcy
Copy link
Collaborator

navidcy commented Feb 19, 2023

Does this only appear for 6 tracers? What about 2?

@navidcy
Copy link
Collaborator

navidcy commented Feb 19, 2023

Also, if w and τ are fields why do you do Field(w*τ)?

@navidcy
Copy link
Collaborator

navidcy commented Feb 19, 2023

Also, if w and τ are fields why do you do Field(w*τ)?

OK, I see you explained that you want to use scratch_data...

@navidcy
Copy link
Collaborator

navidcy commented Feb 19, 2023

Can we reduce the MWE even more? Why not Nx, Ny, Nz = 4, 4, 4? And stop_time=30?

Also, can you post the code to plot?

@navidcy
Copy link
Collaborator

navidcy commented Feb 19, 2023

I run this:

using Oceananigans

grid = RectilinearGrid(size=(16, 16, 16), extent = (500, 500, 120))

n_tracers = 6
tracer_symbols = [ Symbol(, i) for i in 1:n_tracers ]
model = NonhydrostaticModel(; grid, tracers = (tracer_symbols...,))
@info model

uᵢ(x, y, z) = 1e-2 * randn()
set!(model, w=uᵢ)

tracer_IC_odd(x, y, z) = sin(2π * z / grid.Lz)
for i in 1:n_tracers
    @info "Setting tracer $i"
    expression = Meta.parse("set!(model, τ$i=tracer_IC_odd)")
    eval(expression)
end

simulation = Simulation(model, Δt=30, stop_iteration=4)

u, v, w = model.velocities

wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key, τ) in pairs(model.tracers))

outputs_full = (; wτ...)

outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full))

outputs_xz1 = merge(outputs_full, outputs_yavg)
simulation.output_writers[:xz1_writer] = NetCDFOutputWriter(model, outputs_xz1;
                       filename = "test.nc",
                       schedule = TimeInterval(simulation.stop_time),
                       verbose=true,
                       indices = (:, 1, :),
                       overwrite_existing = true,
                       )
run!(simulation)

and got

[ Info: Initializing simulation...
[ Info: Writing to NetCDF: ./test.nc...
[ Info: Computing NetCDF outputs for time index 1: ["wτ3", "wτ2_yavg", "wτ6_yavg", "wτ1", "wτ5_yavg", "wτ6", "wτ2", "wτ5", "wτ4", "wτ1_yavg", "wτ4_yavg", "wτ3_yavg"]...
[ Info: Computing wτ3 done: time=439.823 ms
[ Info: Computing wτ2_yavg done: time=3.404 seconds
[ Info: Computing wτ6_yavg done: time=3.018 seconds
[ Info: Computing wτ1 done: time=225.326 ms
[ Info: Computing wτ5_yavg done: time=2.950 seconds
[ Info: Computing wτ6 done: time=292.708 μs
[ Info: Computing wτ2 done: time=192.674 ms
[ Info: Computing wτ5 done: time=190.263 ms
[ Info: Computing wτ4 done: time=193.185 ms
[ Info: Computing wτ1_yavg done: time=1.210 seconds
[ Info: Computing wτ4_yavg done: time=2.954 seconds
[ Info: Computing wτ3_yavg done: time=2.953 seconds
[ Info: Writing done: time=17.732 seconds, size=19.5 KiB, Δsize=0.0 B
[ Info:     ... simulation initialization complete (18.528 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (30.965 seconds).
[ Info: Simulation is stopping after running for 49.565 seconds.
[ Info: Model iteration 4 equals or exceeds stop iteration 4.

and then

julia> using NCDatasets

julia> ds = NCDataset(simulation.output_writers[:xz1_writer].filepath, "r")
NCDataset: ./test.nc
Group: /

Dimensions
   zC = 16
   zF = 17
   xC = 16
   yF = 1
   xF = 16
   yC = 1
   time = 1

Variables
  zC   (16)
    Datatype:    Float64
    Dimensions:  zC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the z-direction.

  zF   (17)
    Datatype:    Float64
    Dimensions:  zF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the z-direction.

  xC   (16)
    Datatype:    Float64
    Dimensions:  xC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the x-direction.

  yF   (1)
    Datatype:    Float64
    Dimensions:  yF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the y-direction.

  xF   (16)
    Datatype:    Float64
    Dimensions:  xF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the x-direction.

  yC   (1)
    Datatype:    Float64
    Dimensions:  yC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the y-direction.

  time   (1)
    Datatype:    Float64
    Dimensions:  time
    Attributes:
     units                = seconds
     longname             = Time

  wτ3   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ2_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ6_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ1   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ5_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ6   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ2   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ5   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ4   (16 × 1 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × yC × zF × time

  wτ1_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ4_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

  wτ3_yavg   (16 × 17 × 1)
    Datatype:    Float64
    Dimensions:  xC × zF × time

Global attributes
  interval             = Inf
  Oceananigans         = This file was generated using Oceananigans v0.79.4 (DEVELOPMENT BRANCH)
  Julia                = This file was generated using Julia Version 1.8.5
Commit 17cfb8e65e (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin22.2.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 6 on 8 virtual cores
Environment:
  JULIA_EDITOR = code

  output time interval = Output was saved every Inf years.
  date                 = This file was generated on 2023-02-18T19:16:16.882.
  schedule             = TimeInterval

julia> ds["wτ2_yavg"][:, :, 1]
16×17 Matrix{Float64}:
 0.0   0.000239391   0.00083127    0.00119759    0.00100968    0.00105376    0.000547622    -0.000823575  -0.00120598   -0.00105608   -0.000569584  -0.000338584  -7.31754e-5   0.0
 0.0  -0.000272188  -0.000758612  -0.000936866  -0.000932501  -0.000901216  -0.000569969      0.000851883   0.0010599     0.00082886    0.000575857   0.000439241   7.574e-5     0.0
 0.0   0.000214064   0.000566256   0.000828549   0.00134003    0.00116415    0.000864495     -0.000798704  -0.000681288  -0.00016673   -8.373e-5     -0.000169139   4.80975e-6   0.0
 0.0  -9.46733e-5   -0.000369457  -0.000711505  -0.00134226   -0.00106128   -0.00056589       0.000770231   0.000740732   0.0003337     0.000169505   0.00016793    6.28015e-5   0.0
 0.0   7.24424e-6    8.88843e-5    0.000181775   0.000217206  -7.28566e-5   -0.000650438      5.28438e-5    7.66377e-5    0.000354346   0.000364643   0.000234465   3.8087e-5    0.0
 0.0  -0.000138346  -0.000478074  -0.000792918  -0.000680375  -0.000466813   0.000147319     0.000106948  -2.15028e-5   -0.000203383  -0.000232146  -0.000163638  -0.000101512  0.0
 0.0   0.00018169    0.000677117   0.00112315    0.00109023    0.000650344   0.00011895      -0.000305076  -0.000244745  -0.000376204  -0.000309182  -0.000293188   1.46846e-6   0.0
 0.0  -0.000162776  -0.000532657  -0.000719588  -0.000509252   0.000263136   0.000623105     -0.000843197  -0.00101884   -0.00104725   -0.000711489  -0.000339487  -0.000173752  0.0
 0.0   0.000216915   0.000456321   0.000635276   0.000467607  -9.47766e-6   -0.000190093      0.00045725    0.000502882   0.000982609   0.000585199   0.000491581   0.000224349  0.0
 0.0  -0.000179709  -0.000246697  -0.00039405   -0.000311943  -0.000275781  -0.000321412      0.000779653   0.00090438    0.000526134   0.000687727   0.000144865  -4.29204e-5   0.0
 0.0   0.000212135   0.00045376    0.000821791   0.000824723   0.000830667   0.000767886    -0.000756743  -0.000672515  -0.000567891  -0.000655449  -0.000291907  -5.14761e-5   0.0
 0.0  -0.000207786  -0.000570261  -0.00113736   -0.0011126    -0.00095512   -0.000668289      0.000187081  -2.50371e-5   -0.000234227  -3.30057e-5    7.37231e-5    4.74698e-5   0.0
 0.0   3.44085e-5    6.01794e-5    0.000160129  -5.71002e-5   -0.000251935  -0.000348341      0.000175676   0.000452654   0.000757255   0.000292256   1.47004e-5   -4.29739e-5   0.0
 0.0   0.000127176   0.000354777   0.000538542   0.000684584   0.000355887   0.000248227      0.000121747  -8.66234e-5   -5.15146e-5    0.000247927   0.000267242   0.000177876  0.0
 0.0  -0.000112676  -0.000247409  -0.000271409  -0.00044693    0.000199694   0.000249875     -0.000585559  -0.00072856   -0.000851675  -0.000696049  -0.000430978  -0.00022939   0.0
 0.0  -6.48691e-5   -0.0002854    -0.0005231    -0.000241099  -0.000523152  -0.000253047     0.000609541   0.0009479     0.000772055   0.000367521   0.000193174   8.25992e-5   0.0

julia> ds["wτ2_yavg"] == ds["wτ1_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ3_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ4_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ5_yavg"]
true

julia> ds["wτ2_yavg"] == ds["wτ6_yavg"]
true

So seems that all is good? So the problem comes when I continue the integration longer?
But why do the info statements appear in this order?

[ Info: Initializing simulation...
[ Info: Writing to NetCDF: ./test.nc...
[ Info: Computing NetCDF outputs for time index 1: ["wτ3", "wτ2_yavg", "wτ6_yavg", "wτ1", "wτ5_yavg", "wτ6", "wτ2", "wτ5", "wτ4", "wτ1_yavg", "wτ4_yavg", "wτ3_yavg"]...
[ Info: Computing wτ3 done: time=439.823 ms
[ Info: Computing wτ2_yavg done: time=3.404 seconds
[ Info: Computing wτ6_yavg done: time=3.018 seconds
[ Info: Computing wτ1 done: time=225.326 ms
[ Info: Computing wτ5_yavg done: time=2.950 seconds
[ Info: Computing wτ6 done: time=292.708 μs
[ Info: Computing wτ2 done: time=192.674 ms
[ Info: Computing wτ5 done: time=190.263 ms
[ Info: Computing wτ4 done: time=193.185 ms
[ Info: Computing wτ1_yavg done: time=1.210 seconds
[ Info: Computing wτ4_yavg done: time=2.954 seconds
[ Info: Computing wτ3_yavg done: time=2.953 seconds

?

@glwagner
Copy link
Member

But why do the info statements appear in this order?

Because we use Dict for outputs:

outputs = Dict(string(name) => construct_output(outputs[name], model.grid, indices, with_halos) for name in keys(outputs))

so iteration order is not deterministic.

We could use OrderedDict to change that (but I don't think that's a relevant issue in this case?)

@navidcy
Copy link
Collaborator

navidcy commented Feb 19, 2023

But the way @tomchor wrote the example, is outputs_yavg dependent on outputs_full? Seems like it...

E.g.,

outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full))

?

@glwagner
Copy link
Member

Oh I see, so there is an assumption that val is computed correctly in here? I'm a little confused by this example, having trouble understanding it.

@navidcy
Copy link
Collaborator

navidcy commented Feb 19, 2023

I also have trouble understanding it...

Perhaps @tomchor can elaborate? Or simplify it to exemplify the issue?

@glwagner
Copy link
Member

Dependencies between fields are supposed to be accounted for. For example if we write

wc = Field(w * c)
wc_average = Field(Average(wc, dims=1))

then

compute!(wc_average)

should first call compute!(wc). For example:

function compute!(field::ReducedComputedField, time=nothing)
reduction = field.operand
compute_at!(reduction.operand, time)
reduction.reduce!(field, reduction.operand)
return field
end

where reduction.operand === wc` in the above example. Thus this calls

function compute!(comp::ComputedField, time=nothing)
# First compute `dependencies`:
@apply_regionally compute_at!(comp.operand, time)
@apply_regionally compute_field!(comp)
fill_halo_regions!(comp)
return comp
end

Note that compute_at! should always compute if needed because

function compute_at!(field::Field, time)
if isnothing(field.status) # then always compute:
compute!(field, time)
# Otherwise, compute only on initialization or if field.status.time is not current,
elseif time == zero(time) || time != field.status.time
compute!(field, time)
field.status.time = time
end
return field
end

@glwagner
Copy link
Member

However, I would also recommend using

wc_average = Average(w*c, dims=1)

because this is more efficient (usually). One can in principle save some time by constructing a computational graph for the diagnostics, but I'm not sure it's worth it most of the time...

@glwagner
Copy link
Member

I guess its also a difference with JLD2OutputWriter since we are allowed to use NamedTuple there:

outputs = NamedTuple(Symbol(name) => construct_output(outputs[name], model.grid, indices, with_halos)

I don't remember exactly why we use Dict for NetCDF...

@glwagner
Copy link
Member

So the problem comes when I continue the integration longer?

This could be worth trying, note that

julia> using Oceananigans.Units

julia> 0.15hours / 30
18.0

So you can run for 18 steps instead of 4. (@tomchor why 18?)

@glwagner
Copy link
Member

I ran

using Oceananigans
using NCDatasets

Nx = Ny = Nz = 16
grid = RectilinearGrid(size=(Nx, Ny, Nz), extent=(1, 1, 1))

tracer_names = Tuple(Symbol(, n) for n = 1:6)
model = NonhydrostaticModel(; grid, tracers=tracer_names)

uᵢ(x, y, z) = randn()
cᵢ(x, y, z) = sin(2π * z / grid.Lz)
kw = NamedTuple(c => cᵢ for c in tracer_names)
set!(model; u=uᵢ, v=uᵢ, w=uᵢ, kw...)

simulation = Simulation(model, Δt=0.1/Nx, stop_iteration=100)

u, v, w = model.velocities
fluxes = NamedTuple(Symbol("$n") => Field(w*c) for (n, c) in enumerate(model.tracers))
averaged_fluxes = NamedTuple(Symbol("avg_wτ$n") => Average(flux, dims=2) for (n, flux) in enumerate(fluxes))

jld2_filename = "test.jld2"
nc_filename = "test.nc"
kwargs = (schedule = IterationInterval(1),
          verbose = true,
          indices = (:, 1, :),
          overwrite_existing = true)

simulation.output_writers[:jld2] = JLD2OutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = jld2_filename,
                                                    kwargs...)

simulation.output_writers[:nc] = NetCDFOutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = nc_filename,
                                                    kwargs...)

run!(simulation)

ds = Dataset(nc_filename)

Ntracers = length(tracer_names)
flux_timeseries = Dict("$n" => FieldTimeSeries(filename, "$n") for n = 1:Ntracers)
average_flux_timeseries = Dict("$n" => FieldTimeSeries(filename, "avg_wτ$n") for n = 1:Ntracers)

flux_1_nc = ds["wτ1"]
avg_flux_1_nc = ds["avg_wτ1"]
flux_1 = flux_timeseries["wτ1"]
avg_flux_1 = average_flux_timeseries["wτ1"]

for n = 2:Ntracers
    flux_n = flux_timeseries["$n"]
    avg_flux_n = average_flux_timeseries["$n"]

    @show "Fluxes for tracer $n:"
    @show all(flux_1[:, 1, :, :] .≈ flux_n[:, 1, :, :])
    @show all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :])
    @show all(flux_1_nc .≈ ds["$n"])
    @show all(avg_flux_1_nc .≈ ds["avg_wτ$n"])
end

close(ds)

and all the fluxes and averaged fluxes are identical for both JLD2 and NetCDF output writers.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 20, 2023

I also have trouble understanding it...

Perhaps @tomchor can elaborate? Or simplify it to exemplify the issue?

So you can run for 18 steps instead of 4. (@tomchor why 18?)

Sorry for the unclear example, guys and thanks for the help. I posted this after many hours of trying to catch the culprit in a very complex simulation and at the time I was so tired that the MWE seemed reasonable to me. Now I see it's pretty badly set up. I'm gonna work a bit on this today and come up with a better MWE if we need one.

But to explain a bit better, the main goal of this snippet (other than showing the issue) is to write (in the same file) an xz-slice (at j=1) and a y-average (using Average(field, dims=(2,)).

For that I first create a tuple of "full" fields (fields without slicing or averaging, which I call outputs_full). Then, based on that tuple, get each element and average it in y, creating the tuple outputs_yavg.

When I pass both of those tuples (merged) to the NetCDFOutputWriter along with the option indices=(:,1,:), I get, in the same file, sliced fields (which are called wτ1, etc.) and y-averaged fields (called wτ1_yavg, etc.).

I just ran @glwagner's MWE locally and the issue doesn't appear, even though at first it does exactly what my MWE does, so I need to track what's the important change there.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 20, 2023

So seems that all is good? So the problem comes when I continue the integration longer?

It appears that the issue does get worse the longer you run, yes. The original MWE I posted runs for 18 time steps and it looks like this:

image

Running it for 2 time steps it looks like this:

image

The results are still different from different tracers, but not visibly so. (The order of magnitude of the differences is around 1e-5 in this case)

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 20, 2023

It seems that merely switching the order you add the output writer in @glwagner's example makes the issue pop up. In @glwagner's original example the JLD2 writer is added first, and then the netcdf writer, and there's no issue. The example below (which is pretty much the same example, except with that order switched) fails for me:

using Oceananigans
using NCDatasets

Nx = Ny = Nz = 16
grid = RectilinearGrid(size=(Nx, Ny, Nz), extent=(1, 1, 1))

tracer_names = Tuple(Symbol(, n) for n = 1:6)
model = NonhydrostaticModel(; grid, tracers=tracer_names)

uᵢ(x, y, z) = randn()
cᵢ(x, y, z) = sin(2π * z / grid.Lz)
kw = NamedTuple(c => cᵢ for c in tracer_names)
set!(model; u=uᵢ, v=uᵢ, w=uᵢ, kw...)

simulation = Simulation(model, Δt=0.1/Nx, stop_iteration=100)

u, v, w = model.velocities
fluxes = NamedTuple(Symbol("$n") => Field(w*c) for (n, c) in enumerate(model.tracers))
averaged_fluxes = NamedTuple(Symbol("avg_wτ$n") => Average(flux, dims=2) for (n, flux) in enumerate(fluxes))

jld2_filename = "test.jld2"
nc_filename = "test.nc"
kwargs = (schedule = IterationInterval(1),
          verbose = true,
          indices = (:, 1, :),
          overwrite_existing = true)

simulation.output_writers[:nc] = NetCDFOutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = nc_filename,
                                                    kwargs...)
simulation.output_writers[:jld2] = JLD2OutputWriter(model, merge(fluxes, averaged_fluxes);
                                                    filename = jld2_filename,
                                                    kwargs...)

run!(simulation)

ds = Dataset(nc_filename)

Ntracers = length(tracer_names)
flux_timeseries = Dict("$n" => FieldTimeSeries(jld2_filename, "$n") for n = 1:Ntracers)
average_flux_timeseries = Dict("$n" => FieldTimeSeries(jld2_filename, "avg_wτ$n") for n = 1:Ntracers)

avg_flux_1_nc = ds["avg_wτ1"]
avg_flux_1 = average_flux_timeseries["wτ1"]

for n = 2:Ntracers
    flux_n = flux_timeseries["$n"]
    avg_flux_n = average_flux_timeseries["$n"]

    @show "Fluxes for tracer $n:"
    @show all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :])
    @show all(avg_flux_1_nc .≈ ds["avg_wτ$n"])
end

close(ds)

This gives me:

"Fluxes for tracer $(n):" = "Fluxes for tracer 2:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
"Fluxes for tracer $(n):" = "Fluxes for tracer 3:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = true
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = true
"Fluxes for tracer $(n):" = "Fluxes for tracer 4:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
"Fluxes for tracer $(n):" = "Fluxes for tracer 5:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
"Fluxes for tracer $(n):" = "Fluxes for tracer 6:"
all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :]) = false
all(avg_flux_1_nc .≈ ds["avg_wτ$(n)"]) = false
closed NetCDF NCDataset

And indeed plotting it reveals:

image

This makes no sense to me. Does it have to do with JLD2 writer "pre-computing" the outputs in a better way that the netcdf writer does?

@navidcy
Copy link
Collaborator

navidcy commented Feb 20, 2023

Interesting.

Is 6 tracers necessary for this to appear?

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 20, 2023

Interesting.

Is 6 tracers necessary for this to appear?

I was just able to reproduce this with as few as 3 tracers, but not two.

@glwagner
Copy link
Member

Ok, I can reproduce this.

Here's a few observations:

  1. It works when JLD2OutputWriter computes the outputs first.
  2. When NetCDFOutputWriter computes the outputs, the JLD2 output is also wrong. This shows that the output is not being recomputed.
  3. The different between JLD2 and NetCDF is that JLD2 uses a NamedTuple while NetCDF uses Dict. With Dict, the order in which the outputs are computed is not deterministic. Yet that should not matter here, as far as I can tell. So there is a bug somewhere.
  4. I tested changing Dict to OrderedDict and the problem disappears.

We can merge this last change. But I'd also like to dig a little further to see if there isn't some more insidious bug, because I don't understand why we need deterministic computation of output. (On the other hand, I think deterministic output computation is a potentially useful feature so it makes sense to support this with NetCDFOutputWriter).

@glwagner
Copy link
Member

Also, if we do not include the computation of fluxes (in addition to averaged fluxes) here, there's no issue? Is that right?

@glwagner
Copy link
Member

Hmm I also want to point out that this pattern is inefficient with memory. If we are only interested in the values at indices=(:, 1, :), then it is wasteful to allocate Field(w*c) etc. Instead we should compute w*c only at the indices required.

The other ambiguous aspect of this setup is what we expect to happen when we ask for Field(Average(field, dims=2), indices=(:, 1, :)). The field is reduced in y, but we are asking for specific y-indices --- should this be allowed?

There's still a mystery here, even if the setup if a little confusing...

@navidcy
Copy link
Collaborator

navidcy commented Feb 21, 2023

The other ambiguous aspect of this setup is what we expect to happen when we ask for Field(Average(field, dims=2), indices=(:, 1, :)). The field is reduced in y, but we are asking for specific y-indices --- should this be allowed?

That we do/allow this is bit confusing for me also.

@glwagner
Copy link
Member

It could be consistent with the indexing behavior of Nothing location --- ie if you "slice" a field along a reduced dimension, nothing happens.

Yet still I feel that the indices kwarg isn't being used properly here...

@glwagner
Copy link
Member

glwagner commented Feb 21, 2023

I think I understand the issue now.

Here's a summary of what we are trying to do.

  1. Create a 3D computed field wc = Field(w*c)
  2. Create a reduced field that averages this field: wc_avg = Average(wc, dims=2).

Note that we usually recommend averaging an operation rather than Field via wc_avg = Average(w*c, dims=2), which is more efficient if we don't require the three-dimensional results in wc. If we do require the 3D wc as well, we could compute its average in post processing on the fly. But I guess there are use cases for precomputing the 3D field (convenience, or GPU compilation issues...)

Next, we

  1. Slice into both the reduced field and the 3D field in an output writer by passing the kwarg indices = (:, 1, :).

As discussed above, we aren't sure if this is good practice, since we're asking the output writers to "reindex" a reduced field in y along a specific y index, ie (:, 1, :). Being a reduced field, the y-index is meaningless, so this kwarg is basically ignored for the reduced field. I think a better way to write this code is to build the indices for each output individually / manually rather than using the convenience kwarg.

Nevertheless --- for this last step, the output writer creates a "view field" that slices into the original 3D field (ie it does not allocate any additional memory, but instead creates a new field whose data is a view into the parent 3D field):

return Field(user_output; indices)

this calls

Field(f::Field; indices=f.indices) = view(f, indices...) # hmm...

which calls

function Base.view(f::Field, i, j, k)
grid = f.grid
loc = location(f)
# Validate indices (convert Int to UnitRange, error for invalid indices)
window_indices = validate_indices((i, j, k), loc, f.grid)
# Choice: OffsetArray of view of OffsetArray, or OffsetArray of view?
# -> the first retains a reference to the original f.data (an OffsetArray)
# -> the second loses it, so we'd have to "re-offset" the underlying data to access.
# -> we choose the second here, opting to "reduce indirection" at the cost of "index recomputation".
#
# OffsetArray around a view of parent with appropriate indices:
windowed_data = offset_windowed_data(f.data, loc, grid, window_indices)
boundary_conditions = FieldBoundaryConditions(window_indices, f.boundary_conditions)
return Field(loc,
grid,
windowed_data,
boundary_conditions,
window_indices,
f.operand,
f.status)
end

Note that view(f, i, j, k) passes f.status into the sliced field. This is the issue --- then when we call compute!(sliced_f), the parent field f "thinks" it has been computed, even though values were only computed on (:, 1, :).

It's an easy fix since we just have to set the status of the sliced field to nothing.

@navidcy
Copy link
Collaborator

navidcy commented Feb 21, 2023

heroic debugging 👍🏼 :)

@glwagner
Copy link
Member

@navidcy also made me wonder if we should copy boundary conditions in similar...

@navidcy
Copy link
Collaborator

navidcy commented Feb 21, 2023

yeap, open question: #2882

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 21, 2023

That's some great debugging there, @glwagner. Thanks!

Yeah I agree passing indices alongside averages is unclear to say the least. When I first set up the output writer to do this (with only one tracer) I was surprised that it worked out of the box since I expected an error or warning. But since it made code simpler and it worked, I kept it. Then this error creeped up on me 😬

I'd be okay if you want to not allow that, or throw a warning or something in this case.

@glwagner
Copy link
Member

Have you tried using wc = Field(w*c, indices=(:, 1, :)) and also wc_avg = Average(w*c, dims=2)? This would allow you to avoid using the indices kwarg.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 21, 2023

Have you tried using wc = Field(w*c, indices=(:, 1, :)) and also wc_avg = Average(w*c, dims=2)? This would allow you to avoid using the indices kwarg.

I haven't tried that solution specifically, but I suspect it would work. I think the issue with this bug isn't that there aren't workarounds (for example, one could just separate slices and averages into two different files, which is what I'm currently doing) it's just that it fails silently and subtly, so it could catch users off guard.

@glwagner
Copy link
Member

No we should fix this for sure. I'm asking because you would save a lot of memory if you avoid writing Field(w*c).

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 21, 2023

No we should fix this for sure. I'm asking because you would save a lot of memory if you avoid writing Field(w*c).

Ah, I see. That's a good point, I'll investigate that. I'm wrapping things in Field() calls because I use the data option in all of them to reuse scratch space and save memory. But indeed when averages are involved that might not be the best way to go...

@glwagner
Copy link
Member

Averaging operations does not allocate any extra memory and is more performant than precalculating a field, storing the data, and then taking the average of that.

In general, you only need 3D scratch space if you have 3D output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs output 💾
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants