-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results from identical tracers when using NetCDFOutputWriter #2931
Comments
It seems that in one case you are calculating the y average an in the other you are outputting the field at |
Yes |
Does this only appear for 6 tracers? What about 2? |
Also, if |
OK, I see you explained that you want to use |
Can we reduce the MWE even more? Why not Also, can you post the code to plot? |
I run this: using Oceananigans
grid = RectilinearGrid(size=(16, 16, 16), extent = (500, 500, 120))
n_tracers = 6
tracer_symbols = [ Symbol(:τ, i) for i in 1:n_tracers ]
model = NonhydrostaticModel(; grid, tracers = (tracer_symbols...,))
@info model
uᵢ(x, y, z) = 1e-2 * randn()
set!(model, w=uᵢ)
tracer_IC_odd(x, y, z) = sin(2π * z / grid.Lz)
for i in 1:n_tracers
@info "Setting tracer $i"
expression = Meta.parse("set!(model, τ$i=tracer_IC_odd)")
eval(expression)
end
simulation = Simulation(model, Δt=30, stop_iteration=4)
u, v, w = model.velocities
wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key, τ) in pairs(model.tracers))
outputs_full = (; wτ...)
outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full))
outputs_xz1 = merge(outputs_full, outputs_yavg)
simulation.output_writers[:xz1_writer] = NetCDFOutputWriter(model, outputs_xz1;
filename = "test.nc",
schedule = TimeInterval(simulation.stop_time),
verbose=true,
indices = (:, 1, :),
overwrite_existing = true,
)
run!(simulation) and got [ Info: Initializing simulation...
[ Info: Writing to NetCDF: ./test.nc...
[ Info: Computing NetCDF outputs for time index 1: ["wτ3", "wτ2_yavg", "wτ6_yavg", "wτ1", "wτ5_yavg", "wτ6", "wτ2", "wτ5", "wτ4", "wτ1_yavg", "wτ4_yavg", "wτ3_yavg"]...
[ Info: Computing wτ3 done: time=439.823 ms
[ Info: Computing wτ2_yavg done: time=3.404 seconds
[ Info: Computing wτ6_yavg done: time=3.018 seconds
[ Info: Computing wτ1 done: time=225.326 ms
[ Info: Computing wτ5_yavg done: time=2.950 seconds
[ Info: Computing wτ6 done: time=292.708 μs
[ Info: Computing wτ2 done: time=192.674 ms
[ Info: Computing wτ5 done: time=190.263 ms
[ Info: Computing wτ4 done: time=193.185 ms
[ Info: Computing wτ1_yavg done: time=1.210 seconds
[ Info: Computing wτ4_yavg done: time=2.954 seconds
[ Info: Computing wτ3_yavg done: time=2.953 seconds
[ Info: Writing done: time=17.732 seconds, size=19.5 KiB, Δsize=0.0 B
[ Info: ... simulation initialization complete (18.528 seconds)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (30.965 seconds).
[ Info: Simulation is stopping after running for 49.565 seconds.
[ Info: Model iteration 4 equals or exceeds stop iteration 4. and then julia> using NCDatasets
julia> ds = NCDataset(simulation.output_writers[:xz1_writer].filepath, "r")
NCDataset: ./test.nc
Group: /
Dimensions
zC = 16
zF = 17
xC = 16
yF = 1
xF = 16
yC = 1
time = 1
Variables
zC (16)
Datatype: Float64
Dimensions: zC
Attributes:
units = m
longname = Locations of the cell centers in the z-direction.
zF (17)
Datatype: Float64
Dimensions: zF
Attributes:
units = m
longname = Locations of the cell faces in the z-direction.
xC (16)
Datatype: Float64
Dimensions: xC
Attributes:
units = m
longname = Locations of the cell centers in the x-direction.
yF (1)
Datatype: Float64
Dimensions: yF
Attributes:
units = m
longname = Locations of the cell faces in the y-direction.
xF (16)
Datatype: Float64
Dimensions: xF
Attributes:
units = m
longname = Locations of the cell faces in the x-direction.
yC (1)
Datatype: Float64
Dimensions: yC
Attributes:
units = m
longname = Locations of the cell centers in the y-direction.
time (1)
Datatype: Float64
Dimensions: time
Attributes:
units = seconds
longname = Time
wτ3 (16 × 1 × 17 × 1)
Datatype: Float64
Dimensions: xC × yC × zF × time
wτ2_yavg (16 × 17 × 1)
Datatype: Float64
Dimensions: xC × zF × time
wτ6_yavg (16 × 17 × 1)
Datatype: Float64
Dimensions: xC × zF × time
wτ1 (16 × 1 × 17 × 1)
Datatype: Float64
Dimensions: xC × yC × zF × time
wτ5_yavg (16 × 17 × 1)
Datatype: Float64
Dimensions: xC × zF × time
wτ6 (16 × 1 × 17 × 1)
Datatype: Float64
Dimensions: xC × yC × zF × time
wτ2 (16 × 1 × 17 × 1)
Datatype: Float64
Dimensions: xC × yC × zF × time
wτ5 (16 × 1 × 17 × 1)
Datatype: Float64
Dimensions: xC × yC × zF × time
wτ4 (16 × 1 × 17 × 1)
Datatype: Float64
Dimensions: xC × yC × zF × time
wτ1_yavg (16 × 17 × 1)
Datatype: Float64
Dimensions: xC × zF × time
wτ4_yavg (16 × 17 × 1)
Datatype: Float64
Dimensions: xC × zF × time
wτ3_yavg (16 × 17 × 1)
Datatype: Float64
Dimensions: xC × zF × time
Global attributes
interval = Inf
Oceananigans = This file was generated using Oceananigans v0.79.4 (DEVELOPMENT BRANCH)
Julia = This file was generated using Julia Version 1.8.5
Commit 17cfb8e65e (2023-01-08 06:45 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin22.2.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 6 on 8 virtual cores
Environment:
JULIA_EDITOR = code
output time interval = Output was saved every Inf years.
date = This file was generated on 2023-02-18T19:16:16.882.
schedule = TimeInterval
julia> ds["wτ2_yavg"][:, :, 1]
16×17 Matrix{Float64}:
0.0 0.000239391 0.00083127 0.00119759 0.00100968 0.00105376 0.000547622 … -0.000823575 -0.00120598 -0.00105608 -0.000569584 -0.000338584 -7.31754e-5 0.0
0.0 -0.000272188 -0.000758612 -0.000936866 -0.000932501 -0.000901216 -0.000569969 0.000851883 0.0010599 0.00082886 0.000575857 0.000439241 7.574e-5 0.0
0.0 0.000214064 0.000566256 0.000828549 0.00134003 0.00116415 0.000864495 -0.000798704 -0.000681288 -0.00016673 -8.373e-5 -0.000169139 4.80975e-6 0.0
0.0 -9.46733e-5 -0.000369457 -0.000711505 -0.00134226 -0.00106128 -0.00056589 0.000770231 0.000740732 0.0003337 0.000169505 0.00016793 6.28015e-5 0.0
0.0 7.24424e-6 8.88843e-5 0.000181775 0.000217206 -7.28566e-5 -0.000650438 5.28438e-5 7.66377e-5 0.000354346 0.000364643 0.000234465 3.8087e-5 0.0
0.0 -0.000138346 -0.000478074 -0.000792918 -0.000680375 -0.000466813 0.000147319 … 0.000106948 -2.15028e-5 -0.000203383 -0.000232146 -0.000163638 -0.000101512 0.0
0.0 0.00018169 0.000677117 0.00112315 0.00109023 0.000650344 0.00011895 -0.000305076 -0.000244745 -0.000376204 -0.000309182 -0.000293188 1.46846e-6 0.0
0.0 -0.000162776 -0.000532657 -0.000719588 -0.000509252 0.000263136 0.000623105 -0.000843197 -0.00101884 -0.00104725 -0.000711489 -0.000339487 -0.000173752 0.0
0.0 0.000216915 0.000456321 0.000635276 0.000467607 -9.47766e-6 -0.000190093 0.00045725 0.000502882 0.000982609 0.000585199 0.000491581 0.000224349 0.0
0.0 -0.000179709 -0.000246697 -0.00039405 -0.000311943 -0.000275781 -0.000321412 0.000779653 0.00090438 0.000526134 0.000687727 0.000144865 -4.29204e-5 0.0
0.0 0.000212135 0.00045376 0.000821791 0.000824723 0.000830667 0.000767886 … -0.000756743 -0.000672515 -0.000567891 -0.000655449 -0.000291907 -5.14761e-5 0.0
0.0 -0.000207786 -0.000570261 -0.00113736 -0.0011126 -0.00095512 -0.000668289 0.000187081 -2.50371e-5 -0.000234227 -3.30057e-5 7.37231e-5 4.74698e-5 0.0
0.0 3.44085e-5 6.01794e-5 0.000160129 -5.71002e-5 -0.000251935 -0.000348341 0.000175676 0.000452654 0.000757255 0.000292256 1.47004e-5 -4.29739e-5 0.0
0.0 0.000127176 0.000354777 0.000538542 0.000684584 0.000355887 0.000248227 0.000121747 -8.66234e-5 -5.15146e-5 0.000247927 0.000267242 0.000177876 0.0
0.0 -0.000112676 -0.000247409 -0.000271409 -0.00044693 0.000199694 0.000249875 -0.000585559 -0.00072856 -0.000851675 -0.000696049 -0.000430978 -0.00022939 0.0
0.0 -6.48691e-5 -0.0002854 -0.0005231 -0.000241099 -0.000523152 -0.000253047 … 0.000609541 0.0009479 0.000772055 0.000367521 0.000193174 8.25992e-5 0.0
julia> ds["wτ2_yavg"] == ds["wτ1_yavg"]
true
julia> ds["wτ2_yavg"] == ds["wτ3_yavg"]
true
julia> ds["wτ2_yavg"] == ds["wτ4_yavg"]
true
julia> ds["wτ2_yavg"] == ds["wτ5_yavg"]
true
julia> ds["wτ2_yavg"] == ds["wτ6_yavg"]
true So seems that all is good? So the problem comes when I continue the integration longer? [ Info: Initializing simulation...
[ Info: Writing to NetCDF: ./test.nc...
[ Info: Computing NetCDF outputs for time index 1: ["wτ3", "wτ2_yavg", "wτ6_yavg", "wτ1", "wτ5_yavg", "wτ6", "wτ2", "wτ5", "wτ4", "wτ1_yavg", "wτ4_yavg", "wτ3_yavg"]...
[ Info: Computing wτ3 done: time=439.823 ms
[ Info: Computing wτ2_yavg done: time=3.404 seconds
[ Info: Computing wτ6_yavg done: time=3.018 seconds
[ Info: Computing wτ1 done: time=225.326 ms
[ Info: Computing wτ5_yavg done: time=2.950 seconds
[ Info: Computing wτ6 done: time=292.708 μs
[ Info: Computing wτ2 done: time=192.674 ms
[ Info: Computing wτ5 done: time=190.263 ms
[ Info: Computing wτ4 done: time=193.185 ms
[ Info: Computing wτ1_yavg done: time=1.210 seconds
[ Info: Computing wτ4_yavg done: time=2.954 seconds
[ Info: Computing wτ3_yavg done: time=2.953 seconds ? |
Because we use
so iteration order is not deterministic. We could use |
But the way @tomchor wrote the example, is E.g., outputs_yavg = NamedTuple( Symbol(key, :_yavg)=>Average(val, dims=(2,)) for (key, val) in zip(keys(outputs_full), outputs_full)) ? |
Oh I see, so there is an assumption that |
I also have trouble understanding it... Perhaps @tomchor can elaborate? Or simplify it to exemplify the issue? |
Dependencies between fields are supposed to be accounted for. For example if we write wc = Field(w * c)
wc_average = Field(Average(wc, dims=1)) then compute!(wc_average) should first call Oceananigans.jl/src/Fields/field_reductions.jl Lines 72 to 77 in c929676
where Oceananigans.jl/src/AbstractOperations/computed_field.jl Lines 64 to 72 in c929676
Note that Oceananigans.jl/src/Fields/field.jl Lines 451 to 462 in c929676
|
However, I would also recommend using wc_average = Average(w*c, dims=1) because this is more efficient (usually). One can in principle save some time by constructing a computational graph for the diagnostics, but I'm not sure it's worth it most of the time... |
I guess its also a difference with
I don't remember exactly why we use |
This could be worth trying, note that julia> using Oceananigans.Units
julia> 0.15hours / 30
18.0 So you can run for 18 steps instead of 4. (@tomchor why 18?) |
I ran using Oceananigans
using NCDatasets
Nx = Ny = Nz = 16
grid = RectilinearGrid(size=(Nx, Ny, Nz), extent=(1, 1, 1))
tracer_names = Tuple(Symbol(:τ, n) for n = 1:6)
model = NonhydrostaticModel(; grid, tracers=tracer_names)
uᵢ(x, y, z) = randn()
cᵢ(x, y, z) = sin(2π * z / grid.Lz)
kw = NamedTuple(c => cᵢ for c in tracer_names)
set!(model; u=uᵢ, v=uᵢ, w=uᵢ, kw...)
simulation = Simulation(model, Δt=0.1/Nx, stop_iteration=100)
u, v, w = model.velocities
fluxes = NamedTuple(Symbol("wτ$n") => Field(w*c) for (n, c) in enumerate(model.tracers))
averaged_fluxes = NamedTuple(Symbol("avg_wτ$n") => Average(flux, dims=2) for (n, flux) in enumerate(fluxes))
jld2_filename = "test.jld2"
nc_filename = "test.nc"
kwargs = (schedule = IterationInterval(1),
verbose = true,
indices = (:, 1, :),
overwrite_existing = true)
simulation.output_writers[:jld2] = JLD2OutputWriter(model, merge(fluxes, averaged_fluxes);
filename = jld2_filename,
kwargs...)
simulation.output_writers[:nc] = NetCDFOutputWriter(model, merge(fluxes, averaged_fluxes);
filename = nc_filename,
kwargs...)
run!(simulation)
ds = Dataset(nc_filename)
Ntracers = length(tracer_names)
flux_timeseries = Dict("wτ$n" => FieldTimeSeries(filename, "wτ$n") for n = 1:Ntracers)
average_flux_timeseries = Dict("wτ$n" => FieldTimeSeries(filename, "avg_wτ$n") for n = 1:Ntracers)
flux_1_nc = ds["wτ1"]
avg_flux_1_nc = ds["avg_wτ1"]
flux_1 = flux_timeseries["wτ1"]
avg_flux_1 = average_flux_timeseries["wτ1"]
for n = 2:Ntracers
flux_n = flux_timeseries["wτ$n"]
avg_flux_n = average_flux_timeseries["wτ$n"]
@show "Fluxes for tracer $n:"
@show all(flux_1[:, 1, :, :] .≈ flux_n[:, 1, :, :])
@show all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :])
@show all(flux_1_nc .≈ ds["wτ$n"])
@show all(avg_flux_1_nc .≈ ds["avg_wτ$n"])
end
close(ds) and all the fluxes and averaged fluxes are identical for both JLD2 and NetCDF output writers. |
Sorry for the unclear example, guys and thanks for the help. I posted this after many hours of trying to catch the culprit in a very complex simulation and at the time I was so tired that the MWE seemed reasonable to me. Now I see it's pretty badly set up. I'm gonna work a bit on this today and come up with a better MWE if we need one. But to explain a bit better, the main goal of this snippet (other than showing the issue) is to write (in the same file) an xz-slice (at For that I first create a tuple of "full" fields (fields without slicing or averaging, which I call When I pass both of those tuples (merged) to the I just ran @glwagner's MWE locally and the issue doesn't appear, even though at first it does exactly what my MWE does, so I need to track what's the important change there. |
It appears that the issue does get worse the longer you run, yes. The original MWE I posted runs for 18 time steps and it looks like this: Running it for 2 time steps it looks like this: The results are still different from different tracers, but not visibly so. (The order of magnitude of the differences is around 1e-5 in this case) |
It seems that merely switching the order you add the output writer in @glwagner's example makes the issue pop up. In @glwagner's original example the JLD2 writer is added first, and then the netcdf writer, and there's no issue. The example below (which is pretty much the same example, except with that order switched) fails for me: using Oceananigans
using NCDatasets
Nx = Ny = Nz = 16
grid = RectilinearGrid(size=(Nx, Ny, Nz), extent=(1, 1, 1))
tracer_names = Tuple(Symbol(:τ, n) for n = 1:6)
model = NonhydrostaticModel(; grid, tracers=tracer_names)
uᵢ(x, y, z) = randn()
cᵢ(x, y, z) = sin(2π * z / grid.Lz)
kw = NamedTuple(c => cᵢ for c in tracer_names)
set!(model; u=uᵢ, v=uᵢ, w=uᵢ, kw...)
simulation = Simulation(model, Δt=0.1/Nx, stop_iteration=100)
u, v, w = model.velocities
fluxes = NamedTuple(Symbol("wτ$n") => Field(w*c) for (n, c) in enumerate(model.tracers))
averaged_fluxes = NamedTuple(Symbol("avg_wτ$n") => Average(flux, dims=2) for (n, flux) in enumerate(fluxes))
jld2_filename = "test.jld2"
nc_filename = "test.nc"
kwargs = (schedule = IterationInterval(1),
verbose = true,
indices = (:, 1, :),
overwrite_existing = true)
simulation.output_writers[:nc] = NetCDFOutputWriter(model, merge(fluxes, averaged_fluxes);
filename = nc_filename,
kwargs...)
simulation.output_writers[:jld2] = JLD2OutputWriter(model, merge(fluxes, averaged_fluxes);
filename = jld2_filename,
kwargs...)
run!(simulation)
ds = Dataset(nc_filename)
Ntracers = length(tracer_names)
flux_timeseries = Dict("wτ$n" => FieldTimeSeries(jld2_filename, "wτ$n") for n = 1:Ntracers)
average_flux_timeseries = Dict("wτ$n" => FieldTimeSeries(jld2_filename, "avg_wτ$n") for n = 1:Ntracers)
avg_flux_1_nc = ds["avg_wτ1"]
avg_flux_1 = average_flux_timeseries["wτ1"]
for n = 2:Ntracers
flux_n = flux_timeseries["wτ$n"]
avg_flux_n = average_flux_timeseries["wτ$n"]
@show "Fluxes for tracer $n:"
@show all(avg_flux_1[:, 1, :, :] .≈ avg_flux_n[:, 1, :, :])
@show all(avg_flux_1_nc .≈ ds["avg_wτ$n"])
end
close(ds) This gives me:
And indeed plotting it reveals: This makes no sense to me. Does it have to do with JLD2 writer "pre-computing" the outputs in a better way that the netcdf writer does? |
Interesting. Is 6 tracers necessary for this to appear? |
I was just able to reproduce this with as few as 3 tracers, but not two. |
Ok, I can reproduce this. Here's a few observations:
We can merge this last change. But I'd also like to dig a little further to see if there isn't some more insidious bug, because I don't understand why we need deterministic computation of output. (On the other hand, I think deterministic output computation is a potentially useful feature so it makes sense to support this with NetCDFOutputWriter). |
Also, if we do not include the computation of fluxes (in addition to averaged fluxes) here, there's no issue? Is that right? |
Hmm I also want to point out that this pattern is inefficient with memory. If we are only interested in the values at The other ambiguous aspect of this setup is what we expect to happen when we ask for There's still a mystery here, even if the setup if a little confusing... |
That we do/allow this is bit confusing for me also. |
It could be consistent with the indexing behavior of Yet still I feel that the |
I think I understand the issue now. Here's a summary of what we are trying to do.
Note that we usually recommend averaging an operation rather than Next, we
As discussed above, we aren't sure if this is good practice, since we're asking the output writers to "reindex" a reduced field in Nevertheless --- for this last step, the output writer creates a "view field" that slices into the original 3D field (ie it does not allocate any additional memory, but instead creates a new field whose
this calls Oceananigans.jl/src/Fields/field.jl Line 180 in e394bf7
which calls Oceananigans.jl/src/Fields/field.jl Lines 298 to 322 in e394bf7
Note that It's an easy fix since we just have to set the status of the sliced field to |
heroic debugging 👍🏼 :) |
@navidcy also made me wonder if we should copy boundary conditions in |
yeap, open question: #2882 |
That's some great debugging there, @glwagner. Thanks! Yeah I agree passing indices alongside averages is unclear to say the least. When I first set up the output writer to do this (with only one tracer) I was surprised that it worked out of the box since I expected an error or warning. But since it made code simpler and it worked, I kept it. Then this error creeped up on me 😬 I'd be okay if you want to not allow that, or throw a warning or something in this case. |
Have you tried using |
I haven't tried that solution specifically, but I suspect it would work. I think the issue with this bug isn't that there aren't workarounds (for example, one could just separate slices and averages into two different files, which is what I'm currently doing) it's just that it fails silently and subtly, so it could catch users off guard. |
No we should fix this for sure. I'm asking because you would save a lot of memory if you avoid writing |
Ah, I see. That's a good point, I'll investigate that. I'm wrapping things in |
Averaging operations does not allocate any extra memory and is more performant than precalculating a field, storing the data, and then taking the average of that. In general, you only need 3D scratch space if you have 3D output. |
The example below creates a nonhydrostatic model with 6 identical tracers, calculates the vertical flux of those tracers in an identical way for all 6, and then writes it all to NetCDF. In the same file, it writes a vertical (x-z) slice of the fluxes using the
indices
flag, and an y-average of them:While the outputs should be the same (since the tracers and their advection are identical), I get different results for the y-averaged fluxes for different tracers:
Note that, while similar, tracers α=1,3,4 are different from α=2,5,6. The difference isn't large in this example, but can be made larger with more complexity in the calculations.
A couple of notes:
wτ = NamedTuple(Symbol(:w, key) => Field(w*τ) for (key,τ) in pairs(model.tracers))
togets rid of the issue. Although doing the above prevents a user from using
Field(op, data=scratch_data.data)
to save memory, which in some cases (my case for example) is very important.indices
flag (i.e. writing 3D fields instead slices) apparently gets rid of this issue.Here, even though the obvious easy solution is to separate averages from slices when writing, a user wouldn't know that since this fails silently and the wrong results can be pretty subtle (as the example above hopefully illustrates). For example, it popped up in one of my simulations and it took me a while to even realize what was happening, let alone figure out the solution.
My main question is: is this expected behavior? If so, should we somehow warn users (or even throw an error) do avoid mistakes?
The text was updated successfully, but these errors were encountered: