Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix CSV output for SVector #193

Merged
merged 9 commits into from
Feb 2, 2022
Merged

fix CSV output for SVector #193

merged 9 commits into from
Feb 2, 2022

Conversation

verseve
Copy link
Member

@verseve verseve commented Jan 12, 2022

The CSV output was not correct for model parameters stored as SVector (dimension layer), for example:

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, 9.24, 9.44, 9.78, 9.88

This is fixed by specifying a required internal layer index for a layered parameter:

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1

If multiple layers are desired, this can be specified in separate [[csv.column]] entries.

Fixed the NetCDF scalar output for a SVector which gave the following error message:
"ERROR: DimensionMismatch("array could not be broadcast to match destination")

When Wflow is integrated with Delft-FEWS an extra dimension in not allowed by the importNetcdfActivity of the General Adapter of FEWS for scalar timeseries. As for CSV output an (optional) internal layer index can be provided for a layered parameter, so FEWS can import the NetCDF scalar file.

NetCDF 3D data can be imported by Delft-FEWS if the dimension has CF axis attribute Z, this has been added to the gridded NetCDF output of Wflow.

@laurenebouaziz
Copy link
Contributor

I run the previous testcase where I had the error and it is indeed fixed with the proposed changes, thanks!

Copy link
Contributor

@laurenebouaziz laurenebouaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a test case and it works fine now :)

@visr
Copy link
Member

visr commented Jan 18, 2022

@laurenebouaziz what did you try, and what kind of output do you wish?

I tested this locally by adding at the end of sbm_simple.toml:

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c"
parameter = "vertical.c"

But we'll need a new test to make sure we get this right.
If I'm not mistaken the difference in this PR is

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, [9.24, 9.44, 9.78, 9.88]

vs

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, 9.24, 9.44, 9.78, 9.88

These are both not good, right? The values in the second example are ok, but that would need a header like c_layer_1, c_layer_2, c_layer_3, c_layer_4, which is logic that would have to be added to this function:

Wflow.jl/src/io.jl

Lines 674 to 690 in b1f113e

"Get a Vector{String} of all columns names for the CSV header, exept the first, time"
function csv_header(cols, dataset, config)
header = [col["header"] for col in cols]
header = String[]
for col in cols
h = col["header"]::String
if haskey(col, "map")
mapname = col["map"]
ids = locations_map(dataset, mapname, config)
hvec = [string(h, '_', id) for id in ids]
append!(header, hvec)
else
push!(header, h)
end
end
return header
end

For gauge maps with IDs it already does that, by using $name_$id. We'd have to combine that logic with the new logic to also support writing layered parameters at gauges to a CSV. It then becomes a bit tricky since we are encoding multiple extra dimensions in CSV column names, at that point a multidimensional format like netCDF is just a better choice.

To avoid introducing too much complexity here that we will regret later, how about throwing an error, unless for a layered parameter, the user also specifies a layer?

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1

If multiple layers are desired, this can be specified in separate csv.column entries.

@verseve
Copy link
Member Author

verseve commented Jan 18, 2022

I think the following output is fine:

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, [9.24, 9.44, 9.78, 9.88]

Some post processing of the file in this case is of course required. Was that the approach you were following @laurenebouaziz ?

@verseve
Copy link
Member Author

verseve commented Jan 18, 2022

To avoid introducing too much complexity here that we will regret later, how about throwing an error, unless for a layered parameter, the user also specifies a layer?

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1

If multiple layers are desired, this can be specified in separate csv.column entries.

Commit 8e2e330 is doing this, with a general index_dim for SVectors (extra dim layer or classes).

@visr
Copy link
Member

visr commented Jan 18, 2022

Some post processing of the file in this case is of course required.

But what's the value of writing CSV if we need special tools to process them? I'd really prefer it if our CSVs are sufficiently boring that they can go straight into Excel/pandas/DataFrames or other standard tools, without needing to manually unpack columns.

@laurenebouaziz
Copy link
Contributor

Indeed, the changes here lead to:

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, [9.24, 9.44, 9.78, 9.88]

I fully agree that specifying a layer is much better to directly be able to read the csv (as we discussed for flextopo #194):

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1

so great that this is now added in #194 8e2e330

what I previously used to read the csv with [float, float, float] was not straightforward at all, it replaces the "[" and "]" by "'" and then reads the csv in with quotechar option (but this still requires to then parse the lists of values in each column):

filename = r"path\output.csv"

#Read in the file
with open(filename, 'r') as file :
  filedata = file.read()

#Replace the target string
filedata = filedata.replace("[","'").replace("]","'")

#Write the file out again
with open(filename, 'w') as file:
  file.write(filedata)

out = pd.read_csv(filename, index_col=0, parse_dates=True, quotechar = "'")

@verseve
Copy link
Member Author

verseve commented Jan 19, 2022

Thanks for the feedback @visr and @laurenebouaziz. How do we want to proceed with this PR, also related to #194?
Based on your feedback and the work in #194, we could add the following as suggested by @visr:

To avoid introducing too much complexity here that we will regret later, how about throwing an error, unless for a layered parameter, the user also specifies a layer?

 [[csv.column]]
 coordinate.x = 7.378
 coordinate.y = 50.204
 header = "c_layer_1"
 parameter = "vertical.c"
 layer = 1

If multiple layers are desired, this can be specified in separate csv.column entries.

I probably still have some time this week to work on this, and also implement it for classes dimension in #194.

@visr
Copy link
Member

visr commented Jan 20, 2022

Yes that would be great. If you prefer to just incorporate this as part of #194 that would be fine with me as well.

@verseve
Copy link
Member Author

verseve commented Jan 20, 2022

I made the changes in this branch (for dimension layer). Could you please review @visr and @laurenebouaziz ?

@visr
Copy link
Member

visr commented Jan 20, 2022

Ah I see that for netCDF scalar output the same approach as for CSV is used now. What happened with layered output to netCDF files before? Since netCDF is multidimensional, it would be nice if the different layers would by default be written to the file. Or is that a headache to support?

@verseve
Copy link
Member Author

verseve commented Jan 20, 2022

With NetCDF scalar this gave the following error:
"ERROR: DimensionMismatch("array could not be broadcast to match destination")

But yes, I agree it would be nice to support writing different layers at once for NetCDF. Not sure how easy it is (should also be FEWS compliant). Just checked the FEWS format and I think FEWS is not able to handle more dimensions than (time, location).

@verseve
Copy link
Member Author

verseve commented Jan 21, 2022

For the NetCDF scalar approach I will do first some testing with FEWS, to check if the scalar NetCDF import can handle an extra dimension like layer.

@verseve verseve marked this pull request as draft January 21, 2022 08:29
available for CSV. For integration with Delft-FEWS, see also [Run from Delft-FEWS](@ref),
it is recommended to write scalar data to NetCDF format since the General Adapter of
Delft-FEWS can ingest this data format directly.
`location` is required. Model parameters with the extra dimension `layer` for layered model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small thing, but I suggest to write:
"Model parameters and variables with the extra dimension ..."

Copy link
Member Author

@verseve verseve Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, this has been changed in commit 0a8f21b.

case a single entry can lead to multiple columns in the CSV file, which will be of the form
`header_id`, e.g. `Q_20`, for a gauge with integer ID 20. Model parameters with the extra
dimension `layer` for layered model parameters of the vertical `sbm` concept require the
specification of the layer (see also example below). If multiple layers are desired, this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we specify in the documentation here: "the specification of the (Julia) index of the layer"?
in staticmaps, variable c has layer coordinates of [0,1,2,3], and in order to get c_0, the config should be:

[[csv.column]]
coordinate.x = 6.255
coordinate.y = 50.012
header = "vwc_layer0_bycoord"
parameter = "vertical.vwc"
layer = 1

but I thought we wanted to link this directly to the name of the layer (in my understanding this would be 0, 1, 2, 3 in this case instead of 1, 2, 3, 4)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is indeed how the dimension layer is defined internally in Wflow as part of sbm. This is also how we write the layer dimension to the gridded NetCDF output ([1,2,3,4]). I agree, would be good to add in the text that this is internal defined.

Copy link
Member Author

@verseve verseve Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to internal layer index in commit 0a8f21b.

@laurenebouaziz
Copy link
Contributor

For the NetCDF scalar approach I will do first some testing with FEWS, to check if the scalar NetCDF import can handle an extra dimension like layer.

Ok I did some first tests with csv and current implementation for scalar netcdf and it works. I am not sure what was the idea for the indexing of the layer (see comment above) for this PR?

let me know when I can further test the netcdf scalar!

verseve and others added 6 commits January 31, 2022 08:55
at a specific location (coordinate or index) the output was not correct
This is the 'old way', but I find it more ergonomic.
For instance if I start `julia --project` in the test dir, then Wflow is not there, and `test` doesn't work.

If you want to activate the test environment, the best way is to use `TestEnv.activate()`.
Fix CSV and NetCDF scalar export for layered model parameters of `sbm`. The dimension name `layer` and the label of this dimension should be provided in the output CSV/NetCDF scalar section of the TOML file.
An extra dimension is not allowed when running Wflow as part of Delft-FEWS: specification of a layer index is optional, so the General Adapter of FEWS can import this file.
raw html image path, in the hosted builds the HTML files technically live one directory down, see also  JuliaDocs/Documenter.jl#921 (comment)
@verseve
Copy link
Member Author

verseve commented Jan 31, 2022

Not sure why tests do fail on Windows. Did report the following issue Alexander-Barth/NCDatasets.jl#158, seems to be related to the NetCDF_jll version.

@verseve verseve marked this pull request as ready for review February 1, 2022 11:05
@verseve verseve mentioned this pull request Feb 2, 2022
6 tasks
Copy link
Member

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Still had a question about the Delft-FEWS support, see above.

In cfd1077 I made sure that the layer coordinates are always Float64, not Float32 or Int. X and Y are also Float64 regardless of data precision, to avoid precision issues with large coordinates.

docs/src/config.md Show resolved Hide resolved
@verseve verseve merged commit cbd6244 into master Feb 2, 2022
@visr visr deleted the dim-layers-csv branch February 2, 2022 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

csv output at coordinate points shifts when there are layers
3 participants