fix CSV output for SVector #193

verseve · 2022-01-12T12:22:08Z

The CSV output was not correct for model parameters stored as SVector (dimension layer), for example:

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, 9.24, 9.44, 9.78, 9.88

This is fixed by specifying a required internal layer index for a layered parameter:

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1

If multiple layers are desired, this can be specified in separate [[csv.column]] entries.

Fixed the NetCDF scalar output for a SVector which gave the following error message:
"ERROR: DimensionMismatch("array could not be broadcast to match destination")

When Wflow is integrated with Delft-FEWS an extra dimension in not allowed by the importNetcdfActivity of the General Adapter of FEWS for scalar timeseries. As for CSV output an (optional) internal layer index can be provided for a layered parameter, so FEWS can import the NetCDF scalar file.

NetCDF 3D data can be imported by Delft-FEWS if the dimension has CF axis attribute Z, this has been added to the gridded NetCDF output of Wflow.

laurenebouaziz · 2022-01-13T17:00:53Z

I run the previous testcase where I had the error and it is indeed fixed with the proposed changes, thanks!

laurenebouaziz

I ran a test case and it works fine now :)

visr · 2022-01-18T16:59:09Z

@laurenebouaziz what did you try, and what kind of output do you wish?

I tested this locally by adding at the end of sbm_simple.toml:

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c"
parameter = "vertical.c"

But we'll need a new test to make sure we get this right.
If I'm not mistaken the difference in this PR is

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, [9.24, 9.44, 9.78, 9.88]

vs

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, 9.24, 9.44, 9.78, 9.88

These are both not good, right? The values in the second example are ok, but that would need a header like c_layer_1, c_layer_2, c_layer_3, c_layer_4, which is logic that would have to be added to this function:

Wflow.jl/src/io.jl

Lines 674 to 690 in b1f113e

    
           "Get a Vector{String} of all columns names for the CSV header, exept the first, time" 
        
           function csv_header(cols, dataset, config) 
        
               header = [col["header"] for col in cols] 
        
               header = String[] 
        
               for col in cols 
        
                   h = col["header"]::String 
        
                   if haskey(col, "map") 
        
                       mapname = col["map"] 
        
                       ids = locations_map(dataset, mapname, config) 
        
                       hvec = [string(h, '_', id) for id in ids] 
        
                       append!(header, hvec) 
        
                   else 
        
                       push!(header, h) 
        
                   end 
        
               end 
        
               return header 
        
           end

For gauge maps with IDs it already does that, by using $name_$id. We'd have to combine that logic with the new logic to also support writing layered parameters at gauges to a CSV. It then becomes a bit tricky since we are encoding multiple extra dimensions in CSV column names, at that point a multidimensional format like netCDF is just a better choice.

To avoid introducing too much complexity here that we will regret later, how about throwing an error, unless for a layered parameter, the user also specifies a layer?

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1

If multiple layers are desired, this can be specified in separate csv.column entries.

verseve · 2022-01-18T17:45:18Z

I think the following output is fine:

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, [9.24, 9.44, 9.78, 9.88]

Some post processing of the file in this case is of course required. Was that the approach you were following @laurenebouaziz ?

verseve · 2022-01-18T17:59:07Z

To avoid introducing too much complexity here that we will regret later, how about throwing an error, unless for a layered parameter, the user also specifies a layer?
[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1
If multiple layers are desired, this can be specified in separate csv.column entries.

Commit 8e2e330 is doing this, with a general index_dim for SVectors (extra dim layer or classes).

visr · 2022-01-18T18:40:13Z

Some post processing of the file in this case is of course required.

But what's the value of writing CSV if we need special tools to process them? I'd really prefer it if our CSVs are sufficiently boring that they can go straight into Excel/pandas/DataFrames or other standard tools, without needing to manually unpack columns.

laurenebouaziz · 2022-01-19T08:31:10Z

Indeed, the changes here lead to:

time,Q,recharge,c
2000-01-02T00:00:00, 0.7, -0.05, [9.24, 9.44, 9.78, 9.88]

I fully agree that specifying a layer is much better to directly be able to read the csv (as we discussed for flextopo #194):

[[csv.column]]
coordinate.x = 7.378
coordinate.y = 50.204
header = "c_layer_1"
parameter = "vertical.c"
layer = 1

so great that this is now added in #194 8e2e330

what I previously used to read the csv with [float, float, float] was not straightforward at all, it replaces the "[" and "]" by "'" and then reads the csv in with quotechar option (but this still requires to then parse the lists of values in each column):

filename = r"path\output.csv"

#Read in the file
with open(filename, 'r') as file :
  filedata = file.read()

#Replace the target string
filedata = filedata.replace("[","'").replace("]","'")

#Write the file out again
with open(filename, 'w') as file:
  file.write(filedata)

out = pd.read_csv(filename, index_col=0, parse_dates=True, quotechar = "'")

verseve · 2022-01-19T16:15:58Z

Thanks for the feedback @visr and @laurenebouaziz. How do we want to proceed with this PR, also related to #194?
Based on your feedback and the work in #194, we could add the following as suggested by @visr:

To avoid introducing too much complexity here that we will regret later, how about throwing an error, unless for a layered parameter, the user also specifies a layer?

 [[csv.column]]
 coordinate.x = 7.378
 coordinate.y = 50.204
 header = "c_layer_1"
 parameter = "vertical.c"
 layer = 1

If multiple layers are desired, this can be specified in separate csv.column entries.

I probably still have some time this week to work on this, and also implement it for classes dimension in #194.

visr · 2022-01-20T10:21:47Z

Yes that would be great. If you prefer to just incorporate this as part of #194 that would be fine with me as well.

verseve · 2022-01-20T12:38:19Z

I made the changes in this branch (for dimension layer). Could you please review @visr and @laurenebouaziz ?

visr · 2022-01-20T12:44:38Z

Ah I see that for netCDF scalar output the same approach as for CSV is used now. What happened with layered output to netCDF files before? Since netCDF is multidimensional, it would be nice if the different layers would by default be written to the file. Or is that a headache to support?

verseve · 2022-01-20T13:04:37Z

With NetCDF scalar this gave the following error:
"ERROR: DimensionMismatch("array could not be broadcast to match destination")

But yes, I agree it would be nice to support writing different layers at once for NetCDF. Not sure how easy it is (should also be FEWS compliant). Just checked the FEWS format and I think FEWS is not able to handle more dimensions than (time, location).

verseve · 2022-01-21T08:19:11Z

For the NetCDF scalar approach I will do first some testing with FEWS, to check if the scalar NetCDF import can handle an extra dimension like layer.

laurenebouaziz · 2022-01-21T08:36:01Z

docs/src/config.md

-available for CSV. For integration with Delft-FEWS, see also [Run from Delft-FEWS](@ref),
-it is recommended to write scalar data to NetCDF format since the General Adapter of
-Delft-FEWS can ingest this data format directly.
+`location` is required. Model parameters with the extra dimension `layer` for layered model


small thing, but I suggest to write:
"Model parameters and variables with the extra dimension ..."

Thanks for the suggestion, this has been changed in commit 0a8f21b.

laurenebouaziz · 2022-01-21T09:31:56Z

docs/src/config.md

+case a single entry can lead to multiple columns in the CSV file, which will be of the form
+`header_id`, e.g. `Q_20`, for a gauge with integer ID 20. Model parameters with the extra
+dimension `layer` for layered model parameters of the vertical `sbm` concept require the
+specification of the layer (see also example below). If multiple layers are desired, this


should we specify in the documentation here: "the specification of the (Julia) index of the layer"?
in staticmaps, variable c has layer coordinates of [0,1,2,3], and in order to get c_0, the config should be:

[[csv.column]] coordinate.x = 6.255 coordinate.y = 50.012 header = "vwc_layer0_bycoord" parameter = "vertical.vwc" layer = 1

but I thought we wanted to link this directly to the name of the layer (in my understanding this would be 0, 1, 2, 3 in this case instead of 1, 2, 3, 4)?

Yes, this is indeed how the dimension layer is defined internally in Wflow as part of sbm. This is also how we write the layer dimension to the gridded NetCDF output ([1,2,3,4]). I agree, would be good to add in the text that this is internal defined.

Changed this to internal layer index in commit 0a8f21b.

laurenebouaziz · 2022-01-21T09:35:33Z

For the NetCDF scalar approach I will do first some testing with FEWS, to check if the scalar NetCDF import can handle an extra dimension like layer.

Ok I did some first tests with csv and current implementation for scalar netcdf and it works. I am not sure what was the idea for the indexing of the layer (see comment above) for this PR?

let me know when I can further test the netcdf scalar!

at a specific location (coordinate or index) the output was not correct

This is the 'old way', but I find it more ergonomic. For instance if I start `julia --project` in the test dir, then Wflow is not there, and `test` doesn't work. If you want to activate the test environment, the best way is to use `TestEnv.activate()`.

Fix CSV and NetCDF scalar export for layered model parameters of `sbm`. The dimension name `layer` and the label of this dimension should be provided in the output CSV/NetCDF scalar section of the TOML file.

An extra dimension is not allowed when running Wflow as part of Delft-FEWS: specification of a layer index is optional, so the General Adapter of FEWS can import this file.

raw html image path, in the hosted builds the HTML files technically live one directory down, see also JuliaDocs/Documenter.jl#921 (comment)

verseve · 2022-01-31T12:35:34Z

Not sure why tests do fail on Windows. Did report the following issue Alexander-Barth/NCDatasets.jl#158, seems to be related to the NetCDF_jll version.

visr

Looks good to me. Still had a question about the Delft-FEWS support, see above.

In cfd1077 I made sure that the layer coordinates are always Float64, not Float32 or Int. X and Y are also Float64 regardless of data precision, to avoid precision issues with large coordinates.

docs/src/config.md

verseve requested review from visr and laurenebouaziz January 12, 2022 12:22

verseve self-assigned this Jan 12, 2022

verseve linked an issue Jan 12, 2022 that may be closed by this pull request

csv output at coordinate points shifts when there are layers #185

Closed

verseve force-pushed the dim-layers-csv branch from 080b0b3 to d288b38 Compare January 12, 2022 20:34

laurenebouaziz approved these changes Jan 13, 2022

View reviewed changes

visr force-pushed the dim-layers-csv branch from d288b38 to 2ab9abd Compare January 18, 2022 17:01

verseve requested a review from laurenebouaziz January 20, 2022 12:37

verseve marked this pull request as draft January 21, 2022 08:29

laurenebouaziz reviewed Jan 21, 2022

View reviewed changes

verseve and others added 6 commits January 31, 2022 08:55

fix CSV output for SVector

2ad903a

at a specific location (coordinate or index) the output was not correct

revert wrapping all CSV output in a vector

6cbba0f

CSV and NetCDF scalar export

59e171a

Fix CSV and NetCDF scalar export for layered model parameters of `sbm`. The dimension name `layer` and the label of this dimension should be provided in the output CSV/NetCDF scalar section of the TOML file.

update docs

4ede816

NetCDF scalar output

0a8f21b

An extra dimension is not allowed when running Wflow as part of Delft-FEWS: specification of a layer index is optional, so the General Adapter of FEWS can import this file.

verseve force-pushed the dim-layers-csv branch from 0898581 to 0a8f21b Compare January 31, 2022 07:56

fix display image

0d52ba9

raw html image path, in the hosted builds the HTML files technically live one directory down, see also JuliaDocs/Documenter.jl#921 (comment)

verseve marked this pull request as ready for review February 1, 2022 11:05

verseve mentioned this pull request Feb 2, 2022

Flextopo #194

Merged

6 tasks

visr added 2 commits February 2, 2022 12:43

ensure extra_dim axis is Float64 like x and y

cfd1077

run JuliaFormatter

97f40ea

visr approved these changes Feb 2, 2022

View reviewed changes

docs/src/config.md Show resolved Hide resolved

verseve merged commit cbd6244 into master Feb 2, 2022

visr deleted the dim-layers-csv branch February 2, 2022 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix CSV output for SVector #193

fix CSV output for SVector #193

verseve commented Jan 12, 2022 •

edited

Loading

laurenebouaziz commented Jan 13, 2022

laurenebouaziz left a comment

visr commented Jan 18, 2022

verseve commented Jan 18, 2022 •

edited

Loading

verseve commented Jan 18, 2022

visr commented Jan 18, 2022

laurenebouaziz commented Jan 19, 2022

verseve commented Jan 19, 2022 •

edited

Loading

visr commented Jan 20, 2022

verseve commented Jan 20, 2022

visr commented Jan 20, 2022

verseve commented Jan 20, 2022 •

edited

Loading

verseve commented Jan 21, 2022

laurenebouaziz Jan 21, 2022

verseve Jan 31, 2022 •

edited

Loading

laurenebouaziz Jan 21, 2022

verseve Jan 21, 2022

verseve Jan 31, 2022 •

edited

Loading

laurenebouaziz commented Jan 21, 2022

verseve commented Jan 31, 2022

visr left a comment

fix CSV output for SVector #193

fix CSV output for SVector #193

Conversation

verseve commented Jan 12, 2022 • edited Loading

laurenebouaziz commented Jan 13, 2022

laurenebouaziz left a comment

Choose a reason for hiding this comment

visr commented Jan 18, 2022

verseve commented Jan 18, 2022 • edited Loading

verseve commented Jan 18, 2022

visr commented Jan 18, 2022

laurenebouaziz commented Jan 19, 2022

verseve commented Jan 19, 2022 • edited Loading

visr commented Jan 20, 2022

verseve commented Jan 20, 2022

visr commented Jan 20, 2022

verseve commented Jan 20, 2022 • edited Loading

verseve commented Jan 21, 2022

laurenebouaziz Jan 21, 2022

Choose a reason for hiding this comment

verseve Jan 31, 2022 • edited Loading

Choose a reason for hiding this comment

laurenebouaziz Jan 21, 2022

Choose a reason for hiding this comment

verseve Jan 21, 2022

Choose a reason for hiding this comment

verseve Jan 31, 2022 • edited Loading

Choose a reason for hiding this comment

laurenebouaziz commented Jan 21, 2022

verseve commented Jan 31, 2022

visr left a comment

Choose a reason for hiding this comment

verseve commented Jan 12, 2022 •

edited

Loading

verseve commented Jan 18, 2022 •

edited

Loading

verseve commented Jan 19, 2022 •

edited

Loading

verseve commented Jan 20, 2022 •

edited

Loading

verseve Jan 31, 2022 •

edited

Loading

verseve Jan 31, 2022 •

edited

Loading