Inquiry API for D-Flow FM_his.nc output files #268

arthurvd · 2022-07-06T09:05:27Z

Describe the solution you'd like
Recommendation: use @d2hydro's his reader as a starting point:
https://github.com/Deltares/HYDROLIB/blob/case_management_tools/hydrolib/case_management_tools/src/cmt/utils/read_his.py

A class for D-Flow FM his.nc files.
Construct directly based on path to his.nc file
NOT all data is fully read upon construction (in particular not the time-series themselves), only the file contents is investigated/indexed.
Idea/discussion, let's introduce the term object_type to distinguish the various types of objects in the file: all structure stypes/observation point/observation cross section/lateral/sourcesink type. In the above linked d2hydro code, it is still called "layer", but I'd like to reserve that for 3D vertical layers in the future.
- An Enum will define all supported object types: global, observationPoint, observationCrossSection, weir, universalWeir, culvert, longCulvert, bridge, pump, orifice, gate, generalStructure, damBreak, compound, lateral, sourceSink.
Inquiry functions in the API that allow users to get information about what object types/counts are in the file, without yet looking at any timeseries data variables:
- get_object_count(object_type), for object_type in the above enum.
- get_object_ids(object_type), returning the character string ids for all objects/structures (coming from the variable for the right object type AND with attribute :cf_role="timeseries_id").

On hold: the following requested functionalities are on hold until we've experimented enough with xarray and decided whether we still need this or not. This concerns the actual timeseries data variables.

Click for original text

* Variable getter functions **not** for the data itself, but the **name** of data variables, for example: * `get_datavar_name(object_type, standard_name = None, basename = None)`, returning a single character string variable name that the caller can use to actually get data from that var using another package/direct netcdf access. Identification of the variable (name) is based on these checks: * if standard_name is present, it should literally match the variable's `candidatevar:standard_name` attribute. * otherwise, if basename is present, the variable name should end with that basename. * also, the variable must have at least one netcdf dimension that is exactly the so-called *instance dimension* for the requested structure type. For example, our files contain: ``` netcdf mk1d2d_his { dimensions: time = UNLIMITED ; // (1 currently) // [..] gategens = 27 ; gategen_geom_nNodes = 54 ; variables: // [..] double gategen_crest_width(time, gategens) ; gategen_crest_width:long_name = "gate crest width (via general structure)" ; gategen_crest_width:units = "m" ; gategen_crest_width:coordinates = "gategen_name" ; gategen_crest_width:geometry = "gategen_geom" ; ``` Here, `gategensg` is the instance dimension for the "gate" structure type.

Describe alternatives you've considered
Packages such as xarray are still in scope for the data reading, but this issue/feature request is mainly about the implementation of CF-convention-based indexing/inquiry of the D-Flow FM his file contents.

Some further refinement of this issue may be needed, please contact me (@arthurvd) for discussion.

The text was updated successfully, but these errors were encountered:

priscavdsluis · 2022-07-18T13:16:50Z

@arthurvd, thanks for setting some ideas!
I do have some questions/remarks:

I think that instance_type is a little too generic, maybe we can use structure_type or location_type or is that not an accurate name?
The history files that I am inspecting currently do not have a standard name for most variables. Could you give an example of the below situation. Why not query the full variable name?

otherwise, if basename is present, the variable name should end with that basename.

I also notice that some variables have the same standard name. For example, both weirgen_s1up and weirgen_s1dn as well as orifice_s1up and orifice_s1dn have the same standard name: sea_surface_height. Is that correct, and how do we deal with that?
Bonus question: do we have some documentation on the standard names of the variables?

veenstrajelmer · 2022-10-06T10:12:07Z

As a user, my main issue with history files is currently that .sel() is not available for stations/generalstructures. This is easy to work around with via this method, and maybe this is already good enough. However, it would be convenient to query via station name and e.g. also via a depth of -5m (arbitrary value). The code I am using currently is this:

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
plt.close('all')

statname_str = 'station_name_str'

def convert_stat_names(ds):
    #convert station names to strings, potentially as new variable
    ds[statname_str] = ds['station_name'].str.decode('utf-8',errors='ignore').str.strip() #TODO: this is currently necessary for .isin() on station name. Maybe inplace on station_name instead (overwrite variable)
    if statname_str!='station_name':
        ds = ds.set_coords(statname_str)
    return ds

file_nc = r'p:\11202512-h2020_impaqt\07_Mediterranean_model\MedSea_impaqt_model\computations_final\r013_waq\DFM_OUTPUT_MedSea_impaqt_FM\MedSea_impaqt_FM_0000_his.nc'
data_xr = xr.open_mfdataset(file_nc,preprocess=convert_stat_names)
data_xr_stationlist_pd = data_xr[statname_str].to_series()

stations_list = ['MO_TS_MO_ATHOS','MO_TS_MO_LESVO','MO_TS_MO_SKYRO','IOC_thes','farm_impaqt','obs 14']
idx_stations = np.where(data_xr_stationlist_pd.isin(stations_list))[0]

print('plot waterlevel from his')
data_fromhis_xr = data_xr['waterlevel'].isel(stations=idx_stations) #TODO: would be nice if possible to use sel instead
fig, ax = plt.subplots()
data_fromhis_xr.plot.line(ax=ax,x='time')
ax.legend(data_fromhis_xr[statname_str].to_series()) #TODO: would be nice to not have to do this manually, currently default legend is 0 to 5

print('plot salinity from his')
data_fromhis_xr = data_xr['salinity'].isel(stations=idx_stations,laydim=20)
fig, ax = plt.subplots()
data_fromhis_xr.plot.line('-',ax=ax,x='time')
ax.legend(data_fromhis_xr[statname_str].to_series()) 

print('plot salinity over depth')
data_fromhis_xr = data_xr['salinity'].isel(stations=idx_stations).sel(time='2015-01-04')
fig, ax = plt.subplots()
data_fromhis_xr.T.plot.line('-',ax=ax,y='zcoordinate_c')
ax.legend(data_fromhis_xr[statname_str].to_series())

Edit: this code is currently outdated. Label based indexing works now, see comment below.

veenstrajelmer · 2022-10-28T13:27:07Z

Today I created a preprocess function for hisfiles called preprocess_hisnc(): https://github.com/openearth/dfm_tools/blob/master/dfm_tools/xarray_helpers.py#L12

It can be used as an xarray argument like this:
data_xr = xr.open_mfdataset(file_nc, preprocess=preprocess_hisnc)

Some extension might be needed, but I think this was the most important feature needed to nicely interact with hisfiles. I created an example script for reading/selecting/plotting hisfiles here:
https://github.com/openearth/dfm_tools/blob/master/tests/examples/postprocess_gethismodeldata.py

arthurvd · 2022-11-02T09:32:27Z

In response to @priscavdsluis:

I think that instance_type is a little too generic, maybe we can use structure_type or location_type or is that not an accurate name?

I've updated the issue description to use object_type.

The following questions are put on hold until further notice, but I'll answer with my original idea in mind:

The history files that I am inspecting currently do not have a standard name for most variables. Could you give an example of the below situation. Why not query the full variable name?

otherwise, if basename is present, the variable name should end with that basename.

The idea was to avoid to much hardcoding. Instead of querying 'weirgen_s1up' and 'uniweir_s1up', you would query for (ObjectType.weir, basename = 's1up') and (ObjectType.universalWeir, basename = 's1up').

I also notice that some variables have the same standard name. For example, both weirgen_s1up and weirgen_s1dn as well as orifice_s1up and orifice_s1dn have the same standard name: sea_surface_height. Is that correct, and how do we deal with that?

You are right, the only way via our API is then to use a basename. Or do full hardcoding in an xarray dataset instead:

data_xr = xr.open_dataset(ncfile, ...)
data_xr.waterlevel.sel(stations=stations_requested)
data_xr.orifice_s1up.sel(orifice=orifices_requested)

Bonus question: do we have some documentation on the standard names of the variables?

No, the user manual only lists the MDU settings and the long name.

arthurvd · 2023-03-28T13:15:38Z

@rbruijnshkv: please check out the above example code by veenstrajelmer (Oct 6, 2022) #268 (comment) for example code that reads timeseries from his files.

arthurvd added the type: enhancement Improvements to existing functionality label Jul 6, 2022

arthurvd added this to To do in HYDROLIB-core via automation Jul 6, 2022

arthurvd self-assigned this Jul 15, 2022

priscavdsluis moved this from To do to In progress in HYDROLIB-core Jul 18, 2022

priscavdsluis moved this from In progress to To do in HYDROLIB-core Jul 18, 2022

arthurvd added this to the Release 0.4 milestone Sep 20, 2022

priscavdsluis moved this from To do to Reviewer follow up in HYDROLIB-core Sep 27, 2022

arthurvd moved this from Reviewer follow up to To do in HYDROLIB-core Sep 29, 2022

arthurvd moved this from To do to In progress in HYDROLIB-core Nov 2, 2022

veenstrajelmer mentioned this issue Dec 14, 2022

Fixes related to hydrolib-core issues Deltares/dfm_tools#214

Open

73 tasks

tim-vd-aardweg modified the milestones: Release 0.4, Release 0.5 Dec 15, 2022

priscavdsluis moved this from In progress to To do in HYDROLIB-core Dec 15, 2022

arthurvd modified the milestones: Release 0.5, Release 1.0 Apr 4, 2023

rhutten removed this from To do in HYDROLIB-core Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry API for D-Flow FM_his.nc output files #268

Inquiry API for D-Flow FM_his.nc output files #268

arthurvd commented Jul 6, 2022 •

edited

Loading

priscavdsluis commented Jul 18, 2022 •

edited

Loading

veenstrajelmer commented Oct 6, 2022 •

edited

Loading

veenstrajelmer commented Oct 28, 2022

arthurvd commented Nov 2, 2022

arthurvd commented Mar 28, 2023

Inquiry API for D-Flow FM_his.nc output files #268

Inquiry API for D-Flow FM_his.nc output files #268

Comments

arthurvd commented Jul 6, 2022 • edited Loading

priscavdsluis commented Jul 18, 2022 • edited Loading

veenstrajelmer commented Oct 6, 2022 • edited Loading

veenstrajelmer commented Oct 28, 2022

arthurvd commented Nov 2, 2022

arthurvd commented Mar 28, 2023

arthurvd commented Jul 6, 2022 •

edited

Loading

priscavdsluis commented Jul 18, 2022 •

edited

Loading

veenstrajelmer commented Oct 6, 2022 •

edited

Loading