Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry API for D-Flow FM_his.nc output files #268

Open
arthurvd opened this issue Jul 6, 2022 · 5 comments
Open

Inquiry API for D-Flow FM_his.nc output files #268

arthurvd opened this issue Jul 6, 2022 · 5 comments
Assignees
Labels
type: enhancement Improvements to existing functionality
Milestone

Comments

@arthurvd
Copy link
Member

arthurvd commented Jul 6, 2022

Describe the solution you'd like
Recommendation: use @d2hydro's his reader as a starting point:
https://github.com/Deltares/HYDROLIB/blob/case_management_tools/hydrolib/case_management_tools/src/cmt/utils/read_his.py

  • A class for D-Flow FM his.nc files.
  • Construct directly based on path to his.nc file
  • NOT all data is fully read upon construction (in particular not the time-series themselves), only the file contents is investigated/indexed.
  • Idea/discussion, let's introduce the term object_type to distinguish the various types of objects in the file: all structure stypes/observation point/observation cross section/lateral/sourcesink type. In the above linked d2hydro code, it is still called "layer", but I'd like to reserve that for 3D vertical layers in the future.
    • An Enum will define all supported object types: global, observationPoint, observationCrossSection, weir, universalWeir, culvert, longCulvert, bridge, pump, orifice, gate, generalStructure, damBreak, compound, lateral, sourceSink.
  • Inquiry functions in the API that allow users to get information about what object types/counts are in the file, without yet looking at any timeseries data variables:
    • get_object_count(object_type), for object_type in the above enum.
    • get_object_ids(object_type), returning the character string ids for all objects/structures (coming from the variable for the right object type AND with attribute :cf_role="timeseries_id").

On hold: the following requested functionalities are on hold until we've experimented enough with xarray and decided whether we still need this or not. This concerns the actual timeseries data variables.

Click for original text * Variable getter functions **not** for the data itself, but the **name** of data variables, for example: * `get_datavar_name(object_type, standard_name = None, basename = None)`, returning a single character string variable name that the caller can use to actually get data from that var using another package/direct netcdf access. Identification of the variable (name) is based on these checks: * if standard_name is present, it should literally match the variable's `candidatevar:standard_name` attribute. * otherwise, if basename is present, the variable name should end with that basename. * also, the variable must have at least one netcdf dimension that is exactly the so-called *instance dimension* for the requested structure type. For example, our files contain: ``` netcdf mk1d2d_his { dimensions: time = UNLIMITED ; // (1 currently) // [..] gategens = 27 ; gategen_geom_nNodes = 54 ; variables: // [..] double gategen_crest_width(time, gategens) ; gategen_crest_width:long_name = "gate crest width (via general structure)" ; gategen_crest_width:units = "m" ; gategen_crest_width:coordinates = "gategen_name" ; gategen_crest_width:geometry = "gategen_geom" ; ``` Here, `gategensg` is the instance dimension for the "gate" structure type.

Describe alternatives you've considered
Packages such as xarray are still in scope for the data reading, but this issue/feature request is mainly about the implementation of CF-convention-based indexing/inquiry of the D-Flow FM his file contents.

Some further refinement of this issue may be needed, please contact me (@arthurvd) for discussion.

@arthurvd arthurvd added the type: enhancement Improvements to existing functionality label Jul 6, 2022
@arthurvd arthurvd added this to To do in HYDROLIB-core via automation Jul 6, 2022
@arthurvd arthurvd self-assigned this Jul 15, 2022
@priscavdsluis priscavdsluis moved this from To do to In progress in HYDROLIB-core Jul 18, 2022
@priscavdsluis priscavdsluis moved this from In progress to To do in HYDROLIB-core Jul 18, 2022
@priscavdsluis
Copy link
Contributor

priscavdsluis commented Jul 18, 2022

@arthurvd, thanks for setting some ideas!
I do have some questions/remarks:

  1. I think that instance_type is a little too generic, maybe we can use structure_type or location_type or is that not an accurate name?

  2. The history files that I am inspecting currently do not have a standard name for most variables. Could you give an example of the below situation. Why not query the full variable name?

otherwise, if basename is present, the variable name should end with that basename.

  1. I also notice that some variables have the same standard name. For example, both weirgen_s1up and weirgen_s1dn as well as orifice_s1up and orifice_s1dn have the same standard name: sea_surface_height. Is that correct, and how do we deal with that?

  2. Bonus question: do we have some documentation on the standard names of the variables?

@arthurvd arthurvd added this to the Release 0.4 milestone Sep 20, 2022
@priscavdsluis priscavdsluis moved this from To do to Reviewer follow up in HYDROLIB-core Sep 27, 2022
@arthurvd arthurvd moved this from Reviewer follow up to To do in HYDROLIB-core Sep 29, 2022
@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Oct 6, 2022

As a user, my main issue with history files is currently that .sel() is not available for stations/generalstructures. This is easy to work around with via this method, and maybe this is already good enough. However, it would be convenient to query via station name and e.g. also via a depth of -5m (arbitrary value). The code I am using currently is this:

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
plt.close('all')

statname_str = 'station_name_str'

def convert_stat_names(ds):
    #convert station names to strings, potentially as new variable
    ds[statname_str] = ds['station_name'].str.decode('utf-8',errors='ignore').str.strip() #TODO: this is currently necessary for .isin() on station name. Maybe inplace on station_name instead (overwrite variable)
    if statname_str!='station_name':
        ds = ds.set_coords(statname_str)
    return ds

file_nc = r'p:\11202512-h2020_impaqt\07_Mediterranean_model\MedSea_impaqt_model\computations_final\r013_waq\DFM_OUTPUT_MedSea_impaqt_FM\MedSea_impaqt_FM_0000_his.nc'
data_xr = xr.open_mfdataset(file_nc,preprocess=convert_stat_names)
data_xr_stationlist_pd = data_xr[statname_str].to_series()

stations_list = ['MO_TS_MO_ATHOS','MO_TS_MO_LESVO','MO_TS_MO_SKYRO','IOC_thes','farm_impaqt','obs 14']
idx_stations = np.where(data_xr_stationlist_pd.isin(stations_list))[0]

print('plot waterlevel from his')
data_fromhis_xr = data_xr['waterlevel'].isel(stations=idx_stations) #TODO: would be nice if possible to use sel instead
fig, ax = plt.subplots()
data_fromhis_xr.plot.line(ax=ax,x='time')
ax.legend(data_fromhis_xr[statname_str].to_series()) #TODO: would be nice to not have to do this manually, currently default legend is 0 to 5

print('plot salinity from his')
data_fromhis_xr = data_xr['salinity'].isel(stations=idx_stations,laydim=20)
fig, ax = plt.subplots()
data_fromhis_xr.plot.line('-',ax=ax,x='time')
ax.legend(data_fromhis_xr[statname_str].to_series()) 

print('plot salinity over depth')
data_fromhis_xr = data_xr['salinity'].isel(stations=idx_stations).sel(time='2015-01-04')
fig, ax = plt.subplots()
data_fromhis_xr.T.plot.line('-',ax=ax,y='zcoordinate_c')
ax.legend(data_fromhis_xr[statname_str].to_series())

Edit: this code is currently outdated. Label based indexing works now, see comment below.

@veenstrajelmer
Copy link
Collaborator

Today I created a preprocess function for hisfiles called preprocess_hisnc(): https://github.com/openearth/dfm_tools/blob/master/dfm_tools/xarray_helpers.py#L12

It can be used as an xarray argument like this:
data_xr = xr.open_mfdataset(file_nc, preprocess=preprocess_hisnc)

Some extension might be needed, but I think this was the most important feature needed to nicely interact with hisfiles. I created an example script for reading/selecting/plotting hisfiles here:
https://github.com/openearth/dfm_tools/blob/master/tests/examples/postprocess_gethismodeldata.py

@arthurvd
Copy link
Member Author

arthurvd commented Nov 2, 2022

In response to @priscavdsluis:

  1. I think that instance_type is a little too generic, maybe we can use structure_type or location_type or is that not an accurate name?

I've updated the issue description to use object_type.

The following questions are put on hold until further notice, but I'll answer with my original idea in mind:

  1. The history files that I am inspecting currently do not have a standard name for most variables. Could you give an example of the below situation. Why not query the full variable name?

otherwise, if basename is present, the variable name should end with that basename.

The idea was to avoid to much hardcoding. Instead of querying 'weirgen_s1up' and 'uniweir_s1up', you would query for (ObjectType.weir, basename = 's1up') and (ObjectType.universalWeir, basename = 's1up').

  1. I also notice that some variables have the same standard name. For example, both weirgen_s1up and weirgen_s1dn as well as orifice_s1up and orifice_s1dn have the same standard name: sea_surface_height. Is that correct, and how do we deal with that?

You are right, the only way via our API is then to use a basename. Or do full hardcoding in an xarray dataset instead:

data_xr = xr.open_dataset(ncfile, ...)
data_xr.waterlevel.sel(stations=stations_requested)
data_xr.orifice_s1up.sel(orifice=orifices_requested)
  1. Bonus question: do we have some documentation on the standard names of the variables?

No, the user manual only lists the MDU settings and the long name.

@arthurvd arthurvd moved this from To do to In progress in HYDROLIB-core Nov 2, 2022
@priscavdsluis priscavdsluis moved this from In progress to To do in HYDROLIB-core Dec 15, 2022
@arthurvd
Copy link
Member Author

@rbruijnshkv: please check out the above example code by veenstrajelmer (Oct 6, 2022) #268 (comment) for example code that reads timeseries from his files.

@arthurvd arthurvd modified the milestones: Release 0.5, Release 1.0 Apr 4, 2023
@rhutten rhutten removed this from To do in HYDROLIB-core Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Improvements to existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants