# Reading hydro hdf5 files from DSM2

This notebook is an example of using pydsm to read DSM2 h5 output.

The timeseries are loaded as pandas DataFrame with datetime index and mcolumns of variable type (e.g. flow, stage, ec). This is similar to pyhecdss read in objects.

In addition to the state of the model as time series, the HDF file also contains the input tables as intepreted by DSM2. I say interpreted because it also has important tables such as virtual cross-sections that is the geometry finally used by DSM2 even though the user specifies the physical geometry in the input files.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import h5py
import pydsm.io
#%conda install matplotlib
# Turn on ones below if in debug or development mode
#%load_ext autoreload
#%autoreload 2

## Opening a H5 file
This provides the handle to the HDF5 file. 

In [None]:
filename='../../tests/historical_v82.h5'
h5f=h5py.File(filename,'r')

## Reading H5 file with 

A HDF5 file consists of Groups and Datasets. 
Groups are like dicts with keys and values and Datasets are like arrays with some slicing abilities.

A HDF5 file has a concept of path, similar to the file path.
For example, the top level of the hydro h5 (HDF) file has 'hydro' as the top most Group

In [None]:
print('Topmost group of hydro h5 file: \n',list(h5f.keys()))

In [None]:
print('Children of hydro group:\n',list(h5f.get('hydro').keys()))
print('DSM2 Input Tables are children of input:\n',list(h5f.get('hydro').get('input').keys()))
print('DSM2 geometry are children of geometry: \n', list(h5f.get('hydro').get('geometry').keys()))
print('DSM2 time series output of state are children of data: \n', list(h5f.get('hydro').get('data').keys()))

In [None]:
bf=h5f.get('hydro').get('input').get('boundary_flow')
pd.DataFrame(bf[:])

In [None]:
input_tables=pydsm.io.list_groups_as_df(filename, '/hydro/input')
for table in input_tables[0]:
    path='/hydro/input/'+str(table)
    print(path)
    display(pydsm.io.read_table_as_df(filename, path))

In [None]:
print(h5f.get('/input'))

# Hydro data file structure
DSM2 Hydro HDF5 stores data under three groups:
 * /hydro/data
 * /hydro/input
 * /hydro/geometry
 
The next cell prints out the tables available under each

In [None]:
group_paths=['/hydro/input','/hydro/data','/hydro/geometry']
for path in group_paths:
    print(path)
    for key in h5f.get(path).keys():
        print('    ',key)

## Channel indices to numbers
The data in DataSets under /hydro/data is typically indexed by time, channel index, upstream/downstream if needed
The channel index can be mapped to the channel number by looking up that information from /hydro/geometry/channel_number

In [None]:
channel_numbers=pd.DataFrame(h5f.get('/hydro/geometry/channel_number')[:])
print(channel_numbers)
channel_index2number=channel_numbers[0].to_dict()
index=157
print('This channel number for index:',index, ' should be 169. It is ',channel_index2number[index])
channel_number2index= {value: key for key, value in channel_index2number.items()}
print('This channel index for number:', 169, ' should be ',index,'. It is ',channel_number2index[169])


In [None]:
channel_location=pd.DataFrame(h5f.get('/hydro/geometry/channel_location')[:],dtype=np.str)
display(channel_location)


## Extracting time series data
Extracting data can then be done using the channel numbers. All data arrays have the first axis as time. The time start and time interval is available in the attrs along with other meta data.

Flow data shape is *time* x *channel index* x *channel location*

time start is available in attribue "START_TIME"
channel index to channel numbers is explained above
channel location (upstream/downstream) is available in /hydro/geometry/channel_location

In [None]:
channel_location[0].str.upper()

In [None]:
flowdata = h5f.get('/hydro/data/channel flow')
print(flowdata.shape)
#
interval_string=flowdata.attrs['interval'][0].decode('UTF-8')
model=flowdata.attrs['model'][0].decode('UTF-8')
model_version=flowdata.attrs['model_version'][0].decode('UTF-8')
start_time=pd.to_datetime(flowdata.attrs['start_time'][0].decode('UTF-8'))
print('Start time: ',start_time)
print('time interval: ',interval_string)
print('Model: ',model)
print('Model Version: ',model_version)
#
print('Slicing along time for channel number: 441')
channel_id=441
location='UPSTREAM'
channel_index= channel_number2index[channel_id]# channel_numbers[channel_numbers[0]==441] #-- slow way
location_index=channel_location[channel_location[0].str.upper()==location]
darr=flowdata[:,channel_index,location_index]
ts441=pd.DataFrame(darr,
                   columns=[str(channel_id)+'-'+location],
                   index=pd.date_range(start_time,freq='30T',periods=darr.shape[0]),dtype=np.float32)

In [None]:
ts441['01jan1990':'10jan1990'].plot()

In [None]:
for key in flowdata.attrs.keys(): print (key, flowdata.attrs[key])

In [None]:
import pydsm.io

In [None]:
#pd.DataFrame()
pd.DataFrame(h5f['/hydro/input/boundary_flow'][:],dtype=np.str)

In [None]:
import pydsm.io
x=pydsm.io.read_table_as_df(filename,'/hydro/input/boundary_flow')
display(x)


In [None]:
cb=pydsm.io.read_table_as_df(filename,'/hydro/geometry/channel_bottom')
display(cb)
print('Channel Bottom for Channel Number: ',441)
print(cb[channel_number2index[441]])

In [None]:
f5=h5py.File(filename,'r')
catable=f5['/hydro/data/channel area']

In [None]:
pydsm.io.read_table_attr(filename,'/hydro/data/channel area' )

In [None]:
table_metadata=pydsm.io.read_table_attr(filename,'/hydro/data/channel area')
display(table_metadata)
pd.to_timedelta(str(table_metadata['interval'].astype(str)[0]))
pd.to_datetime(table_metadata['start_time'].astype(str)[0])


In [None]:
class TableMetaData:
    pass
tmd=TableMetaData()
tmd.table_name='/hydro/data/channel area'
tmd.interval=pd.to_timedelta('30min')
tmd.start_time=pd.to_datetime('1990-01-02 00:00:00')
tmd.dimension_labels=table_metadata['DIMENSION_LABELS'].astype('str')
from ast import literal_eval as make_tuple
tmd.shape=make_tuple(table_metadata['shape'])
print(tmd.shape)


In [None]:
s=pydsm.io._convert_time_to_table_slice("01jan1980","01jan1991",tmd.interval,tmd.start_time,tmd.shape[0])
print(s)
x=catable[s,[501,502],0]
#pydsm.io.read_table_as_df(filename,"/hydro/data/channel area",s)

In [None]:
bf=pd.DataFrame(data=np.array(x), index=pd.DatetimeIndex(data=pd.date_range(start=tmd.start_time+tmd.interval,freq=tmd.interval,periods=s.stop)))

In [None]:
catable

In [None]:
bf.plot()

In [None]:
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/external_flow_names'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/hydro_comp_point'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/node_flow_connections'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/qext'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/reservoir_flow_connections'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/reservoir_names'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/reservoir_node_connect'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/stage_boundaries'))
display(pydsm.io.read_table_as_df(filename,'/hydro/geometry/transfer_names'))


In [None]:
pd.DataFrame(h5f.get('/hydro/geometry/channel_location'),dtype=np.str)

In [None]:
%load_ext autoreload
%autoreload 2


In [None]:
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [15, 5]

In [None]:
import pydsm.hydroh5

In [None]:
hydroh5=pydsm.hydroh5.HydroH5(filename)

In [None]:
flow4up=hydroh5.get_channel_flow('4','upstream')
flow8down=hydroh5.get_channel_flow('8','downstream')
ax1=flow4up.plot()
flow8down.plot(ax=ax1)

In [None]:
area4up=hydroh5.get_channel_area('4','upstream')
area8down=hydroh5.get_channel_area('8','downstream')
ax1=area4up.plot()
area8down.plot(ax=ax1)

In [None]:
vel4up=(flow4up/area4up)
vel8down=(flow8down/area8down)
ax1=vel4up.plot()
vel8down.plot(ax=ax1)

In [None]:
vel4up.index.freqstr