# Reading hydro hdf5 files from DSM2

This notebook is an example of using pydsm to read DSM2 h5 output.

The timeseries are loaded as pandas DataFrame with datetime index and mcolumns of variable type (e.g. flow, stage, ec). This is similar to pyhecdss read in objects.

In addition to the state of the model as time series, the HDF file also contains the input tables as intepreted by DSM2. I say interpreted because it also has important tables such as virtual cross-sections that is the geometry finally used by DSM2 even though the user specifies the physical geometry in the input files.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import h5py
# main import 
from pydsm.hydroh5 import HydroH5
# Turn on ones below if in debug or development mode
#%load_ext autoreload
#%autoreload 2

## Opening a H5 file
This provides the handle to the HDF5 file. 

In [2]:
filename='../../tests/data/historical_v82.h5'
hydro=HydroH5(filename)

# Hydro data file structure
DSM2 Hydro HDF5 stores data under three groups:
 * /hydro/data
 * /hydro/input
 * /hydro/geometry
 


## Display channels

The method get_channels() returns a data frame indexed by internal channel index. The first column contains the external channel id that is referenced in the dsm2 input files

In [3]:
hydro.get_channels()

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5
...,...
516,575
517,700
518,701
519,702


## Reservoirs
The reservoirs table shows the name of the reservoirs

In [4]:
hydro.get_reservoirs()

Unnamed: 0,name
0,bethel
1,clifton_court
2,discovery_bay
3,franks_tract
4,liberty
5,mildred


## External Flows

These are external flows defined in the input files. E.g all the boundary flow inputs, including the diversions/seepage/returns at nodes are available from this table


?Need reference to dsm2 docs here

In [5]:
hydro.get_qext()

Unnamed: 0,name,attach_obj_name,attached_obj_type,attached_obj_no
0,calaveras,21,2,20
1,cosumnes,446,2,428
2,moke,447,2,429
3,north_bay,273,2,402
4,sac,330,2,295
...,...,...,...,...
785,stockton,15,2,15
786,dicu_div_bbid,clifton_court,3,2
787,dicu_drain_bbid,clifton_court,3,2
788,dicu_seep_bbid,clifton_court,3,2


## Get Data Tables

These are tables that contain time series data. There are corresponding 
get_* for each table. Those are described below

In [6]:
hydro.get_data_tables()

['channel flow',
 'channel area',
 'channel stage',
 'channel avg area',
 'qext flow',
 'reservoir flow',
 'reservoir height',
 'transfer flow']

## Channel indices to numbers
The data in DataSets under /hydro/data is typically indexed by time, channel index, upstream/downstream if needed
The channel index can be mapped to the channel number by looking up that information from /hydro/geometry/channel_number

## Extracting time series data
Extracting data can then be done using the channel numbers. All data arrays have the first axis as time. The time start and time interval is available in the attrs along with other meta data.

Flow data shape is *time* x *channel index* x *channel location*

time start is available in attribue "START_TIME"
channel index to channel numbers is explained above
channel location (upstream/downstream) is available in /hydro/geometry/channel_location

### get_* methods

Each of the data tables has a corresponding get_* method. 
E.g. To the get the channel flow data use the methods below

Time window is an optional argument that can allow to retrieve only a part of the information

In [7]:
up1 = hydro.get_channel_flow('1','upstream')
down1 = hydro.get_channel_flow('1','downstream')
pd.concat([up1,down1],axis=1)

Unnamed: 0,1-upstream,1-downstream
1990-01-02 00:00:00,2600.000000,2400.000000
1990-01-02 00:30:00,207.439545,1069.039551
1990-01-02 01:00:00,1274.626099,1000.641602
1990-01-02 01:30:00,1156.880615,1157.169312
1990-01-02 02:00:00,1167.104126,1245.222656
...,...,...
1990-01-30 22:00:00,1208.349243,1205.972046
1990-01-30 22:30:00,1208.321655,1203.835205
1990-01-30 23:00:00,1208.271362,1201.660400
1990-01-30 23:30:00,1208.193848,1200.023438


Use the timewindow argument to retrieve only part of the time series

In [8]:
up2 = hydro.get_channel_flow(2,'downstream','05JAN1990 0000 - 07JAN1990 0445')
up2

Unnamed: 0,2-downstream
1990-01-05 00:00:00,1172.085571
1990-01-05 00:30:00,1170.742676
1990-01-05 01:00:00,1169.590088
1990-01-05 01:30:00,1168.591675
1990-01-05 02:00:00,1167.637939
...,...
1990-01-07 02:00:00,1211.994995
1990-01-07 02:30:00,1210.896851
1990-01-07 03:00:00,1209.269531
1990-01-07 03:30:00,1207.549438


In [9]:
hydro.get_channel_stage(1,'upstream','08JAN1990 - 10JAN1990')

Unnamed: 0,1-upstream
1990-01-08 00:00:00,7.170351
1990-01-08 00:30:00,7.165919
1990-01-08 01:00:00,7.156087
1990-01-08 01:30:00,7.148658
1990-01-08 02:00:00,7.142569
...,...
1990-01-09 21:30:00,7.093923
1990-01-09 22:00:00,7.094852
1990-01-09 22:30:00,7.095624
1990-01-09 23:00:00,7.096228


## Hydro Input Tables
The .h5 file in hydro contains many (though not all) input tables (*.inp). A complete listing of those tables can be read from the echo files. See this [notebook to read input](dsm2_read_input.ipynb)

In [10]:
hydro.get_input_tables()

['/hydro/input/boundary_flow',
 '/hydro/input/boundary_stage',
 '/hydro/input/channel',
 '/hydro/input/channel_ic',
 '/hydro/input/envvar',
 '/hydro/input/gate',
 '/hydro/input/gate_pipe_device',
 '/hydro/input/gate_weir_device',
 '/hydro/input/input_gate',
 '/hydro/input/input_transfer_flow',
 '/hydro/input/io_file',
 '/hydro/input/layers',
 '/hydro/input/operating_rule',
 '/hydro/input/oprule_expression',
 '/hydro/input/oprule_time_series',
 '/hydro/input/output_channel',
 '/hydro/input/output_gate',
 '/hydro/input/output_reservoir',
 '/hydro/input/reservoir',
 '/hydro/input/reservoir_connection',
 '/hydro/input/reservoir_ic',
 '/hydro/input/reservoir_vol',
 '/hydro/input/scalar',
 '/hydro/input/source_flow',
 '/hydro/input/source_flow_reservoir',
 '/hydro/input/transfer',
 '/hydro/input/xsect',
 '/hydro/input/xsect_layer']

To read the contents of any of the above tables simply use the get_input_table method

In [11]:
hydro.get_input_table('/hydro/input/channel')

Unnamed: 0,chan_no,length,manning,dispersion,upnode,downnode
0,1,19500,0.035,360.0,1,2
1,2,14000,0.028,360.0,2,3
2,3,13000,0.028,360.0,3,4
3,4,14050,0.028,360.0,4,5
4,5,12350,0.028,360.0,5,6
...,...,...,...,...,...,...
516,575,13000,0.022,1800.0,328,357
517,700,10000,0.033,360.0,700,330
518,701,10000,0.033,360.0,701,700
519,702,10000,0.033,360.0,702,701


## Hydro geometry input
Hydro also contains the geometry information such as the mapping of internal channel ids to external ones

In [12]:
hydro.get_geometry_tables()

['/hydro/geometry/channel_bottom',
 '/hydro/geometry/channel_location',
 '/hydro/geometry/channel_number',
 '/hydro/geometry/external_flow_names',
 '/hydro/geometry/hydro_comp_point',
 '/hydro/geometry/node_flow_connections',
 '/hydro/geometry/qext',
 '/hydro/geometry/reservoir_flow_connections',
 '/hydro/geometry/reservoir_names',
 '/hydro/geometry/reservoir_node_connect',
 '/hydro/geometry/stage_boundaries',
 '/hydro/geometry/transfer_names']

Channel bottoms are a calculation especially when looking at channel stage. These then have to be used in conjunction with that information to calculate depths

In [13]:
channels=['1','331','441']
hydro.get_channel_bottom(channels)

Unnamed: 0,upstream,downstream
1,3.502402,2.509
331,-13.584,-15.414
441,-69.570999,-52.325001


Hydro does its computation at certain points and those are available from the table below

In [14]:
hydro.get_geometry_table('/hydro/geometry/hydro_comp_point')

Unnamed: 0,comp_index,channel,distance
0,1,1,0.0
1,2,1,6500.0
2,3,1,13000.0
3,4,1,19500.0
4,5,2,0.0
...,...,...,...
1218,1219,520,5000.0
1219,1220,520,10000.0
1220,1221,521,0.0
1221,1222,521,5000.0
