#  Time stamps, Periods HEC convention

HEC (and perhaps others) timestamp the data to the next time to indicate time from the start to end (including the end)

In pandas, this is handled by differentiating between time stamp ( a point in time ) to time period ( a period of time). Furthermore, in pandas the period of time is closed at the start by default as opposed to end as it is in HEC

## Time formats (standards and military conventions)

ISO 8601 is a standard for date and times. Read this for [duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) and [time interval](https://en.wikipedia.org/wiki/ISO_8601#Time_intervals)

An important distinction is that there is 2400 for HHmm (hour/minutes) representation in the [ISO 8601 standards](https://en.wikipedia.org/wiki/ISO_8601#Times) however HEC-DSS stores the midnight with a 2400 designation and that is reflected in their libraries ability to parse that timestamp to the midnight of the next day. In summary 2400 is interpreted as midnight (0000) of the next day.

Infact that is now explicitly disallowed
```
Earlier versions of the standard allowed "24:00" corresponding to the end of a day, 
but this is explicitly disallowed by the 2019 revision.
```

Military time allows for 2400 and this is interpreted as 0000 of the next day. See this [wikipedia reference](https://en.wikipedia.org/wiki/24-hour_clock#Midnight_00:00_and_24:00)

## Period and Instantaneous data types

HEC-DSS reads the data, its units and a string representing the interval type as either starting with "INST" for instantaneous or "PER" for period data. 

For regularly sampled time series in HEC is either INST-VAL (timestamped) or PER-AVER (or just PER- prefixed) to designated period data.

pyhecdss reads the "INST" data as is into a timestamp indexed data. 

However for "PER" data, the time stamp is moved to the left ( backwards ) by the length of the period and the data returned is period indexed. E.g. 31JAN1991 2400 for monthly data is interpreted as 01JAN1991 by first moving the 31JAN1991 2400 timestamp to 31DEC1990 2400 (i.e. 1 month) and then interpreted as the timestamp which is 01JAN1991 0000 and then converted to a period with 1 month interval which is the period 01JAN1991
The reverse is applied to period data on writes.

## Period operations

If you use the default conventions (pandas), the timestamps for a particular period include the start and not the end. For HEC style (or end of timestamp data), you should use the closed="right" when [resampling (pandas function)](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling) the data

One catch is to use kind="timestamp" and the convert back with to_period() (pandas function).

**TLDR; just use the pydsm.functions tsmath module which has functions for per_aver, per_max, per_min**

In [1]:
import pyhecdss
import pandas as pd
import numpy as np
import pydsm
from pydsm.functions import tsmath

A function to generate a simple increasing time series from 0 -> 99 with timestamps regularly placed at 1 hour intervals

In [8]:
pd.date_range('01JAN1999 0100',periods=4,freq='15T')

DatetimeIndex(['1999-01-01 01:00:00', '1999-01-01 01:15:00',
               '1999-01-01 01:30:00', '1999-01-01 01:45:00'],
              dtype='datetime64[ns]', freq='15T')

In [9]:
pd.period_range('01JAN1999 0100',periods=4,freq='15T')

PeriodIndex(['1999-01-01 01:00', '1999-01-01 01:15', '1999-01-01 01:30',
             '1999-01-01 01:45'],
            dtype='period[15T]')

In [2]:
def linear_timeseries():
    '''
    A simple increasing time series to use for averaging functions
    ```
    01JAN2000 0100 - 0
    01JAN2000 0200 - 1
    ....
    01JAN2000 2200 - 21
    01JAN2000 2300 - 22
    02JAN2000 0000 - 23
    ```
    '''
    nvals=100
    return pd.DataFrame(np.arange(0,nvals), columns=['values'], index=pd.date_range(start='01JAN2000 0100',periods=nvals,freq='H'))


In [13]:
ts=linear_timeseries()
ts.iloc[0:24]

Unnamed: 0,values
2000-01-01 01:00:00,0
2000-01-01 02:00:00,1
2000-01-01 03:00:00,2
2000-01-01 04:00:00,3
2000-01-01 05:00:00,4
2000-01-01 06:00:00,5
2000-01-01 07:00:00,6
2000-01-01 08:00:00,7
2000-01-01 09:00:00,8
2000-01-01 10:00:00,9


Use the tsmath modules per_aver, per_max or per_min functions to get the average, max and min respectively in the HEC convention
The interval is passed to pandas [resample](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling) and should be DateOffset, Timedelta or str

In [14]:
dfall = pd.concat([tsmath.per_aver(ts.to_period(), interval='D'),
           tsmath.per_max(ts, interval='D'),
           tsmath.per_min(ts, interval='D')], axis=1)
dfall.columns=['avg','max','min']
dfall

Unnamed: 0,avg,max,min
2000-01-01,11.5,23,0
2000-01-02,35.5,47,24
2000-01-03,59.5,71,48
2000-01-04,83.5,95,72
2000-01-05,97.5,99,96


Storing these time series in DSS and retrieving this is demoed below

HECDSS api only supported for double type so convert to doubles first

In [19]:
ts=ts.astype('double')

In [20]:
import os
os.remove('hecavg.dss')
with pyhecdss.DSSFile('hecavg.dss',create_new=True) as d:
    d.write_rts('/AVG/TS/C//1HOUR/F/',ts,'unk','INST-VAL')

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'hecavg.dss'

In [None]:
ts_avg = tsmath.per_aver(ts,'1D')
ts_max = tsmath.per_max(ts,'1D')
ts_min = tsmath.per_min(ts, '1D')

In [21]:
with pyhecdss.DSSFile('hecavg.dss') as d:
    d.write_rts('/AVG/TS/C-PER-AVER//1DAY/F/',ts_avg,'unk','PER-AVER')
    d.write_rts('/AVG/TS/C-PER-MAX//1DAY/F/',ts_max,'unk','PER-AVER')
    d.write_rts('/AVG/TS/C-PER-MIN//1DAY/F/',ts_min,'unk','PER-AVER')

In [22]:
matching = pyhecdss.get_ts('hecavg.dss','/AVG/TS/C-PER-AVER//1DAY/F/')
ts_avg_read=next(matching)
print('Units: ',ts_avg_read.units, ' | Period Type: ',ts_avg_read.period_type)
ts_avg_read.data

Units:  unk  | Period Type:  PER-AVER


Unnamed: 0,/AVG/TS/C-PER-AVER/01JAN2000 - 01JAN2000/1DAY/F/
2000-01-01,11.5
2000-01-02,35.5
2000-01-03,59.5
2000-01-04,83.5
2000-01-05,97.5


In [23]:
pd.concat([ts_avg,ts_avg_read.data],axis=1)

Unnamed: 0,values,/AVG/TS/C-PER-AVER/01JAN2000 - 01JAN2000/1DAY/F/
2000-01-01,11.5,11.5
2000-01-02,35.5,35.5
2000-01-03,59.5,59.5
2000-01-04,83.5,83.5
2000-01-05,97.5,97.5


A slightly more detailed explanation is below
The [resample](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling) function is used with particular options to conform to the HEC convention. 

For example below you can use the other grouping functions with resample to do other operations with these conventions
The resample_hec_style() function is a convenience for call resample with the hec conventions

In [24]:
ts.resample('1D', closed='right',kind='timestamp').sum().to_period()
tsmath.resample_hec_style(ts,'1D').sum().to_period()

Unnamed: 0,values
2000-01-01,276.0
2000-01-02,852.0
2000-01-03,1428.0
2000-01-04,2004.0
2000-01-05,390.0


In [28]:
tsmath.resample_hec_style(ts,'4H').sum().to_period()

Unnamed: 0,values
2000-01-01 00:00,6.0
2000-01-01 04:00,22.0
2000-01-01 08:00,38.0
2000-01-01 12:00,54.0
2000-01-01 16:00,70.0
2000-01-01 20:00,86.0
2000-01-02 00:00,102.0
2000-01-02 04:00,118.0
2000-01-02 08:00,134.0
2000-01-02 12:00,150.0


In [27]:
ts

Unnamed: 0,values
2000-01-01 01:00:00,0.0
2000-01-01 02:00:00,1.0
2000-01-01 03:00:00,2.0
2000-01-01 04:00:00,3.0
2000-01-01 05:00:00,4.0
...,...
2000-01-05 00:00:00,95.0
2000-01-05 01:00:00,96.0
2000-01-05 02:00:00,97.0
2000-01-05 03:00:00,98.0
