Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic insert peak function #2

Open
narlesky opened this issue Dec 6, 2021 · 8 comments
Open

Generic insert peak function #2

narlesky opened this issue Dec 6, 2021 · 8 comments

Comments

@narlesky
Copy link

narlesky commented Dec 6, 2021

Hi @danhamill thanks for sharing this repo! It's great to see this algorithm implemented in Python -- so much more accessible!!

As I was reading through the peak insertion logic, I had a question about the calculation for how the daily volume is reapportioned when a new peak is added. Does this make sense as a way to express what you've written? I was curious about the origin of the 0.9 factor -- is that an approximation of 22 / 24? Thanks in advance! - Carly

def insert_peak_generic(daily_accumulation, hourly_accumulation, date, value, at_hour=None):
  """
  Accept daily_accumulation, hourly_accumulation, date and value. Insert 
  hourly peak flow of magnitude value at <at_hour> of date. Remaining daily
  volume proportionally divided between preceeding and following portions of
  remaining day. Return new hourly_accumulation.
  
  Arguments:
    daily_accumulation (pandas.Series): 
    hourly_accumulation (pandas.Series): 
    date (pandas.Period): date of tabulated peak value
    value (int64?): tabulated peak value at specified date
    at_hour (int): hour of day to insert peak, 0 (12am) to 23 (11pm)
  Returns:
    hourly_accumulation (pandas.Series): 
  """
  beginning_daily_sum = daily_accumulation[date]
  ending_daily_sum = daily_accumulation[date + 1]
  daily_flow = ending_daily_sum - beginning_daily_sum
  start_hour = pd.Period(freq="H", year=date.year, month=date.month, day=date.day, hour=at_hour)
  end_hour = start_hour + 1
  # end_hour = pd.Period(freq="H", year=date.year, month=date.month, day=date.day, hour=at_hour + 1)

  # hourly average flow from daily peak value
  value_hourly = value / 24.0

  # 11 am: Insert hourly peak flow of magnitude value at noon of date. Remaining daily
  # volume proportionally divided between preceeding and following portions of remaining day.
  if at_hour == 11:
    reapportion = (daily_flow - value_hourly) * 11 / 23

  # 10 pm: Insert hourly peak flow of magnitude value at midnight of date. Remaining daily
  # volume applied to first 23 hours of day.
  elif at_hour == 22:
    reapportion = daily_flow - value_hourly - value_hourly * 0.9

  # 11 pm: Insert hourly peak flow of magnitude value at midnight of date. Remaining daily
  # volume applied to first 23 hours of day.
  elif at_hour == 23:
    reapportion = daily_flow - value_hourly - value_hourly

  # 12 am: Insert hourly peak flow of magnitude value at 0000 of date. Remaining daily
  # volume applied to remaining 23 hours of day.
  elif at_hour == 0:
    reapportion = 0 # + (daily_flow - value_hourly)

  # 1 am: Insert hourly peak flow of magnitude value at 0100 of date. Remaining daily
  # volume applied to remaining 23 hours of day.
  elif at_hour == 1:
    reapportion = value_hourly * 0.9 # + (daily_flow - value_hourly)

  else:
    raise NotImplementedError(f'insert_peak_generic not implemented for at_hour={at_hour}')

  # update hourly_accumulation series values
  hourly_accumulation[start_hour] = beginning_daily_sum + reapportion
  hourly_accumulation[end_hour] = hourly_accumulation[start_hour] + value_hourly
  return hourly_accumulation
@danhamill
Copy link
Owner

@narlesky That is a good question. I asked Rob and he also could not remember why he used a value of 0.9 in the at_hour == 22, at_hour == 1 elif statements.

Thinking about what the function is intended to do, I think you found an issue worth exploring. For example the same logic could be rewritten like:

 elif at_hour == 22:
    reapportion = daily_flow - 1.9*value_hourly
  elif at_hour == 23:
    reapportion = daily_flow - 2*value_hourly

which does not make much sense. I will test and get back to you.

@narlesky
Copy link
Author

narlesky commented Jan 6, 2022

Thanks @danhamill!

@danhamill
Copy link
Owner

danhamill commented Mar 17, 2022

@narlesky Things have been busy but I have made a little progress simplifying the code. See Spline_from_Pandas.py

The first thing I wanted to get rid of was all the faux-date business. I think treating everything as a pd.Period will avoid any date overflow issues with pd.Timestamp.

The next thing I was working on was getting away from the text file inputs and outputs. My goal is to read and write directly from dss. The spline function in Spline_from_Pandas accepts a dataframe with two columns. Right now I have the flow column hard coded, which is undesirable. Passing a dataframe implies I am reading from dss outside of the method, but I think we could make this an object to simplify the usage of the algorithm. Here is how I was reading in the time series:

from pydsstools.heclib.dss import HecDss
from core.Spline_from_Pandas import spline

def get_daily_ts(path, dss_file):
    with HecDss.Open(dss_file) as fid:
        ts = fid.read_ts(path, trim_missing=False)
        times = ts.pytimes
        values = ts.values
    tmp = pd.DataFrame(data = {'date': times, 'flow':values.copy(), 'site':path.split("/")[2]})
    tmp.date = tmp.date - pd.DateOffset(hours = 1)
    return tmp
  
df_daily = get_daily_ts(path, file))
df_hourly = spline(pivot)

Another thing I have yet to tackle is to investigate other types of splines. This:

spline_function = interpolate.splrep(masked_dates.compressed(),
    hourly_accumulation.dropna().values, s=0)

is by no means optimal. This type of spline has to go down before it goes up. When you run the spline on an accumulation time series, going down results in negative flows when the interpolated hourly accumulation function is converted back to a hydrograph using successive differences.

Lately I have been working on local flows which aren't the best to test with since there can be large portions of the time series where there aren't any local flows.

@narlesky
Copy link
Author

@danhamill thanks for the update! What a funny coincidence that I had just visited this and the CriticalDuration repos earlier today and starred them and shared to our GitHub org! I definitely understand the balancing act between text and DSS inputs. We are still working with CSVs a decent portion of the time but I can see why the pydsstools integration would be so valuable!

@danhamill
Copy link
Owner

Haha. I saw all the activity on git and remembered I hadn't committed lately. I was mid-way through an update and got sidetracked.

@narlesky
Copy link
Author

Well it's much appreciated!

@danhamill
Copy link
Owner

@narlesky off topic, but I came across Qiusheng Wu's github profile and he has some really great open source GIS repositories. Check out leafmap and geemap.

@narlesky
Copy link
Author

@danhamill Oh cool! I used to work on a project with a similar theme based on Mapbox. I'll have to check it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants