# Incorporating Historical Information into Flood Frequency Analysis

Here we'll show how to incorporate historical flood information into flood frequency analysis, combining a systematic gauged record at [USGS gauge 09357500](https://waterdata.usgs.gov/nwis/inventory?site_no=09357500&agency_cd=USGS), the Animas River at Howardsville, Colorado. This gauge has a 47-year systematic record from WY 1936-1982.  

In [England et al., 2003](https://www.sciencedirect.com/science/article/pii/S0022169403001410?casa_token=5mUAFml5gLwAAAAA:4Bz--Z0WNxOWyN0a-FD4XuTbBPW6YiOMwtvuYtW-SLU6tmJkgJH3bxKrFzhn6AXN-9TzrbFziww), they note a historical peak of 2470 cfs at this location 953 years ago. These are the threshold and historical record length. We'll see how this information influences our estimate of the LP3 parameters and 100-year flood at this location.

We'll use the `dataretrieval` library to download this data again, but this time we'll just download the annual maxima (`service='peaks'`) rather than the daily values (`service='dv'`).

In [None]:
!pip install dataretrieval

In [None]:
import pandas as pd
import dataretrieval.nwis as nwis

flow_df = nwis.get_record(sites='09357500', service='peaks', start='1936-05-25', end='1982-06-28') # Animas River at Howardsville, CO
flow_df.head()

All we care about is the `peak_va` column, so let's just select that.

In [None]:
flow_df = flow_df["peak_va"]
flow_df.head()

To fit an LP3 distribution, first import the classes and functions from `utils.py`. Since this uses the `lmoments3` library, we'll need to install that here as well.

In [None]:
!pip install lmoments3

In [None]:
from google.colab import drive

# allow access to google drive
drive.mount('/content/drive')

!cp "drive/MyDrive/Colab Notebooks/CE6280/CodingExamples/utils.py" .
from utils import *

For your future homeworks, you can include the code below within `utils.py`, but I'm pasting it below rather than including it there so `utils.py` does not contain the answers to your first homework 😉.

Below I've included a function to estimate the parameters of the Gamma distribution using both systematic and historical information with the Expected Moments Algorithm (EMA). This functions can be included within `utils.py` for your future homeworks as well. You'll also need to write a similar function to estimate the parameters with the Approximate Moments Algorithm (AMA).

In [None]:
import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt

class Gamma(Distribution):
  def __init__(self):
    super().__init__()
    self.alpha = None
    self.xi = None
    self.beta = None

  def fit(self, data, method, npars, initialize=True, hist_data=None, h=None, thres=None, params0 = None, tolerance=None, maxiter=None):
    assert method == 'MLE' or method == 'MOM' or method == "EMA","method must = 'MLE', 'MOM' or 'EMA'"
    assert npars == 2 or npars == 3,"npars must = 2 or 3"

    self.findMoments(data)
    if method == 'MLE':
      if initialize == False:
        if npars == 2:
          shape, loc, scale = ss.gamma.fit(data, floc=0)
        elif npars == 3:
          shape, loc, scale = ss.gamma.fit(data)
      else:
        if npars == 2:
          self.fit(data, 'MOM', 2)
          shape, loc, scale = ss.gamma.fit(data, self.alpha, floc=0)
        elif npars == 3:
          self.fit(data, 'MOM', 3)
          shape, loc, scale = ss.gamma.fit(data, self.alpha)

      self.alpha = shape
      self.xi = loc
      self.beta = 1/scale
    elif method == 'MOM':
      if npars == 2:
        self.alpha = self.xbar**2/self.var
        self.beta = self.xbar/self.var
        self.xi = 0
      elif npars == 3:
        self.alpha = 4/self.skew**2
        self.beta = np.sqrt(self.alpha/self.var)
        self.xi = self.xbar - self.alpha/self.beta
    elif method == 'EMA':
      x = data
      y = hist_data
      s = len(x)

      # Get vector of all peaks over the threshold and find how many there are in the historical record
      y_prime = np.concatenate((y[np.where(y >= thres)], x[np.where(x >= thres)]),0)
      k = len(np.where(y >= thres)[0])

      # Get vector of all observed peaks below the threshold (those during the systematic record, the last s years)
      # Find how many peaks in the systematic record, m, exceeded the threshold
      x_prime = x[np.where(x < thres)]
      m = s - len(x_prime)

      # Fit the distribution parameters using EMA
      # Step 1: estimate initial parameters alpha_hat, beta_hat, xi_hat
      # with MOM using only systematic record
      #self.fit(x, "MOM", 3)
      alpha_hat = params0.alpha
      beta_hat = params0.beta
      xi_hat = params0.xi

      # Step 2: Use initial parameter estimates to find expected moments below the threshold.
      # Then compute "new" sample moments based on observed floods and expected moments of unseen floods
      c2 = (s + k) / (s + k - 1)
      c3 = (s + k)**2 / ((s + k - 1)*(s + k - 2))
      for i in range(maxiter):
        # find expected value below the threshold with current parameter estimates
        D = ss.gamma(alpha_hat)
        if beta_hat > 0:
          E_xH_below = ss.gamma.expect(lambda x: x, D.args, loc=xi_hat, scale=1/beta_hat, ub=thres, conditional=True)
        else:
          E_xH_below = -ss.gamma.expect(lambda x: x, D.args, loc=-xi_hat, scale=-1/beta_hat, lb=-thres, conditional=True)

        # update estimate of the mean
        mu_new = (np.sum(x_prime) + np.sum(y_prime) + (h-k)*E_xH_below) / (s + h)

        # find expected value of x^2 below the threshold with current parameter estimates
        if beta_hat > 0:
          E_xH_below_2 = ss.gamma.expect(lambda x: (x-mu_new)**2, D.args, loc=xi_hat, scale=1/beta_hat, ub=thres, conditional=True)
        else:
          E_xH_below_2 = ss.gamma.expect(lambda x: (x+mu_new)**2, D.args, loc=-xi_hat, scale=-1/beta_hat, lb=-thres, conditional=True)

        # update estimate of the standard deviation
        sigma_new = np.sqrt((c2 * (np.sum((x_prime - mu_new)**2) + np.sum((y_prime - mu_new)**2)) + (h-k)*E_xH_below_2) / (s+h))

        # find expected value of x^3 below the threshold with current parameter estimates
        if beta_hat > 0:
          E_xH_below_3 = ss.gamma.expect(lambda x: (x-mu_new)**3, D.args, loc=xi_hat, scale=1/beta_hat, ub=thres, conditional=True)
        else:
          E_xH_below_3 = -ss.gamma.expect(lambda x: (x+mu_new)**3, D.args, loc=-xi_hat, scale=-1/beta_hat, lb=-thres, conditional=True)

        # update estimate of the skewness
        gamma_new = ((c3*(np.sum((x_prime - mu_new)**3) + np.sum((y_prime - mu_new)**3)) + (h-k)*E_xH_below_3)) / ((s+h)*sigma_new**3)

        # Step 3: Use new moments to find corresponding "new" estimates of alpha, xi and beta
        alpha_new = 4/gamma_new**2
        beta_new = (np.abs(gamma_new)/gamma_new) * np.sqrt(alpha_new)/sigma_new
        xi_new = mu_new - alpha_new/beta_new

        # calculate difference between old ("hat") and "new" parameter estimates
        alphaDiff = 0.5 * np.abs(( alpha_new - alpha_hat) / (alpha_new + alpha_hat))
        xiDiff = 0.5 * np.abs((xi_new - xi_hat) / (xi_new + xi_hat))
        betaDiff = 0.5 * np.abs((beta_new - beta_hat) / (beta_new + beta_hat))
        totalDiff = alphaDiff +  xiDiff + betaDiff

        # update old ("hat") estimates with "new" estimates
        alpha_hat = alpha_new
        beta_hat = beta_new
        xi_hat = xi_new

        # Step 4: convergence test
        # exit loop if total difference between past and current estimates is within the tolerance
        if totalDiff < tolerance:
          break

      # return parameter estimates from EMA loop
      self.alpha = alpha_hat
      self.xi = xi_hat
      self.beta = beta_hat

  def findReturnPd(self, T):
    q_T = ss.gamma.ppf(1-1/T, self.alpha, self.xi, 1/self.beta)
    return q_T

  def plotHistPDF(self, data, min, max, title):
    x = np.arange(min, max,(max-min)/100)
    f_x = ss.gamma.pdf(x, self.alpha, self.xi, 1/self.beta)

    plt.hist(data, density=True)
    plt.plot(x,f_x)
    plt.xlim([min, max])
    plt.title(title)
    plt.xlabel('Flow')
    plt.ylabel('Probability Density')
    plt.show()

What would be the parameter and 100-year flood estimates with just the systematic record using LP3 with MOM?

In [None]:
dist = Gamma()
if ss.skew(np.log(flow_df),bias=False) > 0:
  dist.fit(np.log(flow_df), "MOM", 3)
  q100 = np.exp(dist.findReturnPd(100))
  dist.plotHistPDF(np.log(flow_df), 6, 8, "LP3 MOM Fit")
else:
  dist.fit(-np.log(flow_df), "MOM", 3)
  q100 = np.exp(-dist.findReturnPd(1/0.99))
  dist.plotHistPDF(-np.log(flow_df), -8, -6, "LP3 MOM Fit")

print("alpha: %0.2f" % dist.alpha)
print("xi: %0.2f" % dist.xi)
print("beta: %0.2f" % dist.beta)
print("q100: %0.0f cfs" % q100)

How does this change using LP3 with EMA and the historical flood?

In [None]:
hist_data = np.array([2470])
thres = 2470

params0 = Gamma()
if ss.skew(np.log(flow_df),bias=False) < 0:
  params0.fit(-np.log(flow_df), "MOM", 3)
  params0.beta = -params0.beta
  params0.xi = -params0.xi
else:
  params0.fit(np.log(flow_df), "MOM", 3)

dist = Gamma()
dist.fit(np.log(np.array(flow_df)), "EMA", 3, True, np.log(hist_data), 953, np.log(thres), params0, tolerance=0.0001, maxiter=100)
if dist.beta > 0:
  q100 = np.exp(dist.findReturnPd(100))
  dist.plotHistPDF(np.log(flow_df), 6, 8, "EMA MOM Fit")
  T_hist = 1 / (1 - ss.gamma.cdf(np.log(2470), dist.alpha, dist.xi, 1/dist.beta))
else:
  dist.beta = -dist.beta
  dist.xi = -dist.xi
  q100 = np.exp(-dist.findReturnPd(1/0.99))
  dist.plotHistPDF(-np.log(flow_df), -8, -6, "EMA MOM Fit")
  T_hist = 1 / ss.gamma.cdf(-np.log(2470), dist.alpha, dist.xi, 1/dist.beta)

print("alpha: %0.2f" % dist.alpha)
print("xi: %0.2f" % dist.xi)
print("beta: %0.2f" % dist.beta)
print("q100: %0.0f cfs" % q100)
print("Return period of historical flood: %0.0f years" % T_hist)

The 100-yr flood estimate shifts downward from 2023 cfs to 1977 cfs. Other locations might see more of a shift from historical information. The historical flood of 2470 cfs that was observed 953 years ago has an estimated return period of 1029 years, not far from its empirical estimate given the record length.