# Goodness of Fit Tests for Model Adequacy and Quantifying Uncertainty

In [None]:
!pip install lmoments3

In [None]:
import pandas as pd
import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
from google.colab import drive

# allow access to google drive
drive.mount('/content/drive')

!cp "drive/MyDrive/Colab Notebooks/CE6280/CodingExamples/utils.py" .
from utils import *

maxQ = pd.read_csv("drive/MyDrive/Colab Notebooks/CE6280/Data/Problem6.17.csv")

## Goodness of Fit Tests

### Test for Non-zero Skewness
First, to determine if we should fit a skewed distribution or not, we should test if the skewness of the data is different than 0. The test statistic, $z$, for this test is the following, which should have a standard normal distribution:  
$$z = \frac{\hat{\gamma}}{\sqrt{\frac{6(n-2)}{(n+1)(n+3)}}}$$  
where $\hat{\gamma}$ is the sample skewness coefficient.

In [None]:
def skewtest(data):
  g = ss.skew(data,bias=False)
  print("Sample skewness: %0.2f" % g)
  n = len(data)
  z = g / np.sqrt(6*(n-2) / ((n+1)*(n+3)))
  if z > 0:
    p_one_sided = 1 - ss.norm.cdf(z)
  else:
    p_one_sided = ss.norm.cdf(z)

  p_two_sided = 2*p_one_sided

  return p_one_sided, p_two_sided

p_one_sided, p_two_sided = skewtest(maxQ["Flow"])

print("One-sided p-value: %.2e" % p_one_sided)
print("Two-sided p-value: %.2e" % p_two_sided)

Whether one or two sided (one-sided is probably appropriate for untransformed flood data), we absolutely reject the data has 0 skew!  

What about the log-transformed data?

In [None]:
p_one_sided, p_two_sided = skewtest(np.log(maxQ["Flow"]))

print("One-sided p-value: %.2e" % p_one_sided)
print("Two-sided p-value: %.2e" % p_two_sided)

For this we should probably use a 2-sided test, but even then, the log-transformed data has significantly positive skew. This suggests even the log of the data is not normal, so a log-normal distribution still may not sufficiently fit our data.  

### Test for distribution fit with Kolmogorov-Smirnov (K-S) Test

Let's test the fit of the LN2 and LN3 distributions using MOM, MLE and Lmom statistically with a K-S test. The test statistic for this test is $D$, the maximum distance between the fitted and empirical distributions:  
$$\require{amsmath}  
D = \text{sign} \underset{x}{\max} |F_{empirical}(x) - F_{fitted}(x)| $$

In [None]:
methods = ["MOM", "MLE", "Lmom"]
npars = [2, 3]
p = ss.mstats.plotting_positions(maxQ["Flow"])
p = np.sort(p)

for method in methods:
  for npar in npars:
    LN = LogNormal()
    LN.fit(maxQ["Flow"], method, npar)
    result = ss.kstest(maxQ["Flow"], ss.lognorm.ppf(p, s=LN.sigma, loc=LN.tau, scale=np.exp(LN.mu)), alternative='two-sided')
    print("p-value of 2-sided K-S test for LN%d %s fit: %f" % (npar, method, result.pvalue))

According to the K-S test, we cannot reject that the data came from any of these distributions, but we are most confident in the LN3 MLE and LN3 Lmom fits and least confident in the MOM fits for LN2 and LN3.  

### Test for distribution fit with Probability Plot Correlation Coefficient (PPCC) Test

The PPCC test is more powerful though, meaning it is less likely to accept the null hypothesis when the data doesn't actually come from that distribution. What does the Probability Plot Correlation Coefficient (PPCC) test conclude for each fit?

There is a Python function for finding the PPCC under different shape parameters of a distribution: [ppcc_plot](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ppcc_plot.html). This also reports the shape parameter that results in the greatest PPCC. But it does not perform a Monte Carlo-based hypothesis test to estimate the probability your data could have come from that distribution. So we will have to write our own code for that.

In [None]:
def ppccTest(data, fitted_dist, distname, title, m=10000):
  x_sorted = np.sort(data)
  p_observed = ss.mstats.plotting_positions(x_sorted)
  if distname == 'LN': # you'll add elif statements for alternative distributions for your next homework
    # get fitted quantiles
    x_fitted = ss.lognorm.ppf(p_observed, s=fitted_dist.sigma, loc=fitted_dist.tau, scale=np.exp(fitted_dist.mu))

    # generate m synthetic samples of n observations
    rhoVector = np.zeros(m)
    for i in range(m):
      np.random.seed(i)
      x = ss.lognorm.rvs(s=fitted_dist.sigma, loc=fitted_dist.tau, scale=np.exp(fitted_dist.mu), size=len(data))
      rhoVector[i] = np.corrcoef(np.sort(x), x_fitted)[0,1]

  # calculate test statistic
  rho = np.corrcoef(x_sorted, x_fitted)[0,1]

  # calculate pvalue of test statistic from Monte Carlo simulation
  count = 0
  for i in range(len(rhoVector)):
    if rho < rhoVector[i]:
      count = count + 1

  p_value = 1 - count/(m+1)

  # make Q-Q plot
  plt.scatter(x_sorted,x_fitted,color='b')
  plt.plot(x_sorted,x_sorted,color='r')
  plt.xlabel('Observations')
  plt.ylabel('Fitted Values')
  plt.title(title)
  plt.show()

  return rho, p_value

for method in methods:
  for npar in npars:
    LN = LogNormal()
    LN.fit(maxQ["Flow"], method, npar)
    rho, p_value = ppccTest(maxQ["Flow"], LN, 'LN', 'QQ plot for LN' + str(npar) + ' fit with ' + method)
    print("p-value of PPCC test for LN%d %s fit: %f" % (npar, method, p_value))

You can see the p-values are much lower for these tests, so we're closer to rejecting that the data came from these distributions. The best fit is LN3 Lmom, followed closely by LN3 MLE. But clearly the fits aren't great in the Q-Q plots, so it would be good to quantify uncertainty in quantile estimates from these fits like the 100-year flood.  

# Quantifying Uncertainty in Quantile Estimates

In the slides, we showed how to do this theoretically for the LN2 distribution, but not the LN3 distribution, so for that we'll use bootstrapping. You can add these functions to the LogNormal class in `utils.py` for the future, replacing `dist` with `self`, and write similar functions for the other distributions on your next homework.  

For bootstrapping, we'll load a function from the `astropy` library, which we have to install first.

In [None]:
!pip install astropy

The analytical confidence interval for the p-th quantile of the LN2 distribution, $x_p$ is:  
$$CI(x_p) = \exp\Bigg(x_p - z_{1-\alpha/2} \sqrt{\frac{\hat{\sigma^2}}{n}\big(1+0.5z_p^2\big)}, x_p + z_{1-\alpha/2} \sqrt{\frac{\hat{\sigma^2}}{n}\big(1+0.5z_p^2\big)} \Bigg)$$  
where  
$$x_p = \hat{\mu} + z_p\hat{\sigma}$$  
and $\hat{\mu}$ and $\hat{\sigma}$ are the fitted parameters of the LN2 distribution.

In [None]:
from astropy.stats import bootstrap as bootstrap

def calcCI(dist, data, p, CI, method, npars, seed, m=1000):
  '''Function for finding `CI`% confidence interval of
  `p`-th percentile of distribution `dist`
  with `npars` parameters fit to `data` using `method`
  with random seed `seed` if bootstrapping'''
  n = len(data)
  alpha = (100.0-CI)/100.0
  if npars == 2:
    # calculate theoretical confidence interval using formula from slides
    z_p = ss.norm.ppf(p)
    z_crit = ss.norm.ppf(1-alpha/2)
    x_p = dist.mu + z_p*dist.sigma
    LB = np.exp(x_p - z_crit * np.sqrt(dist.sigma**2 * (1+0.5*z_p**2)/n))
    UB = np.exp(x_p + z_crit * np.sqrt(dist.sigma**2 * (1+0.5*z_p**2)/n))
    return LB, UB
  elif npars == 3:
    # calculate confidence interval from m bootstrap samples
    np.random.seed(seed)
    sample = bootstrap(data, m)
    q_sample = np.zeros(m)-999 # initialize estimates at -999
    # estimate p-th percentile by fitting distribution to each bootstrap sample
    for i in range(m):
      LN = LogNormal()
      try:
        LN.fit(sample[i,:], method, npars)
        q_sample[i] = LN.findReturnPd(1/(1-p))
      except:
        pass

    # sort bootstrap samples and find confidence interval from empirical quantiles of sampled estimates
    q_sample = np.sort(q_sample)
    keep_indices = np.intersect1d(np.where(~np.isnan(q_sample))[0],np.where(q_sample != -999)[0]) # remove failed fits
    q_sample = q_sample[keep_indices]
    LB = q_sample[int((alpha/2)*len(q_sample))]
    UB = q_sample[int((1-alpha/2)*len(q_sample))]
    return LB, UB, q_sample

for method in methods:
  for npar in npars:
    LN = LogNormal()
    LN.fit(maxQ["Flow"], method, npar)
    # calculate 95% confidence interval on 100-yr flood
    if npar == 2:
      LB, UB = calcCI(LN, maxQ["Flow"], 0.99, 95, method, npar, 1923)
    elif npar == 3:
      LB, UB, q_sample = calcCI(LN, maxQ["Flow"], 0.99, 95, method, npar, 1923)

    print("95%% confidence interval for 100-yr flood estimate from LN%d %s fit: %f, %f" % (npar, method, LB, UB))

This returned a crazy upper bound of the 95% CI for the LN3 MLE fit! This is an example of how the MLE fits can be unstable. This had initialized the estimates with Lmom. We scaling the data and then fitting the distribution.

In [None]:
method = 'MLE'
npar = 3
LN = LogNormal()
z = (maxQ["Flow"]-np.mean(maxQ["Flow"])) / np.std(maxQ["Flow"],ddof=1)
LN.fit(z, method, npar)
# calculate 95% confidence interval on 100-yr flood
LB, UB, q_sample = calcCI(LN, maxQ["Flow"], 0.99, 95, method, npar, 1923)
LB = LB*np.std(maxQ["Flow"],ddof=1) + np.mean(maxQ["Flow"])
UB = UB*np.std(maxQ["Flow"],ddof=1) + np.mean(maxQ["Flow"])
print("95%% confidence interval for 100-yr flood estimate from LN%d %s fit: %f, %f" % (npar, method, LB, UB))
print("All sorted bootstrapped estimates of 100-yr flood:")

We still got a crazy upper bound, but we also got an error in finding the log of the data minus the lower bound, so clearly there were some problematic parameter estimates.  

Let's look at all of the 100-year flood estimates.

In [None]:
print(q_sample)

There is a jump from estimates being on the order of E4 to E9! How often does this happen?


In [None]:
len(np.where(q_sample>1E5)[0])/len(q_sample)

This happens in about 15% of the bootstrapped samples, so it's not only a few cases. In this case, we probably just shouldn't consider using the uncertainty in the LN3 estimates for flood design!