[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aebtehaj/Hydrologic-Design_Notebook/blob/main/Chapter9.ipynb)

<a id="0"></a>
<div style="text-align: center; color: #1877F2">
  <h1>Hydrologic Design 4501</h1>
</div>

<div style="text-align: center; font-weight: bold;">
  Chapter 9: Statistical Hydrology
</div>

<div style="text-align: center; margin-bottom: 0.5em;">
  Mohammadali Olyaei and Ardehsir Ebtehaj
</div>

<div style="text-align: center; font-weight: bold;">
  University of Minnesota
</div>

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9.png" alt="Title" style="width: auto; height: 40px;">
</div>

<a id="1"></a>
# 1- Introduction

Hydrologic processes can be partly explained by deterministic physically-based models; however, some processes are not still well understood and shall be characterized through statistical models. Statistical models explain hydrologic processes based on probabilistic representations of their historical observations.

A random variable can be explained with a probability distribution, which is a parametric function that characterizes the probability of occurrence of the random variable.

A Random variable (X) is a variable whose possible values are outcomes of a random process.

There are two types of random variables:

a) A **discrete random variable** only takes countable number of random values.

b) A **continuous random variable** represents infinite number of possible random values.


- A finite set of observations $X_1 ,X_2 ,\ldotp \ldotp \ldotp ,X_n$ of a random variable is called a **sample set**. 
- The space that all samples can be drawn is called the **sample space**, often denoted by $\Omega$ . In other words, the sample space of a random experiment is the set of all possible outcomes. 
- A subset of the sample space (i.e., a set of outcomes of an experiment) is called an **event**. 

For example, $\Omega =\left\lbrace x|x=0,1,2,\ldotp \ldotp \ldotp ,10\right\rbrace$ can represent a sample space, where $A_x =\left\lbrace x|1\le x\le 6\right\rbrace$ is an event.

The box below has three oranges (o) and two blue (b) discs as the entire sample $\Omega =\left\lbrace \textrm{blue},\textrm{orange}\right\rbrace$.

$p\left(b\right)=\frac{2}{5}$

$p\left(o\right)=\frac{3}{5},$ where $p\left(o\right)+p\left(b\right)=1$

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F1.png" alt="Figure 1" style="width: auto; height: 300px;">
</div>

- **Total Probability:**

where $A_1 \ldotp \ldotp \ldotp ,A_n$ are disjoint events that their union form the entire sample space.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F2.png" alt="Figure 2" style="width: auto; height: 300px;">
</div>

- **Complementarity:**

$p\left(A\right)=1-p\left(\Omega -A\right)=1-p\left(\bar{A} \right)$          

$\bar{A}$: compliment of A


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F3.png" alt="Figure 3" style="width: auto; height: 300px;">
</div>


**Conditional Probability:** Suppose we have two events A and B, the conditional probability p(A|B) refers to the probability of an event A given that the event B has already occurred. 

$$p\left(A|B\right)=\frac{p\left(A\cap B\right)}{p\left(B\right)}$$

where $p\left(A\cap B\right)$ is the **joint probability**, which is shown with a solid hatch in the following sample space. The joint probability is sometimes denoted by p(A, B) as well.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F4.png" alt="Figure 4" style="width: auto; height: 300px;">
</div>

If two events are **independent**, we have,


$$p\left(A|B\right)=p\left(A\right)=\frac{p\left(A\cap B\right)}{p\left(B\right)}$$

and thus $p\left(A\cap B\right)=p\left(A\right)\ldotp p\left(B\right)$.


**Marginal Probability:** The probability of an event without conditioning to other dependent random variables in the sample space. For example, if $p\left(A\cap B\right)\not= \oslash$ is not empty (i.e., $A$ and $B_i$ are not disjoint), then the marginal distribution of $p\left(A\right)$ is $p\left(A\right)=\sum_{B_i } p\left(A\cap B_i \right)$. This representation is an extended version of the law of total probability.

**Example:** Let us assume that values of total annual precipitation amounts are independent random variables ($X$) and $p\left(X\le {40}^" \right)=a$ and $p\left(X\le {30}^" \right)=b$ then $p\left(X_1 \le {40}^" \cap X_2 \le {30}^" \right)=\textrm{ab}$.

Moreover, if $p\left(X\le {35}^" \right)=0\ldotp 333$ and $p\left(X>{45}^" \right)=0\ldotp 275$, then the complement probability is $p\left(35\le X\le 45\right)=1-0\ldotp 333-0\ldotp 275=0\ldotp 392$.

**Frequency Histogram vs Probability Distribution**

If we have a finite number of n independent and identically distributed (iid) samples of $X$, we can first determine the range of the random numbers and then divide it into discrete intervals with a size of $\Delta X$. Then, we can count the number of values ($n_i$) that fall within $\left\lbrack x_i ,x_i +\Delta x\right\rbrack$ and divide it by the total number of samples ($n$) to obtain the frequency of occurrence within each interval as follows:

$$f_s \left(x_i \le X\le x_i +\Delta x\right)\cong \frac{n_i }{\;n}$$


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F5.png" alt="Figure 5" style="width: auto; height: 300px;">
</div>

<p style="text-align: center;">Probability histogram (mass function) of discrete random numbers (left) versus probability density function of continuous random variables (right)</p>

As $\Delta x\to 0$, we can say that the histogram approaches to the probability distribution function (PDF)  $f_X \left(x\right)\to 0$, where $\int_{-\infty }^{+\infty } f_x \left(u\right)\textrm{du}=1$ and the probability of occurrence over an interval is

$$p\left(x_i \le X\le x_i +\Delta x\right)=\int_{x_i }^{x_i +\Delta x} f_x \left(u\right)\textrm{du}$$

Note that $f_s \left(\ldotp \right)$ is the frequency function and $f_X \left(u\right)$ refers to the probability density function (PDF).

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F6.png" alt="Figure 6" style="width: auto; height: 500px;">
</div>

<p style="text-align: center;">Cumulative frequency (left) and cumulative density functions (CDF) (right)</p>

The cumulative frequency and density functions,


$$F_s \left(x_i \right)=\sum_{j=1}^i f_s \left(x_j \right) \Longrightarrow \textrm{cumulative frequency function} $$

$$F_X \left(x_i \right)=\int_{-\infty }^{x_i } f_X \left(u\right)\textrm{du}\Longrightarrow \textrm{cumulative frequency function (CDF)}$$


As a result, we can have,


$$p\left(x_i \le X\le x_i +\Delta x\right)=F_X \left(x_i +\Delta x\right)-F_X \left(x_i \right)=\int_{-\infty }^{x_i +\Delta x} f_X \left(u\right)\textrm{du}-\int_{-\infty }^{x_i } f_X \left(u\right)\textrm{du}$$

**Moments of a Probability Distribution:**

Moments of a probability distribution are **statistical parameters** that can be used to extract essential information about the **position, spread and shape** of a probability distribution.

**First-order** moment of $f_X (x)$:

$$\mu=\mathbb{E(x)}=\int_{-\infty}^{\infty} uf_X(u)du$$

**Second-order** central moment of $f_X (x)$:

$$\sigma^2=\mathbb{E(x-\mu)^2}=\int_{-\infty}^{\infty} (u-\mu)^2f_X(u)du$$

**Third-order** central moment of $f_X (x)$:

$$\gamma^2=\mathbb{E(x-\mu)^3}=\int_{-\infty}^{\infty} (u-\mu)^3f_X(u)du$$


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F7.png" alt="Figure 7" style="width: auto; height: 300px;">
</div>

<p style="text-align: center;"> Mean (μ) is a measure of central tendency or location. Standard deviation ($\sigma$) is a measure of width or dispersion of the random variable around its mean. The third order central moment ($\gamma$) is a measure of symmetry or skewness of the random variable. Densities with $\gamma >0$
are positively skewed and those with $ \gamma <0$ are negatively skewed </p>

** Percentile or Quantile: **

A **percentile** $x_p$ of a distribution is a statistic that indicates the value below which a given percentage of the probability mass falls. For example, the 95th percentile is the value below which 95% of the probability mass of a distribution is located of $p(X < x_{95}) = 0.95$.

**Example:**

$$\mu = \bar{x}= \mathbb{E(x)}=1 \times 0.3 + 2 \times 0.5 + 3 \times 0.2 = 1.9 = \sum_{i=1}^3 x_if_s(x_i)$$

$$ \sigma^2= \mathbb{E(x-\mu)^2}= (1−1.9)^2 \times 0.3+(2−1.9)^2 \times 0.5+(3−1.9)^2 \times 0.2 = 
0.81 \times 0.3 + 0.01 \times 0.5 + 1.21 \times 0.2 = 0.49$$

$$ \gamma^2= \mathbb{E(x-\mu)^3}= (1−1.9)^3 \times 0.3+(2−1.9)^3 \times 0.5+(3−1.9)^3 \times 0.2 = 
0.048$$

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F8.png" alt="Figure 8" style="width: auto; height: 300px;">
</div>

We often normalize the statistical moments to make them more meaningful.

**Coefficient of variation:**

$$ CV=\frac{\sigma}{\mu} $$


**Coefficient of skewness:**


$$ C_s=\frac{ \mathbb{E(x-\mu)^3}}{\sigma^3} $$

Note: When sample size is small, discrete approximation of the moments may be biased. To obtain **unbiased** estimates, the following formulas shall be used:

$$ \sigma_x^2=\frac{1}{n-1}\sum_{i=1}^n (x_i-\mu)^2 $$

$$ C_s=\frac{ n\sum_{i=1}^n (x_i-\mu)^3 }{(n-1)(n-2)\sigma_s^3}\;\;\;\; \textrm{Where $C_s<0$\;\;(negative skewness)\;\;$C_s>0$\;\;(positive skewness)}$$

In the above unbiased sample statistics the sample mean is $\mu=\frac{\sum_{i=1}^n x_i}{n}$.

## Common PDFs in hydrology I

**Normal Distribution:**

$$ f_X(x) = \frac{1}{\sqrt{2\pi\sigma}}exp(\frac{-(x-\mu)^2}{2\sigma^2}) $$

$$ f_X(x) = \int_{-\infty}^{x} f_X(u)du $$

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F9.png" alt="Figure 9" style="width: auto; height: 300px;">
</div>

<p style="text-align: center;"> PDF (left) and CDF (right) of the Gaussian or Normal distribution </p>

$$ \mathbb{E(x)=\mu} $$
$$ \mathbb{E(x-\mu)^2}=\sigma^2 $$
$$ \mathbb{E(x-\mu)^3}=0\;\;\textrm{and}\;\;\mathbb{E(x-\mu)^4}=3\sigma^4 $$

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F10.png" alt="Figure 10" style="width: auto; height: 500px;">
</div>

<p style="text-align: center;"> Distribution of the probability mass of a normal distribution based on different values of
its standard deviation </p>

When x is drawn from the normal distribution, if we define the standard normal variable as $z = \frac{x−μ}{\sigma}$, which is often called **z-score**, the distribution of z will have zero mean and a standard deviation equal to one. This distribution is called the **standard normal** distribution and has the
following analytical form:


$$ f_Z(z) = \frac{1}{\sqrt{2\pi}}exp(\frac{-z^2}{2}) $$


The cumulative distribution function (CDF) of the standard normal is:

$$ \Phi_Z(z)=\int_{-\infty}^{z} \frac{1}{\sqrt{2 \pi}}e^{\frac{-u^2}{2}}du  $$

The error function is defined as follows:

$$ erfc(x)=1-erf(x)=\frac{2}{\sqrt{\pi}} \int_{x}^{\infty} e^{-u^2}du  $$

The values of the error function for different input values are given in the following table and can be obtained using $\operatorname{erf}(x)$ function in MATLAB and Pyhton (in the scipy.special module). 

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F11.png" alt="Figure 11" style="width: auto; height: 300px;">
</div>

<p style="text-align: center;"> Values of the error function </p>


We can use the error function to compute the CDF of the standard normal density function $\Phi_Z(z)$ as follows:


$$ \Phi_Z(z)=\frac{1}{\sqrt{2 \pi}}\int_{-\infty}^{z} e^{\frac{-u^2}{2}}du, \;\;\;\;\; erf(z)=\frac{2}{\sqrt{\pi}} \int_{0}^{z} e^{-u^2}du  $$

If we do a change of variable as $u=\frac{t}{\sqrt{2}}$ and thus $du=\frac{dt}{\sqrt{2}}$ we have  $u = 0 \rightarrow t = 0$ and
$u=z \Rightarrow t=\sqrt{2}z$. Applying this change of variable to the error function, we get:

$$ erf(z)=\frac{2}{\sqrt{\pi}} \int_{0}^{\sqrt{2}z} e^{\frac{-t^2}{2}}\frac{dt}{\sqrt{2}} = \frac{2}{\sqrt{2\pi}} \int_{0}^{\sqrt{2}z} e^{\frac{-t^2}{2}}dt $$

$$ =2 \left(\frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\sqrt{2}z} e^{\frac{-t^2}{2}}dt - \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{0} e^{\frac{-t^2}{2}}dt  \right) $$

As thus. 

$$ erf(z)=2\left(\Phi(\sqrt{2}z) - \Phi(0)  \right)=2\left(\Phi(\sqrt{2}z) - \frac{1}{2}  \right) $$

which results in,

$$ \Phi_z(\sqrt{2}z) = \frac{1+erf(z)}{2} $$

By another change of variable, we get

$$ z \rightarrow \frac{z}{\sqrt{2}} $$

$$ \Phi_z(z)= \frac{1+erf(z)}{2} $$

Therefore, if $x \sim N(\mu, \sigma)$ and we are interested to compute $F_X(x) = \int_{-\infty}^{x} f_X(u)$, we should follow the following steps:

$$ z=\frac{x-\mu}{\sigma} \Rightarrow F_X(x)=\Phi_z(z)=\frac{1+erf(z)}{2}$$

**MATLAB**: $\textrm{normcdf}(x,\mu,\sigma)$

**Pyhton**: $\textrm{scipy.stats.norm.cdf\;\;\;(in the scipy.special module)}$

**Example**: Assume that $X$ is from a normal distribution with $\mu=2$ and $\sigma=3$, what is the probability of $\textrm{prob}(2.5 \leq x \leq 5) = F_X(5) − F_X (2.5)$?

$$ x=2.5 \Rightarrow z=\frac{2.5-2}{3}=0.167 $$

$$ x=5 \Rightarrow z=\frac{5-2}{3}=1 \Rightarrow p_r(0.167 \leq z \leq z)= \Phi(1) - \Phi(0.167) $$

$$ \Phi(1)=\frac{1+erf(\frac{1}{\sqrt{2}})}{2}=0.8413 $$

$$ \Phi(0.167)=\frac{1+erf(\frac{0.167}{\sqrt{2}})}{2}=0.5663 \Rightarrow \textrm{prob}(0.167 \leq z \leq z)= \Phi(1) - \Phi(0.167) \simeq	 0.2752$$

**Log-normal Distribution:**

Hydrologic variables are often skewed. A logarithm transformation often makes them more symmetric and allows a more robust estimation of their statistics. If a random variable $Y=logX$ is normally distributed, then $X$ has a log-normal distribution.

$$ f_X(x) = \frac{1}{x \sigma_y \sqrt{2 \pi}}exp(-\frac{(log x - \mu_y)^2}{2\sigma_y^2}) \;\; x>0 $$

where $\mu_y$ and $\sigma_y$ are the mean and standard deviation of $Y = log X$.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F12.png" alt="Figure 12" style="width: auto; height: 500px;">
</div>

<p style="text-align: center;"> The PDF (left) and CDF (right) of the log-normal density function </p>

**Exponential Distribution:**

Some hydrologic processes, such as the occurrence time between precipitation events can be explained by the an exponential distribution.

$$ f_X(x) = \lambda e^{-\lambda x} \;\;\; x \geq 0 \;\;  \lambda > 0 $$

$$ F_X(x)=1-e^{-\lambda x} $$

$$ \mu_x=\frac{1}{\lambda}\;\;\;\; \sigma_x^2=\frac{1}{\lambda^2}$$

where $\mu_y$ and $\sigma_y$ are the mean and standard deviation of $Y = log X \;\;\; x \geq 0   \lambda > 0$.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F13.png" alt="Figure 13" style="width: auto; height: 450px;">
</div>

<p style="text-align: center;"> The PDF (left) and CDF (right) of an exponential density function </p>

**Gamma Distribution:** A two parameter Gamma distribution is defined as follows:

$$ f_X(x) = \frac{\lambda^\beta x^{\beta-1}e^{-\lambda x}}{\Gamma(\beta)} \;\;\; \text{for}\;\; x \geq 0 \;\; \text{and} \;\;  \lambda, \beta > 0 $$

where $\lambda$ is a width parameter and $\beta$ characterizes the shape. The following relationship holds between the parameters and the first and second order moments.

$$ \lambda=\frac{\mu_x}{\sigma_x^2}\;\;\;\;\textrm{and}\;\;\; \beta=\frac{\mu_x^2}{\sigma_x^2}.$$

The gamma function for positive integers and real numbers are $\Gamma (\alpha) = (\alpha -1 )!$ and $\Gamma (\alpha) = \int_{0}^{\infty} x^{\alpha -1}e^{-x}dx,$ repectively.


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F14.png" alt="Figure 14" style="width: auto; height: 450px;">
</div>

<p style="text-align: center;"> The PDF (left) and CDF (right) of the gamma density function, which spans a probability continuum from the exponential to the Gaussian PDFs </p>

**Gumbel Distribution:** 

$$ f(x, \beta) =\frac{1}{\beta} \textrm{exp{}} \left[ -\frac{x-u}{\beta}-\textrm{exp}(-\frac{x-u}{\beta}) \right] \;\;\; -\infty<x<\infty $$

$$u \simeq \mu - 0.5772 \beta, \;\;\; \beta=\frac{\sqrt{6}\sigma_x}{\pi}$$

where $\mu$ and $\sigma_x$ are the mean and standard deviation of x.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F15.png" alt="Figure 15" style="width: auto; height: 450px;">
</div>

<p style="text-align: center;"> The PDF (left) and CDF (right) of the Gumbel distribution </p>

**Pearson Type III Distribution:** 

This density is a three parameter gamma density function that has an extra location parameter $\epsilon$.


$$ f_X(x) =\frac{\lambda^\beta (x-\epsilon)^{\beta-1}e^{-\lambda (x-\epsilon)}}{\Gamma(\beta)} \;\;\; x \geq \epsilon $$

$$ \lambda = \frac{\sigma_x}{\sqrt{\beta}}, \;\;\; \beta=\left(\frac{2}{C_s}\right)^2, \;\;\; \epsilon= \mu - \sigma_x \sqrt{\beta}   $$

$\mu:$ mean of x

$\sigma :$ standard deviation of x

$C_s:$ skewness coefficient of x.

**Log Pearson Type III Distribution:**

$$ f_X(x) =\frac{\lambda^\beta (logx-\epsilon)^{\beta-1}e^{-\lambda (logx-\epsilon)}}{x\Gamma(\beta)}; \;\;\; logx \geq \epsilon $$

$$ \lambda= \frac{\sigma_y}{\sqrt{\beta}}; \;\;\; \beta=\left( \frac{2}{C_{sy}}  \right)^2; \;\; \epsilon= \mu_y-\sigma_y \sqrt{\beta}  $$

where $y = log x$. The log-Pearson type III distribution is used in the United States for flood frequency analysis.

**Chi-Squared Distribution:** 

Chi-squared distribution has the following density function with $k \in \mathbb{N}$ (integer) **degree of freedom**:

$$ f_{\kappa}(x) =\frac{1}{2^{\frac{\kappa}{2}}\Gamma(\frac{\kappa}{2})}x^{\frac{\kappa}{2} -1}e^{-\frac{x}{2}}$$

$$ \mu=\kappa, \;\;\; \sigma_X^2=2\kappa, \;\;\; C_s=\sqrt{\frac{8}{\kappa}}$$

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F16.png" alt="Figure 16" style="width: auto; height: 400px;">
</div>

<p style="text-align: center;"> The PDF (left) and CDF (right) of the Chi-Squared distribution </p>


Note: If $z_1, \ldotp , z_k$ are independent standard normal random variables, then the sum of their squares $\sum_i^k z_i^2$ is distributed based on the chi-squared distribution with **k degree of freedom**.

Note that observation or simulation errors can often be well represented by the normal density function. This means that sum of squared of errors has a Chi-Squared distribution.

## Fitting a probability distribution I

Method of Moments:

When we have a set of random samples, we are often interested to obtain a parametric representation for distribution of those random numbers. To obtain that parametric representation, we need to **first choose a PDF and then estimate its parameters such that the density represents well the observed samples**. After choosing the PDF, we can simply compute the sample unbiased statistics of the distribution and use them to estimate the parameters as a function of these statistics.


$$ \mu=\frac{\sum x_i}{n} \;\;\; \sigma_x^2=\frac{1}{n-1}\sum_i(x_i-\mu)^2 \;\;\; C_s=\frac{n\sum_{i=1}^n (x_i-\mu)^3}{(n-1)(n-2)\sigma_x^3}$$

For example, for a normal and/or an exponential distribution we have:

$$ f_X(x)=\frac{1}{\sqrt{2\pi}\sigma_x}exp(-\frac{(x-\mu)^2}{2\sigma_x^2})\;\;\;\textrm{normal distribution} $$

$$ f_X(x)=\lambda e^{-\lambda x} \;\;\; \lambda=\frac{1}{\mu} $$

**Example**: Assume that $x_i=\{2.4, 4.25, 0.77, 13.22, 3.55, 1.37\} $ are drawn from an exponential density function. What is the best estimate of the $\lambda$ parameter based on the method of moment?

$$ \mu=\frac{2.4 + 4.25 + 0.77 + 13.22 + 3.55 + 1.37}{6}=4.28 $$

$\lambda=\frac{1}{\mu}=0.234$ and thus $f_X(x)=0.234e^{-0.234x}$

However, the question is, **how can we choose the best probability density function?** In other words, we need a **goodness of fit test** to differentiate between different candidate PDFs and understand what is the best representative density function.

**A Recap on Hypothesis Testing**

The **null hypothesis** is hypothesis that there is no relationship between two events or random variables. Rejecting the null hypothesis denotes that a relationship or a dependency could exist at a significance level. We can also have a **hypothesis** that there is a relationship or dependency between events or variables or a **random variable is drawn from a specific density**.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F17.png" alt="Figure 17" style="width: auto; height: 450px;">
</div>

Figure 12: A schematic showing the concept of hypothesis testing at significance level $\alpha$. Here, the hypothesis is that x is drawn from the Chi-squared PDF, where $\alpha$ is the **probability of exceedance**. When $\chi_\kappa^2 \leq X_\alpha$ it is likely that the computed statistic is drawn from a Chi-squared density. Thus, we can **NOT REJECT** the hypothesis with significance level \alpha. However, when $\chi_\kappa^2 > X_\alpha$, the computed statistics may not belong the Chi-squared density. Thus, we can **REJECT** the hypothesis with significance level $\alpha$. A common value is $\alpha=0.05$, where $x_{\alpha=0.05}$ denotes $(1 − \alpha) × 100 = 95$ **percentile** or 0.95 **quantile** of the PDF.

**Chi-Squared test**: The chi-square goodness-of-fit test determines if sampled data are drawn from a specified probability distribution, with parameters estimated from the data.

Let’s assume that we have a set of n samples $\{x_j\}_{j=1}^n$, the hypothesis is whether they are drawn from a specific PDF $f_X(x)$ or not. We can divide the observation domain into a set of intervals $[x_{i-1}, x_i]$ and compute the frequency of occurrence of $x_i$ as follows $f_s(x_i)=\frac{n_i}{n}$ (i.e., probability
histogram), where

$n$: is the total number of observations and

$n_i$: is the number of observations that fall within $[x_{i-1}, x_i]$.


For a chosen probability model $f_X(x)$, we have $f_X(x)=f_X(x \in [x_{i-1}, x_i])= F_X(x_i)- F_X(x_{i-1})$, where $F_X(.)$ is the CDF function of the chosen probability model. If we assume that our histogram has $m$ intervals, one may compute the following statistic that represents the error between the o**bserved** $nf_s(x_i)$ and **expected** $nf_X(x_i)$ occurrence rates – obtained from the chosen probability model:


$$ \chi_\kappa^2= \sum_{i=1}^m \frac{[nf_s(x_i)-nf_X(x_i)]^2}{nf_X(x_i)}=\sum_{i=1}^m \frac{n[f_s(x_i)-f_X(x_i)]^2}{f_X(x_i)}  $$


If we assume that the error $e_i = f_s(x_i)− f_X(x_i)$ is drawn from the Gaussian distribution, then it can be theoretically shown that $\chi_\kappa^2$ has a Chi-squared distribution with degree of freedom $\kappa = m − n_p − 1$, where $m$ is the number of intervals and $n_p$ denotes the number fitted parameters for the chosen distribution (e.g., $n_p = 2$ for the Gaussian).

**Example**: We have a record of 69 years of annual precipitation data (inches) with the sample mean and standard deviation $\mu = 39.77$ [in] and $\sigma = 9.17$ [in]. We have the hypothesis that whether the annual precipitation data are drawn from the Gaussian distribution. Use the chi-squared test to accept or reject the null hypothesis.


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F18a.png" alt="Figure 18a" style="width: auto; height: 350px;">
  <img src="Figures/Ch9/Ch9F18b.png" alt="Figure 18b" style="width: auto; height: 350px;">

</div>

For example in the above table, for the 4th interval, we have

$$f_s(x \in [30,35])= \frac{14}{69}= 0.203$$

$$F_s (x_4) = 0.014 + 0.029 + 0.087 + 0.203 = 0.333$$

Let’s fit a Gaussian distribution with the following statistics:

$$ \mu = 39.77'', \;\;\; \sigma_x=9.17''$$

Having the above parameters for a Gaussian density, we can compute the probabilities of the third and forth intervals as follows:

$$ i=4 \Rightarrow z_4=\frac{X-\mu}{\sigma}= \frac{35.39.77}{9.17}= -0.52$$

$$\frac{1+erf(\frac{z_4}{\sqrt{2}})}{2}= \phi_z(z_4= -0.52)= F(X_4=35)= 0.301$$

$$i=3 \Rightarrow z_3=\frac{30-39.77}{9.17}= -1.065$$

$$\phi_z(z_3= -1.065)= F_X(X_3=30)=0.144$$

$$f_X(x \in [x_3,x_4])= F_X(x_4)- F_X(x_3)= 0.301- 0.144= 0.158$$

$$\textrm{Pyhton} \Rightarrow f_X(x \in [x_{i-1},x_i])= F_X(x_i)- F_X(x_{i-1})= \textrm{stats.norm.cdf}(x_i, \mu, \sigma)- \textrm{stats.norm.cdf}(x_{i-1}, \mu, \sigma)$$

$$\textrm{MATLAB} \Rightarrow f_X(x \in [x_{i-1},x_i])= F_X(x_i)- F_X(x_{i-1})= \textrm{normcdf}(x_i, \mu, \sigma)- \textrm{normcdf}(x_{i-1}, \mu, \sigma)$$

And in the last column for $i = 4$, we have, $\chi_\kappa^2(x_i)=\frac{n[f_s(x_4)-f_X(x_4)]^2}{f_X(x_4)}=0.891$ and the sum for all intervals is $\chi_\kappa^2=\sum_{i=1}^m\frac{n[f_s(x_i)-f_X(x_i)]^2}{f_X(x_i)}=2.377$ where, $m = 10, n_p = 2$ and the degree of freedom $\kappa = m − n_p − 1 = 7$. The chi-squared statistic for significance level $\alpha=0.05$ can be computed from [available tables](https://www.medcalc.org/manual/chi-square-table.php) or existing software tools. In the problem at hand $x_{(\alpha=0.05)}=14.1$, which is the inverse of the CDF of Chi-squares distribution with $\kappa=7$ at non-exceedance probability of $1 − \alpha = 0.95$.

---

Note: if you have access to a computer program that can automatically compute $F(x_i)$, given the mean ($\mu =29.77$ in) and standard deviation ($\sigma=9.17$ in) of the fitted normal distribution to the annual rainfall values, you don’t need to compute $z_i$ . In Pyhton $f(x_i) = F(x_i) − F(x_{i−1}) = \textrm{stats.norm.cdf} (x_i,\mu,\sigma) − \textrm{stats.norm.cdf} (x_{i−1}, \mu, \sigma)$ (In MATLAB the command is $\textrm{normcdf}$ with the same input), where $\mu = 29.77$ [in] and $\sigma = 9.17$ [in] in the above example. Moreover the inverse of the Chi-Squared distribution can be obtained using the following commands:

Pyhton:
$$x_\alpha=14.1=\textrm{ch2inv}(0.95, 7)$$

MATLAB:

Excel: 

$$14.1 = \textrm{chinve}(0.95,7)$$

In MATLAB, the above Chi-squared test can be done with $\textrm{h = chi2gof(x)}$ in which for $h = 0$, we can reject the null hypothesis and assume that samples are drawn from the Gaussian density with significance level $95\%$. In pyhton the command is $\textrm{scipy.stats.chisquare}$(observed_freq, expected_freq) which needs to compare the observed frequencies of the data against the expected frequencies.

---

In the above example, the hypothesis is whether precipitation observations are drawn from the normal distribution. We have to check whether the hypothesis can be accepted or shall be rejected. Since $\chi_\kappa^2 < X_{\alpha}$, we can **NOT reject the hypothesis. Therefore, the normal distribution could be an acceptable distribution for a parametric representation of the precipitation data.**

## Frequency Analysis I

Frequency analysis in hydrology is for explaining **extreme hydrologic events** that cannot be explained via physical modeling. For example, physical models cannot directly tell us what is the **magnitude of a flood** with a return period of 100 years. In this case, probability theory can help. To that end we have to find **probability distributions of hydrologic extremes**, such as the **maximum annual streamflow**. It is important to note that when we fit a probability density function to a finite number of samples, we assume that those samples are **independent and identically distributed (i.i.d)**. For example, two samples of streamflow rate from today and tomorrow can be considered as two samples from the same distribution but it is very likely that they are not independent. **However, for example, maximum annual streamflow rates can be considered as independent random variables.**

**Extrme events:** An extreme event is an event that has very low probability of occurrence and a long return period.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F19.png" alt="Figure 19" style="width: auto; height: 350px;">
</div>

<p style="text-align: center;"> The extreme value $x_T$ with return period T can be inferred from probability distribution of independent and identically distributed annual or block maxima of the event (e.g., annual floods). Note that $x_T$ is the $100(1 − p)\textrm{th}$ percentile of $f_X(x)$, where p denotes the probability of exceedance </p>

The following table represents the **annual daily maximum discharge in cfs** of a river station for almost five decades.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F20.png" alt="Figure 20" style="width: auto; height: 350px;">
</div>

For example assuming $x_T =50,000$ cfs, we can see that nine times the maximum annual flow exceeds this extreme event with the following recurrence intervals:

$$t_1=4,\; t_2=1,\; t_3=1,\; t_4=16,\; t_5 = 3,\; t_6 = 6,\; t_7 = 5,\; t_8 = 5\textrm{yr}$$

Therefore, the return period is $T=\mathbb{E}(t)= \frac{\sum t_i}{n}= \frac{41}{8}=5.125$ year.

Let us assume that the probability of **exceedance** of an extreme event is denoted by $p$. If an extreme event occurs at $t^{th}$ year, it means that there were $t − 1$ years without any extreme event. Therefore, the probability distribution of the return period maybe explained as follows:

$$p_T(t)=(1-p)^{t-1}p $$

The expected values of $t$ or the return period can be calculated as

$$T=\mathbb{E}(t)= \sum_{t=1}^{\infty} t(1-p)^{t-1}p$$
$$=p+2(1-p)p+3(1-p)^2p+4(1-p)^3p+...$$
$$=p[1+2(1-p)+3(1-p)^2+4(1-p)^3+...]$$
$$=\frac{p}{[1-(1-p)]^2}=\frac{1}{p}$$

Therefore, $T = \mathbb{E}(t) = \frac{1}{p}$, and thus the return period is equal to the inverse of the probability of exceedance of $x_T$,

$$ \frac{1}{T}=p=1-F_X(x_T)\;\;\; \textrm{and thus}\;\;\; x_T=F_X^{-1}(1-p)$$

**Example**: For an $x_T = 50000\;\textrm{[cfs]}$ , the return period can be obtained as $T = \mathbb{E}(t) = 5.1$ yr, which results in $p =\frac{1}{T}= 0.195$.

---
Question: What is the probability that a $T$-year return period event will occur at least once in $N$ years?

$$P_r(X<x_t \;\;\textrm{for N consecutive years})=(1-p)(1-p)...=(1-p)^N$$

$$P_r(X \geq x_t \;\;\textrm{at least once in N years})=1-(1-p)^N=1-(1-\frac{1}{T})^N$$

---

**Example:** Estimate the probability that annual maximum discharge $Q$ will exceed 50000 cfs at least once during the next 5 years.

$$ p=\frac{1}{T} = \frac{1}{5.1} = 0.195 $$

$p(Q \geq 500000)$ cfs at least once in the next five years $=1-(1-0.195)^5 \simeq 0.66$

**Extreme Value Distributions:**
Two classes of distributions are commonly used in hydrologic extreme value analyses:

- (1) Extreme value type I,II,III
- (2) Log-Pearson Type III

**Extreme value type I** or the Gumbel distribution is often used for rainfall frequency analysis as follows:

$$ f_X(x)= \frac{1}{\beta}exp[-\frac{x-u}{\beta}-\exp(-\frac{x-u}{\beta})]\;\;\; -\infty<x<\infty $$

$$ u=\mu_x-0.5772\beta $$

$$ \beta=\frac{\sqrt{6}\sigma_x}{\pi} $$

$$ F_X(x)= \exp[-\exp(-\frac{x-u}{\beta})]\;\;\; -\infty<x<\infty $$

If we define a reduced variable as

$$ y=\frac{x-u}{\beta} $$

we have:

$$ F_X(x)=\exp[-\exp(-y)] \Rightarrow y_t=-\ln[\ln(\frac{1}{F_X(x_T)})]\;\;\; (1), $$

we showed that,


$$ \frac{1}{T}=p= \int_{x_T}^\infty f_X(x)dx=1-\int_{-\infty}^{x_T} f_X(x)dx$$

$$ = 1-F_X(x_T) \Rightarrow F_X(x_T)=\frac{T-1}{T},\;\;\; (2)$$

from (1) & (2) we have,

---

$$ y_T=-\ln[\ln(\frac{T}{T-1})] $$
$$ X_T=u+\beta y_T $$

---

Therefore, given the return period $T$, we can compute $y_T$ and knowing the parameters of the Gumbel distribution for the maximum annual values, from the method of moment, we can obtain the associated extreme event $x_T$ as shown above.

**Example:** Annual maximum value of 10-minute-duration rainfall (inches) in a specific location is presented below. Calculate the 10-minute-duration maximum rainfall for 5- and 10-year return period.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F21.png" alt="Figure 21" style="width: auto; height: 350px;">
</div>

$$ \mu=0.649'' \;\;\; \sigma_x=0.177'' \;\;\; \beta=\frac{\sqrt{6} \sigma_x}{\pi}=0.138 $$

$$ u=\mu_x-0.5772\beta=0.569 $$
$$ y_T=-\ln[\ln(\frac{T}{T-1})] $$
$$ y_T=-\ln[\ln(\frac{5}{4})]=1.5 $$
$$ X_T=u+\beta y_T= 0.569+0.138 \times 1.50= 0.78'' $$
The same approach can be used for $T = 10$ year.

**Log-Pearson Type III:** is used for flood frequency in the United States. Recall that as the Gumbel distribution was invertible, we have a closed form expression for its percentile and thus could find an analytical expression for the extreme values of annual rainfall maxima ($x_T$ for return period T) as follows:

$$ f(X \geq x_T)=p \Rightarrow F_X^{-1}(1-p)=x_T $$

However, CDFs of many distributions are not invertible. For computational convenience, we use a **frequency factor** $K_T$ and express the percentile of the distribution for return period T as follows:

$$ x_T=\mu_x+K_T \sigma_x, $$

where $\mu_x$ and $\sigma_x$ are the mean and standard deviation of the random variable $x$. When the random variable (e.g, maximum annual flood) has significant positive skewness, we take $y = \log x$ and then we have

$$ y_T=\mu_y+K_T \sigma_y. $$

Finding $y_T$ , then one can obtain $x_T =10^{y_T}$.

For complex distributions such as the log-Pearson type III, the relationship between $K_T$ and $T$ is given in some pre-calculated tables as a function of coefficient of skewness of the log-transformed variables. In the tables shown in the following slides, the frequency factor $K_T$ is given for return period $T = 2, 5, 10, 25, 50, 100, 200\;\textrm{yr}$ as a function of the sample skewness coefficient of the data ($C_s$) in a logarithm scale for the log-Pearson Type III distribution.


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F22a.png" alt="Figure 22a" style="width: 800px; height: auto;">
  <img src="Figures/Ch9/Ch9F22b.png" alt="Figure 22b" style="width: 800px; height: auto;">

</div>

**Example:** Sixteen years of the annual maxima of a river flow is given in the following table. Calculate the annual maximum discharge, with a 50-year return period, using the log-Pearson type III distribution.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F23.png" alt="Figure 23" style="width: 800px; height: auto;">
</div>

The statistics of the annual maxima in the log-scale ($y = \log x$) are as follows:

$$ \mu_y=\frac{\sum y_i}{n}=4.27, \;\;\; \sigma_y=\sqrt{\frac{\sum(y_i-\mu_y)^2}{n-1}}=0.4027, \;\;\; C_s=\frac{ n\sum_{i=1}^n (y_i-\mu_y)^3 }{(n-1)(n-2)\sigma_y^3}=-0.0696$$

For $C_s = −0.0696$ from the table of the frequency factors, through linear interpolation, we have

$$ K_{T=50} \simeq 2.054+ \frac{2.00-2.054}{(-1.0-0)} \times(-0.0696-0)=2.016$$

Therefore, $y_{T=50}=\mu_y+K_{T=50}\sigma_y=4.2743+2.016 \times 0.4027=5.0863$.

Thus, $x_{T=50}=10^{5.0863}=121,990\;\textrm{[cfs]}$

## Water Resources Council Method I

The values of frequency factors $K_T$ are very sensitive to the coefficients of skewness $C_s$. To avoid overestimation or underestimation of floods and financial consequences, the Water Resources Council recommends a method that leads to a more accurate and robust estimate of the coefficient of skewness ($C_w$) as follows:

$$ C_w=wC_s+(1-w)C_m $$

where 

$C_s$ : is the sample skewness

$C_m$: denotes a map skewness,

$C_w$ : denotes a weighted skewness (a more robust estimate),

where, $w$ is an optimal weight that interpolates between these two quantities.

Assuming that $C_s$ and $C_m$ are two independent variables, the optimal weight is defined such that it minimizes the variance of $C_w$ as follows:

$$ \mathbb{Var}(C_w)=w^2\mathbb{Var}(C_s)+(1-w)^2\mathbb{Var}(C_m) $$

To obtain the minimizer of $\mathbb{Var}(C_w)$, we set its derivative to zero $\frac{d}{dw}[\mathbb{Var}(C_w)]=0$ which leads to optimal value of

$$ w=\frac{\mathbb{Var}(C_m)}{\mathbb{Var}(C_s)+\mathbb{Var}(C_m)} $$

This value of $w$ is a minimizer of the variance of $C_w$ as we can show that $\frac{d^2}{dw^2}[\mathbb{Var}(C_w)]\geq0$.

Therefore, a robust estimate of the skewness coefficient is:

---

$$ C_w= \frac{\mathbb{Var}(C_m)C_s+\mathbb{Var}(C_s)C_m}{\mathbb{Var}(C_m)+\mathbb{Var}(C_s)} $$

---

The water resources council recommends to consider $\mathbb{Var}(C_m) = 0.3025$ and proposes a regional map for $C_m$.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F24.png" alt="Figure 24" style="width: 800px; height: auto;">
</div>

<p style="text-align: center;"> Generalized map skewness coefficients (Cm) of annual maximum streamflow from U.S.
Water Resources Council (1981) </p>


For the Log-Pearson type III distribution and n years of annul maximum streamflow, the variance of the sample skewness can be obtained from the following formula:

---

$$ \mathbb{Var}(C_s)=10^{A-B \log_{10}^{(\frac{n}{10})}} $$

---

$$ A=\begin{cases}
          -0.33+0.08|C_s| & |C_s| \leq 0.90  \\
          -0.52+0.30|C_s| & |C_s| > 0.90  \\
\end{cases} \;\;\;\; B=\begin{cases}
          0.94-0.26|C_s| & |C_s| \leq 1.5  \\
          0.55 & |C_s| > 1.5  \\
\end{cases}  $$

where $n$ is the number of years of data.

**Example:** Determine the flood with return period of 100 years from the following maximum annual streamflow data near Austin, TX, where $C_m = −0.3$.


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F25.png" alt="Figure 25" style="width: 600px; height: auto;">
</div>


Taking the logarithm of the data $y = \log x$, the statistics of $y$ are,

$$ \mu_y=3.6388 \;\;\; \sigma_y=\frac{\sum(y_i-\bar{y})^2}{n-1}=0.4439 \;\;\; C_s=\frac{ n\sum_{i=1}^n (y_i-\bar{y})^3 }{(n-1)(n-2)\sigma_y^3}=-1.244 $$

$ |C_s|>0.9 \Rightarrow  A = −0.52 + 0.3|Cs| = −0.52 + 0.3 \times 1.244 = −0.147. $

$ |C_s|>1.5 \Rightarrow  B = 0.94 − 0.26|Cs| = 0.617. $

Thus, we have $\mathbb{Var}(C_s)=10^{-0.147-0.617 \log_{10}^{(\frac{16}{10})}}=0.533$ and $\mathbb{Var}(C_m)=0.303$

The optimal weight can be calculated as follows:

$$ w=\frac{\mathbb{Var}(C_m)}{\mathbb{Var}(C_s)+\mathbb{Var}(C_m)} = \frac{0.303}{0.303+0.533}= 0.362.$$

Then a robust estimate of the coefficient of skewness is

$$ C_w=wC_s+(1-w)C_m = 0.362 \times (−1.244) + 0.638 \times (−0.3) = −0.64.$$

From the table of the frequency factors for the Log-Pearson Type III distribution, for negative skewness and $T = 100$ years, we have

$$ C_w=-0.6 \rightarrow K_T=1.88 $$

$$ C_w=-0.7 \rightarrow K_T=1.806 $$

By linear interpolation, one can obtain $C_w = −0.64$ and calculate the frequency factor $K_T = 1.850$, which leads to

$$ y_T = μ_y + K_T \sigma_y = 3.639 + 1.85 \times 0.4439 = 4.46 \rightarrow  Q_{T=100} = 10^{4.46} = 28,900 \;\;\textrm{[cfs]}. $$

## Design Storms

As we discussed, we collect maximum total precipitation amount for a specific duration (e.g., 10-minute duration rainfall) for each year and then use Extreme value type I distribution to obtain
extreme precipitation events ($x_T$) for a specific return period. 

This calculation is done by the National Weather Service and is published under the name of *Rainfall Frequency Atlas of the United States”*, **Technical paper No.4.** These maps are available
for durations of 30 minutes to 24 hours and return period of 1 to 100 years (see Figure below).

As a result based on the project design specifications, the design engineer shall interpolated between isohyetals, which are lines with equal values of extreme precipitation amount.

For shorter durations of 5 to 60 minutes, there is another report called NOAA Technical document Hydro-35. The maps of precipitation depths from 5-, 15- and 60-minute durations and return period of 2 and 100 years for 37 eastern states are available in this technical paper.

Currently, there is a [Precipitation Frequency Data Server (PFDs)](https://hdsc.nws.noaa.gov/pfds/), that can be used for obtaining the values of design precipitation for different return periods almost all over the United States.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F26.png" alt="Figure 26" style="width: 1000px; height: auto;">
</div>

<p style="text-align: center;"> 1-year 30 minute rainfall (in) in the United States as presented in U.S. Weather Bureau technical paper 40 </p>

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F27.png" alt="Figure 27" style="width: 1000px; height: auto;">
</div>

<p style="text-align: center;"> 100-year 24-hour rainfall (in) in the United States as presented in U.S. Weather Bureau technical paper 40 </p>

**Intensity-Duration-Frequency (IDF) curves**

Intensity of precipitation is depth per unit time (mm/hr)

$$ i=\frac{R_d}{T_d} $$

where $R_d$ is the rainfall depth (e.g., mm, in) and $T_d$ is the duration, usually in hours. The frequency is often expressed in terms of the return period $T$, which is the average length of time
between precipitation events.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F28.png" alt="Figure 28" style="width: 500px; height: auto;">
</div>

<p style="text-align: center;"> A schematic of rainfall IDF curves </p>

We discussed that using extreme value Type I distribution, we can compute extreme precipitation values for a specific duration (e.g. 10-minute) for different return periods ($x_T$). The explained
calculation can be simply represented in a plot called Intensity-Duration-Frequency (IDF) curves.

For specific return periods, the IDF curves are typically represented by the following parametric equation:

$$ i=\frac{c}{(T_d)^e+f}\;\; (\frac{\textrm{in}}{\textrm{hr}}) $$

where $T_d$ is the rainfall duration and e, c, f are empirical coefficients that can be obtained from historical observations of rainfall.


For example, for a 10-year return period and duration $T_d = 20$ minutes, these coefficients over Atlanta are $c = 97.5$, $e = 0.83$ and $f = 6.88$. Therefore, a 20-minute design rainfall with a
10-year return period in Atlanta is:


$$ i=\frac{97.5}{(20)^{0.83}+6.88}=5.15\;\; (\frac{\textrm{in}}{\textrm{hr}}) $$

**Design precipitation hyetographs from IDF curves:**

There are several methods to obtain the design hyetograph from IDF curves. Here, we explain the **alternating block** method through an example.


**Example:** Determine the design precipitation hyetograph, with 20 minutes intervals, for a $2\frac{1}{3}$ hour storm in Atlanta using the IDF curve $i = \frac{c}{(T_d^e + f)}$, where c=97.5, e=0.83 and f=6.88.

From this IDF curve, we can compute the rainfall intensities over 20 minutes intervals as follows:


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F29.png" alt="Figure 29" style="width: 1000px; height: auto;">
</div>

To shape the design hyetograph, we alternate incremental depths around their maximum in a descending order as shown in the figure below. As is evident, this derivation is not unique depending on whether we start the alternation from the left or right hand side of the maximum incremental depth.


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F30.png" alt="Figure 30" style="width: 500px; height: auto;">
</div>

**Rational Method:**

Rational method, is a simple method for estimating the basin outflows based on different land use and rainfall return period. In this method, the rate of the outflow of a drainage basin is determined as follows:

$$ Q=ciA $$

$Q:$ outflow [cfs]

$c:$ runoff coefficient $0 \leq c \leq 1$

$i:$ rainfall intensity $[\frac{\textrm{in}}{\textrm{hr}}]$

$A:$ area of the basin in acres [$43560\;\textrm{ft}^2$]


We can divide the basin into $m$ smaller sub-basins with more uniform land use and then obtain the outflow based on the following equation:


$$ Q=i\sum_{j=1}^m c_jA_j $$


The runoff coefficient ($c_j$) for different land use and rainfall return period may be obtained from the following table.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F31.png" alt="Figure 31" style="width: 1000px; height: auto;">
</div>

<p style="text-align: center;"> Values of the runoff coefficients in the rational method </p>


Rational method is widely used in the design of storm sewer systems. To that end, we first need to define the design storm in terms of its total depth and duration. The total depth of the design
rainfall can be obtained from NOAA rainfall design maps (e.g., see Generalized map skewness coefficients figure) or regional IDF curves for calculation the design intensity $i = \frac{p}{T_d}$. The design duration ($T_d$) is the time of the concentration of the basin that drains water into the sewer system,

$$ T_d=t_c=\sum_{i=1}^n \frac{L_i}{V_i} $$

**Example:** Determine the required pipe diameter for a storm sewer drainage system for a $T = 5$ yr return period, where $A = 5$ acres, $c = 0.6$ and $t_c = T_d$ = 10 minutes. The design precipitation is give by following IDF equation,


$$ i=\frac{100T^{0.2}}{T_d+30}=4.14\;\; \frac{\textrm{in}}{\textrm{hr}}, $$

$$ Q=ciA=0.6 \times 4.14 \times 5= 12.42\;\; \textrm{cfs}. $$


Elevation of the basin changes $3$ ft in $500$ ft of length. Thus the slope is $S_0 = \frac{3}{500}=0.006$ and the manning’s coefficient is $n = 0.02$. Using the Manning equation, we have


$$ Q=\frac{1.49}{n}S_0^{\frac{1}{2}}AR^{\frac{2}{3}} $$

$$ Q=\frac{1.49}{n}S_0^{\frac{1}{2}} \frac{\pi D^2}{4} \left(\frac{D}{4}\right)^{\frac{2}{3}}= \frac{0.463}{n}S_0^{\frac{1}{2}}D^{\frac{8}{3}}$$

$$ \textrm{and thus}\;\; D=\left(\frac{2.16nQ}{\sqrt{s_0}}\right)^{\frac{3}{8}}= \left(\frac{2.16 \times 0.02 \times 12.42}{\sqrt{0.006}}\right)^{\frac{3}{8}}=2 [\textrm{ft}].$$

## Flood Frequency Analysis using HEC-SSP I

The HEC-SSP software by the [U.S. Army Corps of Engineers](http://www.hec.usace.army.mil/software/hec-ssp/download.aspx) can be used for flood frequency analysis using the available standard methods. Here, we provide an example using streamflow data for the Minnesota River at Grand Rapid Station.

Step 1: Add a shape file map to the HEC-SSP (not required). Go to “Maps” and then click on the “Add Maps Layer” and chose a shape file that represents the boundary of the studied region.


<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F32.png" alt="Figure 32" style="width: 1000px; height: auto;">
</div>

<p style="text-align: center;"> Adding shape file of the U.S. to HEC-SSP </p>


Step 2: Go the “Data” tab and then click on the “New”. Get the data of annual maximum river flows in Minnesota from the USGS Website (e.g., 05211000, Minnesota River at Grand Rapid).

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F33.png" alt="Figure 33" style="width: 1000px; height: auto;">
</div>

<p style="text-align: center;"> Annual maximum flood [cfs] at 05211000, Minnesota River at Grand Rapid </p>


Step 3: Go to “Analysis” tab and click on the “New” tab and then click on the Bulletin 17 Flow Frequency.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F34.png" alt="Figure 34" style="width: 1000px; height: auto;">
</div>

<p style="text-align: center;"> Bulletin 17B flow frequency analysis options in HEC-SSP </p>


Step 4: Push the compute button and then plot the curve. From the curve, given that $p = \frac{1}{T}$, we can infer the flow discharge for different values of return period.

<div style="text-align: center;">
  <img src="Figures/Ch9/Ch9F35.png" alt="Figure 35" style="width: 1000px; height: auto;">
</div>

<p style="text-align: center;"> Extreme flow versus its probability of exceedance $p = \frac{1}{T}$ at the USGS station 052111000 </p>
