# Why A 100 Year Flood Can Occur Every Year. Calculate Exceedance Probability and Return Periods in Python

## Introduction to flood frequency analysis

One way to analyse timeseries data - particularly related to events like floods - is to calculate the frequency of different magnitude events. You have likely heard the term "100-year-flood". t actually refers to the flood magnitude that has a probability of exceedance of 1/100 in any given year (i.e., a 1% chance). This is why 100 year flood event can occur 2 years in a row.

In this lesson you will learn how "100-year floods" (and other flood frequencies) are calculated using some basic stats. Let's define 2 terms:
1. Exceedance probability: the probability of a given magnitude event or greater to occur.
2. Recurrence interval: the average time of exceedance is the inverse of exceedance probability.

### Important consideration

- The above definitions assume that flood events in the timeseries are independent (i.e. the event magnitudes are not correlated with each other in time)

In this project, we will be interpreting max annual floods. How valid do you think the above assumptions are for annual maxima?

- In this project, we will be asking you to construct and interpret plots of recurrence intervals. Do you think the processes that drive floods are periodic? If so, over what timescales?

## What is recurrence interval?

100-year floods can happen 2 years in a row

Stat techniques, through a process called frequency analysis, are used to estimate the probability of the occurrence of a given precipitation event. The recurrence interval is based on the probability that the given event will be equalled or exceeded in any given year. For example, assume there is a 1 in 50 chance that 6.60 inches of rain will fall in a certain area in a 24-hour period during any given year. Thus, a rainfall total of 6.60 inches in a consecutive 24-hour period is said to have a 50-year recurrence interval. Likewise, using a frequncy analysis there is a 1 in 100 chance that a streamflow of 15,000 cubic feet per seconds (ft3/s) will occur during any year at a certain streamflow-measurement site.Thus, a peak flow of 15,000 ft3/s at the site is said to have a 100-year recurrence interval. Rainfall recurrence intervals are based on both the magnitude and the duration of a rainfall event, whereas streamflow recurrence intervals are based solely on the magnitude of the annual peak flow.

10+ years of data are required to perform a frequency analysis for the determination of recurrence intervals. Of course, the more years of historical data the better - a hydrologist will have more confidence on an analysis of a river with 30 years of record than 1 based on 10 years of record.

Recurrence intervals for the annual peak streamflow at a given location change if there are significant changes in the flow patterns at that location, possibly caused by an impoundment or diversion of flow. The effects of development (conversion of land from forested or agricultural uses to commercial, residential, or industrial uses) on peak flows is generally much greater for low-recurrence interval floods than for high-recurrence interval floods, such as 25- 50- or 100-year floods. During these larger floods, the soil is saturated and does not have the capacity to absorb additional rainfall. Under these conditions, essentially all of the rain that falls, whether on paved surfaces or on saturated soil, runs off and becomes streamflow.

## How can we have 2 "100-year floods" in < 2 years?

This question points out the importance of proper terminology. The term "100-year flood" is used in an attempt to simplify the definition of a flood that statistically has a 1% chance of occurring in any given year. Likewise, the term "100-year storm" is used to define a rainfall event that statistically has this same 1% chance of occurring. In order words, over the course of 1 mil years, these events would be expected to occur 10,000 times. But just because it rained 10 inches in 1 day las year doesn't mean it can't rain 10 inches in 1 day again this year.

## What is an annual exceedance probability?

The USGUS and other agencies often refer to the % chance of occurrence as an Annual Exceedance Probability or AEP. An AEP is always a fraction of 1. SO a 0.2 AEO flood has a 20% chance of occuring in any given year, and this corresponds to a 5-year recurrence-interval flood. Recurrence-interval terminology tends to be more understandable for flood intensity comparisons. However, AEP terminology reminds the observer that a rare flood does not reduce the chances of another rare flood within a short time period.

## Calculate probability in Python

You will use streamflow data to explore the probabilities of a different magnitude events (e.g. discharge is measured in cubic feet per sec). You will want long historic records to make your stat inference more robust.

You will use the hydrofunctions python package to access streamflow data via an API from the US Geological Survey National Water Info System website.

In [14]:
import os
import urllib
import requests
import math
import matplotlib.pyplot as plt
import seaborn as sns
import earthpy as et
import hydrofunctions as hf
import pandas as pd

# Datetime conversion registration
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Get the data & set work dir
data = et.data.get_data("colorado-flood")
os.chdir(os.path.join(et.io.HOME, "earth-analytics"))

# Prettier plotting with seaborn
sns.set(font_scale=1.5, style="whitegrid")

## Find a station of interest

The hf.draw_map() function allows you to explore the station visually in a particular area.

You will gage dv06730500 The gage along Boulder Creek survived the 2013 flood event and is 1 of longest timeseries dataset along Boulder Creek.

In [3]:
# Create map of stations
hf.draw_map()

You can get a list of all stations located in Colorado using the hf.NWIS().get_data() method.

In [7]:
# Request data for all stations in Colorado
PR = hf.NWIS(stateCd="CO").get_data()

# List the names for the first 5 sites in Colorado, USA
PR.siteName[0:5]

Requested data from https://waterservices.usgs.gov/nwis/dv/?format=json%2C1.1&stateCd=CO


AttributeError: 'NWIS' object has no attribute 'siteName'

## Download stream gage data

### Mean daily vs instantaneous stream flow data

There are 2 kinds of streamflow timeseries data that the USGS provides online:
1. Mean daily streamflow: Mean daily streamflow is useful because it is a complete timeseries (except for days when the gage fails) and thus retains all recorded streamflow events over the period of record.
2. Annual max instantaneous streamflow: Instant data is not averaged over the entire day, but instead reflects continuous variation in the flood hydrograph recorded by the stream gage. As such, annual max instant streamflow data are useful because they retiain the max values of discharge recorded in a given year.

You will download the mean daily discharge data. The code for this data in dv when using hydrofunctions python package.

### Get data using hydrofunctions API interface for Python

To begin define a start and end date that you'd like to download. ALso define the site ID. Use USGS 06730500 as your selected site. This stream gage survived the 2013 flood event in Colorado. It also has a long record of measurement that will be helpful when calculating recurrence intervals and exceedance probability values below.

## Station selection

In general, to select stream gages for flood frequency analysis you will want to carefully examine the metadata for candidate stations to check for the time period of operation, record completeness, and other comments on gage operation that might impact your interpretation of stat results (e.g., Is there a dam upstream? When was it built? Other flow diversion? Did the gage malfunction during some event?)

There are 2 subset of USGS gages that have been specially identified for hydo-climatic analyses because station records are of high quality, cover a long time period, and human modification of the water shed is minimal (e.g. due to flow regulation or urban development): (1) Hydro-Cli9matic Data Network - 2009 and (2) Geospatial attributes of gages for evaluation streamflow.

For this project, we followed the lead of scientists assessing the significance of the 2013 Colorado floods using methods similar to the ones introduced in this project.

In [8]:
# Define the site numnber and start and end dates that you are interested in
site = "06730500"
start = "1946-05-10"
end = "2018-08-29"

# Request data for that site and time period
longmont_resp = hf.get_nwis(site, "dv", start, end)

Requested data from https://waterservices.usgs.gov/nwis/dv/?format=json%2C1.1&sites=06730500&startDT=1946-05-10&endDT=2018-08-29


### View site and metadata info

You can explore the metadata for the site using the get_nwis() function. Below we request the metadata for the site and the "dv" or Daily Value data. Recall from above that dv is the mean daily value. iv provides the instant values.

In [45]:
# Request data for the site and time period
longmont_resp = hf.get_nwis(site, "dv", start, end)

# Convert the response to a json in order to use the extract_nwis_df function
longmont_resp = longmont_resp.json()

# Get metadata about the data
hf.get_nwis(site, "dv").json()

Requested data from https://waterservices.usgs.gov/nwis/dv/?format=json%2C1.1&sites=06730500&startDT=1946-05-10&endDT=2018-08-29
Requested data from https://waterservices.usgs.gov/nwis/dv/?format=json%2C1.1&sites=06730500


{'name': 'ns1:timeSeriesResponseType',
 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType',
 'scope': 'javax.xml.bind.JAXBElement$GlobalScope',
 'value': {'queryInfo': {'queryURL': 'http://waterservices.usgs.gov/nwis/dv/format=json%2C1.1&sites=06730500',
   'criteria': {'locationParam': '[ALL:06730500]',
    'variableParam': 'ALL',
    'parameter': []},
   'note': [{'value': '[ALL:06730500]', 'title': 'filter:sites'},
    {'value': '[mode=LATEST, modifiedSince=null]',
     'title': 'filter:timeRange'},
    {'value': 'methodIds=[ALL]', 'title': 'filter:methodId'},
    {'value': '2020-06-21T11:09:09.880Z', 'title': 'requestDT'},
    {'value': 'a185f160-b3af-11ea-b630-6cae8b663fb6', 'title': 'requestId'},
    {'value': 'Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information.',
     'title': 'disclaimer'},
    {'value': 'vaas01', 'title': 'server'}]},
  'timeSeries': [{'sourceInfo': {'siteName': 'BOULDER CREEK AT MOUTH

Now, request the data, returned as pandas dataframe.

In [46]:
# Get the data in a pandas dataframe format
longmont_discharge = hf.extract_nwis_df(longmont_resp)
type(longmont_discharge)

tuple

In [47]:
longmont_discharge = pd.DataFrame(longmont_discharge)

In [48]:
longmont_discharge.head()

Unnamed: 0,0
0,USGS:06730500:00060...
1,{'USGS:06730500': {'siteName': 'BOULDER CREEK ...


Hydrofunctions import your data into a pandas dataframe with a datetime index. However, you may find the column headings to be too long.

In [49]:
# Rename columns
longmont_discharge.columns = ["discharge", "flag"]

# View first 5 rows
longmont_discharge.head()

ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements