# Visualization of Streamflow Conditions at Streamgages

This notebook provides a demonstration of the use of [hyswap](https://doi-usgs.github.io/hyswap/) python package for calculating streamflow percentiles and then visualizing streamflow conditions at multiple streamflow gages. 

This example notebook relies on use of the [dataretrieval](https://github.com/DOI-USGS/dataRetrieval) package for downloading streamflow information from USGS NWIS as well as the [geopandas](https://geopandas.org/) package for mapping functionality.


In [1]:
# Run commented lines below to install geopandas and mapping dependencies from within the notebook
#import sys
#!{sys.executable} -m pip install geopandas folium mapclassify

In [2]:
from dataretrieval import nwis
import hyswap
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo

from tqdm import tqdm # used for progress bar indicators
import geopandas # has dependencies of folium and mapclassify to create maps within this notebook
import warnings
warnings.filterwarnings('ignore') # ignore warnings from dataretrieval

**NOTE:** The `tqdm` package is used in for-loops in this notebook to show a data download progress bar, which may be informative to the user. The specification below (`disable_tdqm`) determines whether this progress bar is displayed when the notebook renders. It is set to `True` when rendering the notebook in the `hyswap` GitHub documentation site. To see the progress bars in this notebook, set `disable_tqdm=False`.

In [3]:
disable_tqdm=True

## Define Helper Functions
The `hyswap` package provides functionality for calculating non-interpretive streamflow statistics but does not provide functionality for correcting invalid data or geospatial capabilities for mapping. Here we setup some simple helper functions we can re-use throughout the notebook to QAQC data and create maps.

In [4]:
# Data QAQC function for provisional NWIS data
def qaqc_nwis_data(df, data_col):
    #replace invalid -999999 values with NA
    df[data_col] = df[data_col].replace(-999999, np.nan)
    # add any additional QAQC steps needed
    return df

In [5]:
def create_gage_condition_map(gage_df, flow_data_col, map_schema, streamflow_data_type):
        # Format date and set to str type for use in map tooltips
        if flow_data_col == '00060':
                gage_df['Date'] = gage_df['datetime'].dt.strftime('%Y-%m-%d %H:%M')
        elif flow_data_col == '00060_Mean':
                gage_df['Date'] = gage_df['datetime'].dt.strftime('%Y-%m-%d')
        gage_df = gage_df.drop('datetime', axis=1)
        # create colormap for map from hyswap schema
        schema = hyswap.utils.retrieve_schema(map_schema)
        flow_cond_cmap = schema['colors']
        if 'low_color' in schema:
                flow_cond_cmap = [schema['low_color']] + flow_cond_cmap
        if 'high_color' in schema:
                flow_cond_cmap = flow_cond_cmap + [schema['high_color']]
        # if creating a drought map, set handling of non-drought flows
        if map_schema in ['WaterWatch_Drought', 'NIDIS_Drought']:
                gage_df['flow_cat'] = gage_df['flow_cat'].cat.add_categories('Other')
                gage_df.loc[gage_df['flow_cat'].isnull(), 'flow_cat'] = 'Other'
                flow_cond_cmap = flow_cond_cmap + ['#e3e0ca'] # light taupe
        # set NA values to "Not Ranked" category
        gage_df['flow_cat'] = gage_df['flow_cat'].cat.add_categories('Not Ranked')
        gage_df.loc[gage_df['est_pct'].isna(), 'flow_cat'] = 'Not Ranked'
        flow_cond_cmap = flow_cond_cmap + ['#d3d3d3'] # light grey
        # renaming columns with user friendly names for map
        gage_df = gage_df.rename(columns={flow_data_col:'Discharge (cfs)',
                                                'est_pct':'Estimated Percentile',
                                                'site_no':'USGS Gage ID',
                                                'station_nm':'Streamgage Name',
                                                'flow_cat':'Streamflow Category'})
        # convert dataframe to geopandas GeoDataFrame
        gage_df = geopandas.GeoDataFrame(gage_df, 
                             geometry=geopandas.points_from_xy(gage_df.dec_long_va,
                                                               gage_df.dec_lat_va), 
                             crs="EPSG:4326").to_crs("EPSG:5070")
        # Create map
        m = gage_df.explore(column="Streamflow Category",
                                cmap=flow_cond_cmap,
                                tooltip=["USGS Gage ID", "Streamgage Name", "Streamflow Category", "Discharge (cfs)", "Estimated Percentile", "Date"],
                                tiles="CartoDB Positron",
                                marker_kwds=dict(radius=5),
                                legend_kwds=dict(caption=streamflow_data_type + '<br> Streamflow  Category'))
        return m #returns a folium map object

## Data Downloading and Processing
Utilize an example state to select streamgages for generating various flow condition maps. Certain past days selected in the notebook are relevant to using the state of Vermont (VT) as an example, but the notebook can be run for any state.

### Find all stream sites active in the last year within a State
Limit the search to streamgages that have also been operational prior to approximately 10 years ago as a minimum of 10 years of flow records should be available for calculating streamflow percentiles.

In [6]:
#| tbl-cap: List of streamgage sites active within the last week
state = 'VT'
# Query NWIS for what streamgage sites were active within the last week
active_nwis_sites, _ = nwis.what_sites(stateCd=state, parameterCd='00060', period="P1W", siteType='ST')
# Find gages present in both of the above queries to get a list of gages that were
#  recently active that also have records from more than 10 years ago
sites = active_nwis_sites#[active_nwis_sites['site_no'].isin(nwis_sites['site_no'])]
display(sites)

Unnamed: 0,agency_cd,site_no,station_nm,site_tp_cd,dec_lat_va,dec_long_va,coord_acy_cd,dec_coord_datum_cd,alt_va,alt_acy_va,alt_datum_cd,huc_cd
0,USGS,1133000,"EAST BRANCH PASSUMPSIC RIVER NEAR EAST HAVEN, VT",ST,44.633942,-71.897594,S,NAD83,943.34,0.21,NAVD88,1080102
1,USGS,1134500,"MOOSE RIVER AT VICTORY, VT",ST,44.511723,-71.837314,S,NAD83,1103.46,0.16,NAVD88,1080102
2,USGS,1135100,"POPE BROOK TRIBUTARY (W-9), NR NORTH DANVILLE, VT",ST,44.490611,-72.161767,S,NAD83,1720.57,0.08,NAVD88,1080102
3,USGS,1135150,"POPE BROOK (SITE W-3) NEAR NORTH DANVILLE, VT",ST,44.476167,-72.124543,S,NAD83,1141.03,0.14,NAVD88,1080102
4,USGS,1135300,"SLEEPERS RIVER (SITE W-5) NEAR ST. JOHNSBURY, VT",ST,44.435335,-72.038429,S,NAD83,641.27,0.01,NAVD88,1080102
5,USGS,1135500,"PASSUMPSIC RIVER AT PASSUMPSIC, VT",ST,44.365615,-72.039261,S,NAD83,487.79,0.12,NAVD88,1080102
6,USGS,1138500,"CONNECTICUT RIVER AT WELLS RIVER, VT",ST,44.153397,-72.041758,S,NAD83,399.37,0.01,NAVD88,1080101
7,USGS,1139000,"WELLS RIVER AT WELLS RIVER, VT",ST,44.150341,-72.065091,S,NAD83,505.2,0.12,NAVD88,1080103
8,USGS,1139800,"EAST ORANGE BRANCH AT EAST ORANGE, VT",ST,44.092842,-72.335653,S,NAD83,1177.59,0.07,NAVD88,1080103
9,USGS,1141500,"OMPOMPANOOSUC RIVER AT UNION VILLAGE, VT",ST,43.79007,-72.254813,S,NAD83,415.0,20.0,NGVD29,1080103


### Retrieve Streamflow Data from NWIS
For the sites identified above, download all historical daily streamflow data (1900 through 2023 Water Years). 

In [7]:
# create a python dictionary of dataframes by site id number
flow_data = {}

for StaID in tqdm(sites['site_no'], disable=disable_tqdm, desc="Downloading NWIS Flow Data for Sites"):
    flow_data[StaID] = nwis.get_record(sites=StaID, parameterCd='00060', start="1900-01-01", end="2023-10-01", service='dv')

### Calculate Variable Streamflow Percentile Thresholds
For the sites identified above, calculate streamflow percentile thresholds at 0, 1, 5, 10, ... , 90, 95, 99, 100 percentiles

In [8]:
# Define what percentile levels (thresholds) that we want to calculate.
# Intervals of 5 or less are recommended to have sufficient levels to interpolate between in later calculations. 
# Note that 0 and 100 percentile levels are ignored. Refer to min and max values returned instead
percentile_levels = np.concatenate((np.array([1]), np.arange(5,96,5), np.array([99])), axis=0)
print(percentile_levels)

[ 1  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 99]


In [9]:
percentile_values = {}
for StaID in tqdm(sites['site_no'], disable=disable_tqdm, desc="Processing Sites"):
    if '00060_Mean' in flow_data[StaID].columns:
        # Filter data as only approved data in NWIS should be used to calculate statistics
        df = hyswap.utils.filter_approved_data(flow_data[StaID], '00060_Mean_cd')
        percentile_values[StaID] = hyswap.percentiles.calculate_variable_percentile_thresholds_by_day(
            df, '00060_Mean', percentiles=percentile_levels)
    else:
        print('No standard discharge data column found for site ' + StaID + ', skipping')

No standard discharge data column found for site 434928072192701, skipping


In [10]:
#| tbl-cap: Sample of calcualted variable streamflow percentile thresholds for first site in list
# View percentile thresholds for the first site
display(percentile_values[list(percentile_values.keys())[0]].head())

Unnamed: 0_level_0,min,p01,p05,p10,p15,p20,p25,p30,p35,p40,...,p80,p85,p90,p95,p99,max,mean,count,start_yr,end_yr
month_day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01-01,29.0,,31.14,33.2,41.2,44.6,46.0,50.4,57.6,63.2,...,110.0,130.8,145.0,169.0,,378.0,85.3,63,1940,2023
01-02,28.0,,31.0,34.2,40.2,44.0,46.0,49.6,57.2,60.6,...,120.2,127.0,144.2,178.4,,350.0,84.32,63,1940,2023
01-03,27.0,,30.2,35.4,40.0,43.6,45.0,48.4,54.2,61.6,...,108.4,118.2,133.0,205.0,,300.0,81.88,63,1940,2023
01-04,28.0,,30.2,35.8,40.64,43.8,45.0,51.6,60.32,66.0,...,104.6,114.0,130.4,184.4,,220.0,78.39,63,1940,2023
01-05,26.0,,29.2,35.8,40.34,42.8,45.0,52.4,59.2,66.76,...,94.2,103.8,118.0,160.0,,425.0,79.83,63,1940,2023


## Create a Current Flow Conditions Map for Daily Mean Streamflow

### Retrieve most recent (yesterday) daily mean streamflow
Download data from NWIS and calculate corresponding streamflow percentile for the most recent daily mean discharge

In [11]:
yesterday = datetime.strftime(datetime.now(tz=ZoneInfo("US/Eastern")) - timedelta(1), '%Y-%m-%d')
recent_dvs = nwis.get_record(sites=sites['site_no'].tolist(), parameterCd='00060', start=yesterday, end=yesterday, service='dv')
recent_dvs = qaqc_nwis_data(recent_dvs, '00060_Mean')

### Categorize streamflow based on calculated percentile values
Calculate estimated streamflow percentile for the new data by interpolating against the previously calculated percentile threshold levels.

In [12]:
# estimate percentiles
df = pd.DataFrame()
for StaID, site_df in recent_dvs.groupby(level="site_no", group_keys=False):
    if StaID in list(percentile_values.keys()):
        if not percentile_values[StaID].isnull().all().all():
            percentiles = hyswap.percentiles.calculate_multiple_variable_percentiles_from_values(
            site_df,'00060_Mean', percentile_values[StaID])
            df = pd.concat([df, percentiles])
# categorize streamflow by the estimated streamflow percentiles
df = hyswap.utils.categorize_flows(df, 'est_pct', schema_name="NWD")
df = df.reset_index(level='datetime')
# Prep Data for mapping by joining site information and flow data  
gage_df = pd.merge(sites, df, how="right", on="site_no")

### Create Map of Streamflow Conditions

In [13]:
#| fig-cap: Map showing most recent daily mean streamflow and corresponding flow conditions
map = create_gage_condition_map(gage_df, '00060_Mean', 'NWD', 'Current Daily Mean')
display(map)

### Create Map of Streamflow Conditions using Alternative Categorization Schema

In [14]:
#| fig-cap: Map showing most recent daily mean streamflow and corresponding flow conditions using a brown-blue schema

# Prep Data for mapping by joining site information and flow data  
map = create_gage_condition_map(gage_df, '00060_Mean', 'WaterWatch_BrownBlue', 'Current Daily Mean')
display(map)

## Create a "Real-Time" Flow Conditions Map for Instantaneous Streamflow

### Retrieve most recent instantaneous streamflow records
Download data from NWIS and calculate corresponding streamflow percentile for the most recent instantaneous discharge measurement

In [15]:
recent_ivs = nwis.get_record(sites=sites['site_no'].tolist(), parameterCd='00060', service='iv')
recent_ivs = qaqc_nwis_data(recent_ivs, '00060')

### Categorize streamflow based on calculated percentile values
Calculate estimated streamflow percentile for the new instantaneous data by interpolating against the previously calculated percentile threshold levels from daily streamflow records.

In [16]:
# estimate percentiles
df = pd.DataFrame()
for StaID, site_df in recent_ivs.groupby(level="site_no", group_keys=False):
    if StaID in list(percentile_values.keys()):
        if not percentile_values[StaID].isnull().all().all():
            percentiles = hyswap.percentiles.calculate_multiple_variable_percentiles_from_values(
            site_df,'00060', percentile_values[StaID])
            df = pd.concat([df, percentiles])
# categorize streamflow by the estimated streamflow percentiles
df = hyswap.utils.categorize_flows(df, 'est_pct', schema_name="NWD")
df = df.tz_convert(tz='US/Eastern')
df = df.reset_index(level='datetime')
# Prep Data for mapping by joining site information and flow data  
gage_df = pd.merge(sites, df, how="right", on="site_no")

### Create Map of Real-Time Streamflow Conditions

In [17]:
#| fig-cap: Map showing most real-time streamflow conditions

map = create_gage_condition_map(gage_df, '00060', 'NWD', 'Real-Time Instantaneous')
display(map)

## Create a Current Flow Conditions Map for n-Day Daily Streamflow

### Retrieve daily streamflow records for past 7 days
Download data from NWIS and calculate corresponding streamflow percentiles for each day

In [18]:
past_dvs = nwis.get_record(
    sites=sites['site_no'].tolist(), 
    parameterCd='00060',
    start=datetime.strftime(datetime.now(tz=ZoneInfo("US/Eastern")) - timedelta(7), '%Y-%m-%d'),
    end=yesterday,
    service='dv'
)
past_dvs = qaqc_nwis_data(past_dvs, '00060_Mean')
past_dvs_7d = past_dvs.copy()
for StaID, new_df in past_dvs.groupby(level=0):
    past_dvs_7d.loc[StaID] = hyswap.utils.rolling_average(new_df, '00060_Mean', 7).round(2)

past_dvs_7d = past_dvs_7d.dropna()

### Calculate 7-day average streamflow and corresponding variable percentile thresholds

In [19]:
flow_data_7d = {}
for StaID in tqdm(sites['site_no'], disable=disable_tqdm):
    if '00060_Mean' in flow_data[StaID].columns:
        flow_data_7d[StaID] = hyswap.utils.rolling_average(flow_data[StaID], '00060_Mean', 7).round(2)
    else:
        print('No standard discharge data column found for site ' + StaID + ', skipping')

No standard discharge data column found for site 434928072192701, skipping


In [20]:
percentile_values_7d = {}
for StaID in tqdm(sites['site_no'], disable=disable_tqdm, desc="Processing"):
    if '00060_Mean' in flow_data[StaID].columns:
        # Filter data as only approved data in NWIS should be used to calculate statistics
        df = hyswap.utils.filter_approved_data(flow_data_7d[StaID], '00060_Mean_cd')
        percentile_values_7d[StaID] = hyswap.percentiles.calculate_variable_percentile_thresholds_by_day(
            df, '00060_Mean', percentiles=percentile_levels)
    else:
        print('No standard discharge data column found for site ' + StaID + ', skipping')

No standard discharge data column found for site 434928072192701, skipping


### Categorize streamflow based on calculated percentile values
Calculate estimated streamflow percentile for the new data by interpolating against the previously calculated percentile threshold levels.

In [21]:
# estimate percentiles
df = pd.DataFrame()
for StaID, site_df in past_dvs_7d.groupby(level="site_no", group_keys=False):
    if StaID in list(percentile_values_7d.keys()):
        if not percentile_values[StaID].isnull().all().all():
            percentiles = hyswap.percentiles.calculate_multiple_variable_percentiles_from_values(
            site_df,'00060_Mean', percentile_values[StaID])
            df = pd.concat([df, percentiles])
# categorize streamflow by the estimated streamflow percentiles
df = hyswap.utils.categorize_flows(df, 'est_pct', schema_name="NWD")
# keep only most recent 7-day average flow for plotting
df = df[df.index.get_level_values('datetime') == yesterday]
df = df.reset_index(level='datetime')
# Prep Data for mapping by joining site information and flow data  
gage_df = pd.merge(sites, df, how="right", on="site_no")

### Create Map of 7-Day Average Streamflow Conditions

In [22]:
#| fig-cap: Map showing most recent 7-day average streamflow and corresponding flow conditions

map = create_gage_condition_map(gage_df, '00060_Mean', 'NWD', 'Current 7-Day Average')
display(map)

## Create a Drought Conditions Map for a Previous Day's Streamflow

### Retrieve daily streamflow records from a past day
Download data from NWIS and calculate corresponding streamflow percentiles for the given day's streamflow

In [23]:
past_day = "2023-05-30"

past_dvs = nwis.get_record(sites=sites['site_no'].tolist(),
                              parameterCd='00060',
                              start=past_day,
                              end=past_day,
                              service='dv')
past_dvs = qaqc_nwis_data(past_dvs, '00060_Mean')

### Categorize streamflow based on calculated percentile values

In [24]:
# Calculate estimated streamflow percentile for the new data by interpolating against
# the previously calculated percentile threshold levels
df = pd.DataFrame()
for StaID, site_df in past_dvs.groupby(level="site_no", group_keys=False):
    if StaID in list(percentile_values.keys()):
        if not percentile_values[StaID].isnull().all().all():
            percentiles = hyswap.percentiles.calculate_multiple_variable_percentiles_from_values(
            site_df,'00060_Mean', percentile_values[StaID])
            df = pd.concat([df, percentiles])
# categorize streamflow by the estimated streamflow percentiles
df = hyswap.utils.categorize_flows(df, 'est_pct', schema_name="WaterWatch_Drought")
df = df.reset_index(level='datetime')
# Prep Data for mapping by joining site information and flow data  
gage_df = pd.merge(sites, df, how="right", on="site_no")

### Create Map of Streamflow Drought Conditions

In [25]:
#| fig-cap: Map showing historical daily mean streamflow and corresponding flow conditions using a drought categorization schema
map = create_gage_condition_map(gage_df, '00060_Mean', 'WaterWatch_Drought', 'Daily Mean')
display(map)

## Create a Flood Conditions Map for a past Day's Streamflow
This example uses fixed percentiles that are not calculated by  day of year, but instead across all days of the year together. Flow categories are therefore relative to absolute streamflow levels rather than what is normal for that day of the year.

### Retrieve daily streamflow records from a past day
Download data from NWIS and calculate corresponding fixed streamflow percentiles for the given day's streamflow

In [26]:
past_day = "2023-07-10"

past_dvs = nwis.get_record(sites=sites['site_no'].tolist(),
                           parameterCd='00060',
                           start=past_day,
                           end=past_day,
                           service='dv')
past_dvs = qaqc_nwis_data(past_dvs, '00060_Mean')

In [27]:
fixed_percentile_values = {}

for StaID in tqdm(sites['site_no'], disable=disable_tqdm):
    if '00060_Mean' in flow_data[StaID].columns:
        # Filter data as only approved data in NWIS should be used to calculate statistics
        df = hyswap.utils.filter_approved_data(flow_data[StaID], '00060_Mean_cd')
        if not df.empty:
            fixed_percentile_values[StaID] = hyswap.percentiles.calculate_fixed_percentile_thresholds(
                df['00060_Mean'], percentiles=percentile_levels)
        else:
            print(StaID + ' has no approved data, skipping')
    else:
        print(StaID + ' does not have standard discharge data column, skipping')

01135100 has no approved data, skipping


434928072192701 does not have standard discharge data column, skipping


### Categorize streamflow based on calculated percentile values

In [28]:
# estimate percentiles
for StaID in past_dvs.index.get_level_values(0):
    if StaID in list(fixed_percentile_values.keys()):
        past_dvs.at[(StaID, past_day), 'est_pct'] = hyswap.percentiles.calculate_fixed_percentile_from_value(
            past_dvs.at[(StaID, past_day), '00060_Mean'], fixed_percentile_values[StaID])
# categorize streamflow by the estimated streamflow percentiles
df = hyswap.utils.categorize_flows(past_dvs, 'est_pct', schema_name="WaterWatch_Flood")
df = df.reset_index(level='datetime')
# Prep Data for mapping by joining site information and flow data  
gage_df = pd.merge(sites, df, how="right", on="site_no")

### Create Map of Streamflow High-Flow Conditions

In [29]:
#| fig-cap: Map showing historical daily mean streamflow and corresponding flow conditions using a high-flow categorization schema
map = create_gage_condition_map(gage_df, '00060_Mean', 'WaterWatch_Flood', 'Daily Mean')
display(map)