# Regime classification
This Notebook reads discharge observations for a test river basin in order to classify the basin's regime using circular statistics ([Burn et al., 2010](https://doi.org/10.1002/hyp.7625)). Based on a combination of three different peak events identification metrics, it identifies whether the basin has a nival (i.e., snowmelt-driven regime). For the analysis presented in Arnal et al. (2023) - add paper link - we selected basins across Canada and the USA with a nival regime according to all three metrics.

Decisions:
- We perform a linear interpolation of the daily discharge data before running the regime classification, to fill in small data gap of maximum 15 days. See user-specified variables below.
- The nival regime definition (i.e., start and end doy & minimum regularity) was pre-defined by Paul Whitfield (USask) from expert knowledge of Canadian hydrology. See user-specified variables below.
- The water year definition: October 1st to September 30th. See user-specified variables below.
- For the peak over threshold calculations, the threshold used is the minimum value of all annual maxima.

The "Variables" section below is the only section a user will need to modify for testing different options for most of these decisions.

Notes:
- There are no checks for water year completeness, which shouldn't be an issue with the datasets I provide as the freshet periods are captured on either end (i.e., the streamflow data starts on 1st January and ends on 31st December).

## Modules, settings, and functions

In [1]:
# Import required modules ()
import logging
%matplotlib notebook
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import os
from pprint import pprint
import sys
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import xarray as xr

In [2]:
# Add scripts to the system path
sys.path.append('../scripts')

# Set up logging, configured for this workflow (see utilities.py)
from utilities import setup_logging, read_settings
setup_logging()

# Set up logging for this notebook
logger = logging.getLogger()

# Suppress misc. comments from being added to the log file
logging.getLogger('matplotlib.font_manager').disabled = True
logging.getLogger('matplotlib.pyplot').disabled = True

%load_ext autoreload
%autoreload 2

2025-01-28 10:14:29,948 - root - INFO - Logging setup complete. Log file: C:\Users\lauri\PycharmProjects\FROSTBYTE_PREVAH\logs\data_driven_forecasting_20250128_101429.log


In [3]:
# Save Notebook name to the log file
logger.debug(f'Notebook: 1_RegimeClassification')

In [4]:
# Read settings file
settings = read_settings('../settings/config_test_case.yaml', log_settings=True)
pprint(settings)

2025-01-28 10:14:30,142 - root - INFO - Settings logged from ../settings/config_test_case.yaml


{'SWE_obs_path': '../PREVAH/input_data/SWE_prevah_m3.nc',
 'basins_dem_path': '../PREVAH/input_data/MERIT_Hydro_dem_',
 'basins_shp_path': '../PREVAH/input_data/ebene_40km_with_ids.shp',
 'domain': 'V61',
 'glacier_component_path': '../PREVAH/input_data/GL_prevah.nc',
 'output_data_path': '../PREVAH/output_data_test/',
 'plots_path': '../PREVAH/output_plots_test/',
 'precip_obs_path': '../PREVAH/input_data/P_prevah_m3.nc',
 'streamflow_obs_path': '../PREVAH/input_data/Q_prevah_m3.nc'}


In [5]:
# Import required functions
from functions import regime_classification, polar_plot

## Variables

In [6]:
# Set user-specified variables
#test_basin_id = 'V1161'  # Set basin_id for testing
test_basin_id = settings['domain'] # Can override this with testbasin_id = <string of the testbasin id>, make sure that this id is in the input data files
nival_start_doy_default = 60 # nival regime starting day of year, corresponds to the 1st of March
nival_end_doy_default = 213  # nival regime ending day of year, corresponds to the 1st of August
nival_regularity_threshold_default = 0.65  # nival regime minimum regularity threshold
month_start_water_year_default, day_start_water_year_default = 10, 1  # water year start
month_end_water_year_default, day_end_water_year_default = 9, 30  # water year end
max_gap_days_default = 15  # max. number of days for gaps allowed in the daily streamflow data for the linear interpolation

In [7]:
# Save the user-specified variables to the log file
logger.debug(f'test basin ID: {test_basin_id}')
logger.debug(f'nival regime start DOY: {nival_start_doy_default}')
logger.debug(f'nival regime end DOY: {nival_end_doy_default}')
logger.debug(f'regularity threshold: {nival_regularity_threshold_default}')
logger.debug(f'water year start (month/day): {month_start_water_year_default}/{day_start_water_year_default}')
logger.debug(f'water year end (month/day): {month_end_water_year_default}/{day_end_water_year_default}')
logger.debug(f'linear interpolation maximum gap (days): {max_gap_days_default}')

## Read data

In [8]:
# Read the basin outlet's daily streamflow data as a DataArray 
Qobs_ds = xr.open_dataset(settings['streamflow_obs_path'])
display(Qobs_ds)
Qobs_testbasin_ds = Qobs_ds.where(Qobs_ds.Station_ID==test_basin_id, drop=True)
Qobs_testbasin_ds = Qobs_testbasin_ds.set_index({"Station_ID":"Station_ID"})

display(Qobs_testbasin_ds)

In [9]:
# Plot a climatological hydrograph for the basin
display(Qobs_testbasin_ds)
fig = plt.figure()
streamflow_data_da = Qobs_testbasin_ds.Flow
doy_mean = streamflow_data_da.groupby("time.dayofyear").mean(skipna=True)
plt.plot(np.arange(366), doy_mean.values, color='b')
plt.ylabel('mean climatological streamflow [m3/s]')
plt.xticks(np.arange(0,360,30),['1st Jan', '1st Feb', '1st Mar', '1st Apr', '1st May', '1st Jun', '1st Jul', '1st Aug', '1st Sep', '1st Oct', '1st Nov', '1st Dec'], rotation=45)
plt.tight_layout();

<IPython.core.display.Javascript object>

For this and any subsequent interactive plots, don't forget to press the "Stop Interaction" button on the top left of the plot once you're done exploring the results. Otherwise new plots will overwrite existing plots.

In [10]:
# Save the figure
fig.savefig(settings['plots_path']+"hydrograph_basin"+test_basin_id+".png", dpi=300)
#fig.savefig(settings['plots_path']+"hydrograph_basin"+test_basin_id+".png", dpi=300)

In [11]:
# Close the figure - please run this as it will ensure that we're not overloading the memory unnecessarily
plt.close(fig)

## Circular statistics for regime classification
Below, we calculate three streamflow statistics that can be used for peak streamflow events identification: streamflow annual maxima, peaks over threshold (POT) & centre of mass (or centre of volume). We use a combination of three statistics to get more a robust output.

### Annual maxima

In [12]:
# Regime classification based on streamflow annual maxima (flag=1)
regime_annualmax_gdf, theta_rad_events_annualmax = regime_classification(Qobs_testbasin_ds, start_water_year=(month_start_water_year_default, day_start_water_year_default), max_gap_days=max_gap_days_default, flag=1)

display(regime_annualmax_gdf)
print(theta_rad_events_annualmax)

Unnamed: 0,area,Station_ID,geometry,circular_stats_theta_rad,circular_stats_regularity,mean_peak_doy
0,44605590.0,V61,POINT (2597090.44884 1098408.52311),3.525738,0.879289,204.94916


[3.70105436 3.70105436 3.58055491 3.33043156 3.37398444 3.21905658
 4.0797669  3.58793915 3.94205325 3.11577134 3.33955603 3.48493611
 3.28791341 3.18462817 3.49448388 3.84544675 4.40683682 3.8215538
 3.59776912 4.53213366 4.97490563 2.9952719  3.49448388 3.84544675
 4.02812428 3.08134293 3.78712539 3.34759873 3.40841285 3.30512761
 3.35677023 3.15875983 3.78712539 3.61498333 3.20184238 3.02142244
 3.8043396  2.80591563 3.06412873 3.09009113 3.25348499 2.4444173
 5.11261928]


This outputs a table (regime_annualmax_gdf) showing the average angular value (in radians) for all peak events combined (circular_stats_theta_rad), the regularity (circular_stats_regularity), and the average peak day of year (mean_peak_doy). The regularity is a measure of the spread in the dates of occurrences of peak events, which varies from zero to one. Larger values indicate a higher level of regularity (i.e., less spread in the dates). The list of numbers (theta_rad_events) shows the angular values of individual peak events, need for the plot below.

In [13]:
# Polar plot
fig = polar_plot(theta_rad_events_annualmax, regime_annualmax_gdf.circular_stats_regularity.values, flag=0, nival_start_doy=nival_start_doy_default, nival_end_doy=nival_end_doy_default, nival_regularity_threshold=nival_regularity_threshold_default)

<IPython.core.display.Javascript object>

This is a polar plot showing the dates of all individual peak events (i.e., here annual maxima) as dark squares. The average (i.e., circular mean) of these events is shown as the blue circle, giving an indication of the average date of peak events (position along the circle), as well as its regularity (position along the radius; with 0 at the centre of the circle and 1 at the edge of the circle). The basin's regime is classified as being nival if it falls within the light blue band (i.e., average peak between 1st March and 1st August, and a regularity of at least 0.65).

In [14]:
# Save the figure
fig.savefig(settings['plots_path']+"polar_plot_annualmax_basin"+test_basin_id+".png", dpi=300)
#fig.savefig(settings['plots_path'] + "polar_plot_annualmax_basin" + test_basin_id + ".png", dpi=300)


In [15]:
# Close the figure - please run this as it will ensure that we're not overloading the memory unnecessarily
plt.close(fig)

We will now go through the same process, but instead of the streamflow annual maxima, we will use the peaks over threshold (POT) metric to identify peak events.

### Peaks over threshold (POT)
The threshold used for this streamflow peak metric is the smallest annual maximum over the historical period. All peaks above this threshold will be counted as peak events to classify the basin's regime.

In [16]:
# Regime classification based on POT (flag=2)
regime_POT_gdf, theta_rad_events_POT = regime_classification(Qobs_testbasin_ds, start_water_year=(month_start_water_year_default, day_start_water_year_default), max_gap_days=max_gap_days_default, flag=2)

display(regime_POT_gdf)
print(theta_rad_events_POT)

Unnamed: 0,area,Station_ID,geometry,circular_stats_theta_rad,circular_stats_regularity,mean_peak_doy
0,44605590.0,V61,POINT (2597090.44884 1098408.52311),3.488912,0.919467,202.790841


[2.75427301 2.77148722 2.80591563 ... 2.4444173  2.4616315  2.47884571]


You can already start noticing differences between the statistics output with this streamflow peak metric compared to the previous.

In [17]:
# Polar plot
fig = polar_plot(theta_rad_events_POT, regime_POT_gdf.circular_stats_regularity.values, flag=0, nival_start_doy=nival_start_doy_default, nival_end_doy=nival_end_doy_default, nival_regularity_threshold=nival_regularity_threshold_default)

<IPython.core.display.Javascript object>

Note that there are more events for the streamflow POT than for the two other metrics because multiple events can be identified per year, and years with missing values aren't discarded, but they are for the two other metrics calculations.

In [18]:
# Save the figure
fig.savefig(settings['plots_path']+"polar_plot_POT_basin"+test_basin_id+".png", dpi=300)
#fig.savefig(settings['plots_path']+"polar_plot_POT_basin"+test_basin_id+".png", dpi=300)

In [19]:
# Close the figure - please run this as it will ensure that we're not overloading the memory unnecessarily
plt.close(fig)

Finally, we will go through the same process using the centre of mass (COM) to identify peak events.

### Centre of mass (COM)
The centre of mass is the date on which 50% of the water-year streamflow occurs.

In [20]:
# Regime classification based on COM (flag=3)
regime_COM_gdf, theta_rad_events_COM = regime_classification(Qobs_testbasin_ds, start_water_year=(month_start_water_year_default, day_start_water_year_default), max_gap_days=max_gap_days_default, flag=3)

display(regime_COM_gdf)
print(theta_rad_events_COM)

Unnamed: 0,area,Station_ID,geometry,circular_stats_theta_rad,circular_stats_regularity,mean_peak_doy
0,44605590.0,V61,POINT (2597090.44884 1098408.52311),3.386388,0.965425,196.848648


[3.49448388 3.42562706 3.49448388 3.50210329 3.51169809 3.33955603
 3.51169809 3.46776894 3.33955603 3.39119865 3.44284126 3.46776894
 3.30512761 3.42562706 3.37398444 3.26176286 3.59776912 3.39119865
 3.46005547 3.27893004 3.33955603 3.30512761 3.44284126 3.63944067
 3.35677023 3.2706992  3.28791341 3.43343459 3.46005547 3.33955603
 3.33955603 3.22742852 3.42562706 3.33955603 3.20184238 3.27893004
 3.21905658 3.15019976 3.15019976 3.19309417 3.21905658 3.13298555
 5.12983348]


Again, you can notice differences between the statistics output with this streamflow peak metric compared to the other two metrics.

In [21]:
# Polar plot
fig = polar_plot(theta_rad_events_COM, regime_COM_gdf.circular_stats_regularity.values, flag=0, nival_start_doy=nival_start_doy_default, nival_end_doy=nival_end_doy_default, nival_regularity_threshold=nival_regularity_threshold_default)

<IPython.core.display.Javascript object>

In [22]:
# Save the figure
fig.savefig(settings['plots_path']+"polar_plot_COM_basin"+test_basin_id+".png", dpi=300)
#fig.savefig(settings['plots_path']+"polar_plot_COM_basin"+test_basin_id+".png", dpi=300)

In [23]:
# Close the figure - please run this as it will ensure that we're not overloading the memory unnecessarily
plt.close(fig)

In [24]:
display(regime_annualmax_gdf)

Unnamed: 0,area,Station_ID,geometry,circular_stats_theta_rad,circular_stats_regularity,mean_peak_doy
0,44605590.0,V61,POINT (2597090.44884 1098408.52311),3.525738,0.879289,204.94916


In [25]:
display(regime_POT_gdf)

Unnamed: 0,area,Station_ID,geometry,circular_stats_theta_rad,circular_stats_regularity,mean_peak_doy
0,44605590.0,V61,POINT (2597090.44884 1098408.52311),3.488912,0.919467,202.790841


In [26]:
display(regime_COM_gdf)

Unnamed: 0,area,Station_ID,geometry,circular_stats_theta_rad,circular_stats_regularity,mean_peak_doy
0,44605590.0,V61,POINT (2597090.44884 1098408.52311),3.386388,0.965425,196.848648
