![](./figures/Logo.PNG)

## In this part of the tutorial, you will
* use several process-based signatures to assess model performance
* learn how signatures provide diagnostic insights
* add your own signature 

- - -

# 2 c - Signatures

- - -

## 1 About signatures

Statistical metrics quantify model fit, but it is often possible to achieve high metric values with an unrealistic model. Process-based signatures provide an alternative or complementary strategy for model evaluation with diagnostic potential. In the best case, signatures quantify underlying processes and therefore enable the modeller to compare how process dynamics are represented in the model in comparison to the real system. We can thus investigate where and when a model is an inadequate representation of the underlying system, and, equally important, how the model might be improved. [(Gupta et al., 2008)](https://doi.org/10.1002/hyp.6989        )

In this tutorial, we use several signatures to analyse observed and modelled river discharge:
* The **runoff ratio (RR)** is the proportion of precipitation that is not absorbed by the soil and vegetation, instead flowing over the land surface and into rivers or other water bodies.
* The **baseflow index (BI)** is a measure of the proportion of streamflow in a river that originates from groundwater discharge, reflecting the contribution of baseflow to the overall streamflow.
* The **recession constant** represents the rate at which a river's discharge decreases during the recession phase, characterizing the decline in streamflow following a peak flow event.
* The **lag time** refers to the time delay between the occurrence of peak rainfall and the corresponding peak discharge in a river or watershed, reflecting the time taken for precipitation to reach and contribute to streamflow.
* The **slope of the flow duration curve**, expressed as the negative of the derivative of exceedance probability with respect to flow, indicates the rate at which the probability of exceeding a given flow diminishes as discharge increases, providing insights into the streamflow variability across different percentiles.

<center>
    <img src="./figures/signatures.png" style="width:50%">
</center>

<div style="background:#e0f2fe; padding: 1%; border:1mm solid SkyBlue; color:black">
    <h4><span>&#129300 </span>Task I: Signatures</h4>
    In the last tutorial, you have learned about four different statistical metrics (Bias, RMSE, KGE, NSE) to evaluate model fit. As you have seen for yourself, model calibrations using different parameter sets can result in similar or even equal evaluation metrics. To overcome this problem and to constrain models to physically reasonable representations, we can additionally use signatures for model evaluation.
    <ol>
        <li>How do different signatures represent physical processes?</li>
        <li>How are the signatures related to components of the evaluation metrics, such as the KGE?</li>
        <li>How many signatures do we need to complement model evaluation and on what choices does this number depend?</li>
    </ol>
</div>

_PUT YOUR ANSWERS HERE_

<div style="background:#e0f2fe; padding: 1%; border:1mm solid SkyBlue; color:black">
</div>

## 2 Using hydrological signatures

**Import packages**

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
from scipy.signal import argrelextrema
from scipy.special import boxcox1p
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
from matplotlib.lines import Line2D
import sys
sys.path.append('src/')
import HBV
from ipywidgets import interact, Dropdown, FloatSlider, Checkbox, HTML

**Defining functions**

In [2]:
def runoff_ratio(runoff, precip):
    """
    Calculate the ratio of mean runoff to mean precipitation.
    """
    return round(np.divide(np.mean(runoff),np.mean(precip)), 3)


def calc_percentile(data, x):  # used in "slope_of_flow_duration_curve"
    """
    Find the x-th percentile value from the data based on flow duration curve.
    """
    p = 1 - x/100 # transform to get exceedance probability
    # get ranks as a proxy for exceedance probabilities
    data_tmp = data[~np.isnan(data)] # remove NaN values
    data_sorted = np.sort(data_tmp)
    data_ranked = np.linspace(1,len(data_tmp),len(data_tmp)) # give unique (random) rank to every measurement
    FDC = 1 - data_ranked / len(data_ranked) # flow duration curve
    
    # find x-th flow percentile
    indices = np.linspace(1,len(FDC),len(FDC))
    bound_x = int(np.max(indices[FDC >= p]))
    data_x = data_sorted[bound_x]

    return data_x


def slope_of_flow_duration_curve(data):
    """
    Calculate the difference between 33.3rd and 66.6th percentiles of the data.
    """
    lower_percentile = 33.3 
    upper_percentile = 66.6
    return round(calc_percentile(data, lower_percentile) - calc_percentile(data, upper_percentile), 3)


def baseflow_index(data):
    """
    Compute the baseflow index as the mean of baseflow values.
    """
    a = 0.925 # coefficient: https://www.scirp.org/(S(351jmbntvnsjt1aadkposzje))/journal/paperinformation.aspx?paperid=83002#return41
    baseflow_t0 = 0
    runoff_t0 = 0
    l_baseflow = []
    for i, runoff_t1 in enumerate(data):
        if i > 0:
            baseflow_t1 = (a * baseflow_t0) + (((1-a)/2) * (runoff_t1 + runoff_t0))  # compute baseflow: Q_b_t1 = a*Q_b_t0 + (1-a)/2) * (Q_t1 - Q_t0)
            baseflow_t0 = baseflow_t1  # for next time step: migrate t1 to t0
            l_baseflow.append(baseflow_t1)
        runoff_t0 = runoff_t1  # for next time step: migrate t1 to t0
    return round(np.mean(l_baseflow), 3)


def find_peak_to_min(data):  # used in "recession_constant"
    """
    Identify indices of peak value and the subsequent minimum in the data.
    """
    id_max = np.argmax(data)  # get id of the max value in data
    data_cropped = data[id_max:]  # crop data (cropped data starts with maximum)
    # use argrelextrema to find local minima, use [0][0] to get the first one in cropped data
    id_next_min = id_max + argrelextrema(data_cropped, np.less)[0][0] + 1
    return id_max, id_next_min


def recession_constant(data):
    """
    Calculate the recession constant from peak to the next minimum in the data.
    """
    id_peak, id_next_min = find_peak_to_min(data)
    timesteps = id_next_min - id_peak
    peak = data[id_peak]
    next_min = data[id_next_min] # last element of data
    return round(- np.log(peak/next_min)/ timesteps, 3)  # https://docs.niwa.co.nz/library/public/HHPP8.pdf


def lag_time(data, data_obs, precip, search_range=100):
    """
    Calculate the lag time between peak observed flow and prior peak precipitation.
    """
    id_peak_flow_obs = np.argmax(data_obs)  # get id of peak observed flow
    precip_cropped = precip[:id_peak_flow_obs+2]  # crop precip
    id_prior_max_precip = argrelextrema(precip_cropped, np.greater)[0][-1]  # find id of local max precip before peak observed flow 
    # in line above: using [0] to get the first array, and [-1] to get last element of that array
    data_cropped = data[id_peak_flow_obs-1:id_peak_flow_obs+search_range+2]  # crop data 
    # in line above: 
    #  - start slice at id_peak_flow_obs-1 to allow max to be found (needs the value before the max to determine it correctly)
    #  - end slice at id_peak_flow_obs+search_range+2
    id_data_cropped = argrelextrema(data_cropped, np.greater)[0][0] - 1 # get id of the first max value in cropped data
    # in line above: using [0] to get the first array, and the next [0] to get the first element of that array
    id_data = id_peak_flow_obs + id_data_cropped
    return round(id_data - id_prior_max_precip, 3)

def hbv(par, precip, temp, evap):
    # Run HBV snow routine
    p_s, _, _ = HBV.snow_routine(par[:4], temp, precip)
    # Run HBV runoff simulation
    Case = 1 # for now we assume that the preferred path in the upper zone is runoff (Case = 1), it can be set to percolation (Case = 2)
    ini = np.array([0,0,0]) # initial state
    runoff_sim, _, _ = HBV.hbv_sim(par[4:], p_s, evap, Case, ini)
    return runoff_sim

**Create and display interactive menus for selecting catchment**

In [3]:
# DO NOT ALTER! code to select the catchment

catchment_names = ["Medina River, TX, USA", "Siletz River, OR, USA", "Trout River, BC, Canada"]
dropdown = Dropdown(
    options=catchment_names,
    value=catchment_names[0],
    description='Catchment:',
    disabled=False
)

display(dropdown)

Dropdown(description='Catchment:', options=('Medina River, TX, USA', 'Siletz River, OR, USA', 'Trout River, BC…

**Read catchment data and prepare model input**

In [4]:
# Read catchment data
catchment_name = dropdown.value
# Read catchment data
file_dic = {catchment_names[0]: "camels_08178880", catchment_names[1]: "camels_14305500", catchment_names[2]: "hysets_10BE007"}
df_obs = pd.read_csv(f"data/{file_dic[catchment_name]}.csv")
# Make sure the date is interpreted as a datetime object -> makes temporal operations easier
df_obs.date = pd.to_datetime(df_obs['date'], format='%Y-%m-%d')
# Select time frame
start_date = '2003-01-01'  # the first year is used as spin up. Evaluation is done for the time series after spin up.
end_date = '2004-12-30'

# Index frame by date
df_obs.set_index('date', inplace=True)
# Select time frame
df_obs = df_obs[start_date:end_date]
# Reformat the date for plotting
df_obs["date"] = df_obs.index.map(lambda s: s.strftime('%b-%d-%y'))
# Reindex
df_obs = df_obs.reset_index(drop=True)
# Select snow, precip, PET, streamflow and T
df_obs = df_obs[["snow_depth_water_equivalent_mean", "total_precipitation_sum","potential_evaporation_sum","streamflow", "temperature_2m_mean", "date"]]
# Rename variables
df_obs.columns = ["Snow [mm/day]", "P [mm/day]", "PET [mm/day]", "Q [mm/day]", "T [C]", "Date"]

# Prepare the data intput for both models
P = df_obs["P [mm/day]"].to_numpy()
evap = df_obs["PET [mm/day]"].to_numpy()
temp = df_obs["T [C]"].to_numpy()

# load calibrated parameters
params_calibrated = pd.read_csv("./data/calibrated_parameters - HBV.csv")
params_calibrated = params_calibrated[(params_calibrated.catchment_name == catchment_name) & (params_calibrated.objective_function == "nse")] # use only this catchment and the rmse parameters

**Using signatures to evaluate HBV results**

In [5]:
def your_own_signature(SWE):
    """SWE is the snow-water equivalent, i.e. the amount of water that is stored as snow in the catchment. 
       The function is once called with the measured SWE and once with the simulated SWE."""
    # TODO: implement your own idea here
    return "implement"

In [6]:
# DO NOT ALTER! code to calculate and plot the signatures

param_names = ["Ts", "CFMAX", "CFR", "CWH", "BETA", "LP", "FC", "PERC", "K0", "K1", "K2", "UZL", "MAXBAS"]
lower       = [-3, 0, 0, 0, 0, 0.3, 1, 0, 0.05, 0.01, 0.005, 0, 1] # lower bounds for the parameters
upper       = [3, 20, 1, 0.8, 7, 1, 2000, 100, 2, 1, 0.1, 100, 6]  # upper bounds for the parameters

# widgets for easy input
params = {param: FloatSlider(value=params_calibrated.round(1).iloc[0, j+3], min=xmin, max=xmax, step=0.1, description=param) for j, xmin, xmax, param in zip(range(13), lower, upper, param_names)}

@interact(scalelog=Checkbox(value=False, description="Log Scale for Flow Curve"), lmbda=FloatSlider(min=-2, max=2, step=0.1, value=1, description="BC Lambda"), sep=HTML("HBV Parameters ---", description="---"), **params)
def signature_function(scalelog, lmbda=0, sep="", **params):
    
    # run HBV simulation
    params = np.array(list(params.values()))
    Q_sim = hbv(params, P, temp, evap)
    Q_obs = df_obs["Q [mm/day]"].values

    # run snow component for HBV
    _, SWE_sim, _ = HBV.snow_routine(params[:4], temp, P)
    SWE_sim = SWE_sim[:,0] + SWE_sim[:,1]

    signatures = ["Runoff Ratio", "Central Slope", "Baseflow Index", "Fast Recession Constant", "Lag Time", "Your Snow Signature"]  
    results_sim = [runoff_ratio(Q_sim, P), slope_of_flow_duration_curve(Q_sim), baseflow_index(Q_sim), recession_constant(Q_sim), lag_time(Q_sim, Q_obs, P), your_own_signature(SWE_sim)]
    results_obs = [runoff_ratio(Q_obs, P), slope_of_flow_duration_curve(Q_obs), baseflow_index(Q_obs), recession_constant(Q_obs), lag_time(Q_obs, Q_obs, P), your_own_signature(df_obs["Snow [mm/day]"])]
    df_results = pd.DataFrame({"Observed": results_obs, "Simulated": results_sim}, index=signatures)
    
    # --- PLOTS ---
    
    plt.figure(figsize=(25, 5))

    # OBSERVED AND SIMULATED RUNOFF (using BoxCox)
    plt.subplot(131)
    plt.title(f"Hydrograph (with BC Lambda = {lmbda:.1f})")
    lineobs, = plt.plot(df_obs["Date"], boxcox1p(Q_obs, lmbda), color="black")
    linesim, = plt.plot(df_obs["Date"], boxcox1p(Q_sim, lmbda), color="C3")
    plt.gca().xaxis.set_major_locator(mdate.MonthLocator(interval=2))
    plt.xticks(rotation=45)
    plt.xlabel("Date")
    plt.ylabel("Q [mm/day]")
    
    # PRECIPITATION 
    ax = plt.twinx()
    ax.invert_yaxis()
    pbar = ax.bar(df_obs["Date"], df_obs["P [mm/day]"], color="skyblue")
    ax.set_ylabel("P [mm/day]")
    plt.legend([lineobs, linesim, pbar], ["Observed", "Simulated", "Precipitation"], loc="upper left")

    # FLOW DURATION CURVES 
    plt.subplot(132)
    sns.ecdfplot(y=Q_obs, color="black", complementary=True, ax=plt.gca(), label="Observed")
    sns.ecdfplot(y=Q_sim, color="red",   complementary=True, ax=plt.gca(), label="Simulated")
    plt.title("Flow Duration Curve")
    plt.ylabel("Q [mm/day]")
    plt.legend()
    if scalelog:
        plt.loglog()

    plt.subplot(133)
    plt.plot(df_obs["Date"], df_obs["Snow [mm/day]"], color="black", label="Observed")
    plt.plot(df_obs["Date"], SWE_sim[1:], color="red", label="Simulated")
    plt.title("Snow Water-Equivalent")
    plt.gca().xaxis.set_major_locator(mdate.MonthLocator(interval=2))
    plt.xticks(rotation=45)
    plt.xlabel("Date")
    plt.ylabel("SWE [mm]")
    plt.legend()
    
    plt.gcf().suptitle(catchment_name, fontweight="bold")
    plt.tight_layout()
    plt.show()

    # display signatures
    display(df_results)

interactive(children=(Checkbox(value=False, description='Log Scale for Flow Curve'), FloatSlider(value=1.0, de…

<div style="background:#e0f2fe; padding: 1%; border:1mm solid SkyBlue; color:black">
    <h4><span>&#129300 </span>Your Turn II: Applying Signatures to HBV</h4>
    Above you will find the simulated and observed hydrographs for the catchment that you selected (remember to rerurn the code when you change this). The right plot shots the flow duration curve, i.e. the cumulative distribution function (CDF) of the observed and simulated runoff. You can again transform the data in the hydrograph using the Box-Cox transformation with the lambda parameter and enable logarithmic scaling in the CDF.
    <ol>
        <li>Which processes in the HBV model are linked to which signatures? Which parameters affect certain signatures?</li>
    </ol>
    Please change the catchment to Trout or Siletz River. Above the plot, you will find an empty function. It will be called once with the observed snow water-equivalent, i.e. the amount of water stored in the snowpack, and once with the simulated data.
    <ol start="2">
        <li>What could you use as a snow-related signature?</li>
        <li>Can you implement this signature?</li>
    </ol>
</div>

_PUT YOUR ANSWERS HERE_

<div style="background:#e0f2fe; padding: 1%; border:1mm solid SkyBlue; color:black">
</div>