![](./figures/Logo.PNG)

## In this part of the tutorial, you will
* use several process-based signatures to assess model performance
* learn how signatures provide diagnostic insights
* add your own signature 

- - -

# 2 c - Signatures

- - -

## 1 About signatures

Statistical metrics quantify model fit, but it is often possible to achieve high metric values with an unrealistic model. Process-based signatures provide an alternative or complementary strategy for model evaluation with diagnostic potential. In the best case, signatures quantify underlying processes and therefore enable the modeller to compare how process dynamics are represented in the model in comparison to the real system. We can thus investigate where and when a model is an inadequate representation of the underlying system, and, equally important, how the model might be improved. [(Gupta et al., 2008)](https://doi.org/10.1002/hyp.6989       )

In this tutorial, we use several signatures to analyse observed and modelled river discharge:
* The **runoff ratio** is the proportion of precipitation that is not absorbed by the soil and vegetation, instead flowing over the land surface and into rivers or other water bodies.
* The **slope of the flow duration curve**, expressed as the negative of the derivative of exceedance probability with respect to flow, indicates the rate at which the probability of exceeding a given flow diminishes as discharge increases, providing insights into the streamflow variability across different percentiles.
* The **baseflow index** is a measure of the proportion of streamflow in a river that originates from groundwater discharge, reflecting the contribution of baseflow to the overall streamflow.
* The **recession constant** represents the rate at which a river's discharge decreases during the recession phase, characterizing the decline in streamflow following a peak flow event.
* The **lag time** refers to the time delay between the occurrence of peak rainfall and the corresponding peak discharge in a river or watershed, reflecting the time taken for precipitation to reach and contribute to streamflow.

---

### <div class="blue"><span style="color:blue">Exercise section</span></div>
### Exercise 1

(a) In the last exercise, you have used NSE and KGE. How are signatures and the NSE/KGE components (bias, variability, correlation) related? In how far are the NSE/KGE components diagnostic?

* Answer

(b) Discuss with your neighbour: How many signatures do we actually need for a modelling study? What does the number depend on?

* Answer

---

## 2 Using hydrological signatures

**Import packages**

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
from scipy.signal import argrelextrema
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
from matplotlib.lines import Line2D
import sys
sys.path.append('src/')
import HyMod
from ipywidgets import interact, Dropdown

**Defining functions**

In [2]:
def runoff_ratio(runoff, precip):
    return round(np.divide(np.mean(runoff),np.mean(precip)), 3)


def calc_percentile(data, x):  # used in "slope_of_flow_duration_curve"
    p = 1 - x/100 # transform to get exceedance probability
    # get ranks as a proxy for exceedance probabilities
    data_tmp = data[~np.isnan(data)] # remove NaN values
    data_sorted = np.sort(data_tmp)
    data_ranked = np.linspace(1,len(data_tmp),len(data_tmp)) # give unique (random) rank to every measurement
    FDC = 1 - data_ranked / len(data_ranked) # flow duration curve
    
    # find x-th flow percentile
    indices = np.linspace(1,len(FDC),len(FDC))
    bound_x = int(np.max(indices[FDC >= p]))
    data_x = data_sorted[bound_x]

    return data_x


def slope_of_flow_duration_curve(data):
    lower_percentile = 33.3 
    upper_percentile = 66.6
    return round(calc_percentile(data, lower_percentile) - calc_percentile(data, upper_percentile), 3)


def baseflow_index(data):
    a = 0.925 # coefficient: https://www.scirp.org/(S(351jmbntvnsjt1aadkposzje))/journal/paperinformation.aspx?paperid=83002#return41
    baseflow_t0 = 0
    runoff_t0 = 0
    l_baseflow = []
    for i, runoff_t1 in enumerate(data):
        if i > 0:
            baseflow_t1 = (a * baseflow_t0) + (((1-a)/2) * (runoff_t1 + runoff_t0))  # compute baseflow: Q_b_t1 = a*Q_b_t0 + (1-a)/2) * (Q_t1 - Q_t0)
            baseflow_t0 = baseflow_t1  # for next time step: migrate t1 to t0
            l_baseflow.append(baseflow_t1)
        runoff_t0 = runoff_t1  # for next time step: migrate t1 to t0
    return round(np.mean(l_baseflow), 3)


def find_peak_to_min(data):  # used in "recession_constant"
    id_max = np.argmax(data)  # get id of the max value in data
    data_cropped = data[id_max:]  # crop data (cropped data starts with maximum)
    # use argrelextrema to find local minima, use [0][0] to get the first one in cropped data
    id_next_min = id_max + argrelextrema(data_cropped, np.less)[0][0] + 1
    return id_max, id_next_min


def recession_constant(data):
    id_peak, id_next_min = find_peak_to_min(data)
    timesteps = id_next_min - id_peak
    peak = data[id_peak]
    next_min = data[id_next_min] # last element of data
    return round(- np.log(peak/next_min)/ timesteps, 3)  # https://docs.niwa.co.nz/library/public/HHPP8.pdf


def lag_time(data, data_obs, precip, search_range=100):
    id_peak_flow_obs = np.argmax(data_obs)  # get id of peak observed flow
    precip_cropped = precip[:id_peak_flow_obs+2]  # crop precip
    id_prior_max_precip = argrelextrema(precip_cropped, np.greater)[0][-1]  # find id of local max precip before peak observed flow 
    # in line above: using [0] to get the first array, and [-1] to get last element of that array
    data_cropped = data[id_peak_flow_obs-1:id_peak_flow_obs+search_range+2]  # crop data 
    # in line above: 
    #  - start slice at id_peak_flow_obs-1 to allow max to be found (needs the value before the max to determine it correctly)
    #  - end slice at id_peak_flow_obs+search_range+2
    id_data_cropped = argrelextrema(data_cropped, np.greater)[0][0] - 1 # get id of the first max value in cropped data
    # in line above: using [0] to get the first array, and the next [0] to get the first element of that array
    id_data = id_peak_flow_obs + id_data_cropped
    return round(id_data - id_prior_max_precip, 3)


**Create and display interactive menus for selecting catchment**

In [3]:
catchment_names = ["Medina River, TX, USA", "Siletz River, OR, USA", "Trout River, BC, Canada"]
dropdown = Dropdown(    options=catchment_names,
    value=catchment_names[0],
    description='Catchment:',
    disabled=False)

display(dropdown)

Dropdown(description='Catchment:', options=('Medina River, TX, USA', 'Siletz River, OR, USA', 'Trout River, BC…

**Read catchment data and prepare model input**

In [4]:
# Read catchment data
catchment_name = dropdown.value
# Read catchment data
file_dic = {catchment_names[0]: "camels_08178880", catchment_names[1]: "camels_14305500", catchment_names[2]: "hysets_10BE007"}
df_obs = pd.read_csv(f"data/{file_dic[catchment_name]}.csv")
# Make sure the date is interpreted as a datetime object -> makes temporal operations easier
df_obs.date = pd.to_datetime(df_obs['date'], format='%Y-%m-%d')
# Select time frame
start_date = '2003-01-01'  # the first year is used as spin up. Evaluation is done for the time series after spin up.
end_date = '2004-12-30'

# Index frame by date
df_obs.set_index('date', inplace=True)
# Select time frame
df_obs = df_obs[start_date:end_date]
# Reformat the date for plotting
df_obs["date"] = df_obs.index.map(lambda s: s.strftime('%b-%d-%y'))
# Reindex
df_obs = df_obs.reset_index(drop=True)
# Select snow, precip, PET, streamflow and T
df_obs = df_obs[["snow_depth_water_equivalent_mean", "total_precipitation_sum","potential_evaporation_sum","streamflow", "temperature_2m_mean", "date"]]
# Rename variables
df_obs.columns = ["Snow [mm/day]", "P [mm/day]", "PET [mm/day]", "Q [mm/day]", "T [C]", "Date"]

# Prepare the data intput for both models
P = df_obs["P [mm/day]"].to_numpy()
evap = df_obs["PET [mm/day]"].to_numpy()
temp = df_obs["T [C]"].to_numpy()


**Using signatures to evaluate HyMOD results**

In [5]:
@interact(Sm = (0, 400, 1), beta = (0, 2, 0.01), alpha = (0, 1, 0.01), Rs=(8.0, 200.0, 0.5), Rf=(1.0, 7.0, 0.1))    
def signature_function(Sm=200, beta=0.32, alpha=0.45, Rs=150, Rf=2.6):
    # Calculate signatures
    signatures = ["Runoff ratio", "Central slope", "Baseflow index", "Fast recession constant", "Lag time"]  
    param = np.array([Sm, beta, alpha, 1/Rs, 1/Rf])
    sim, states, fluxes = HyMod.hymod_sim(param, P, evap)  # Run HyMOD simulation
    df_model = pd.DataFrame({'Q [mm/day]': sim, 'ET [mm/day]': fluxes.T[0], 'Date': df_obs["Date"].to_numpy()})
    df_model_eval = df_model.iloc[365:]
    sim = df_model_eval["Q [mm/day]"].values
    df_obs_eval = df_obs.iloc[365:]
    obs = df_obs_eval["Q [mm/day]"].values
    results_sim = [runoff_ratio(sim, P), slope_of_flow_duration_curve(sim), baseflow_index(sim), recession_constant(sim), lag_time(sim, obs, P)]
    results_obs = [runoff_ratio(obs, P), slope_of_flow_duration_curve(obs), baseflow_index(obs), recession_constant(obs), lag_time(obs, obs, P)]
    df_results = pd.DataFrame({"Signature": signatures, "Observed": results_obs, "Simulated": results_sim})
    
    # Plot results
    plt.close()
    fig, axes = plt.subplots(1, 2, figsize=(20, 3))
    fig.suptitle(catchment_name)  # set figure title
    # Plot the simulated and observed Q
    sns.lineplot(data=df_model_eval, x="Date", y="Q [mm/day]", color="red", ax=axes[0])
    sns.lineplot(data=df_obs_eval, x="Date", y="Q [mm/day]", color="black", ax=axes[0])
    # Get the right hand side second y-axis and plot the precipitation as inverted bars
    a1 = axes[0].twinx()
    sns.barplot(data=df_obs_eval, x="Date", y="P [mm/day]", ax=a1, label="P", color="dodgerblue", alpha=0.5)
    a1.invert_yaxis()
    # Show only the main ticks
    locator = mdate.MonthLocator()
    plt.gca().xaxis.set_major_locator(locator) 
    axes[0].tick_params(axis='x', labelrotation=45)
    # Add custom legend
    custom_lines = [Line2D([0], [0], color="black", lw=2), Line2D([0], [0], color="red", lw=2), 
                    Line2D([0], [0], color="dodgerblue", lw=2)]
    axes[0].legend(custom_lines, ['Observed discharge','Simulated discharge','Precipitation (P)'], 
                   bbox_to_anchor=(0, 1, 1, 0), loc="lower left")
    # Set subplot title
    axes[0].set_title("Hydrograph")
    # Plot the flow duration curves
    sns.ecdfplot(df_model_eval, y="Q [mm/day]", complementary=True, ax=axes[1], color="red")
    sns.ecdfplot(df_obs_eval, y="Q [mm/day]", complementary=True, ax=axes[1], color="black")
    axes[1].set_title("Flow duration curve")
    axes[1].set_xlabel("Exceedance probability")
    
    plt.show()  # Display the figure
    print(df_results)  # Print results

interactive(children=(IntSlider(value=200, description='Sm', max=400), FloatSlider(value=0.32, description='be…

---

### <div class="blue"><span style="color:blue">Exercise section</span></div>
### Exercise 2 

(a) How do you think the applied signatures link to the processes (and thus parameters) included in HyMod (see figure below)?

* Answer

(b) Can you implement an additional signature for very dry and for snow-driven systems (see pictures below)?



![](./figures/Hymod_fig_cropped.PNG)
![https://commons.wikimedia.org/wiki/File:NachalParan1.jpg, https://commons.wikimedia.org/wiki/File:Snow_Melts_into_River_near_Iceberg_Lake_in_Glacier_National_Park.jpg](./figures/dry_snow_river.PNG)

---

## Jupyter format settings

In [6]:
%%html 
<style>.blue {background-color: #8dc9fc;}</style>