![](./figures/Logo.PNG)

## In this part of the tutorial, you will
* use metrics to assess simulation performance
* study scatter plots of multiple objective functions

- - -

# 2b - Statistical Evaluation Metrics

- - -

## 1. Introducing Bias, RMSE, NSE and KGE

In tutorial 2a, we have relied on visual inspection to learn about the model performance and to fit of the model output to the observed runoff. For some sets of parameter combinations, it can be difficult to assess which set returns the best result. In this tutorial, we will use evaluation metrics, enabling a more robust comparison between model runs with different parameterizations.

We start with a simple metric of closeness: the **bias**.

**Bias**: Bias is the consistent deviation of simulation results from observed values. It indicates the model's tendency to systematically overestimate or underestimate the target variable.

Let $y_i$ represent the observed value and $\hat{y}_i$ denote the simulated value. The bias is calculated as:

$$
\text{Bias} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)
$$

where $n$ is the total number of data points.

Next, we will look at three additional metrics: **Root Mean Square Error (RMSE)**, **Kling-Gupta Efficiency (KGE)**, and **Nash-Sutcliffe Efficiency (NSE)**.

**Root Mean Square Error (RMSE)**: RMSE measures the square root of the average squared differences between predicted values and the corresponding actual values (in other words: the square root of the MSE).

$$
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
$$

where $y_i$ represents the observed value, $\hat{y}_i$ denotes the simulated value, and $n$ is the total number of data points.

**Kling-Gupta Efficiency (KGE)**: KGE is a hydrological metric that assesses the performance of hydrological models by measuring the correlation, bias, and variability of their predictions against observed hydrograph data. It allows evaluation of the model's accuracy, timing, and volume representation.

$$
\text{KGE} = 1 - \sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2}
$$

where $r$ represents the Pearson correlation coefficient, $\alpha$ (alpha) is the ratio of the standard deviations between observed and simulated values, and $\beta$ (beta) is the ratio of their means.

**Nash-Sutcliffe Efficiency (NSE)**: NSE measures the proportion of the observed variance that is explained by the model results. It is particularly useful for evaluating streamflow predictions. A perfect NSE value of 1 indicates a perfect fit between the model and observed data, while negative values suggest the model performs worse than simply using the mean of the observed values.

$$
\text{NSE} = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}
$$

where $y_i$ represents the observed value, $\hat{y}_i$ denotes the simulated value, $n$ is the total number of data points, and $\bar{y}$ is the mean of the observed values.

<div style="background:#e0f2fe; padding: 1%; border: 1mm solid SkyBlue">
    <h4><span>&#129300 </span>Your Turn I: Understanding the Metrics</h4>
    <ol>
        <li>Can you think of a reason why one would prefer RMSE to MSE?</li>
        <li>What are likely limitations of such metrics? What do they miss?</li>
    </ol>
</div>

## 2. Using Bias, RMSE, NSE and KGE

**Import packages**

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import scipy
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
import sys
sys.path.append('src/')
import HyMod
from ipywidgets import interact, Dropdown 

**Defining Bias, RMSE, NSE and KGE**

By the way, the red string at the start of the function which uses three  <code style="color:darkred">"""</code> is called [docstring](https://realpython.com/documenting-python-code/#documenting-your-python-code-base-using-docstrings). It acts as a description of the function and is also used to describe the arguments and return value. We would suggest that you write docstring whenever the function gets more complicated or its arguments aren't immediately clear.

In Jupyter Notebook (or Lab) the documentation can be accessed by pressing `Shift` + `Tab` (both Windows and Mac) when the cursor is placed in the function call. This also works for all other functions and modules, e.g. _numpy_, _scipy_, _matplotlib_, ...

In [2]:
def bias(obs, sim):
    """
    Calculate the Bias between observed and simulated values.
    
    Bias measures the consistent deviation of simulation results from observed values,
    indicating whether the model systematically overestimates or underestimates the target variable.
    """
    return np.mean(np.subtract(obs, sim))  # Mean of observation values minus simulation results

def rmse(obs, sim):
    """
    Calculate the Root Mean Square Error (RMSE) between observed and simulated values.
    
    RMSE measures the square root of the average squared differences between predicted and actual values.
    """
    return np.sqrt(np.mean(np.square(np.subtract(obs, sim))))

def nse(obs, sim):
    """
    Calculate the Nash-Sutcliffe Efficiency (NSE) between observed and simulated values.
    
    NSE measures the proportion of the observed variance explained by the model results.
    A perfect NSE of 1 indicates a perfect fit, while negative values suggest worse performance than using the mean of observed values.
    """
    r_nse = np.corrcoef(obs, sim)[0][1] 
    alpha_nse = np.divide(np.std(sim), np.std(obs))
    beta_nse = np.divide(np.subtract(np.mean(sim), np.mean(obs)), np.std(obs))
    nse = 2 * alpha_nse * r_nse - np.square(alpha_nse) - np.square(beta_nse)
    return nse

def kge(obs, sim):
    """
    Calculate the Kling-Gupta Efficiency (KGE) between observed and simulated values.
    
    KGE assesses model performance by measuring correlation, bias, and variability against observed data.

    Returns:
    tuple: (correlation coefficient, variation ratio, bias ratio, KGE value)
    """
    r_kge = np.corrcoef(obs, sim)[0][1]  # Pearson correlation coefficient
    alpha_kge = np.divide(np.std(sim), np.std(obs))  # Variation ratio
    beta_kge = np.divide(np.mean(sim), np.mean(obs))  # Bias
    kge = 1 - np.sqrt(np.square(r_kge - 1) + np.square(beta_kge - 1) + np.square(alpha_kge - 1))
    return round(r_kge, 3), round(alpha_kge, 3), round(beta_kge, 3), round(kge, 3)

def kge_only(obs, sim):
    """
    Calculate the Kling-Gupta Efficiency (KGE) between observed and simulated values.
    
    Returns only the KGE value, excluding intermediate metrics.
    """
    _, _, _, kge_value = kge(obs, sim)
    return kge_value

**Create and display dropdown for selecting catchment**

In [3]:
# DO NOT ALTER! code to select the catchment

catchment_names = ["Trout River, BC, Canada", "Medina River, TX, USA", "Siletz River, OR, USA"]
dropdown = Dropdown(
    options=catchment_names,
    value=catchment_names[0],
    description='Catchment:',
    disabled=False)

display(dropdown)

Dropdown(description='Catchment:', options=('Trout River, BC, Canada', 'Medina River, TX, USA', 'Siletz River,…

**Read catchment data and prepare input data for model**

In [4]:
# Read catchment data
catchment_name = dropdown.value
# Read catchment data
file_dic = {catchment_names[0]: "hysets_10BE007", catchment_names[1]: "camels_08178880", catchment_names[2]: "camels_14305500"}
df_obs = pd.read_csv(f"data/{file_dic[catchment_name]}.csv")
# Make sure the date is interpreted as a datetime object -> makes temporal operations easier
df_obs.date = pd.to_datetime(df_obs['date'], format='%Y-%m-%d')
# Select time frame
start_date = '2002-10-01'
end_date = '2003-09-30'

# Index frame by date
df_obs.set_index('date', inplace=True)
# Select time frame
df_obs = df_obs[start_date:end_date]
# Reformat the date for plotting
df_obs["date"] = df_obs.index.map(lambda s: s.strftime('%b-%d-%y'))
# Reindex
df_obs = df_obs.reset_index(drop=True)
# Select snow, precip, PET, streamflow and T
df_obs = df_obs[["snow_depth_water_equivalent_mean", "total_precipitation_sum","potential_evaporation_sum","streamflow", "temperature_2m_mean", "date"]]
# Rename variables
df_obs.columns = ["Snow [mm/day]", "P [mm/day]", "PET [mm/day]", "Q [mm/day]", "T [C]", "Date"]

# Prepare the data intput for both models
P = df_obs["P [mm/day]"].to_numpy()
evap = df_obs["PET [mm/day]"].to_numpy()
temp = df_obs["T [C]"].to_numpy()    

**Using bias and RMSE to evaluate HyMOD results**

In [5]:
@interact(
    Sm_blue=(0, 400, 1), beta_blue=(0, 2, 0.01), alfa_blue=(0, 1, 0.01), Rs_blue=(8.0, 200.0, 0.5), Rf_blue=(1.0, 7.0, 0.5),
    Sm_red=(0, 400, 1), beta_red=(0, 2, 0.01), alfa_red=(0, 1, 0.01), Rs_red=(8.0, 200.0, 0.5), Rf_red=(1.0, 7.0, 0.5), 
    lmbda = (0.01, 1.0, 0.01)
)
def compare_hymod_runs(Sm_blue=200, beta_blue=1, alfa_blue=0.5, Rs_blue=50, Rf_blue=6, 
                       Sm_red=150, beta_red=0.7, alfa_red=0.7, Rs_red=40, Rf_red=3,
                       lmbda=0.2):
    # Run HyMOD simulation 1 (blue)
    param_blue = np.array([Sm_blue, beta_blue, alfa_blue, 1/Rs_blue, 1/Rf_blue])
    q_sim_blue, states_blue, fluxes_blue = HyMod.hymod_sim(param_blue, P, evap)
    # Make Dataframe from results
    df_model_blue = pd.DataFrame({'Q [mm/day]': q_sim_blue, 'ET [mm/day]': fluxes_blue.T[0], 'Date': df_obs["Date"].to_numpy()})

    # Run HyMOD simulation 2 (red)
    param_red = np.array([Sm_red, beta_red, alfa_red, 1/Rs_red, 1/Rf_red])
    q_sim_red, states_red, fluxes_red = HyMod.hymod_sim(param_red, P, evap)
    # Make Dataframe from results
    df_model_red = pd.DataFrame({'Q [mm/day]': q_sim_red, 'ET [mm/day]': fluxes_red.T[0], 'Date': df_obs["Date"].to_numpy()})

    # The Box-Cox transform is given by:
    # y = (x**lmbda - 1) / lmbda,  for lmbda != 0
    #     log(x),                  for lmbda = 0
    # reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.boxcox.html
    df_obs["BC-Q [mm/day]"] = scipy.stats.boxcox(df_obs['Q [mm/day]'].array, lmbda=lmbda)
    df_model_red["BC-Q [mm/day]"] = scipy.stats.boxcox(df_model_red['Q [mm/day]'].array, lmbda=lmbda)
    df_model_blue["BC-Q [mm/day]"] = scipy.stats.boxcox(df_model_blue['Q [mm/day]'].array, lmbda=lmbda)
    
    # Plot results
    plt.close()
    fig, ax = plt.subplots(1, 1, figsize=(20, 4))

    # Plot the simulated and observed Q
    sns.lineplot(data=df_model_blue, x="Date", y="Q [mm/day]", label="HyMOD (blue)", color="blue")
    sns.lineplot(data=df_model_red, x="Date", y="Q [mm/day]", label="HyMOD (red)", color="red")
    sns.lineplot(data=df_obs, x="Date", y="Q [mm/day]", color="black", label="Observed")

    # Show only the main ticks
    locator = mdate.MonthLocator()
    plt.gca().xaxis.set_major_locator(locator)
    ax.set_title(catchment_name)
    
    # Display the figure
    plt.show()

    # Calculate metrics
    bias_blue = round(bias(df_obs["Q [mm/day]"], df_model_blue["Q [mm/day]"]), 3)
    rmse_blue = round(rmse(df_obs["Q [mm/day]"], df_model_blue["Q [mm/day]"]), 3)
    bc_rmse_blue = round(rmse(df_obs["BC-Q [mm/day]"], df_model_blue["BC-Q [mm/day]"]), 3)  

    bias_red = round(bias(df_obs["Q [mm/day]"], df_model_red["Q [mm/day]"]), 3)
    rmse_red = round(rmse(df_obs["Q [mm/day]"], df_model_red["Q [mm/day]"]), 3)
    bc_rmse_red = round(rmse(df_obs["BC-Q [mm/day]"], df_model_red["BC-Q [mm/day]"]), 3)

    # Print metric results
    print(f"Blue model run: Bias = {bias_blue}, RMSE = {rmse_blue}, B.C.-RMSE = {bc_rmse_blue}")
    print(f"Red  model run: Bias = {bias_red}, RMSE = {rmse_red}, B.C.-RMSE = {bc_rmse_red}")

interactive(children=(IntSlider(value=200, description='Sm_blue', max=400), FloatSlider(value=1.0, description…

<div style="background:#e0f2fe; padding:1%; border: 1mm solid SkyBlue">
    <h4><span>&#129300 </span>Your Turn II: Apply and Discuss these Metrics</h4>
    <ol>
        <li>Describe the model performance evaluated with RMSE and Box-Cox transformed RMSE.</li>
        <li>What do the metrics capture? What do they miss?</li>
        <li>What value does additionally assessing the bias have?</li>
    </ol>
</div>

**Multiple objectives with bias and RMSE**

In [6]:
@interact(lmbda = (0.01, 1.0, 0.01))    
def multiple_objectives(lmbda=0.2):
    # Define bounds for the parameters
    bounds = {"Sm": (0, 400), "beta": (0, 2), "alfa": (0, 1), "Rs": (8, 200), "Rf": (1, 7)}
    
    # Create random parameter sets within the given ranges
    n = 200
    mult = n  # multiplier for integer generation, after which the integers are divided by mult to get the samples
    samples_Sm = np.array(random.sample(range(bounds["Sm"][0]*mult, bounds["Sm"][1]*mult), n)) / mult  # generate array of random samples within bounds
    samples_beta = np.array(random.sample(range(bounds["beta"][0]*mult, bounds["beta"][1]*mult), n)) / mult
    samples_alfa = np.array(random.sample(range(bounds["alfa"][0]*mult, bounds["alfa"][1]*mult), n)) / mult
    samples_Rs = np.array(random.sample(range(bounds["Rs"][0]*mult, bounds["Rs"][1]*mult), n)) / mult
    samples_Rf = np.array(random.sample(range(bounds["Rf"][0]*mult, bounds["Rf"][1]*mult), n)) / mult
    parameter_sets = np.column_stack((samples_Sm, samples_beta, samples_alfa, samples_Rs, samples_Rf))  # stack the generates samples as columns
    
    df_obs["BC-Q [mm/day]"] = scipy.stats.boxcox(df_obs['Q [mm/day]'].array, lmbda=lmbda)  # Box-Cox transform observations
    
    # Run HyMod for all the parameter sets and compute the metrics
    l_bias = []
    l_rmse = []
    l_bc_rmse = []
    for set_id, parameter_set in enumerate(parameter_sets):
        # Run HyMod, and get the ouput
        Sm, beta, alfa, Rs, Rf = parameter_set
        runoff_sim, states, fluxes = HyMod.hymod_sim([Sm, beta, alfa, 1/Rs, 1/Rf], P, evap)
        df_model = pd.DataFrame({'Q [mm/day]': runoff_sim[-365:], 'ET [mm/day]': fluxes.T[0][-365:], 'Date': df_obs["Date"].to_numpy()})
        l_bias.append(abs(round(bias(df_obs["Q [mm/day]"], df_model["Q [mm/day]"]), 3)))  # calculate bias, round result and append to list
        l_rmse.append(round(rmse(df_obs["Q [mm/day]"], df_model["Q [mm/day]"]), 3))  # calculate rmse, round result and append to list 
        
        df_model["BC-Q [mm/day]"] = scipy.stats.boxcox(df_model['Q [mm/day]'].array, lmbda=lmbda)  # Box-Cox transform simulation results
        l_bc_rmse.append(round(rmse(df_obs["BC-Q [mm/day]"], df_model["BC-Q [mm/day]"]), 3))  # calculate bc rmse, round result and append to list

    df_results = pd.DataFrame({'abs(Bias)': l_bias, 'RMSE': l_rmse, 'B.C.-RMSE': l_bc_rmse})  # create dataframe from metric results
    
    # Plot results
    plt.close()
    fig, axes = plt.subplots(1, 3, figsize=(20, 4))  # create three subplots
    sns.scatterplot(df_results, x='RMSE', y='abs(Bias)', ax=axes[0])
    sns.scatterplot(df_results, x='B.C.-RMSE', y='abs(Bias)', ax=axes[1])
    sns.scatterplot(df_results, x='RMSE', y='B.C.-RMSE', ax=axes[2])
    plt.show()


interactive(children=(FloatSlider(value=0.2, description='lmbda', max=1.0, min=0.01, step=0.01), Output()), _d…

**NSE, KGE and their common parts**

In [7]:
@interact(
    Sm_blue=(0, 400, 1), beta_blue=(0, 2, 0.01), alfa_blue=(0, 1, 0.01), Rs_blue=(8.0, 200.0, 0.5), Rf_blue=(1.0, 7.0, 0.5),
    Sm_red=(0, 400, 1), beta_red=(0, 2, 0.01), alfa_red=(0, 1, 0.01), Rs_red=(8.0, 200.0, 0.5), Rf_red=(1.0, 7.0, 0.5)
)    
def compare_hymod_runs(Sm_blue=200, beta_blue=1, alfa_blue=0.5, Rs_blue=50, Rf_blue=6, 
                       Sm_red=150, beta_red=0.7, alfa_red=0.7, Rs_red=40, Rf_red=3):
    # Run HyMOD simulation 1 (blue)
    param_blue = np.array([Sm_blue, beta_blue, alfa_blue, 1/Rs_blue, 1/Rf_blue])
    q_sim_blue, states_blue, fluxes_blue = HyMod.hymod_sim(param_blue, P, evap)
    df_model_blue = pd.DataFrame({'Q [mm/day]': q_sim_blue[-365:], 'ET [mm/day]': fluxes_blue.T[0][-365:], 'Date': df_obs["Date"].to_numpy()})

    # Run HyMOD simulation 2 (red)
    param_red = np.array([Sm_red, beta_red, alfa_red, 1/Rs_red, 1/Rf_red])
    q_sim_red, states_red, fluxes_red = HyMod.hymod_sim(param_red, P, evap)
    df_model_red = pd.DataFrame({'Q [mm/day]': q_sim_red[-365:], 'ET [mm/day]': fluxes_red.T[0][-365:], 'Date': df_obs["Date"].to_numpy()})
    
    # Plot results
    plt.close()
    fig, ax = plt.subplots(1, 1, figsize=(20, 4))

    # Plot the simulated and observed Q
    sns.lineplot(data=df_model_blue, x="Date", y="Q [mm/day]", label="HyMOD (blue)", color="blue")
    sns.lineplot(data=df_model_red, x="Date", y="Q [mm/day]", label="HyMOD (red)", color="red")
    sns.lineplot(data=df_obs, x="Date", y="Q [mm/day]", color="black", label="Observed")

    # Show only the main ticks
    locator = mdate.MonthLocator()
    plt.gca().xaxis.set_major_locator(locator)
    ax.set_title(catchment_name)
    
    # Display the figure
    plt.show()

    # Calculate metrics
    nse_blue = round(nse(df_obs["Q [mm/day]"], df_model_blue["Q [mm/day]"]), 3)
    r_kge_blue, alpha_kge_blue, beta_kge_blue, kge_blue = kge(df_obs["Q [mm/day]"], df_model_blue["Q [mm/day]"])

    nse_red = round(nse(df_obs["Q [mm/day]"], df_model_red["Q [mm/day]"]), 3)
    r_kge_red, alpha_kge_red, beta_kge_red, kge_red = kge(df_obs["Q [mm/day]"], df_model_red["Q [mm/day]"])

    # Print metric results
    print(f"Blue model run: NSE = {nse_blue}, KGE = {kge_blue}, (KGE components: r = {r_kge_blue}, alpha = {alpha_kge_blue}, beta = {beta_kge_blue})")
    print(f"Red  model run: NSE = {nse_red}, KGE = {kge_red}, (KGE components: r = {r_kge_red}, alpha = {alpha_kge_red}, beta = {beta_kge_red})")

interactive(children=(IntSlider(value=200, description='Sm_blue', max=400), FloatSlider(value=1.0, description…

<div style="background:#e0f2fe; padding:1%; border: 1mm solid SkyBlue">
    <h4><span>&#129300 </span>Your Turn III: Comparing NSE and KGE</h4>
    <ol>
        <li>Compare the values of NSE and KGE! Under what circumstances do either of them give a result closer to their common optimal value (which is 1).</li>
        <li>Which component of KGE dominates the result? (reminder: in KGE, r represents the correlation coefficient, alpha is the ratio of the standard deviations between observed and simulated values, and beta is the ratio of their means)</li>
    </ol>
    Additional Assignments
    <ol start=3>
        <li>Create a multi-objective scatter plot with NSE and KGE on the x- and y-axis (similarly to the scatter plots in Exercise 3 above).</li>
        <li>Create a new function (RMEE - "root mean exponent error") that alters the RMSE function: replace the square by an integer exponent that is also applied to the root. Implement its use and make a scatter plot of RMEE and RMSE.</li>
    </ol>
</div>

**Multiple objectives with NSE and KGE**

In [8]:
# implement your code here (you may of course copy and paste parts of the python cell above)
# beware that you will not need to run this cell interactively => remove the lines with "@interact..." and "def..."
# you also will not need the Box-Cox transformation