![](./figures/Logo.PNG)

## In this part of the tutorial, you will
* use several evaluation metrics to assess a simulation performance.
* learn about the relevance of choosing an evaluation metric.
* discuss the value of calibration-validation.

- - -

# Tutorial 2b - Evaluation metrics: RMSE, KGE and NSE

- - -

## 1 Introducing RMSE, KGE and NSE


In Tutorial 4, we have relied on visual inspection to learn something about the model performance and to fit of the model output to the observed runoff. For some sets of parameter combinations, it can be difficult to assess which set returns the best result. In this tutorial, we will use evaluation metrics, enabling a more robust comparison between model runs with different parameterizations.

TODO write text about evalutation metric

Now we will look at three more metrics: **root mean square error (RMSE)**, **Kling-Gupta Efficiency (KGE)** and **Nash-Sutcliffe Efficiency (NSE)**.

**Root Mean Square Error (RMSE)**: RMSE measures the square root of the average squared differences between its predicted values and the corresponding actual values (in other words: the square root of the MSE).

   ![RMSE Equation](https://latex.codecogs.com/svg.image?\text{RMSE}&space;=&space;\sqrt{\frac{1}{n}&space;\sum_{i=1}^{n}(obs_i&space;-&space;sim_i)^2})

   where **obs<sub>i</sub>** represents the observed value, **sim<sub>i</sub>** denotes the simulated value, and **n** is the total number of data points.

**Kling-Gupta Efficiency (KGE)**: KGE is a hydrological metric that assesses the performance of hydrological models by measuring the correlation, bias, and variability of their predictions against observed hydrograph data. It allows evaluation of the model's accuracy, timing, and volume representation.

   ![KGE Equation](https://latex.codecogs.com/svg.image?\text{KGE}&space;=&space;1&space;-&space;\sqrt{(r-1)^2&space;+&space;(a-1)^2&space;+&space;(b-1)^2})

   where **r** represents the correlation coefficient, **a** is the ratio of the standard deviations between observed and simulated values, and **b** is the ratio of their means.


**Nash-Sutcliffe Efficiency (NSE)**: NSE measures the proportion of the observed variance that is explained by the model results. It is particularly useful for evaluating streamflow predictions. A perfect NSE value of 1 indicates a perfect fit between the model and observed data, while negative values suggest the model performs worse than simply using the mean of the observed values.

![NSE Equation](https://latex.codecogs.com/svg.image?\text{NSE}&space;=&space;1&space;-&space;\frac{\sum_{i=1}^{n}(obs_i&space;-&space;sim_i)^2}{\sum_{i=1}^{n}(obs_i&space;-&space;\bar{obs})^2})


   where **obs<sub>i</sub>** represents the observed value, **sim<sub>i</sub>** denotes the simulated value, and **n** is the total number of data points.

---

### <div class="blue"><span style="color:blue">Exercise section</span></div>
### Exercise 1 - Understanding RMSE, KGE and NSE
Why are such metrics helpful? 

* Answer

What are likely limitations of such metrics? What do they miss?

* Answer

---

## 2 Using RMSE, KGE and NSE

**Import packages**

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
import sys
sys.path.append('src/')
import HBV
import HyMod
from ipywidgets import interact, Dropdown

**Defining RMSE, KGE and NSE**

In [2]:
def rmse(obs, sim):
    return np.sqrt(np.mean(np.square(np.subtract(obs, sim))))

def nse(obs, sim):
    # Nash-Sutcliffe efficiency (NSE)
    # range: negative infinity to 1
    # optimal value: 1
    r_nse = np.corrcoef(obs, sim)[0][1] 
    alpha_nse = np.divide(np.std(sim), np.std(obs))
    beta_nse = np.divide(np.subtract(np.mean(sim), np.mean(obs)), np.std(obs))
    nse = 2 * alpha_nse * r - np.square(alpha_nse) - np.square(beta_nse)
    # alternative: nse = 1-(np.divide(np.sum(np.square(np.subtract(obs, sim))), np.sum(np.square(np.subtract(obs, np.mean(obs))))))
    return nse

def kge(obs, sim):
    # The Kling-Gupta Efficiency (KGE)
    # range: negative infinity to 1
    # optimal value: 1
    r = np.corrcoef(obs, sim)[0][1] # pearson correlation coefficient
    alpha = np.divide(np.std(sim), np.std(obs)) # variation ratio
    beta = np.divide(np.mean(sim), np.mean(obs)) # bias
    kge = 1 - np.sqrt(np.square(r - 1) + np.square(beta - 1) + np.square(alpha - 1))
    return r, alpha, beta, kge

dict_metrics = {"RMSE": rmse, "KGE": kge, "NSE": nse}

def calculate_metrics(dict_metrics, obs, sim):
    metric_results = []
    for key in dict_metrics:
        metric_result = dict_metrics[key](obs, sim)
        metric_results.append({"Metric": key, "Result": metric_result})
    # merge metric results
    df_metric_results = pd.DataFrame.from_records(metric_results)
    return df_metric_results

**Create and display interactive menus for selecting catchment**

In [3]:
catchment_names = ["Medina River, TX, USA", "Siletz River, OR, USA", "Trout River, BC, Canada"]
dropdown = Dropdown(
    options=catchment_names,
    value=catchment_names[0],
    description='Catchment:',
    disabled=False)

display(dropdown)

Dropdown(description='Catchment:', options=('Medina River, TX, USA', 'Siletz River, OR, USA', 'Trout River, BC…

**Read catchment data**

In [4]:
# Read catchment data
catchment_name = dropdown.value
# Read catchment data
file_dic = {catchment_names[0]: "camels_08178880", catchment_names[1]: "camels_14305500", catchment_names[2]: "hysets_10BE007"}
df_obs = pd.read_csv(f"data/{file_dic[catchment_name]}.csv")
# Make sure the date is interpreted as a datetime object -> makes temporal operations easier
df_obs.date = pd.to_datetime(df_obs['date'], format='%Y-%m-%d')
# Select time frame
start_date = '2002-10-01'
end_date = '2003-09-30'

# Index frame by date
df_obs.set_index('date', inplace=True)
# Select time frame
df_obs = df_obs[start_date:end_date]
# Reformat the date for plotting
df_obs["date"] = df_obs.index.map(lambda s: s.strftime('%b-%d-%y'))
# Reindex
df_obs = df_obs.reset_index(drop=True)
# Select snow, precip, PET, streamflow and T
df_obs = df_obs[["snow_depth_water_equivalent_mean", "total_precipitation_sum","potential_evaporation_sum","streamflow", "temperature_2m_mean", "date"]]
# Rename variables
df_obs.columns = ["Snow [mm/day]", "P [mm/day]", "PET [mm/day]", "Q [mm/day]", "T [C]", "Date"]

**Prepare input data for models and define plotting function for model and metric results**

In [5]:
# Prepare the data intput for both models
P = df_obs["P [mm/day]"].to_numpy()
evap = df_obs["PET [mm/day]"].to_numpy()
temp = df_obs["T [C]"].to_numpy()

def plot_results(df_obs, df_model, label):
    plt.close()
    fig, ax = plt.subplots(1, 1, figsize=(20, 4))

    # Plot the simulated and observed Q
    sns.lineplot(data=df_model, x="Date", y="Q [mm/day]", label=label)
    sns.lineplot(data=df_obs, x="Date", y="Q [mm/day]", color="black", label="Observed")

    # Show only the main ticks
    locator = mdate.MonthLocator()
    plt.gca().xaxis.set_major_locator(locator)
    ax.set_title(catchment_name)
    
    # Display the figure
    plt.show()

**Using RMSE, KGE and NSE to evaluate HyMOD results**

In [6]:
@interact(Sm = (0, 400, 1), beta = (0, 2, 0.01), alpha = (0, 1, 0.01), Rs = (0, 1, 0.01), Rf = (0, 1, 0.01))    
def oat_hymod_function(Sm=200, beta=1, alpha=0.5, Rs=0.5, Rf=0.5):
    # Run HyMOD simulation
    param = np.array([Sm, beta, alpha, Rs, Rf]) # Sm (mm), beta (-), alfa (-), Rs (-), Rf (-)
    q_sim, states, fluxes = HyMod.hymod_sim(param, P, evap)
    # Make Dataframe from results
    df_model = pd.DataFrame({'Q [mm/day]': q_sim[-365:], 'ET [mm/day]': fluxes.T[0][-365:], 'Date': df_obs["Date"].to_numpy()})

    # Plot results
    plot_results(df_obs, df_model, "HyMOD")

    # Calculate metrics from simulation results
    df_metric_results = calculate_metrics(dict_metrics, df_obs["Q [mm/day]"], df_model["Q [mm/day]"])
    print(df_metric_results)

interactive(children=(IntSlider(value=200, description='Sm', max=400), FloatSlider(value=1.0, description='bet…

---

### <div class="blue"><span style="color:blue">Exercise section</span></div>
### Exercise 2 

2.1 Speculate on what kind of deviations from the observed they are able to pick up (and which ones they would likely miss).

* Answer

2.2 Are you satisfied with what the RMSE was able to achieve? Is there a reason why it might not be the best choice for an objective function?

* Answer

2.3 Why is the RMSE a metric that can capture how well the process of quick responses are represented?

* Answer

2.4 Are the performances captured by the metrics linked to specific catchments or models? Can you judge how a model deviates from the observed based only on one of the error metrics?

* Answer

2.5 How much would you trust a model evaluation that used only the bias in their validation-calibration process? 

* Answer
  
2.6 What conclusions can you draw for your own future model evaluations (even if they are not hydrological models!) and calibrations in the future?

* Answer



### Additional tasks

1 Answer the questions of Exercise 2 using the metrics to evaluate the HBV model (below). 

2 Can you come up with your own metric? Start by thinking back to the lecture of visual evaluation and the different goals described there.

* Answer

---

## Jupyter format settings

In [8]:
%%html 
<style>.blue {background-color: #8dc9fc;}</style>