# Benchmarking initial model performance

## Introduction

This notebook guides you through the process of benchmarking hydrological models within the CONFLUENCE framework using several simple literature benchmark. Model benchmarking is a critical evaluates the quality of the model simulations by comparing the results to various performance alternatives.

Key steps covered in this notebook include:

1. Pre-processing the benchmarking data
2. Calculating the benchmark datasets for the simulation period
3. Vizualising the comparison of the model simulations to the benchmark and summarizing the results

In this notebook we focus on benchmarking the primary model chosen for your project (e.g., SUMMA) and the HydroBM benchmarking library, but the principles can be applied to other models and benchmarking paradigms as well.

## First we import the libraries and functions we need

In [1]:
import sys
from pathlib import Path
from typing import Dict, Any
import logging
import yaml # type: ignore

current_dir = Path.cwd()
parent_dir = current_dir.parent.parent
sys.path.append(str(parent_dir))

from utils.evaluation_util.evaluation_utils import Benchmarker # type: ignore
from utils.dataHandling_utils.data_utils import BenchmarkPreprocessor # type: ignore  

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## Check configurations

Now we should print our configuration settings and make sure that we have defined all the settings we need. 

In [2]:
config_path = Path('../../0_config_files/config_active.yaml')
with open(config_path, 'r') as config_file:
    config = yaml.safe_load(config_file)
    print(f"FORCING_DATASET: {config['FORCING_DATASET']}")
    print(f"EASYMORE_CLIENT: {config['EASYMORE_CLIENT']}")
    print(f"FORCING_VARIABLES: {config['FORCING_VARIABLES']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")

FORCING_DATASET: ERA5
EASYMORE_CLIENT: easymore cli
FORCING_VARIABLES: longitude,latitude,time,LWRadAtm,SWRadAtm,pptrate,airpres,airtemp,spechum,windspd
EXPERIMENT_TIME_START: 2010-01-01 01:00
EXPERIMENT_TIME_START: 2010-01-01 01:00


## Define default paths

Now let's define the paths to data directories before we run the pre processing scripts and create the containing directories

In [6]:
# Main project directory
data_dir = config['CONFLUENCE_DATA_DIR']
project_dir = Path(data_dir) / f"domain_{config['DOMAIN_NAME']}"

# Data directoris
evaluation_results = project_dir / 'evaluation' 
benchmarking_plots = project_dir / 'plots' / 'benchmarking'

# Make sure the new directories exists
evaluation_results.mkdir(parents = True, exist_ok = True)
benchmarking_plots.mkdir(parents = True, exist_ok = True)

## 1. Pre-Process the benchmarking data

In [7]:
# Preprocess data for benchmarking
preprocessor = BenchmarkPreprocessor(config, logger)
benchmark_data = preprocessor.preprocess_benchmark_data(f"{config['FORCING_START_YEAR']}-01-01", f"{config['FORCING_END_YEAR']}-12-31")

2024-10-21 22:02:32,257 - INFO - Starting benchmark data preprocessing
2024-10-21 22:02:32,425 - INFO - Loaded streamflow data with shape: (118099, 1)
2024-10-21 22:05:47,610 - INFO - Loaded forcing data with variables: ['latitude', 'longitude', 'hruId', 'airpres', 'LWRadAtm', 'SWRadAtm', 'precipitation', 'temperature', 'spechum', 'windspd']
2024-10-21 22:05:47,632 - INFO - Merged data shape: (70135, 3)
2024-10-21 22:05:47,638 - INFO - Filtered data shape: (70135, 3)
2024-10-21 22:05:47,653 - INFO - Data statistics:
         streamflow   temperature  precipitation
count  70135.000000  70135.000000   70135.000000
mean      40.081286    270.366730      94.530307
std       46.004491     10.502746     222.341854
min        5.214286    234.621429       0.000000
25%        9.612819    262.730499       0.014337
50%       18.461614    270.453491      10.557440
75%       54.671425    278.303268      81.689040
max      503.056587    298.729370    7382.431626
2024-10-21 22:05:48,095 - INFO - Benc

## 2. Run benchmarking scripts

In [11]:
# Run benchmarking
benchmarker = Benchmarker(config, logger)
benchmark_results = benchmarker.run_benchmarking(benchmark_data, f"{config['FORCING_END_YEAR']}-12-31")


2024-10-21 22:15:27,825 - INFO - Starting hydrobm benchmarking
2024-10-21 22:15:27,837 - INFO - input data: <xarray.Dataset> Size: 2MB
Dimensions:        (index: 70135)
Coordinates:
  * index          (index) datetime64[ns] 561kB 2010-12-31T17:00:00 ... 2018-...
Data variables:
    streamflow     (index) float64 561kB 8.1 8.139 8.188 ... 9.158 9.162 9.165
    temperature    (index) float32 281kB 245.4 246.7 249.1 ... 256.2 255.3 254.3
    precipitation  (index) float64 561kB 0.0 0.0 3.071 12.14 ... 0.0 0.0 0.0 0.0
2024-10-21 22:15:27,838 - INFO - Running benchmarks ['mean_flow', 'median_flow', 'annual_mean_flow', 'annual_median_flow', 'monthly_mean_flow', 'monthly_median_flow', 'daily_mean_flow', 'daily_median_flow', 'rainfall_runoff_ratio_to_all', 'rainfall_runoff_ratio_to_annual', 'rainfall_runoff_ratio_to_monthly', 'rainfall_runoff_ratio_to_daily', 'rainfall_runoff_ratio_to_timestep', 'monthly_rainfall_runoff_ratio_to_monthly', 'monthly_rainfall_runoff_ratio_to_daily', 'monthly_rain



2024-10-21 22:16:01,713 - INFO - Finished running benchmarks
2024-10-21 22:16:03,500 - INFO - Benchmark flows saved to /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/evaluation/benchmark_flows.csv
2024-10-21 22:16:03,504 - INFO - Benchmark scores saved to /home/darri/data/CONFLUENCE_data/domain_Bow_at_Banff/evaluation


## 3. Visualise and summarise the benchmarking

In [13]:
# Initialize the benchmarking vizualiser
bmv = benchmarkingVisualiser(config,logger)

# Run the visualisation 
bmv.vizualise_streamflow()

NameError: name 'benchmarkingVisualiser' is not defined