# Benchmarking initial model performance

## Introduction

This notebook guides you through the process of benchmarking hydrological models within the CONFLUENCE framework using several simple literature benchmark. Model benchmarking is a critical evaluates the quality of the model simulations by comparing the results to various performance alternatives.

Key steps covered in this notebook include:

1. Pre-processing the benchmarking data
2. Calculating the benchmark datasets for the simulation period
3. Vizualising the comparison of the model simulations to the benchmark and summarizing the results

In this notebook we focus on benchmarking the primary model chosen for your project (e.g., SUMMA) and the HydroBM benchmarking library, but the principles can be applied to other models and benchmarking paradigms as well.

## First we import the libraries and functions we need

In [None]:
import sys
from pathlib import Path
from typing import Dict, Any
import logging
import yaml # type: ignore

current_dir = Path.cwd()
parent_dir = current_dir.parent.parent
sys.path.append(str(parent_dir))

from utils.evaluation_util.evaluation_utils import Benchmarker # type: ignore
from utils.dataHandling_utils.data_utils import BenchmarkPreprocessor # type: ignore  

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## Check configurations

Now we should print our configuration settings and make sure that we have defined all the settings we need. 

In [None]:
config_path = Path('../../0_config_files/config_active.yaml')
with open(config_path, 'r') as config_file:
    config = yaml.safe_load(config_file)
    print(f"FORCING_DATASET: {config['FORCING_DATASET']}")
    print(f"EASYMORE_CLIENT: {config['EASYMORE_CLIENT']}")
    print(f"FORCING_VARIABLES: {config['FORCING_VARIABLES']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")
    print(f"EXPERIMENT_TIME_START: {config['EXPERIMENT_TIME_START']}")

## Define default paths

Now let's define the paths to data directories before we run the pre processing scripts and create the containing directories

In [None]:
# Main project directory
data_dir = config['CONFLUENCE_DATA_DIR']
project_dir = Path(data_dir) / f"domain_{config['DOMAIN_NAME']}"

# Data directoris
evaluation_results = project_dir / 'evaluation' 
benchmarking_plots = project_dir / 'plots' / 'benchmarking'

# Make sure the new directories exists
evaluation_results.mkdir(parents = True, exist_ok = True)
benchmarking_plots.mkdir(parents = True, exist_ok = True)

## 1. Pre-Process the benchmarking data

In [None]:
# Preprocess data for benchmarking
preprocessor = BenchmarkPreprocessor(config, logger)
benchmark_data = preprocessor.preprocess_benchmark_data(f"{config.get('EXPERIMENT_TIME_START').split('-')[0]}-01-01", f"{config.get('EXPERIMENT_TIME_END').split('-')[0]}-12-31")

## 2. Run benchmarking scripts

In [None]:
# Run benchmarking
benchmarker = Benchmarker(config, logger)
benchmark_results = benchmarker.run_benchmarking(benchmark_data, f"{config.get('EXPERIMENT_TIME_END').split('-')[0]")


## 3. Visualise and summarise the benchmarking

In [None]:
# Initialize the benchmarking vizualiser
bmv = benchmarkingVisualiser(config,logger)

# Run the visualisation 
bmv.vizualise_streamflow()