# Demonstration of HydroEval's API

This notebook contains a simple example of the usage of the API of `HydroEval` to evaluate simulated and observed streamflow timeseries.

## 1. Load observed and simulated streamflow timeseries

Two example files are provided in the folder `examples/` in order for anyone to reproduce this tutorial. Because the files are NetCDF files, we are going to use the Python package `netCDF4`, but HydroEval is independent of the file format you are working with because it only requires numpy arrays as inputs for observed and simulated timeseries.

In [None]:
from netCDF4 import Dataset

# load the observed timeseries
with Dataset('examples/catchment.obs.flow.nc', 'r', format='NETCDF4') as f:  # read the NetCDF file
    observed_flow = f.variables['flow'][:]  # this is the observed discharge timeseries
    observed_dt = f.variables['DateTime'][:]  # this is the timestamp series for the observed period

# load the observed timeseries
with Dataset('examples/catchment.sim.flow.nc', 'r', format='NETCDF4') as f:  # read the NetCDF file
    simulated_flow = f.variables['flow'][:]  # these are the simulated timeseries
    simulated_dt = f.variables['DateTime'][:]  # this is the timestamp series for the simulated period

It can be a good idea to check that your simulated and observed datasets are actually covering the same period:

In [None]:
import numpy as np

# check that the two timestamp arrays are identical (i.e. same periods)
if not np.array_equal(observed_dt, simulated_dt):
    raise Exception('The observed and simulated periods do not match.')

## 2. Calculate any available objective function

Now that the dataset is loaded in memory, it is time to use HydroEval to evaluate the fit between the observed and simulated streamflow timeseries. To do so, import `hydroeval`, which will give you access to the `evaluator` Python function as well as all objective functions implemented in HydroEval (as Python functions as well). To evaluate the fit of observed and simulated data, you only need to call the evaluator function and provide three mandatory positional arguments: the Python function corresponding to the objective function to be used (e.g. `kge`, `nse`, etc.), the numpy array for the simulated timeseries (it can be multiple timeseries), and the numpy array for the observed timeseries (it can only contain one timeseries). See below examples using these requirements:

In [None]:
from hydroeval import evaluator, kge, rmse

# use the evaluator with the Kling-Gupta Efficiency (objective function 1)
my_kge = evaluator(kge, simulated_flow, observed_flow)

# use the evaluator with the Kling-Gupta Efficiency for inverted flow series (objective function 2)
my_kge_inv = evaluator(kge, simulated_flow, observed_flow, transform='inv')

# use the evaluator with the Root Mean Square Error (objective function 3)
my_rmse = evaluator(rmse, simulated_flow, observed_flow)

It is important to be aware that HydroEval performs pairwise deletion when missing values in the observed streamflow timeseries occur. Missing values should be set to `nan` (Not A Number) in the observed numpy array for HydroEval to be aware of the positions of the values to delete in both observed and simulated timeseries.

It should also be noted that `kge` and `kgeprime` return four values for each simulated timeseries. Indeed, it returns the KGE or KGE' value, as well as their three respective components (r/$\alpha$/$\beta$, and r/$\gamma$/$\beta$, respectively). However, `kge_c2m` and `kgeprime_c2m` only return one value, that is the corresponding bounded KGE value only.

HydroEval can only deal with 1-dimensional or 2-dimensional numpy arrays as input. The observed numpy array can only contain one time series of flows, hence it should either be a 1-dimensional array of a 2-dimensional array with one of the two dimensions of size 1. HydroEval can evaluate multiple simulated time series against the observed time series at the same time (using vectorised calculations). Which means that the simulated time series can be a 2-dimensional array with both dimensions of size greater than 1, one of them being the time dimension and must match the length of the observed time series. Both observed and simulated arrays must have the same orientation, that is to say they must have their time dimensions along the same axis. By default, HydroEval is expecting the time dimension to be on the first axis (i.e. `axis=0`) and this for both observed and simulated time series (if they are 2-dimensional of course). If this is not the case with your dataset, you need to redefine the default value of the `axis` keyword argument and set it to `1` (for the second axis). Alternatively, you can transpose your arrays before giving them to HydroEval. For multi-component objective functions such as KGE and its variants, the orientation of the input array is preserved in the output array (i.e. the time dimension will be reduced to the number of components in the objective function). For single-component objective functions such as NSE, the value returned is either a scalar if only one simulation time series is evaluated, or a 1-dimensional numpy array if the several simulation time series are evaluated.