# Calibration of a catchment using multisite multiobjective composition


## About this document


In [None]:
from swift2.doc_helper import pkg_versions_info

print(pkg_versions_info("This document was generated from a jupyter notebook"))

## Use case

This vignette demonstrates how one can calibrate a catchment using multiple gauging points available within this catchment. This only covers the definition of the calibration, **not the execution**. The sample data in the package is a small subset of hourly data to keep things (data size, execution time) small.

This is a joint calibration weighing multiple objectives, possibly sourced from different modelling objects in the semi-distributed structure, thus a whole-of-catchment calibration technique. A staged, cascading calibration is supported and described in another vignette.

In [None]:
from swift2.utils import mk_full_data_id
from swift2.classes import CompositeParameteriser, HypercubeParameteriser, Simulation
# from swift2.wrap.ffi_interop import debug_msd
import xarray as xr
import pandas as pd
import numpy as np
import xarray as xr

In [None]:
import swift2.doc_helper as std

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
from cinterop.timeseries import TIME_DIMNAME, slice_xr_time_series, pd_series_to_xr_series, slice_xr_time_series, pd_series_to_xr_series

In [None]:
from cinterop.timeseries import xr_ts_start, xr_ts_end
import datetime as dt

In [None]:
%matplotlib inline

## Data

The sample data that comes with the package contains a model definition for the South Esk catchment, including a short subset of the climate and flow record data.

<img src="south_esk.png" alt="drawing" width="600"/>

<!-- ![](./south_esk.png)
 -->

In [None]:
model_id = 'GR4J'
site_id = 'South_Esk'

In [None]:
se_climate = std.sample_series(site_id=site_id, var_name='climate')
se_flows = std.sample_series(site_id=site_id, var_name='flow')

In [None]:
se_climate.head(3)

In [None]:
se_climate.tail(3)

## Base model creation 

In [None]:
simulation = std.sample_catchment_model(site_id=site_id, config_id='catchment')
# simulation = swap_model(simulation, 'MuskingumNonLinear', 'channel_routing')
simulation = simulation.swap_model('LagAndRoute', 'channel_routing')

The names of the climate series is already set to the climate input identifiers of the model simulation, so setting them as inputs is easy:

In [None]:
simulation.play_input(se_climate)
simulation.set_simulation_span(xr_ts_start(se_climate), xr_ts_end(se_climate))
simulation.set_simulation_time_step('hourly')

Moving on to define the parameters, free or fixed. We will use (for now - may change) the package calibragem, companion to SWIFT.

In [None]:
std.configure_hourly_gr4j(simulation)

## Parameterisation

We define a function creating a realistic feasible parameter space. The parameteriser is relatively sophisticated, but this is not the main purpose of this vignette, so we do not describe the process about defining and creating parameterisers in gread details. 

In [None]:
from swift2.utils import c, paste0, rep
import swift2.parameteriser as sp
import swift2.helpers as hlp


def create_meta_parameteriser(simulation:Simulation, ref_area=250, time_span=3600):  
    time_span = int(time_span)
    parameteriser = std.define_gr4j_scaled_parameter(ref_area, time_span)
  
    # Let's define _S0_ and _R0_ parameters such that for each GR4J model instance, _S = S0 * x1_ and _R = R0 * x3_
    p_states = sp.linear_parameteriser(
                      param_name=c("S0","R0"), 
                      state_name=c("S","R"), 
                      scaling_var_name=c("x1","x3"),
                      min_p_val=c(0.0,0.0), 
                      max_p_val=c(1.0,1.0), 
                      value=c(0.9,0.9), 
                      selector_type='each subarea')
  
    init_parameteriser = p_states.make_state_init_parameteriser()
    parameteriser = sp.concatenate_parameterisers(parameteriser, init_parameteriser)
    
    hlp.lag_and_route_linear_storage_type(simulation)
    hlp.set_reach_lengths_lag_n_route(simulation)

    lnrp = hlp.parameteriser_lag_and_route()
    parameteriser = CompositeParameteriser.concatenate(parameteriser, lnrp, strategy='')
    return parameteriser



In [None]:
parameteriser = create_meta_parameteriser(simulation)

We have built a parameteriser for jointly parameterising:

* GR4J parameters in transformed spaces ($log$ and $asinh$)
* GR4J initial soil moisture store conditions ($S_0$ and $R_0$)
* A "lag and route" streamflow routing scheme in transform space.

There is even more happening there, because on top of GR4J parameter transformation we scale some in proportion to catchment area and time step length. But this is besides the point of this vignette: refer for instance to the vignette about tied parameters to know more about parameter transformation and composition of parameterisers. 


In [None]:
parameteriser

Let us check that we can apply the parameteriser and use its methods.

In [None]:
parameteriser.set_parameter_value('asinh_x2', 0)
parameteriser.apply_sys_config(simulation)
simulation.exec_simulation()

We are now ready to enter the main topic of this vignette, i.e. setting up a weighted multi-objective for calibration purposes.

## Defining weighting multi-objectives

The sample gauge data flow contains gauge identifiers as column headers

In [None]:
se_flows.head()

The network of nodes of the simulation is arbitrarily identified with nodes '1' to '43'

In [None]:
simulation.describe()["nodes"].keys()

We "know" that we can associate the node identifiers 'node.{i}' with gauge identifiers (note to doc maintainers: manually extracted from legacy swiftv1 NodeLink files)

In [None]:
gauges = c( '92106', '592002', '18311', '93044',    '25',   '181')
node_ids = paste0('node.', c('7',   '12',   '25',   '30',   '40',   '43'))   
# names(gauges) = node_ids

First, let us try the Nash Sutcliffe efficiency, for simplicity (no additional parameters needed). We will set up NSE calculations at two points (nodes) in the catchments. Any model state from a link, node or subarea could be a point for statistics calculation.

The function `multi_statistic_definition` in the module `swift2.statistics` is used to create multisite multiobjective definitions.

In [None]:
import swift2.statistics as ssf

In [None]:
ssf.multi_statistic_definition?

In [None]:
span = simulation.get_simulation_span()
span

In [None]:
s = span['start']
e = span['end']

In [None]:
calibration_points = node_ids[:2]
mvids = mk_full_data_id(calibration_points, 'OutflowRate')
mvids

In [None]:
statspec = ssf.multi_statistic_definition( 
  model_var_ids = mvids, 
  statistic_ids = rep('nse', 2), 
  objective_ids = calibration_points, 
  objective_names = paste0("NSE-", calibration_points), 
  starts = [s, s + pd.DateOffset(hours=3)],
  ends =  [e + pd.DateOffset(hours=-4), e + pd.DateOffset(hours=-2)])

We now have our set of statistics defined:

In [None]:
statspec

To create a multisite objective evaluator we need to use the `create_multisite_objective` method of the `simulation` object. We have out statistics defined. The method requires observations and weights to give to combine statistics to a single objective.

In [None]:
simulation.create_multisite_objective?

In [None]:
observations = [
  se_flows[gauges[0]],
  se_flows[gauges[1]]
]

In [None]:
w = {calibration_points[0]: 1.0, calibration_points[1]: 2.0} # weights (internally normalised to a total of 1.0)
w

In [None]:
moe = simulation.create_multisite_objective(statspec, observations, w)
moe.get_score(parameteriser)

We can get the value of each objective. The two NSE scores below are negative. Note above that the composite objective is positive, i.e. the opposite of the weighted average. This is because the composite objective is always minimisable (as of writing anyway this is a design choice.)

In [None]:
moe.get_scores(parameteriser)

## log-likelihood multiple objective

Now, let's move on to use log-likelihood instead of NSE. This adds one level of complexity compared to the above. Besides calibrating the hydrologic parameters of GR4J and flow routing, with the Log-Likelihood we introduce a set of parameters $(a, b, m, s, ...)$ for each calibration point. The statistic definition is similar and still straightforward

In [None]:
statspec = ssf.multi_statistic_definition(  
  model_var_ids = mvids, 
  statistic_ids = rep('log-likelihood', 2), 
  objective_ids=calibration_points, 
  objective_names = paste0("LL-", calibration_points), 
  starts = rep(span['start'], 2), 
  ends = rep(span['end'], 2) )

statspec

But while we can create the calculator, we cannot evaluate it against the sole hydrologic parameters defined in the previous section

In [None]:
obj = simulation.create_multisite_objective(statspec, observations, w)
### This cannot work if the statistic is the log-likelihood:
# obj.get_scores(fp)

If we try to do the above we would end up with an error message 'Mandatory key expected in the dictionary but not found: b'

For this to work we need to include parameters 

In [None]:
maxobs = np.max(observations[0].values) # NOTE: np.nan??
censor_threshold = 0.01
censopt = 0.0
calc_m_and_s = 1.0 # i.e. TRUE

#		const string LogLikelihoodKeys::KeyAugmentSimulation = "augment_simulation";
#		const string LogLikelihoodKeys::KeyExcludeAtTMinusOne = "exclude_t_min_one";
#		const string LogLikelihoodKeys::KeyCalculateModelledMAndS = "calc_mod_m_s";
#		const string LogLikelihoodKeys::KeyMParMod = "m_par_mod";
#		const string LogLikelihoodKeys::KeySParMod = "s_par_mod";

p = sp.create_parameteriser('no apply')
# Note: exampleParameterizer is also available
p.add_to_hypercube( 
          pd.DataFrame.from_dict( dict(Name=c('b','m','s','a','maxobs','ct', 'censopt', 'calc_mod_m_s'),
          Min   = c(-30, 0, 1,    -30, maxobs, censor_threshold, censopt, calc_m_and_s),
          Max   = c(0,   0, 1000, 1, maxobs, censor_threshold, censopt, calc_m_and_s),
          Value = c(-7,  0, 100,  -10, maxobs, censor_threshold, censopt, calc_m_and_s)) ))

p.as_dataframe()

In [None]:
func_parameterisers = [p, p.clone()] # for the two calib points, NSE has no param here
catchment_pzer = parameteriser
fp = sp.create_multisite_obj_parameteriser(func_parameterisers, calibration_points, prefixes=paste0(calibration_points, '.'), mix_func_parameteriser=None, hydro_parameteriser=catchment_pzer)
fp.as_dataframe()

In [None]:
moe = obj

To get the overall weighted multiobjective score (for which lower is better)

In [None]:
moe.get_score(fp)

To get each individual log-likelihood scores for each gauge:

In [None]:
moe.get_scores(fp)