# Workflow to identify the important inputs on which to focus computationally-intensive calibration by means of GSA

### Andres Peñuela, Valentina Noacco, Francesca Pianosi, Thorsten Wagener 
### (University of Bristol)

This document provides:
* a brief introduction to Global Sensitivity Analysis (GSA);
* a workflow to apply GSA using the SAFE (Sensitivity Analysis For Everybody) toolbox ([References](#References) 1-2).

In this example we apply a GSA method to the calibration of simple hydrological model.

## What is Global Sensitivity Analysis and why shall we use it?

**Global Sensitivity Analysis** is a set of mathematical techniques which investigate how the uncertainty in the model inputs influences the variability of the model outputs ([References](#References) 3-4).

The benefits of applying GSA are:

* **Better understanding of the model**: Evaluate the model behaviour beyond default set-up
    
* **“Sanity check” of the model**: Test whether the model "behaves well" (model validation)
    
* **Priorities for uncertainty reduction**: Identify the important inputs on which to focus computationally-intensive calibration, acquisition of new data, etc. 
    
* **More transparent and robust decisions**: Understand how assumptions about uncertain inputs reflect on the modelling outcome and thus on model-informed decisions


## How Global Sensitivity Analysis works

GSA investigates how the uncertainty in the model input factors influences the variability of the model output.

An '**input factor**' is any element that can be changed before running the model. In general, input factors could be the model's parameters, forcing input data, but also the very equations implemented in the model or other set-up choices (for instance, the spatial resolution) needed for the model execution on a computer.

An '**output**' is any variable that is obtained after the model execution.

The main steps to perform GSA are summarised in the figure below.

In [1]:
#import os
#from IPython.display import Image
#mydir = "C:/Users/vn1197/OneDrive - University of Bristol/proj_SAFEVAL/SAFEPy/SAFEpython_v0.0.0"
#os.chdir(mydir + "/jupyter")
#Image(filename="how_GSA_works.png", width=800, height=800)

a. The input factors are sampled from their ranges of variability.

b. The model is repeatedly run against each of the input sampled combinations.

c. The output samples so obtained can be used to characterise the output uncertainty, for instance through a probability distribution or scatter plots.

d. GSA is applied to the input and output samples in order to obtain a set of sensitivity indices. The sensitivity indices measure the relative influence of each input factor on the uncertainty of the output ([References](#References) 4,5).

# PART II: Priorities for uncertainty reduction or Screening

Screening refers to the identification of those input factors, if any, which have no influence on the model output and therefore can be fixed to any value within their feasible range with negligible implications on the output. It is a preliminary step to inform a subsequent calibration, which is tailored to the subset of influential parameters.

### Step 1: Import Python libraries

In [2]:
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
from plotly.subplots import make_subplots
from ipywidgets import widgets

from HyMOD import hymod

### Step 2: Define and setup the model
<left><img src="HyMOD_diagram_simple.png" width="700px">
HYMOD is a parsimonious rainfall-runoff model based based on the theory of runoff yeild under infiltration excess. The model structure is illustrated on the figure below.

The basin is composed by a group of storages that follow a cumulative distribution F(C) where C is the value of the storage which can vary from 0 to Cmax.

<left><img src="HyMOD_diagram.png" width="700px">

Define: 
- the input factors whose influence is to be analysed with GSA,
- their range of variability (choice made by expert judgement, available data or previous studies),

In [3]:
data = [["mm"  , 1   , 100 , "Maximum storage capacity"],
        ["-"   , 0   , 2   , "Degree of spatial variability of the c_max"],
        ["-"   , 0.01, 0.99, "Factor distributing slow and quick flows"],
        ["-", 0.01, 0.10, "Fractional discharge of the slow release reservoir"],
        ["-", 0.10, 0.99, "Fractional discharge of the quick release reservoirs"]]
model_param = pd.DataFrame(data, 
                           columns=["Unit", "Min value", "Max value", "Description"],
                           index = ["c_max" , "beta" , "alpha", "K_s", "K_q"])
model_param

Unnamed: 0,Unit,Min value,Max value,Description
c_max,mm,1.0,100.0,Maximum storage capacity
beta,-,0.0,2.0,Degree of spatial variability of the c_max
alpha,-,0.01,0.99,Factor distributing slow and quick flows
K_s,-,0.01,0.1,Fractional discharge of the slow release reser...
K_q,-,0.1,0.99,Fractional discharge of the quick release rese...


### Interactive manual calibration

In [4]:
nsteps = 120 # days
dates = pd.date_range(start = '2000-01-01', periods = nsteps)
P = 20 * np.random.random(nsteps)
PET = 5 * np.random.random(nsteps)
init = [20,20,10,10,10]

#### Function to update the Hymod simulation when changing the parameters with the sliders

In [5]:
def update_sim(c_max, beta, alpha, K_s, K_q):
    model = hymod(c_max.value, beta.value, alpha.value, K_s.value, K_q.value)
    ER, Q = model.simulation(P, PET, init)
    return ER, Q

#### Function to update the figure when changing the parameters with the sliders

In [6]:
def update_figure(change):
    with fig_hyd.batch_animate(duration=1000):
        fig_hyd.data[1].y = update_sim(c_max, beta, alpha, K_s, K_q)[1]
    with fig_sto.batch_animate(duration=1000):
        fig_sto.data[0].x = 1 - (1 - np.arange(0,c_max.value+1,1)/c_max.value)**beta.value
        fig_sto.data[0].y = np.arange(0,c_max.value+1,1)
        fig_sto.data[1].line.width = update_sim(c_max, beta, alpha, K_s, K_q)[0][0]
        fig_sto.data[2].marker.size = update_sim(c_max, beta, alpha, K_s, K_q)[0][0]*3

#### Definition of the sliders

In [7]:
# c_max: Maximum soil moisture storage capacity (mm)
c_max = widgets.FloatSlider(min=model_param.loc['c_max','Min value'],
                            max=model_param.loc['c_max','Max value'],
                            value=50, step = 1,
                            description = 'c_max: Maximum soil moisture storage capacity (mm)',
                            continuous_update=False,
                          style = {'description_width': '350px'} ,layout={'width': '550px'})
c_max.observe(update_figure,names = 'value')
# beta: Degree of spatial variability of the c_max
beta = widgets.FloatSlider(min=model_param.loc['beta','Min value'],
                           max=model_param.loc['beta','Max value'],
                           value=1, step = 0.01,
                           description = 'beta: Degree of spatial variability of the c_max',
                           continuous_update=False,
                          style = {'description_width': '350px'} ,layout={'width': '550px'})
beta.observe(update_figure,names = 'value')
# alpha: Factor distributing slow and quick flows
alpha = widgets.FloatSlider(min=model_param.loc['alpha','Min value'],
                            max=model_param.loc['alpha','Max value'],
                            value=0.5, step = 0.01, 
                            description = 'alpha: Factor distributing slow (s) and quick (q) flows (q/s)',
                            continuous_update=False,
                          style = {'description_width': '350px'} ,layout={'width': '550px'})
alpha.observe(update_figure,names = 'value')
# K_s: Fractional discharge of the slow release reservoir
K_s = widgets.FloatSlider(min=model_param.loc['K_s','Min value'],
                          max=model_param.loc['K_s','Max value'],
                          value=0.05, step = 0.01, 
                          description = 'K_s: Fractional discharge of slow release reservoir',
                          continuous_update=False,
                          style = {'description_width': '350px'} ,layout={'width': '550px'})
K_s.observe(update_figure,names = 'value')
# K_q: Fractional discharge of the quick release reservoir
K_q = widgets.FloatSlider(min=model_param.loc['K_q','Min value'],
                          max=model_param.loc['K_q','Max value'],
                          value=0.55, step = 0.01,
                          description = 'K_q: Fractional discharge of quick release reservoir',
                          continuous_update=False,
                          style = {'description_width': '350px'} ,layout={'width': '550px'})
K_q.observe(update_figure,names = 'value')

#### Observed hydrograph

In [8]:
c_max_obs = np.random.uniform(model_param.loc['c_max','Min value'], model_param.loc['c_max','Max value'])
beta_obs  = np.random.uniform(model_param.loc['beta','Min value'],  model_param.loc['beta','Max value'])
alpha_obs = np.random.uniform(model_param.loc['alpha','Min value'], model_param.loc['alpha','Max value'])
K_s_obs   = np.random.uniform(model_param.loc['K_s','Min value'],   model_param.loc['K_s','Max value'])
K_q_obs   = np.random.uniform(model_param.loc['K_q','Min value'],   model_param.loc['K_q','Max value'])

In [9]:
model_obs = hymod(c_max_obs, beta_obs, alpha_obs, K_s_obs, K_q_obs)
ER_obs, Q_obs = model_obs.simulation(P, PET, init)


Casting complex values to real discards the imaginary part


Casting complex values to real discards the imaginary part


Casting complex values to real discards the imaginary part



#### Figure: hydrographs

In [10]:
### Figure with two traces: simulated and observed hydrogrphs ###
model = hymod(c_max.value, beta.value, alpha.value, K_s.value, K_q.value)
ER_sim, Q_sim = model.simulation(P, PET, init)
sim_hyd = go.Scatter(x=dates, y=Q_sim, name='sim hyd')
obs_hyd = go.Scatter(x=dates, y=Q_obs, name='obs hyd')
fig_hyd = go.FigureWidget(data   = [obs_hyd,sim_hyd],
                          layout = go.Layout(xaxis = dict(title = 'date'),
                                             yaxis = dict(range = [0,20],
                                                          title = 'Q')))

#### Storage

In [11]:
c = np.arange(0,c_max.value,1)
F_c = 1 - (1 - c/c_max.value)**beta.value

#### Figure: storage capacity

In [12]:
sto_trace = go.Scatter(x=F_c, y=c, name=None, fill="tozeroy", fillcolor = 'sandybrown',mode='none')
arrow_line = go.Scatter(x=[0.01,0.4], y=[50,50], mode = 'lines', line = dict(color = 'blue', width = ER_sim[0]),opacity = 0.5)
arrow_head = go.Scatter(x=[-0.01], y=[50], mode = 'markers', opacity = 0.5,marker = dict(symbol = 'triangle-right', size = ER_sim[0]*3, color = 'blue'))
sto_layout = go.Layout(xaxis = dict(title = 'F(c)',autorange='reversed',range = [1,0], showgrid = False),
                       yaxis = dict(range = [0,model_param.loc['c_max','Max value']],
                                    title = 'mm', showgrid = False),
                       width=350, height=350,plot_bgcolor=None,showlegend=False,
                       annotations = [dict(x = 0.55, y = 5, text = 'Soil storage capacity',showarrow=False),
                                      dict(x = 0.3, y = 60, text = 'Effective rain',showarrow=False)])
fig_sto = go.FigureWidget(data   = [sto_trace,arrow_line,arrow_head],
                          layout = sto_layout)

#### Figure plot

In [13]:
widgets.VBox([widgets.HBox([widgets.VBox([c_max,beta,alpha,K_s, K_q])]),
              fig_sto,fig_hyd])

VBox(children=(HBox(children=(VBox(children=(FloatSlider(value=50.0, continuous_update=False, description='c_m…

### Step 3: Sample inputs space
The number of model runs (N) typically increases proportionally to the number of input factors (M) and will depend on the GSA methods chosen as well. 

As a rule of thumb, it may require more than 10 model runs per input factor (M) for the most frugal methods (e.g. Elementary Effect Test) and more than 1000 model runs per M for the more expensive methods (e.g. variance and density-based methods).

### Step 4: Run the model
For each sampled input factors combination, we run the model and save the associated model output.

### Step 5: Apply Sensitivity Analysis (RSA) method
RSA requires to sort the output samples and then to split them into a certain number of groups (defined by the user). Afterwards, RSA identifies the sub-samples in the inputs space that produced the outputs in each group and compute the cumulative distribution function (CDF) of each sub-sample. Finally, the sensitivity indices are defined as the (mean) maximum vertical distance between the CDFs of the various groups.

Here we divide the output samples into 10 groups, where each group contains the same number of samples.

### Step 6: Check model behaviour by visualising input/output samples
Scatterplots are plotted to visualise the behaviour of the output over each input factor in turn.

### Step 7: Plot sensitivity indices
The sensitivity indices for the RSA method are the maximum vertical distances over each pair of CDF.

### Step 8: Assess robustness by bootstrapping
In order to assess the robustness of the estimated sensitivity indices, bootstrapping is performed (here we resample 100 times). The 95% confidence intervals of the sensitivity indices are plotted below.

### Step 9: Visualise input factors interactions
In order to investigate the interactions between input factors we plot one input against the other, coloured by the value taken by the output.

### <a id="References"></a>References


RSA is based on the function created as part of the SAFE Toolbox by F. Pianosi, F. Sarrazin and T. Wagener at Bristol University (2015). Please refer to the `Licence` file in the SAFE toolbox. 

1) [SAFE Toolbox Website](https://www.safetoolbox.info/)	

2) [Introductory paper to SAFE - Pianosi et al. (2015)](https://www.sciencedirect.com/science/article/pii/S1364815215001188)

3) [A review of available methods and workflows for Sensitivity Analysis - Pianosi et al. (2016)](https://www.sciencedirect.com/science/article/pii/S1364815216300287)

4) [What has Global Sensitivity Analysis ever done for us? A systematic review to support scientific advancement and to inform policy-making in earth system modelling - Wagener and Pianosi (2019)](https://www.sciencedirect.com/science/article/pii/S0012825218300990)

5) [Practical guide through the critical choices needed for Global Sensitivity Analysis - Noacco et al. (2019)](https://www.sciencedirect.com/science/article/pii/S2215016119302572?via%3Dihub)