# Interactive Jupyter Notebooks for the visual analysis of critical choices in Global Sensitivity Analysis

### Valentina Noacco, Andres Peñuela-Fernandez, Francesca Pianosi, Thorsten Wagener 
### (University of Bristol)


[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

This document provides:
* a brief introduction to Global Sensitivity Analysis (GSA);
* a workflow to asssess one of the critical choices needed to set up a GSA application. Here we use the SAFE (Sensitivity Analysis For Everybody) toolbox ([References](#References) 1-2).

In this example we apply the Regional Sensitivity Analysis GSA method to the rainfall-runoff Hymod model.


# PART I: Introduction

## What is Global Sensitivity Analysis and why shall we use it?

**Global Sensitivity Analysis** is a set of mathematical techniques which investigate how the uncertainty in the model inputs influences the variability of the model outputs ([References](#References) 3-4).

The benefits of applying GSA are:

* **Better understanding of the model**: Evaluate the model behaviour beyond default set-up
    
* **“Sanity check” of the model**: Test whether the model "behaves well" (model validation)
    
* **Priorities for uncertainty reduction**: Identify the important inputs on which to focus computationally-intensive calibration, acquisition of new data, etc. 
    
* **More transparent and robust decisions**: Understand how assumptions about uncertain inputs reflect on the modelling outcome and thus on model-informed decisions


## How Global Sensitivity Analysis works

GSA investigates how the uncertainty in the model input factors influences the variability of the model output.

An '**input factor**' is any element that can be changed before running the model. In general, input factors could be the model's parameters, forcing input data, but also the very equations implemented in the model or other set-up choices (for instance, the spatial resolution) needed for the model execution on a computer.

An '**output**' is any variable that is obtained after the model execution.

The main steps to perform GSA are summarised in the figure below.
<img src="how_GSA_works.png" width="800px">
a. The input factors are sampled from their ranges of variability.

b. The model is repeatedly run against each of the input sampled combinations.

c. The output samples so obtained can be used to characterise the output uncertainty, for instance through a probability distribution or scatter plots.

d. GSA is applied to the input and output samples in order to obtain a set of sensitivity indices. The sensitivity indices measure the relative influence of each input factor on the uncertainty of the output ([References](#References) 4,5).

# PART II: Workflow application
### Step 1: Import python modules

In [1]:
from __future__ import division, absolute_import, print_function

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import scipy.stats as st
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from ipywidgets import widgets

# Import SAFE modules:
#import os
#mydir = "C:/Users/valen/OneDrive - University of Bristol/proj_SAFEVAL/SAFEPy/SAFEpython_v0.0.0"
#os.chdir(mydir + "/SAFEpython")

import SAFEpython.PAWN as PAWN # module to perform PAWN
import SAFEpython.plot_functions as pf # module to visualize the results
from SAFEpython.model_execution import model_execution # module to execute the model
from SAFEpython.sampling import AAT_sampling, AAT_sampling_extend # module to perform the input sampling
from SAFEpython.util import aggregate_boot  # function to aggregate the bootstrap results
from SAFEpython import HyMod


### Test model: Rainfall-runoff Hymod model

Before applying GSA, let's have a brief overview of the HyMod model, which it is a parsimonious rainfall-runoff model based on the theory of runoff yeild under infiltration excess. 

Hymod (Boyle 2001; Wagener et al. 2001) is composed of a soil moisture accounting routine, and a flow routing routine, which in its turn is composed of a fast and a slow routing pathway.

The model structure is illustrated in the figure below.

<left><img src="Hymod_scheme.png" width="700px">


### The following steps will be performed below:

### Step 2: Setup the model

Define: 
- the input factors whose influence is to be analysed with GSA, 
- their range of variability (choice made by expert judgement, available data or previous studies),
- choice of their distributions.

In [2]:
M = 5 # number of input factors
N = 2000 # number of samples
distr_par  = [np.nan] * M
col_names  = ["$S_M$" , "$beta$" , "$alpha$", "$R_s$", "$R_F$"] # input factors of interest
distr_fun  = st.uniform # uniform distribution
samp_strat = 'lhs' # Latin Hypercube
fun_test   = HyMod.hymod_nse

In [3]:
# range of variability
data = [["mm", 1,   400, "Maximum storage capacity"],
        ["-",  0,   2,   "Degree of spatial variability of the $S_M$"],
        ["-",  0,   1,   "Factor distributing slow and quick flows"],
        ["-",  0,   0.1, "Fractional discharge of the slow release reservoir"],
        ["-",  0.1, 1,   "Fractional discharge of the quick release reservoirs"]]
model_inputs = pd.DataFrame(data, 
                           columns=["Unit", "Min value", "Max value", "Description"],
                           index = col_names)
model_inputs

Unnamed: 0,Unit,Min value,Max value,Description
$S_M$,mm,1.0,400.0,Maximum storage capacity
$beta$,-,0.0,2.0,Degree of spatial variability of the $S_M$
$alpha$,-,0.0,1.0,Factor distributing slow and quick flows
$R_s$,-,0.0,0.1,Fractional discharge of the slow release reser...
$R_F$,-,0.1,1.0,Fractional discharge of the quick release rese...


In [4]:
# Load data:
data = pd.read_csv('hist_clim_data.csv')
rain = np.array(data['Rain'])[0:365] # 1-year simulation
evap = np.array(data['PET'])[0:365]
flow = np.array(data['Rain'])[0:365]*10
warmup = 30 # Model warmup period (days)

In [5]:
class setup_model:
    def __init__(self, x1, x2, x3, x4, x5):
        # The shape parameters of the uniform distribution are the lower limit and 
        # the difference between lower and upper limits:
        self.xmin = [x1.value[0], x2.value[0], x3.value[0], x4.value[0], x5.value[0]]
        self.xmax = [x1.value[1], x2.value[1], x3.value[1], x4.value[1], x5.value[1]]
        for i in range(M):
            distr_par[i] = [self.xmin[i], self.xmax[i] - self.xmin[i]]
        self.distr_par = distr_par

### Step 3: Sample inputs space

The number of model runs (N) typically increases proportionally to the number of input factors ($M$) and will depend on the GSA methods chosen as well. 

In [6]:
class sample_input:
    def __init__(self,distr_par):
        self.X = AAT_sampling(samp_strat, M, distr_fun, distr_par, N)

### Step 4: Run the model

For each sampled input factors combination, we run the model and save the associated model output.

In [7]:
class run_model:
    def __init__(self,X):
        self.Y = model_execution(fun_test, X, rain, evap, flow, warmup)

### Step 6: Check model behaviour by visualising input/output samples

Scatterplots are plotted to visualise the behaviour of the output over each input factor in turn.

Definition of interactivity

In [8]:
def update_figures(change):
    with fig1.batch_update():
        distr_par = setup_model(x1, x2, x3, x4, x5).distr_par
        X = sample_input(distr_par).X
        Y = run_model(X).Y
        KS_median, _, _, KS_dummy = PAWN.pawn_indices(X, Y, n, dummy = True)

        for i in range(M):
            fig1.data[i].y = [KS_median[i]]
        fig1.data[i+1].y = KS_dummy

Definition of the sliders

In [16]:
x1 = widgets.FloatRangeSlider(value = [50, 150], min = 0, max = 200, step = 1, description = model_inputs.index[0], 
                              readout_format = '.1f', continuous_update=False)
x1.observe(update_figures,names = 'value')

x2 = widgets.FloatRangeSlider(value = [0.5, 1.5], min = 0, max = 2, step = 0.1, description = model_inputs.index[1], 
                              readout_format = '.1f', continuous_update=False)
x2.observe(update_figures,names = 'value')

x3 = widgets.FloatRangeSlider(value = [0.2, 0.8], min = 0, max = 1, step = 0.1, description = model_inputs.index[2], 
                              readout_format = '.1f', continuous_update=False)
x3.observe(update_figures,names = 'value')

x4 = widgets.FloatRangeSlider(value = [0.02, 0.08], min = 0, max = 0.1, step = 0.01, description = model_inputs.index[3], 
                              readout_format = '.1f', continuous_update=False)
x4.observe(update_figures,names = 'value')

x5 = widgets.FloatRangeSlider(value = [0.2, 0.8], min = 0, max = 1, step = 0.1, description = model_inputs.index[4], 
                              readout_format = '.1f', continuous_update=False)
x5.observe(update_figures,names = 'value')

Nboot = widgets.IntSlider(value = 2, min = 2, max = 200, step = 1, description = "Nboot", 
                              continuous_update=False)
Nboot.observe(update_figures,names = 'value')

In [17]:
n=10
distr_par = setup_model(x1, x2, x3, x4, x5).distr_par
X = sample_input(distr_par).X
Y = run_model(X).Y
KS_median, _, _, KS_dummy = PAWN.pawn_indices(X, Y, n, dummy = True)

### Step 7: Plot sensitivity indices

The sensitivity indices for the RSA method are the maximum vertical distances over each pair of CDF.

Definition of the figure

In [18]:
fig1 = go.FigureWidget(layout = dict(width=700, height=500,showlegend = False))
for i in range(M):
    fig1.add_trace(go.Box(y=[KS_median[i]], name = model_inputs.index[i]))

fig1.add_trace(go.Box(y=KS_dummy, name = 'dummy'))
fig1.layout.yaxis.range=[0,1]

#### Plot the interactive figures + sliders

In [19]:
widgets.VBox([widgets.VBox([widgets.VBox([x1,x2,x3,x4,x5]),fig1])])

VBox(children=(VBox(children=(VBox(children=(FloatRangeSlider(value=(50.0, 150.0), continuous_update=False, de…

### <a id="References"></a>References


RSA is based on the function created as part of the SAFE Toolbox by F. Pianosi, F. Sarrazin and T. Wagener at Bristol University (2015). Please refer to the `Licence` file in the SAFE toolbox. 

1) [SAFE Toolbox Website](https://www.safetoolbox.info/)	

2) [Introductory paper to SAFE - Pianosi et al. (2015)](https://www.sciencedirect.com/science/article/pii/S1364815215001188)

3) [A review of available methods and workflows for Sensitivity Analysis - Pianosi et al. (2016)](https://www.sciencedirect.com/science/article/pii/S1364815216300287)

4) [What has Global Sensitivity Analysis ever done for us? A systematic review to support scientific advancement and to inform policy-making in earth system modelling - Wagener and Pianosi (2019)](https://www.sciencedirect.com/science/article/pii/S0012825218300990)

5) [Practical guide through the critical choices needed for Global Sensitivity Analysis - Noacco et al. (2019)](https://www.sciencedirect.com/science/article/pii/S2215016119302572?via%3Dihub)