# Setting up a basic EasyVVUQ campaign

If this is your first Jupyter Notebook - you can execute code cells by selecting them and pressing ```Shift+Enter```. Just have in mind that the order of execution might matter (if later cells depend on things done in earlier ones).

**Note**: if you installed EasyVVUQ using a virtual environment, make sure that:

1) you activated the virtual environment, and
2) you started this notebook with `myenv/bin/jupyterlab`.

EasyVVUQ is a Python library designed to facilitate verification, validation and uncertainty quantification (VVUQ) for a wide variety of simulations. It was first conceived and developed within the EU funded [VECMA](https://www.vecma.eu/) (Verified Exascale Computing for Multiscale Applications) project, and further developed in the UK-funded [SEAVEA project](https://www.seavea-project.org/).

The aim of EasyVVUQ is to facilitate verification, validation and uncertainty quantification (VVUQ) for a wide variety of simulations. While very convenient for simple cases, EasyVVUQ is particularly well suited in situations where the simulations are computationally expensive, heterogeneous computing resources are necessary, the sampling space is very large or book-keeping is prohibitively complex. It coordinates execution using an efficient database, it is fault tolerant and all progress can be saved.

Here are some examples of questions EasyVVUQ can answer about your code:

* Given the uncertainties in input parameters, what is the distribution of the output?
* What percentage of the output variance each input parameter contributes?
* It also lets you construct surrogate models that are cheaper to evaluate than the complete simulation.

The functionality we will be focusing on in this tutorial are the Polynomials Chaos and Stochastic collocation samplers. We will test it out on the following advection diffusion equation:

\begin{align*}
\boxed{
\frac{du}{dx} - \frac{1}{Pe}\frac{d^2u}{dx^2} = f}
\end{align*}

This ODE solves for the velocity $u(x)$, where the spatial domain runs from $x\in[0,1]$. Homogeneous boundary conditions are applied: $u(0)=u(1)=0$. The solution $u$ depends upon two parameters:

* $Pe$: the so-called **Peclet number**, defined as the ratio between the rate of advection and the rate of diffusion,
* $f$: the constant forcing term.

A numerical solver (finite elements) of this equation is found in `advection_diffusion_model/advection_diffusion.py`.

**Note**: While we are using a Python UQ library on a Python ODE model, this does not need to be the case. Models written in other programming languages are supported as well. The only requirements of EasyVVUQ are:

* The model can be executed from the command line.
* The model reads its input values from a file.
* The model stores its output values to a (CSV / HDF5 / JSON) file.

The input file of our advection-diffusion model is a simple JSON file, but the particular format is not important. It is located in `advection_diffusion_model/input.json`. The input file looks like this:

In [None]:
with open('./advection_diffusion_model/input.json', 'r') as f:
    print(f.read())

We can see that the default values of $Pe$ and $f$ are 100 and 1 respectively. Our goal now is:

**Study the effect of uncertainties in $Pe$ and $f$ on the velocity profile $u(x)$**

We require the following imports to do so:

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt

import chaospy as cp
import easyvvuq as uq
from easyvvuq.actions import CreateRunDirectory, Encode, Decode, ExecuteLocal, Actions

### Flags

Here are some flags that we'll use

* `HOME`: simply the current directory,
* `WORK_DIR`: this is where all the EasyVVUQ ensembles runs will be stored,
* `CAMPAIGN_NAME`: this is the name of the EasyVVUQ campaign, explained later.

In [None]:
# home directory
HOME = os.getcwd()
# Work directory, where the easyVVUQ directory will be placed
WORK_DIR = '/tmp'
# EasyVVUQ Campaign name
CAMPAIGN_NAME = 'adv_diff'

### Define parameter space

We will have to define a dictionary that minimally describes the type (typically `float` or maybe `integer`) of each input, as well as its default value. Below we create the `params` dict for the Peclet number $Pe$ and forcing term $f$.

In [None]:
# Define parameter space
params = {
    "Pe": {
        "type": "float",
        "default": 100.0},
    "f": {
        "type": "float",
        "default": 1.0}}

### Choose input distributions

Here we assign a probability density function to inputs from the `params` dict.  EasyVVUQ uses the [Chaospy](https://chaospy.readthedocs.io/en/master/) library for this purpose. A list of available distributions can be found [here](https://chaospy.readthedocs.io/en/master/reference/distribution/index.html).

All parameters that we actually want to vary are stored in the `vary`. These must also occur in the `params` dict. The converse is not true. If we only wish to vary a subset of the inputs in the `params` dict, we simply leave out the inputs in the `vary` dict that we do not want to change. These excluded inputs will automatically be assigned their default value in all ensemble runs.

**Assignment**: Assign uniform distribution to both inputs, using the `cp.Uniform` distribution, bounded to $\pm10\%$ of the defaults values:

In [None]:
vary = 

### Input file encoding

We require some way of feeding new parameter values to whatever simulation code we are examining. This is done by creating an **input template**. The `GenericEncoder` class can read this template and fill it with input samples drawn from the distributions of the `vary` dict. It takes 3 arguments:

* `template_fname` (string): the name of the input template
* `delimiter` (string): a flag sought out by the encoder
* `target_filename` (string): the name of the input file, in our case `input.json`

In the input template, every delimiter should be followed by a parameter name (the keys of `vary`), such that the encoder can replace each flag with a input value.

**Assignment**: 

* Set the delimiter to `"$"`
* Copy the input file `advection_diffusion_model/input.json` to `advection_diffusion_model/input.template`
* Change the default parameters in the template into flags that the encoder can read.
* Create an encoder object using the class below. Again, the arguments are written above, and you can also use shift+tab to view the docstring of the class.
  

In [None]:
encoder = uq.encoders.GenericEncoder(...)

### Local ensemble execution

Our advection-diffusion model is just a toy model that can be executed locally. For this purpose we can use the EasyVVUQ `ExecuteLocal` class. It simply takes as an argument the commandline execution as a string.

**Assigment**: write the proper commandline instruction, making sure you use an absolute path to the advection-diffusion model

In [None]:
cmd = 
execute = ExecuteLocal(cmd)

### Output file decoding

The model write the solution $u(x)$ to a CSV output file `output.csv`, with a single column with header `u`:

In [None]:
# print output file
with open('./advection_diffusion_model/output.csv', 'r') as f:
    print(f.read())

To read the output, we will use the `SimpleCSV` decoder. It takes 2 arguments:

* `target_filename` (string): the name of the output file
* `output_columns` (list of strings): the column names of the CSV file that we wish to load

**Assignment**: Fill out the decoder arguments:

In [None]:
decoder = uq.decoders.SimpleCSV(...)

**Note**: the encoder and decoder we have used will often suffice, yet are fairly basic. More elaborate encoders/decoders are available, see [here](https://github.com/UCL-CCS/EasyVVUQ/blob/dev/tutorials/encoder_decoder_tutorial.ipynb) for a tutorial. If you want to work with outputs of various sizes, you can use the `uq.decoder.HDF5` decoder, which takes the same arguments as the CSV decoder above.

### Actions: creating a sequence of steps

A typical forward UQ work flow consist of the following steps:

1) Create directories for the different runs in the ensemble
2) Encode the input files
3) Execute the ensemble
4) Decode the output
5) Perform postprocessing on the results

Steps 1-4 are strung together in an `Actions` objects as follows:

In [None]:
actions = Actions(CreateRunDirectory(WORK_DIR, flatten=True),
                  Encode(encoder), 
                  execute,
                  Decode(decoder))

These actions will be executed in a UQ Campaign

### Campaign: putting everything 

A campaign is the central EasyVVUQ object which combines all elements. It requires the working directory, a name, the `params` definition and the defined `actions`.

In [None]:
campaign = uq.Campaign(work_dir=WORK_DIR, name=CAMPAIGN_NAME,
                       params=params, actions=actions)

### Selecting a sampling method

The sampling method is added seperately to the campaign. It this case we use a Polynomial Chaos sampler.

**Question**: the polynomial order is set to 3. How many times will we sample the advection-diffusion model?

In [None]:
sampler = uq.sampling.PCESampler(vary=vary, polynomial_order=3)
campaign.set_sampler(sampler)

### Executing the actions

Finally, to executed everything that we defined in `actions`, run the following command:

In [None]:
campaign.execute().collate()

### Retrieving the raw results

The raw results are stored in a Pandas dataframe. We can retrieve this via:

In [None]:
df = campaign.get_collation_result()
df

### Analysis: postprocessing the results

The following command runs the prost-processing subroutines

In [None]:
results = campaign.analyse(qoi_cols=['u'])

Getting code samples:

In [None]:
# advection diffusion runs
code_samples = results.samples['u'].values
# spatial domain of 1 run
xx = np.linspace(0, 1, code_samples.shape[1])

### Plotting moments

The following command show a list of suppported statistics:

In [None]:
results.supported_stats()

**Assignment**: using `results.describe(qoi = <output name>, statistic = <stat name>)` extract the mean and standard deviation. Use this to make a plot of the uncertainty around the mean, as a function of the spatial domain `xx`.

**Question**: This is a common way of visualizing the uncertainty. Is it always a good idea?

In [None]:
mean = 

### Fast sampling using the surrogate

The command below extracts the (PCE) expansion, which can be used as a surrogate model for the real code.

**Assignment**: create a `random_inputs` dictionary `{'Pe': 1000 random Pe samples, 'f': 1000 random f samples}`. You can sample the inputs by `vary['Pe'].sample(1000)`.

Feed these random inputs to the surrogate and plot the surrogate samples alongside the real code samples, which were extracted above. Does the surrogate provide decent looking samples?

In [None]:
surrogate = results.surrogate()



### Plotting the Sobol indices

The command below will extract the first-order Sobol indices.

**Assigment**: Plot these in the spatial domain. What can you conclude about the importance of `Pe`?

In [None]:
sobols_first = results.sobols_first()


### Switching samplers

**Assignment**: Repeat the analysis, except this time with the Stochastic Collocation sampler. Hint: don't overcomplicate, you could complete this assignment in 30 seconds or less.