The purpose of this notebook is to guide the user through use of the src package. The package is intended to be run in the command line, without modification of intermediate output. Some of the script-generated objects (such as the trained emulators and MCMC chains) can be accessed in other Python scripts or Jupyter notebooks - the details are below. Often, the most user input will come in specifying objects in the \__init\__ file.

The remainder of the document will provide the bash commands to execute the scripts of src. Note that for bash commands to be run in Jupyter Notebook, they must be proceeded by "!" - when running the commands in Terminal, leave out the exclamation point.

## Setup - \__init\__ file

The majority of the work for the user comes in setting up the \__init\__.py file. This script is called in all other scripts, and this is where the user inputs his or her own data. 

This script need not be called by the user, but it must be edited by the user. Listed below are the objects which must be changed along with a short description, though the user is encouraged to read the documentation for specifics.

* _systems_ - List of strings containing the collision systems involved. See the documentation for specifics
* _keys_ - List of strings containing the input parameters
* _labels_ - List of strings containing the LaTeX labels for the input parameters
* _ranges_ - List of tuples containing the minimum and maximum for each input parameter
* *design_array* - numpy array containing the design. If default of None remains unchanged, the code will generate a design from a Latin Hypercube
* *data_list* - Dictionary of computer model output. Should be of the form \[collision_system\]\[observable\]\[subobservable\]\['Y':, 'x':,\]. This __must__ be changed from None.
  * 'Y' is the 2D numpy array of output with rows corresponding to the rows of design_array
  * 'x' is the 1D numpy array of indexing values of the columns of 'Y'
* *exp_data_list* - Dictionary of experimental data. Should be of the form \[collision_system\]\[observable\]\[subobservable\]\[{'y':,'x:',yerr: {'stat':,'sys':}}\].  Thus __must__ be changed from None.
  * 'y' is the 1D numpy array of experimental output 
  * 'x' is the 1D numpy array of indexing values of the columns of 'y'
  * 'yerr' is a dictionary with keys 'stat' and 'sys'
  * 'stat' is a 1D numpy array of statistical errors of experimental data
  * 'sys' is a 1D numpy array of systematic errors of experimental data.
  * This must be changed from None.
* *exp_cov* - 2D numpy array of experimental covariance matrix. Recommended specified by user.
  * If left unspecified (default), it will be calculated in _mcmc.py_ as follows:
    * Each block between observables will be independent (0 matrix)
    * Within an observable, blocks of subobservables will be indepdendent unless specified in _mcmc.py_
    * Within a subobservable, the diagonal will be the sum of statistical and systematic error, and the $ij$th off-diagonal will be covariance calculated from distance between $x_i$ and $x_j$ using a squared-exponential covariance function. 
* *observables* - List of 2-tuples containing observable/subobservable pairs

## Building the Emulators - the emulator module

With design and data specified in the \__init\__ file, training the emulators is very simple. Simply run the following command:

In [3]:
! python3 -m src.emulator

[INFO][emulator] training emulator for system PbPb5020 (10 PC, 0 restarts)
[0.29 0.3 ]
Emulator design:
[[0.01       0.05      ]
 [0.16058499 0.08485314]
 [0.30073974 0.13216592]
 [0.04547517 0.29132764]
 [0.15654149 0.19207646]
 [0.06263697 0.3387965 ]
 [0.1226533  0.14429201]
 [0.19871829 0.29998946]
 [0.07360878 0.10498264]
 [0.119071   0.27228267]
 [0.02518607 0.21059398]
 [0.24523197 0.24578168]
 [0.05462338 0.18687813]
 [0.01564432 0.25668644]
 [0.09639561 0.35834248]
 [0.01073974 0.15849388]
 [0.07697239 0.23242124]
 [0.04220857 0.11574906]
 [0.09144587 0.05834248]
 [0.03163994 0.07155832]
 [0.02060277 0.31787042]
 [0.3        0.35      ]
 [0.01       0.35      ]
 [0.3        0.05      ]]
[INFO][emulator] writing cache file cache/emulator/PbPb5020.pkl
PbPb5020
10 PCs explain 0.99998 of variance
GP 0: 0.99206 of variance, LML = 4.7553, kernel: 6.07**2 * RBF(length_scale=[0.264, 0.646]) + WhiteKernel(noise_level=0.00345)
GP 1: 0.00645 of variance, LML = -22.392, kernel: 2.25**2 * 

To change the number of principal components from the default of 10, add the --npc flag. For example, to train the emulators on 3 components, run the following command:

In [5]:
! python3 -m src.emulator --npc 3

PbPb5020
10 PCs explain 0.99998 of variance
GP 0: 0.99206 of variance, LML = 4.7553, kernel: 6.07**2 * RBF(length_scale=[0.264, 0.646]) + WhiteKernel(noise_level=0.00345)
GP 1: 0.00645 of variance, LML = -22.392, kernel: 2.25**2 * RBF(length_scale=[0.0987, 0.22]) + WhiteKernel(noise_level=0.0583)
GP 2: 0.00048 of variance, LML = -33.544, kernel: 0.00316**2 * RBF(length_scale=[2.88, 3]) + WhiteKernel(noise_level=0.958)
GP 3: 0.00034 of variance, LML = -33.544, kernel: 0.00316**2 * RBF(length_scale=[1.63, 2.15]) + WhiteKernel(noise_level=0.958)
GP 4: 0.00021 of variance, LML = -33.544, kernel: 0.00316**2 * RBF(length_scale=[2.37, 3]) + WhiteKernel(noise_level=0.958)
GP 5: 0.00020 of variance, LML = -33.544, kernel: 0.00316**2 * RBF(length_scale=[1.55, 3]) + WhiteKernel(noise_level=0.958)
GP 6: 0.00012 of variance, LML = -33.544, kernel: 0.00316**2 * RBF(length_scale=[0.161, 0.03]) + WhiteKernel(noise_level=0.958)
GP 7: 0.00006 of variance, LML = -33.544, kernel: 0.00316**2 * RBF

The user can also control number of restarts in the optimizer that estimates the GP hyperpameters. This is done with the --nrestarts flag.

__Important__: Once the emulators have been trained, they will be cached. After being cached, a call from the above lines will only print summaries to the console, and will not retrain the emulators. To retrain the emulators, either deleted the cached emulators or use the --retrain flag:

In [6]:
! python3 -m src.emulator --retrain

[INFO][emulator] training emulator for system PbPb5020 (10 PC, 0 restarts)
[0.29 0.3 ]
Emulator design:
[[0.01       0.05      ]
 [0.16058499 0.08485314]
 [0.30073974 0.13216592]
 [0.04547517 0.29132764]
 [0.15654149 0.19207646]
 [0.06263697 0.3387965 ]
 [0.1226533  0.14429201]
 [0.19871829 0.29998946]
 [0.07360878 0.10498264]
 [0.119071   0.27228267]
 [0.02518607 0.21059398]
 [0.24523197 0.24578168]
 [0.05462338 0.18687813]
 [0.01564432 0.25668644]
 [0.09639561 0.35834248]
 [0.01073974 0.15849388]
 [0.07697239 0.23242124]
 [0.04220857 0.11574906]
 [0.09144587 0.05834248]
 [0.03163994 0.07155832]
 [0.02060277 0.31787042]
 [0.3        0.35      ]
 [0.01       0.35      ]
 [0.3        0.05      ]]
[INFO][emulator] writing cache file cache/emulator/PbPb5020.pkl
PbPb5020
10 PCs explain 0.99998 of variance
GP 0: 0.99206 of variance, LML = 4.7553, kernel: 6.07**2 * RBF(length_scale=[0.264, 0.646]) + WhiteKernel(noise_level=0.00345)
GP 1: 0.00645 of variance, LML = -22.392, kernel: 2.25**2 * 

### Accessing trained emulators in Jupyter

The user may wish to access the emulator outside of the scripts in this tutorial. Practical reasons include prediction or sampling for validation methods, or use in a more specialized analysis. To access the cached emulator with system 'PbPb5020' (for example), run the following lines in a python environment:

In [7]:
from src import emulator
em = emulator.Emulator('PbPb5020')

[INFO][emulator] training emulator for system PbPb5020 (10 PC, 0 restarts)
[0.29 0.3 ]
Emulator design:
[[0.01       0.05      ]
 [0.16058499 0.08485314]
 [0.30073974 0.13216592]
 [0.04547517 0.29132764]
 [0.15654149 0.19207646]
 [0.06263697 0.3387965 ]
 [0.1226533  0.14429201]
 [0.19871829 0.29998946]
 [0.07360878 0.10498264]
 [0.119071   0.27228267]
 [0.02518607 0.21059398]
 [0.24523197 0.24578168]
 [0.05462338 0.18687813]
 [0.01564432 0.25668644]
 [0.09639561 0.35834248]
 [0.01073974 0.15849388]
 [0.07697239 0.23242124]
 [0.04220857 0.11574906]
 [0.09144587 0.05834248]
 [0.03163994 0.07155832]
 [0.02060277 0.31787042]
 [0.3        0.35      ]
 [0.01       0.35      ]
 [0.3        0.05      ]]


The object _em_ will now have all functionality of the Emulator class, trained on the data specified in the \__init\__ file

## Performing Calibration - the mcmc module

This modules runs the MCMC scheme to calibrate the inputs to the experimental data, using the trained emulators as a statistical surrogate for the expensive computer model. The script calls the python distribution emcee, which runs an affine-invariant sampler. The sampler requires a number of "walkers" which each run a chain in parallel, as well as a number of iterations to run each walker. A number of burn-in steps to discarded must also be specified (these allow the samplers to "warm up"). To run this module with 500 walkers, 200 burn-in steps, and 300 post-burn-in steps, run the following command:

In [8]:
! python3 -m src.mcmc --nwalkers 500 --nburnsteps 200 300

  from ._conv import register_converters as _register_converters
[INFO][mcmc] no existing chain found, starting initial burn-in
[INFO][mcmc] running 500 walkers for 100 steps
[INFO][mcmc] step 10: acceptance fraction: mean 0.4218, std 0.1807, min 0.0000, max 0.9000
[INFO][mcmc] step 20: acceptance fraction: mean 0.4330, std 0.1345, min 0.0500, max 0.8000
[INFO][mcmc] step 30: acceptance fraction: mean 0.4487, std 0.1102, min 0.1000, max 0.7667
[INFO][mcmc] step 40: acceptance fraction: mean 0.4632, std 0.1003, min 0.1750, max 0.7500
[INFO][mcmc] step 50: acceptance fraction: mean 0.4759, std 0.0956, min 0.1800, max 0.7200
[INFO][mcmc] step 60: acceptance fraction: mean 0.4889, std 0.0913, min 0.2000, max 0.7500
[INFO][mcmc] step 70: acceptance fraction: mean 0.5011, std 0.0889, min 0.2143, max 0.7571
[INFO][mcmc] step 80: acceptance fraction: mean 0.5119, std 0.0863, min 0.2500, max 0.7625
[INFO][mcmc] step 90: acceptance fraction: mean 0.5221, std 0.0821, min 0.2444, max 0.7556
[INFO]

To run an additional (for example) 100 steps with the same walkers, simply remove the --nwalkers and --nburnsteps flags:

In [9]:
! python3 -m src.mcmc 100

  from ._conv import register_converters as _register_converters
[INFO][mcmc] restarting from last point of existing chain
[INFO][mcmc] running 500 walkers for 100 steps
[INFO][mcmc] step 10: acceptance fraction: mean 0.6476, std 0.1863, min 0.1000, max 1.0000
[INFO][mcmc] step 20: acceptance fraction: mean 0.6499, std 0.1415, min 0.1500, max 1.0000
[INFO][mcmc] step 30: acceptance fraction: mean 0.6524, std 0.1150, min 0.3000, max 0.9000
[INFO][mcmc] step 40: acceptance fraction: mean 0.6542, std 0.1016, min 0.3250, max 0.8750
[INFO][mcmc] step 50: acceptance fraction: mean 0.6564, std 0.0935, min 0.3800, max 0.8600
[INFO][mcmc] step 60: acceptance fraction: mean 0.6554, std 0.0862, min 0.3667, max 0.8667
[INFO][mcmc] step 70: acceptance fraction: mean 0.6531, std 0.0798, min 0.3571, max 0.8429
[INFO][mcmc] step 80: acceptance fraction: mean 0.6523, std 0.0756, min 0.3500, max 0.8375
[INFO][mcmc] step 90: acceptance fraction: mean 0.6542, std 0.0713, min 0.3333, max 0.8333
[INFO][mcmc

__Note__: To restart the chain, you must delete the chain.hdf file in the _mcmc/_ directory.

### Accessing posterior samples in Jupyter

The user may wish to access the posterior samples outside of the scripts in this tutorial. Practical reasons include making specific plots, or for getting posterior estimates of functions of the parameters. To access a saved chain, run the following commands in a python environment:

In [10]:
from src import mcmc
chain = mcmc.Chain()
posterior_samples = chain.load()

The object _chain_ is an instance of the Chain class, while *posterior_samples* is a 2D numpy array where each row is a draw from the joint posterior distribution. For marginal posteriors, simply use single columns of *posterior_samples*.

## Plotting Results - the plots module

This package contains some basic plotting tools to visualize different aspects of the analysis. Plots are saved in the _plot_ directory. To create a plot, simply the add it as a positional argument. For example, to make the "posterior" plot, run the following command:

In [11]:
! python3 -m src.plots posterior

  from ._conv import register_converters as _register_converters
[INFO][plots] generating plot: posterior
  (prop.get_family(), self.defaultFamily[fontext]))
[INFO][plots] wrote plots/posterior.pdf


To see the names of plots available for plotting, run

In [12]:
! python3 -m src.plots --help

  from ._conv import register_converters as _register_converters
usage: plots.py [-h] [PLOT [PLOT ...]]

generate plots

positional arguments:
  PLOT        {observables_design, observables_posterior, posterior, design,
              gp, diag_emu} (default: all)

optional arguments:
  -h, --help  show this help message and exit


The available plots are described below:

* __observable_design__
    * Model observables at design points, with experimental data plotted as reference.
    * __IMPORTANT__: For different observables than the example, change the dictionary in \_observables_plot()
* __observable_posterior__
    * Model observables at 100 draws from the posterior, with experimental data plotted as reference.
    * __IMPORTANT__: For different observables than the example, change the dictionary in \_observables_plot()
* __posterior__
    * Pairplot of posteriors for all calibration inputs. Diagonal displays marginal density.Lower off-diagonal displays pairwise scatter plot.
* __design__
    * Projection of a LH design into two dimensions. Change keys within the function to the two inputs you want to protect into.
* __gp__
    * Conditioning a Gaussian process. Simple example plots with dummy data.
* __diag_emu__
    * Diagnostic: plots of each principal component vs each input parameter, overlaid by emulator predictions at several points in design space.
    * See how well the emulators track the design points, if uncertainty and shape of predictions are reasonable.

__Note__: The user may observe that _plots.py_ contains additional plotting functions. Some of these are helper functions, but many are copied over from the original distribution from which this distribution is forked. Most of the unreported functions are hard-coded for that project's observables, so support for those functions were excluded in this package.