# Tutorial 3: PRISM class properties
In this tutorial, we use the knowledge gained in the previous two tutorials to explore how the instance/class properties of the `Pipeline`, `Emulator` and `ModelLink` classes can be used for various tasks.
This includes inspecting the state of the emulator; modifying the pipeline operations and more.
It is assumed here that the reader has successfully completed the first two tutorials ([Basic usage](1_basic_usage.ipynb) and [ModelLink subclasses](2_modellink_subclasses.ipynb)) and has a basic understanding of Python instance/class properties.

For this tutorial, we will use the `LineLink` class definition that was introduced in the previous tutorial.
Therefore, before we can begin, we have to define this class again, initialize the required classes and construct the first iteration:

In [1]:
# Imports
import numpy as np
from prism import Pipeline
from prism.emulator import Emulator
from prism.modellink import ModelLink

# LineLink class definition
class LineLink(ModelLink):
    # Define default model parameters (optional)
    def get_default_model_parameters(self):
        par_dict = {
            'A': [-10, 10, 3],  # Intercept in [-10, 10] with estimate of 3
            'B': [0, 5, 1.5]}   # Slope in [0, 5] with estimate of 1.5
        return(par_dict)

    # Override call_model abstract method
    def call_model(self, emul_i, par_set, data_idx):
        # Calculate the value on a straight line for requested data_idx
        vals = par_set['A']+np.array(data_idx)*par_set['B']
        return(vals)

    # Override get_md_var abstract method
    def get_md_var(self, emul_i, par_set, data_idx):
        # Calculate the model discrepancy variance
        # For a straight line, this value can be set to a constant
        return(1e-4*np.ones_like(data_idx))

# LineLink initialization
model_data = {1: [4.5, 0.1],    # f(1) = 4.5 +- 0.1
              2.5: [6.8, 0.1],  # f(2.5) = 6.8 +- 0.1
              -2: [0, 0.1]}     # f(-2) = 0 +- 0.1
modellink_obj = LineLink(model_data=model_data)

# Pipeline initialization
pipe = Pipeline(modellink_obj, working_dir='prism_line')

# Construction of first iteration
pipe.construct(1)


PIPELINE DETAILS

GENERAL
-------------------------------
Working directory              	'prism_line'
Emulator type                  	'default'
ModelLink subclass             	LineLink
Emulation method               	Regression + Gaussian
Mock data used?                	No

ITERATION
-------------------------------
Emulator iteration             	1
Construction completed?        	Yes
Plausible regions?             	Yes
Projections available?         	No
-------------------------------
# of model evaluation samples  	100 ([100])
# of plausible/analyzed samples	150/32000
% of parameter space remaining 	0.469%
# of active/total parameters   	2/2
# of emulated data points      	3
# of emulator systems          	3
-------------------------------

PARAMETER SPACE
-------------------------------
*A: [-10.0, 10.0] (3.00000)
*B: [  0.0,  5.0] (1.50000)
Emulator iteration 1 has already been fully constructed. Skipping construction process.

PIPELINE DETAILS

GENERAL
---------------------------

## Pipeline properties
The `Pipeline` class holds all information that is required to perform all operations in *PRISM* that do not directly modify the emulator (which is contained in the `Emulator` class).
This includes making projection figures; analyzing an emulator iteration (which does not modify the emulator itself); telling the `Emulator` how to construct an iteration; handling system paths; and many utility methods.
As such, we are able to change most of the underlying `Pipeline` properties at any given moment.
This allows us the modify the operations in the `Pipeline` without making direct changes to the `Emulator` (which could render specific results invalid).
Note that many parameters of the `Pipeline` can be set during initialization by using the *prism_par* argument (see [PRISM parameters](https://prism-tool.readthedocs.io/en/latest/user/descriptions.html#prism-parameters) for their descriptions).

When a new emulator is being constructed, the `Pipeline` object will generate a Latin-Hypercube design of `n_sam_init` samples, which by default is set to:

In [2]:
pipe.n_sam_init

500

These samples are then evaluated in the model and used to construct an emulator.
However, for a model like the one in our `LineLink`, this is probably a bit overkill.
So, we could set it to, let's say, $100$ and reconstruct the first iteration of the emulator:

In [3]:
pipe.n_sam_init = 100
pipe.construct(1, force=True)

Finished obtaining and distributing model realization data in 0.0528 seconds, averaging 0.000528 seconds per model evaluation.
Finished construction of emulator iteration in 0.75 seconds.
Finished analysis of emulator iteration in 0.72 seconds, averaging 2240.40 emulator evaluations per second.
There is 0.375% of parameter space remaining.

PIPELINE DETAILS

GENERAL
-------------------------------
Working directory              	'prism_line'
Emulator type                  	'default'
ModelLink subclass             	LineLink
Emulation method               	Regression + Gaussian
Mock data used?                	No

ITERATION
-------------------------------
Emulator iteration             	1
Construction completed?        	Yes
Plausible regions?             	Yes
Projections available?         	No
-------------------------------
# of model evaluation samples  	100 ([100])
# of plausible/analyzed samples	6/1600
% of parameter space remaining 	0.375%
# of active/total parameters   	2/2
# of emu

  self.analyze()


Here, we reconstructed the first iteration of the emulator by first setting `n_sam_init` to $100$ and then calling the `construct()`-method with *force=True* (which overrides the default behavior of skipping construction if already finished).
Although the part of parameter space that is still remaining is fairly similar as before, we can see that the evaluation rate of the emulator has significantly increased.
This could potentially be very beneficial to us if we were to evaluate the emulator many times in later iterations.

However, we can also see that *PRISM* is warning us that there are probably not enough plausible samples to construct a more accurate emulator iteration.
The reason for this is because there is such a small part of parameter space still remaining as plausible, that we have to evaluate much more samples to obtain a decent number of plausible samples.
The number of samples that are evaluated in the emulator during an analysis is influenced by the iteration number, the number of model parameters `n_par` and the base evaluation number `base_eval_sam`.
While we obviously cannot change the iteration number or the number of model parameters, we can change the base evaluation number.

At this moment, the base evaluation number is set to:

In [4]:
pipe.base_eval_sam

800

As we used $100$ samples for the first iteration and we want to have at least this many plausible samples for the next iteration, let's set `base_eval_sam` to $16000$ and reanalyze the iteration:

In [5]:
pipe.base_eval_sam = 16000
pipe.analyze()

Finished analysis of emulator iteration in 15.17 seconds, averaging 2110.81 emulator evaluations per second.
There is 0.516% of parameter space remaining.

PIPELINE DETAILS

GENERAL
-------------------------------
Working directory              	'prism_line'
Emulator type                  	'default'
ModelLink subclass             	LineLink
Emulation method               	Regression + Gaussian
Mock data used?                	No

ITERATION
-------------------------------
Emulator iteration             	1
Construction completed?        	Yes
Plausible regions?             	Yes
Projections available?         	No
-------------------------------
# of model evaluation samples  	100 ([100])
# of plausible/analyzed samples	165/32000
% of parameter space remaining 	0.516%
# of active/total parameters   	2/2
# of emulated data points      	3
# of emulator systems          	3
-------------------------------

PARAMETER SPACE
-------------------------------
*A: [-10.0, 10.0] (3.00000)
*B: [  0.0,  5.

And now we have enough plausible samples for the next iteration.

If we wanted to, we could check how many samples have been evaluated, how many are plausible and what they are:

In [6]:
print("Number of evaluated samples in iteration 1: %i" % (pipe.n_eval_sam[1]))
print("Number of plausible samples in iteration 1: %i" % (pipe.n_impl_sam[1]))
print("Plausible samples: %s" % (pipe.impl_sam))

Number of evaluated samples in iteration 1: 32000
Number of plausible samples in iteration 1: 165
Plausible samples: [[2.8521875  1.27085938]
 [3.0446875  1.70039063]
 [2.8284375  1.29882812]
 [2.7453125  1.64257812]
 [3.1709375  1.31914062]
 [2.8771875  1.68789063]
 [2.7721875  1.38960938]
 [3.2334375  1.58335938]
 [3.1521875  1.38867188]
 [2.5040625  1.62226563]
 [2.6771875  1.76601563]
 [2.9871875  1.36023438]
 [3.8153125  1.08226563]
 [2.3684375  1.76570312]
 [2.9078125  1.27304688]
 [3.2353125  1.48742188]
 [3.1209375  1.71757812]
 [3.3265625  1.26351563]
 [3.1371875  1.55554688]
 [3.2109375  1.65117187]
 [2.2403125  1.88601562]
 [3.2115625  1.38460937]
 [3.3046875  1.33789062]
 [2.6965625  1.75148437]
 [3.1703125  1.62054688]
 [2.6490625  1.57445312]
 [2.9409375  1.57054688]
 [3.0528125  1.34742187]
 [2.8746875  1.35882813]
 [2.1271875  2.01429688]
 [2.4340625  1.84554687]
 [3.1753125  1.69742188]
 [3.2378125  1.46710937]
 [2.9946875  1.31414062]
 [3.0015625  1.67492187]
 [2.4378

The two numbers in the output show us the same thing as the `details()` overview, but we can also see that all the plausible samples are quite clustered.
This would mean that the next emulator iteration is going to be much more accurate than the current one.
Note that another way to increase the number of plausible samples in an iteration, is by changing the implausibility parameters, as shown in the first tutorial.

A few other useful properties are:

In [7]:
print("Path to emulator HDF5-file: %r" % (pipe.hdf5_file))
print("Is file-logging enabled? %s" % (pipe.do_logging))
print("Are parameters split into active and passive parameters? %s" % (pipe.do_active_anal))
print("Bound ModelLink object: %s" % (pipe.modellink))

Path to emulator HDF5-file: '/home/evandervelden/stack/PhD/PRISM/tutorials/prism_line/prism.hdf5'
Is file-logging enabled? True
Are parameters split into active and passive parameters? True
Bound ModelLink object: LineLink(model_parameters={'A': [-10.0, 10.0, 3.0], 'B': [0.0, 5.0, 1.5]}, model_data={1: [4.5, 0.1], 2.5: [6.8, 0.1], -2: [0.0, 0.1]})


## Emulator properties
Whereas it can be useful to look at the `Pipeline` properties, it does not tell us anything about the state of the emulator itself.
Although they cannot be modified, looking at the `Emulator` properties can give us a lot of information.
As the underlying algorithms of *PRISM* involve a lot of math, they can however be a bit harder to understand.
These properties can be accessed with `pipe.emulator.xxx`.

Probably the most interesting and useful property to look at, is the `poly_terms` property:

In [8]:
pipe.emulator.poly_terms[1]

{1: {'A': 1.0, 'B': 0.9999999999999998},
 2.5: {'A': 1.0000000000000002, 'B': 2.500000000000001},
 -2: {'A': 1.0000000000000002, 'B': -2.0}}

This dictionary shows us what the polynomial terms are for calculating every data point in the first emulator iteration.
So, if we remember that our model is given by the function $f(x) = A+Bx$, then we can immediately see that the polynomial terms are very close to what the real terms should look like, where $A$ is multiplied by unity and $B$ is multiplied by whatever the value of $x$ is for that data point.

The performance of the emulator iteration can also be checked by looking at the amount of variance in the model outputs that it could not explain with the polynomial functions.
This variance is called the *residual variance*, and a general rule is that the lower it is, the more accurate the emulator is:

In [9]:
pipe.emulator.rsdl_var[1]

[3.7581826562794764e-31, 5.624578254225814e-30, 3.400976577634087e-30]

These values tell us basically the same thing as the `poly_terms` property, just in a different way: the emulator is capturing basically all model behavior.
The difference between these two properties however, is that we usually do not know what the polynomial terms should look like, while we do know that a low residual variance is a good thing.
Therefore, it is usually best to view both at the same time, where the `rsdl_var` tells us how much we should believe that the `poly_terms` are accurate.