# Tutorial 2: ModelLink subclasses
In this tutorial, the `ModelLink` abstract base class is introduced and an example is given on how to wrap your model by writing a `ModelLink` subclass.
It is assumed here that the reader has successfully completed the previous tutorial ([Basic usage](1_basic_usage.ipynb)) and understands the basics of Python (sub)classes.
For a more detailed overview of the `ModelLink` abstract base class and its properties, see the [ModelLink crash course](https://prism-tool.readthedocs.io/en/latest/user/modellink_crash_course.html).

Before we get started, let's import all definitions that we are going to need in this tutorial:

In [None]:
import numpy as np
from prism import Pipeline
from prism.modellink import GaussianLink, ModelLink, test_subclass

## ModelLink abstract base class
To help *PRISM* users with wrapping their models and making them callable by the `Pipeline`, *PRISM* provides the `ModelLink` *abstract base class*.
In Python, an abstract base class is a special type of (base) class whose sole purpose is to be subclassed.
They cannot be initialized on their own (unlike normal classes or base classes, like the `Pipeline` class), but instead provide a basic "skeleton" of how the subclass should look like.
This usually includes many properties that are automatically set during initialization; helper functions that we (or internal operations) can use to write/use the subclass; and several *abstract methods*.
An abstract method is a method in an abstract base class that MUST be overridden by the subclass before it can be initialized (which is why an abstract base class cannot be initialized, as its abstract methods have not been overridden).
We can think of an abstract base class as a check-list of all items and properties that its subclasses must have, with most items being handled automatically.

Below is a minimal example of what the structure of a `ModelLink` subclass looks like:
```python
# ExampleLink class definition
class ExampleLink(ModelLink):
    # Extend class constructor
    def __init__(self, *args, **kwargs):
        # Perform any custom operations here
        pass

        # Set ModelLink flags (name, call_type, MPI_call)
        pass

        # Call superclass constructor
        super().__init__(*args, **kwargs)

    # Override default model parameters (optional)
    def get_default_model_parameters(self):
        par_dict = {}
        return(par_dict)

    # Override default model data (optional)
    def get_default_model_data(self):
        data_dict = {}
        return(data_dict)

    # Override call_model abstract method
    def call_model(self, emul_i, par_set, data_idx):
        # Perform operations for obtaining the model output
        # Following is provided:
        # 'emul_i': Requested iteration
        # 'par_set': Requested sample(s) dict
        # 'data_idx': Requested data point(s)
        pass

    # Override get_md_var abstract method
    def get_md_var(self, emul_i, par_set, data_idx):
        # Perform operations for obtaining the model discrepancy variance
        # Following is provided:
        # 'emul_i': Requested iteration
        # 'par_set': Requested sample dict
        # 'data_idx': Requested data point(s)
        pass
```
We can see in the definition of the `ExampleLink` class above, that the `ModelLink` class contains a few basic utility methods and two abstract methods that must be overridden: `call_model()` (wrapper function for calling the model) and `get_md_var()` (calculates the model discrepancy variance).
As both methods are very important, detailed descriptions of them are given in [Wrapping a model](https://prism-tool.readthedocs.io/en/latest/user/modellink_crash_course.html#call-model) and [Model discrepancy variance](https://prism-tool.readthedocs.io/en/latest/user/modellink_crash_course.html#md-var), respectively.

We can check the list of definitions bound to the `ModelLink` class by executing:

In [None]:
[prop for prop in dir(ModelLink) if not prop.startswith('__')]

This shows us that the few definitions that are overridden in the `ExampleLink` class are not nearly all the definitions that the `ModelLink` class has.
Most of the definitions we see in this list are either class properties or utility functions that are used by the `Pipeline`.

### Basic properties
Before we can write a `ModelLink` subclass, we first have to understand what exactly is happening in the `ExampleLink` class given above.
Since every model is different, with some requiring preparations in order to work properly, the constructor method (`__init__()`) may be extended to include any custom code to be executed when the subclass is initialized.
The superclass constructor (`__init__()` of `ModelLink`) must always be called, as it sets several important flags and properties, but the time at which this is done does not matter for *PRISM* itself.
During the initialization of the `Emulator` class (initialized automatically by `Pipeline`), it is checked whether or not the superclass constructor of a provided `ModelLink` instance was called (to avoid this from being forgotten).

Besides executing custom code, three properties/flags can be set in the constructor, which have the following default values if the extended constructor does not set them:
```python
self.name = self.__class__.__name__ # Set instance name to the name of the class
self.call_type = 'single'           # Request single model calls
self.MPI_call = False               # Request only controller calls 
```

The first property, `name`, defines the name of the `ModelLink` instance, which by default is set to the name of the subclass.
This name is used by the `Emulator` class during initialization to check if a constructed emulator is linked to the proper `ModelLink` instance, in order to avoid causing mismatches.
If we want to use the same `ModelLink` subclass for different models (like, we want to use different parameter spaces), then it is recommended that we add an identifier for this to this name.

The other two properties, `call_type` and `MPI_call`, are flags that tell *PRISM* how the `call_model()`-method should be used.
They are mostly important when using sophisticated models in MPI and are best unset in simple cases.
By default, *PRISM* requests samples one-by-one (in serial), which is the easiest to implement for the user.

Finally, the `ModelLink` class has three methods that can be overridden for adding utility to the class (of which two are shown in the `ExampleLink` class definition).
The important ones, `get_default_model_parameters()` and `get_default_model_data()`, return dictionaries containing the default model parameters and model data to use in this `ModelLink` instance, respectively.
By overriding these methods, we can hard-code the use of specific parameters or comparison data, avoiding having to provide them when initializing the `ModelLink` subclass.
Additionally, if we were to provide a default parameter or data point during initialization, the provided information will override the defaults.

We can find an example of this in the `GaussianLink` class, which already has default parameters defined (as mentioned in the previous tutorial):

In [None]:
model_data = {3: [3.0, 0.1]}
modellink_obj = GaussianLink(model_data=model_data)
modellink_obj

If we now initialize the `GaussianLink` class using a custom set of parameters, its defaults will be overridden as shown by its modified representation:

In [None]:
model_parameters = {'A1': [-5, 7, 2]}
modellink_obj = GaussianLink(model_parameters=model_parameters, model_data=model_data)
modellink_obj

## Writing a ModelLink subclass
Now that we have a basic understanding of how to write a `ModelLink` subclass, let's use the template of the `ExampleLink` class from before to make a `ModelLink` subclass for a straight line model, defined as $f(x) = A+Bx$:

In [None]:
# LineLink class definition
class LineLink(ModelLink):
    # Define default model parameters (optional)
    def get_default_model_parameters(self):
        par_dict = {
            'A': [-10, 10, 3],  # Intercept in [-10, 10] with estimate of 3
            'B': [0, 5, 1.5]}   # Slope in [0, 5] with estimate of 1.5
        return(par_dict)

    # Override call_model abstract method
    def call_model(self, emul_i, par_set, data_idx):
        # Calculate the value on a straight line for requested data_idx
        vals = par_set['A']+np.array(data_idx)*par_set['B']
        return(vals)

    # Override get_md_var abstract method
    def get_md_var(self, emul_i, par_set, data_idx):
        # Calculate the model discrepancy variance
        # For a straight line, this value can be set to a constant
        return(1e-4*np.ones_like(data_idx))

Here, we created a `ModelLink` subclass called `LineLink`.
As the `LineLink` class is quite simple, it is not necessary to make any adjustments to the class constructor, so we simply removed it.
We have defined default parameters for our straight line model to avoid having to provide it when we initialize the `LineLink` class.
In the `call_model()`-method, we implemented the algorithm for calculating the value on a straight line.
Although generally not recommended, we used a very basic description for calculating the model discrepancy variance.

In order to help users with writing their `ModelLink` subclass, *PRISM* provides an utility function called `test_subclass()` (`prism.modellink.test_subclass`).
This function takes a `ModelLink` subclass and all arguments that must be provided to it, and tests if it can be initialized and used properly.
If this is the case, it returns the created instance of the provided `ModelLink` subclass, which can then be used in the `Pipeline`.

So, let's see if we have written our `LineLink` class properly:

In [None]:
data_dict = {1: [4.5, 0.1],    # f(1) = 4.5 +- 0.1
             2.5: [6.8, 0.1],  # f(2.5) = 6.8 +- 0.1
             -2: [0, 0.1]}     # f(-2) = 0 +- 0.1
modellink_obj = test_subclass(LineLink, model_data=model_data)

As no errors are being raised, it seems that we indeed wrote it correctly.
In case we had made a mistake, the `test_subclass()`-function would have raised an error telling us what exactly went wrong and what probably caused this.

Now that we have our own custom `ModelLink` instance, we can initialize the `Pipeline` (this time using a specific working directory to avoid clashing with the previous one):

In [None]:
pipe = Pipeline(modellink_obj, working_dir='prism_line')
pipe

Like before, as no errors are being raised, *PRISM* is ready to start emulating.