# 04 Parallezation and Advanced Topics

This notebooks demonstrates different options for running ibicus bias adjustment on larger areas and larger computing environments using the built-in parallelization and integration with dask. In the second part it looks at some advanced topics: logging as well as extending the package with own methods.


## 1. Running ibicus in larger environments: parallelization and dask

ibicus comes with an integrated parallelization option building upon the `multiprocessing` module. It also integrates easily with dask to run in HPC environments. In this notebook, we demonstrate these options using a CDFt and QuantileMapping debiaser.

In [2]:
from ibicus.debias import CDFt, QuantileMapping

Let's get some testing data. For an explanation of the steps please refer to the "Getting started" notebook:

In [3]:
import numpy as np

def get_data(variable, data_path = "testing_data/"):
    # Load in the data 
    data = np.load(f"{data_path}{variable}.npz", allow_pickle = True)
    # Return arrays
    return data["obs"], data["cm_hist"], data["cm_future"], {"time_obs": data["time_obs"], "time_cm_hist": data["time_cm_hist"], "time_cm_future": data["time_cm_future"]}

In [4]:
obs, cm_hist, cm_future, dates = get_data("tas")

### 1.1. Parallelization

Parallelization can be activated in the existing ibicus functionalities by simply specifying `parallel = True` in the `debiaser.apply`-function:

In [5]:
debiaser = CDFt.from_variable("tas")
debiased_cm_future = debiaser.apply(obs, cm_hist, cm_future, **dates, parallel = True, nr_processes = 8)



The number of processes that run in parallel can be controlled using the `nr_processes` option. The default option are 4 processes. For more details see the [ibicus API reference](https://ibicus.readthedocs.io/en/latest/reference/debias.html#ibicus.debias.Debiaser). Important to note: no progressbar is shown in parallelized execution. 

We recommend using parallelization if users are interested in speeding up the execution of bias adjustment on a single machine.

### 1.2. Dask

For some problems the speedup provided by the simple parallelization presented above does not provide enough flexibility: for example if users are interested in scaling debiasing in an HPC environment on many machines or if the observation and climate model data does not fit into RAM. 

To address these issues, ibicus integrates easily with `dask`.  `dask` is an open-source python library for parallel computing allowing users to easily scale their python code from multi-core machines to large clusters. It is integrated in both `xarray` and `iris` (see here for the [xarray dask integration](https://docs.xarray.dev/en/stable/user-guide/dask.html) and here for [the iris one](https://scitools-iris.readthedocs.io/en/latest/userguide/real_and_lazy_data.html)). In both both libraries, it is possible to extract the underlying dask arrays needed for computation. 

For a dask introduction see [here](https://tutorial.dask.org/00_overview.html) and for a practical introduction on how to use dask on a HPC cluster see [this tutorial](https://www.youtube.com/watch?v=FXsgmwpRExM&t=441s). We will only use the `dask.array` module here:

In [3]:
import dask.array as da

Let's get some larger testing data:

In [7]:
obs = da.from_array(np.random.normal(270, 20, size = 50*50*10000).reshape((50, 50, 10000)), chunks=(5, 10, 10000))
cm_hist = da.from_array(np.random.normal(265, 15, size = 50*50*10000).reshape((50, 50, 10000)), chunks=(5, 10, 10000))
cm_future = da.from_array(np.random.normal(280, 30, size = 50*50*10000).reshape((50, 50, 10000)), chunks=(5, 10, 10000))

For our purposes it is crucial that the dask arrays are **chunked in the spatial dimension** meaning chunks can be defined in the first two dimensions, but always need to include the full time dimension at each location. This is required to calculate the climatology at each location.

Given correctly chunked arrays applying dask is easily possible by just mapping the `debiaser.apply` function over all chunks using eg. `map_blocks`:

In [8]:
debiaser = QuantileMapping.from_variable("tas")

collection = da.map_blocks(debiaser.apply, obs, cm_hist, cm_future, dtype=obs.dtype, progressbar = False, parallel = False)
debiased_cm_future = collection.compute(num_workers=8)

It is also possible to use other dask mapping functions such as `blockwise`. To use the ibicus `apply` function together with dask it is important to specify two arguments:

- `progressbar = False` otherwise the progressbar output will fill the output log. A dask progressbar can be used by importing `dask.diagnostics.ProgressBar`.
- `parallel = False` (default) because otherwise ibicus parallelisation will interfere with the dask one. 

In the case of bias adjustment methods where the apply function requires additional information such as time/dates,  this can be specified as keywords arguments to `map_blocks`. For very big runs it is also recommended to specify `failsafe = True` to make sure that if the debiaser fails at some locations the output for the other ones can still be saved. When doing so it is even more important to check the logs for any errors and to evaluate the output carefully.

Dask itself provides a big variety of customization options and we recommend checking those out.

## 2. What about logging and warnings?

A brief note on logging and warnings: when ibicus encounters issues during code execution a warning or error message will be raised and the standard python tools to handle these can be used. ibicus also writes logs during the execution and logs errors during failsafe mode. The logs are written to the "ibicus" logger (`ibicus.utils.get_library_logger()`) and utils provides some [options to set the logging level for ibicus](). The logging outputs can be handled in the usual way as specified by the [logging library](https://docs.python.org/3/howto/logging.html#logging-basic-tutorial): they can be formatted, written to file, ignored, etc. 

## 3. Creating your own bias adjustment methods

By building upon the common framework and interface developed in the ibicus package, it is straightforward to implement your own bias adjustment methods when using the package. A new bias adjustment method can be set up as an attrs-child-class of the abstract `Debiaser`-class ([see here for the documentation](https://ibicus.readthedocs.io/en/latest/reference/debias.html#ibicus.debias.Debiaser)). A child class needs to include two functions:

-  an `apply_location()` function which applies an initialised debiaser at one location. Arguments are 1d-vectors of obs, cm_hist, and cm_future representing observations, and climate model values during the reference (cm_hist) and future period (cm_future). Additionally kwargs passed to the debiaser apply()-function are passed down to the apply_location()-function.

- a `from_variable()` function which initialises a debiaser with default arguments given a climatic variable either as str or member of the `Variable`-class. kwargs are meant to overwrite default arguments for this variable. Given a dict of default arguments with variables of the `Variable` class as keys and dict of default arguments as values the `cls._from_variable()`-function can be used to automatically map variable arguments to default settings.

Given these two functions are provided, the abstract debiaser class then takes care of setup, iterating the application of the method over locations, parallelization, input sanitization, etc.

Alternatively a user can also create a subclass of the `SeasonalRunningWindowDebiaser` or `SeasonalAndFutureRunningWindowDebiaser` class. This enables the user to apply the new method in a running window setting. Either in one over the year to account for seasonalities (`SeasonalRunningWindowDebiaser`), or in one over the year and over the years of the future period (`SeasonalAndFutureRunningWindowDebiaser`) to account for seasonality and smooth out trends: 
- For the `SeasonalRunningWindowDebiaser` the subclass needs specification of an `apply_on_seasonal_window` function instead of an `apply_location`-function. 
- Similarly for the `SeasonalAndFutureRunningWindowDebiaser` the subclass needs specification of an `apply_on_seasonal_and_future_window` function instead of an `apply_location`-function. 

Below is an example of how a new version of LinearScaling could be set up using the `SeasonalRunningWindowDebiaser`:

In [None]:
import attrs
import numpy as np

# Import the SeasonalRunningWindowDebiaser from ibicus to subclass
from ibicus.debias import SeasonalRunningWindowDebiaser

# Define the new debiaser as an attrs-subclass. Slotted classes don't work well with inheritance so we use slots=False
@attrs.define(slots=False)
class LinearScaling(SeasonalRunningWindowDebiaser):

    # Define an argument of the debiaser: the type of transformation used
    delta_type: str = "additive"

    # Define the from_variable-method to initialize the debiaser. 
    @classmethod
    def from_variable(cls, variable, delta_type, **kwargs):
        return cls(variable = variable, delta_type = delta_type)
        
    # Define the apply_on_window method to apply the debiaser
    def apply_on_seasonal_window(self, obs, cm_hist, cm_future, **kwargs):
        
        # Depending on delta_type apply a different transformation
        if self.delta_type == "additive":
            return cm_future - (np.mean(cm_hist) - np.mean(obs))
        
        elif self.delta_type == "multiplicative":
            return cm_future * (np.mean(obs) / np.mean(cm_hist))
        
        else:
            raise ValueError('self.delta_type needs to be one of ["additive", "multiplicative"].')


We can then instantiate and apply the class as follows over a grid of locations:

In [22]:
debiaser = LinearScaling.from_variable("tas", delta_type = "additive")
output = debiaser.apply(np.random.random((100, 3, 3))+280, np.random.random((100, 3, 3))+282, np.random.random((100, 3, 3))+284)

100%|███████████████████████████████████████████| 9/9 [00:00<00:00, 5183.84it/s]


Class-attributes such as the `delta_type` can also be set up as `attrs.field` attributes. This has the advantage of enabling the automatic checking and sanitization of inputs. For example we could write the `delta_type`-definition as:

```python
delta_type: str = attrs.field(default="additive", validator=attrs.validators.in_(["additive", "multiplicative"]))
```

In this example, the objective can only be created if `delta_type` is either *additive* or *multiplicative*. Otherwise, an error is given.

Furthermore, the user can also define default settings and experimental default settings for different variables and use the `cls._from_variable()` function to map variable arguments to their settings when implementing a new method, as shown in the following example:

In [None]:
from ibicus.variables import tas, pr, hurs, psl

# Define default setting and experimental default settings:
default_settings = {tas: {"delta_type": "additive"}, pr: {"delta_type": "multiplicative"},}
experimental_default_settings = {hurs: {"delta_type": "multiplicative"}, psl: {"delta_type": "additive"}}

@attrs.define(slots=False)
class LinearScaling(SeasonalRunningWindowDebiaser):
    delta_type: str = "additive"

    @classmethod
    def from_variable(cls, variable, **kwargs):
        # Use the cls._from_variable helper functions to map a variable onto it's settings
        return cls._from_variable(cls, variable, default_settings, experimental_default_settings, **kwargs)
        
    def apply_on_seasonal_window(self, obs, cm_hist, cm_future, **kwargs):
        if self.delta_type == "additive":
            return cm_future - (np.mean(cm_hist) - np.mean(obs))
        elif self.delta_type == "multiplicative":
            return cm_future * (np.mean(obs) / np.mean(cm_hist))
        else:
            raise ValueError('self.delta_type needs to be one of ["additive", "multiplicative"].')


This allows instantiation and application as follows:

In [30]:
debiaser = LinearScaling.from_variable("psl")
output = debiaser.apply(np.random.random((100, 3, 3))+1, np.random.random((100, 3, 3))+2, np.random.random((100, 3, 3))+3)

  return cls._from_variable(cls, variable, default_settings, experimental_default_settings, **kwargs)
100%|███████████████████████████████████████████| 9/9 [00:00<00:00, 4026.96it/s]


The LinearScaling debiaser set up here includes a running window functionality. If this is not required then we could also subclass the `Debiaser` instead of `SeasonalRunningWindowDebiaser` to set up a new debiaser.