# Parallel Simulations
When it comes to training machine learning models, more data generally means better-performance. Consequently, quickly generating data is invaluable. In this tutorial, we'll explore how to leverage parallel computing for data generation. The tutorial will focus on running the same experiment for simulations with different parameter sets. 

Please note that Jupyter Notebook is not an ideal IDE for parallel computing. While this tutorial is written in Jupyter Notebook for convenience, you will not be able to copy and run the following code blocks in an ``.ipynb`` file. The code blocks, including the output, for this tutorial were copied over from a script written in the [Spyder IDE](https://www.spyder-ide.org/). 

To start the tutorial, we will import the required modules. Note that we use packages from Python's standard library. This means that you do not have to add any more dependencies to run BATMODS-lite code in parallel. 

```python
import os, time
from functools import partial
from multiprocessing import Pool

import numpy as np
import bmlite as bm
```

## Setup the Runner
Python's multiprocessing pool is set up to operate on functions that only include one argument. Below, we create a wrapper function called ``runner`` that operates mainly on the ``theta`` argument. ``theta`` is a dictionary that maps model parameter names to their values. Despite the one-argument requirement, you can see that the function below also takes a second keywords argument ``kwargs`` so we can pass some additional arguments to ``runner``. We will use ``partial`` from ``functools`` below to address the second-argument issue.

Inside the ``runner`` function, we extract the simulation object ``sim`` and experiment ``exp`` from the keyword arguments. We use ``pop`` to do this so that they are removed from the keywords dictionary as we extract them. That way, the remaining key/value pairs in ``kwargs`` can be passed to ``run_CC`` for the ``IDASolver``.

The ``for`` loop within runner sets the model parameters from ``theta``. After all of the parameters have been updated, we run ``sim.pre()`` to update the mesh/pointers/etc. just in case any of the updated parameters require re-running the pre-processor.

```python
def runner(theta: dict, **kwargs) -> object:

    sim = kwargs.pop('sim')
    exp = kwargs.pop('exp')

    for k, v in theta.items():
        domain = getattr(sim, k.split('.')[0])
        setattr(domain, k.split('.')[1], v)

    sim.pre()

    sol = sim.run_CC(exp, **kwargs)

    return sol
```

## Create Parameter List
The ``bmlite.math`` module includes a function ``param_combinations`` to help generate the ``theta`` values for the runner function above. The code below shows an example of how to use ``param_combinations``. The inputs are a list of model parameter names (including their domain class) and value arrays for each parameter. In this case, we will vary the exchange current density and diffusion coefficient degradation factors for both electrodes. We will let each of these four parameters be either $0.1$ or $10$., resulting in $2^4 = 16$ total simulations. In this section of code, we also initialize the simulation and experiment and store them in a keywords dictionary.

```python
sim = bm.SPM.Simulation()
exp = {'C_rate': -2., 't_min': 0., 't_max': 1350., 'Nt': 150}

params = ['an.i0_deg', 'an.Ds_deg', 'ca.i0_deg', 'ca.Ds_deg']

values = []
for i in range(len(params)):
    values.append(np.linspace(0.1, 10., 2))

theta_list = bm.math.param_combinations(params, values)

kwargs = {}
kwargs['sim'] = sim
kwargs['exp'] = exp
```

Since we are using the default simulation ``.yaml`` file, you will see the following output print to the console when you execute this code block. The message is a simple warning and can be ignored. 

```python
[BATMODS WARNING]
    SPM Simulation: Using a default yaml
```

## Time in Series
To compare our parallelization results to a baseline, we will run the $16$ simulations in both series and in parallel. In the code below, we exercise each simulation in series, using the ``time`` module to track how long the data generation process takes. The simulation solutions are stored in a list named ``series``. However, if you didn't need to store all of the results, you could use ``sol.save_sliced()`` to save the results within the ``for`` loop rather than storing them in a list.

```python
start = time.time()
series = []
for theta in theta_list:
    sol = runner(theta, **kwargs)
    series.append(sol)

print(f'time_in_series: {time.time() - start:.3f} s')
```

The results of this code block are shown below. You can see that the total simulation time took about $28$ seconds, or roughly $1.8$ seconds per simulation. In the next section, we will show that running in parallel can dramatically reduce this time.

```python
time_in_series: 28.630 s
```

## Time in Parallel
The code below shows how to call the ``runner`` function we set up at the top of the tutorial to run in parallel. Note two very important pieces of information as you examine this code block: (1) the code is wrapped in an ``if __name__ == `__main__`:`` statement. This is a requirement by the ``multiprocessing`` package so that infinite recursion does not occur, and (2) the ``partial`` function is used to create a new function called ``process`` that has the same arguments as ``runner`` (i.e., ``theta``), but with the keywords arguments pre-specified. This is how we get around passing extra values to ``runner`` even though ``multiprocessing.Pool`` requires single-argument functions. 

After setting up the ``process`` function wrapper, we determine the total number of cores on the computer by using ``os.cpu_count()``. This is the maximum number of cores you can call, but you can also always use fewer. The machine that this code was run with has $12$ total cores, and we used them all below. Generally speaking, using more cores for a small number of simulations run in parallel may actually take more time than running in series. However, if you are running tens, hundreds, or even thousands of simulations, more cores is generally better. 

As with the series runs, we will store the solutions from the parallel simulations in a list named ``parallel``. The function ``imap`` from the multiprocessing pool will ensure that our solutions list will stay in the same order as the ``theta_list``. 

```python
if __name__ == '__main__':
    start = time.time()
    process = partial(runner, **kwargs)
    
    cpus = os.cpu_count()
    
    pool, parallel = Pool(cpus), []
    for sol in pool.imap(process, theta_list):
        parallel.append(sol)
    
    print(f'time in parallel: {time.time() - start:.3f} s')
```

Below you can see that the total time for the same $16$ simulations in parallel required just under $12$ seconds, or roughly $0.7$ seconds per simulation. This result shows that running in series took nearly $2.5$-times longer than exercising the same set of simulations in parallel. Of course, here we only ran the model $16$ times so the time savings doesn't seem like a lot. However, assuming the average time per simulation stayed about the same, this means that in a $12$-hour block we could get either $\approx 24,000$ results in series or over $60,000$ results in parallel given the same amount of time. 

```python
time_in_parallel: 11.784 s
```