# Parallel

The executed version of this tutorial is at https://elephant.readthedocs.io/en/latest/tutorials/parallel.html

`elephant.parallel` module provides a simple interface to parallelize multiple calls to any user-specified function. The typical use case is calling a function many times with different parameters.

## Available executors

`elephant.parallel` has 3 interfaces to choose from, depending whether the user has a laptop/PC or the computation is being done on a cluster machine with many nodes and MPI installed.

* `ProcessPoolExecutor` is a wrapper of python built-in `concurrent.futures.ProcessPoolExecutor`. This is recommended to run on laptops and personal computers;
* `MPIPoolExecutor` is a wrapper of `mpi4py.futures.MPIPoolExecutor`. This is recommened to run on cluster nodes with MPI-2 installed;
* `MPICommExecutor` is a wrapper of `mpi4py.futures.MPICommExecutor`. This is a legacy MPI-1 class for `MPIPoolExecutor` and is recommended to run only on cluster nodes which do not support MPI-2 protocol.

Besides these three, a `SingleProcess` executor is available as a fall-back option to test executions in a single process (no speedup).

All listed above classes has the same API and can be used interchangeably.

## How to use

Let's say you want to call some function `my_function()` for each element in a list `iterables_list` like so:

(eq. 1) `results = [my_function(arg) for arg in iterables_list]`.

If `my_function`'s implementaion does not use parallelization, you can obtain the results by computing `my_function()` assynchronously for each element in arguments list. Then the result of eq. 1 is equivalent to

(eq. 2) `results = Executor().execute(my_function, iterables_list)`,

where `Executor` can be any of listed above available executors. For more information about parallel executors in python refer to https://docs.python.org/3/library/concurrent.futures.html.

## Examples

### Example 1. Computing the mean firing rate

`mean_firing_rate()` function in `elephant.statistics` works with one spiketrain as input. Let's parallelize it by computing firing rates of 8 random spiketrains.

In [None]:
import numpy as np
import quantities as pq

from elephant.parallel import SingleProcess, MPIPoolExecutor, ProcessPoolExecutor, MPICommExecutor
from elephant.spike_train_generation import homogeneous_poisson_process
from elephant.statistics import mean_firing_rate, time_histogram

In [None]:
rate = 10 * pq.Hz
spiketrains = [homogeneous_poisson_process(rate, t_stop=10*pq.s) for _ in range(8)]

We start with a sanity check by computing the mean firing rate of the spiketrains with `SingleProcess` executor, which is run in the main process with no parallelization.

In [None]:
firing_rate0 = SingleProcess().execute(mean_firing_rate, spiketrains)
firing_rate0

Let's verify that all three other executors produce the same result, but now with parallelization turned on.

In [None]:
firing_rate1 = ProcessPoolExecutor().execute(mean_firing_rate, spiketrains)
firing_rate1

In [None]:
firing_rate2 = MPIPoolExecutor().execute(mean_firing_rate, spiketrains)
firing_rate2

In [None]:
firing_rate3 = MPICommExecutor().execute(mean_firing_rate, spiketrains)
firing_rate3

All executors produce identical output, as intended.

### Example 2. Custom functions and positional argument

Sometimes you might want to iterate over the second (or third, etc.) argument of a function. To do this, you need to create a custom function that passes its first input argument into the right position of the original function. Below is an example of how to compute time histograms of spiketrains with different `binsize`s (the second argument).

In [None]:
# step 1: initialize the first argument - spiketrains
spiketrains = [homogeneous_poisson_process(rate, t_stop=10*pq.s) for _ in range(8)]

# step 2: define your custom function
def my_custom_function(binsize):
    # specify all custom key-word options here
    return time_histogram(spiketrains, binsize, output='counts')

In [None]:
binsize_list = np.linspace(0.1, 1, num=8) * pq.s

time_hist = ProcessPoolExecutor().execute(my_custom_function, binsize_list)

`time_hist` contains 8 AnalogSignals - one AnalogSignal per `binsize` from `binsize_list`.

### Benchmark

Finally, let's see if `ProcessPoolExecutor` brings any speedup, compared to sequential processing, to Example 2.

In [None]:
import warnings
warnings.filterwarnings("ignore")

# initialize the iteration list
binsize_list = np.linspace(0.1, 1, 100) * pq.s

In [None]:
# sequential processing
%timeit [time_histogram(spiketrains, binsize) for binsize in binsize_list]

In [None]:
# with parallelization
%timeit ProcessPoolExecutor(max_workers=4).execute(my_custom_function, binsize_list)

## Old version. Will not be included

Comment: we cannot set the random seed to generate the same 100 spiketrains. This might be confusing for those who don't understand how the processes are spawned in Python.

Let's generate homogeneous Poisson process in a parallel manner.

`homogeneous_poisson_process` function requires one mandatory argument - `rate` - and has several keywords with default values. We will call this function multiple times but with different input `rate`.


In [None]:
import numpy as np
import quantities as pq

from elephant.parallel import SingleProcess, MPIPoolExecutor, ProcessPoolExecutor, MPICommExecutor
from elephant.spike_train_generation import homogeneous_poisson_process
from elephant.statistics import mean_firing_rate


Let's create 50 spiketrains with the firing rates in a range `[0.1, 10]` hertz in a time iterval of `[0, 10]` seconds.

In [None]:
rates = np.linspace(0.1, 10, 50) * pq.Hz

We start with a sanity check with a `SingleProcess` executor.

In [None]:
spiketrains1 = SingleProcess().execute(homogeneous_poisson_process, rates,
    t_start=0*pq.s, t_stop=10*pq.s, as_array=True)

Then we just switch to `ProcessPoolExecutor`, `MPIPoolExecutor`, or `MPICommExecutor` without changing anything else.

In [None]:
spiketrains2 = ProcessPoolExecutor().execute(homogeneous_poisson_process, rates,
    t_start=0*pq.s, t_stop=10*pq.s, as_array=True)

spiketrains3 = MPIPoolExecutor().execute(homogeneous_poisson_process, rates,
    t_start=0*pq.s, t_stop=10*pq.s, as_array=True)

spiketrains4 = MPICommExecutor().execute(homogeneous_poisson_process, rates,
    t_start=0*pq.s, t_stop=10*pq.s, as_array=True)


All these sets of spiketrains 1-4 are identical. Let's print the first 10 entries.

In [None]:
spiketrains2[:10]

In [None]:
spiketrains3[:10]

In [None]:
spiketrains4