# Cell Division Code Example

In this notebook we're going to look through a simple but complete example of how we might write a parallel piece of code to simulate the population of cells over time.

## The Problem

We are going to simulate a population of cells over time. The cells have an average lifetime, but the actual time a cell will live is random. If $l$ is the lifetime of a cell, then the probability density function that a cell will die at time $t$ after its birth is given by: 

$P(t) = \textrm{e}^{-\frac{t}{l}}$.

When a cell will ends its life, it will either die or split into two cells. In our simulation, we'll use a mean lifetime of 1, and we'll assume the cell has a probability of splitting of 0.55 and a probability of dying of 0.45. As the time of death of each cell and the fate of each cell is non-deterministic, the same starting conditions will lead to different outcomes each time the simulation is run.

As cells are more likely to split than to die, on average, the population will grow as in the figure below:

<p align="center">
<img src="resources/single_growing_population.png" alt="A figure a realisation when the population grows." class="center">
</p>

However, there is a significant chance that the population will die out, potentially very quickly:

<p align="center">
<img src="resources/single_quick_death.png" alt="A figure a realisation when the population quickly dies." class="center">
</p>

We would like to find out how the mean and standard deviation of the population changes over time. We would also like to calculate the survival probability, which is the probability that the population is non-zero at a given time.

## Serial Simulation

We can simulate this problem by running a number of different realisations of the simulation. Each of these realisations will be an independent simulation of a possible history of the population of cells as a function of time. Once we have run many realisations, we can use the collection of realisations to calculate the mean and standard deviation of the population at each time, as well as the survival probability.

A serial realisation of the simulation can be found in [`06_cell_population_example/serial_version.py`](06_cell_population_example/serial_version.py). We don't need to look into the details of most of the functions, but it's worth understanding what each function does:

- `generate_next_population`: This function calculates the number of cells that will be present at a certain time after the current time for a single cell. This uses a random number generator to determine if a cell will die, split, or survive the time period.
- `run_realisation`: This function runs a single realisation of the simulation. It starts with a single cell at time 0 and calculates the population at each of a number of output times using `generate_next_population`.
- `run_single_realisation`: This function runs a single realisation of the system using `run_realisation` and then calculates the mean and plots the output.
- `run_multiple_realisations`: This function runs several realisations of the system using `run_single_realisation` and then calculates the mean and standard deviation of the population at each time, as well as the survival probability.

The script contain code which runs two single realisations - one which always dies out and one which always grows (the results are guaranteed by manually choosing the seed of the random number generator). The script then runs 100 realisations of the system. The runtimes of the system are printed to the console. When I ran the code, the output was:

* Single realisation that always dies: 0.12s
* Single realisation that always grows: 2.13s
* 100 realisations: 91s

The time to simulate a growing population is significantly longer than the time to simulate a dying population as there are more cells to simulate. Just under 20% of the simulations see a growing population. As the number of realisations is not very high, we might expect there to be some variation in the outputs and the runtimes of the simulation with multiple realisations each time it is run.

## Queue Implementation

This problem is a god candidate for parallelisation as each realisation is independent of the others. As each realisation can take a while to run, it also means the overhead due to communication is likely to be small compared to the time taken to run the realisation.

As we parallelise the code, we want to keep the interface for the functions a user might call as similar as possible, specifically, `run_single_realisation` and `run_multiple_realisations`. This means it will take minimal effort adapt existing tests and profiling, and any users running the code, or any places where the code is called in existing projects will not need to be changed.

Our first attempt to parallelising the code is to use a queue to store te results produced from a number of realisations in the file `06_cell_population_example/queue.py`. To do this we create the new function `run_n_realisation_queue` which is similar to the old function `run_multiple_realisations` but uses a queue to store the results of all realisations performed in a 2D Numpy array. This function will be called by each process. The function `run_multiple_realisations` is adapted to create the queue, start the processes, collect the results from the queue, and process the results. Each process returns a 2D Numpy array with the population at each time for each realisation.

When altering `run_multiple_realisations` we have made the number of processes an optional argument with a default value of 1. This means that calls made to the function without specifying the number of processes will still work, making integration of the new function into existing projects easier.

This implementation doesn't alter the runtime of the single realisations, but decreases the runtime from around 90s to around 40s on 4 cores. This is a decent speedup, but the code is not 4 times faster. Part of the reason for this becomes apparent when we run the code. The code prints when each process has finished its quarter of the realisations. Typically, the processes will finish at significantly different times. In one example I just ran, process 4 finished in 11 seconds, process 2 finished in 27 seconds, process 3 finished in 34 seconds and process 2 finished in 42 seconds. This is because each realisation does not take the same amount of time to run, with realisations that result in quick death of the cell population taking around 5% of the runtime as a realisation where the population grows. If one process happens to simulate 10 realisations out of 25 where the cell population grows, it will take significantly longer to run than a process where only 2 grow. The figure below shows a hypothetical example of how the time each process spends on each realisation might vary.

<p align="center">
<img src="resources/queue_process_time.png" alt="The amount of time each process might spend performing each realisation." class="center">
</p>

Once a process has finished its realisations it will terminate and the physical core will be inactive. The code is left waiting for the slowest process to finish, meaning progressively fewer of the cores are active as the code runs. This is a common problem when parallelising code, and is known as load imbalance and is the main reason why the code is not 4 times faster when run on 4 cores.