# MPI real world example

## Learning Objectives

By the end of this lesson, learners will be able to:

- Parallelize the fractal generation problem using MPI to improve performance and distribute the workload across multiple processes.
- Download and run both the serial and parallel versions of the Julia set fractal code.
- Observe performance improvements when switching from serial to parallel execution with MPI.
  
## MPI real world example problem

In a previous lesson we have seen *multi-processing* being used to solve the generation of the Julia set. An alternative approach is to use *message passing*.

As mentioned earlier, this is a relatively simple problem to parallelise. If we consider running the program with multiple processes, all we need to do to divide the work is to divide the complex grid up between the processes. Thinking back to previous sections, we covered an MPI function that can achieve this - the `scatter` method of the MPI communicator.

We can directly take the example from the previous chapter and apply it to the complex mesh creation function:

```python
comm = MPI.COMM_WORLD

if comm.Get_rank() == 0:
    grid = complex_grid(extent, cells)
    grid = np.array_split(grid, comm.Get_size())
else:
    grid = None

grid = comm.scatter(grid, root=0)
```

Here we are following the same pattern of initialising data on the root rank, splitting into equal-ish parts and scattering to all the different ranks. Each rank can then apply the Julia set function to it's own part of the mesh - this part of the code doesn't need to change at all!
To complete the process, we need to gather the data back into a single array. We can do this with the communicator's `gather` method, followed by concatenating the resulting array:

```python
fractal = comm.gather(fractal, root=0)
if not fractal is None:
    fractal = np.concatenate(fractal)
```

With this method we have effectively offloaded the work of the function to multiple processes and ended up with the same result. Let's use `time` to see if we have increased the speed of the function:

```
$ time python parallel_fractal.py
python parallel_fractal.py  21.52s user 14.17s system 93% cpu 38.368 total

$ time mpirun -n 4 python parallel_fractal.py
mpirun -n 4 python parallel_fractal.py  37.23s user 21.70s system 370% cpu 15.895 total
```

We can see that running the problem in parallel has greatly increased the speed of the function, but that the speed increase is directly proportional to the resource we are using (i.e. using 4 cores doesn't make the process 4 times faster). This is due to the increased overhead induced by MPI communication procedures, which can be quite expensive (as mentioned in previous chapters).
The way that a program performance changes based on the number of processes it runs on is often referred to as its "scaling behaviour". Determining how your problem scales across multiple processes is a useful exercise and is helpful when it comes to porting your code to a larger scale HPC machine.

### Download Complete Parallel File 
[Download complete parallel_fractal_example file](complete_files/mpi_fractal_complete.py)



