# Multiprocessing

The ```multiprocessing``` package has a similar interface to the ```threading``` module but, instead of spawning threads, it spawns processes. These are separate processes in the operating that have their own memory space. This means that sharing information between processes is more complicated than sharing information between threads. However, it also means that race conditions are less likely and, as each process is independent, the GIL is not a problem. This allows for code to be executed in parallel.

## Spawning Processes

The main class is the ```Process``` class. We can create a new instance of this class using ```Process(target=func, args=(arg1, arg2))```. We can then start the process using ```p.start()``` and wait for it to finish using ```p.join()```. For example:

In [None]:
import multiprocessing

def greeting(processes_number):
    print(f'Hello from process number {processes_number}')

processes = []

print(__name__)

if __name__ == '__main__':
    for i in range(2):
        p =multiprocessing.Process(target=greeting, args=(i,))
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

    print('Main process is done')

__main__
Main process is done


Much of this is similar to what you've already done with threads, but there are a few important differences. 

The first is that you will notice that the result of the print statement in ```greetings``` is not displayed under the code cell. This is because it is being run in a separate process and so its output is not captured and displayed by the Jupyter notebook. A copy of this code is found in the file [```03_multiprocessing_scripts/print_example.py```](03_multiprocessing_scripts/print_example.py). You can run this code and see that the output of all processes is captured by the terminal and displayed there.

The second thing to note is the line ```if __name == '__main__':```. This is necessary because, when a new process is spawned, it will run the code from the beginning of the script. This is necessary because the new process has a separate memory space and so needs to run the code again so that the function (in this case ```greetings```) is defined in the new process.

To explain this, we need to consider the built in variable ```__name__```. This variable is created automatically when Python is run and will have different names in different circumstances. In the piece of code which is being run directly,it will have the value ```__main__```. In the case of a piece of code which is being run as the main script of a new process it will have the value ```__mp_main__```. You can check the values of ```__name__``` by running [```03_multiprocessing_scripts/print_example.py```](03_multiprocessing_scripts/print_example.py).

This means that the line ```if __name__ == '__main__':``` will only be true in the main script and not in any new processes that are spawned. This is important because it means that the code inside this block will only be run in the main script and not in any new processes that are spawned. This prevents each new process from spawning more processes and so creating an infinite loop. We also include the code waiting for the processes to finish and the final call to the ```print``` function in the if-block so they are only run in the main script and not in any new processes that are spawned.

## Getting Values from Processes

Just like a thread created with the ```threading``` module, any value returned from a function called in a process will be lost. There are a few ways we can communicate between processes. We'll look at ```Pipe```, ```Queue```, ```Value``` and ```Array```.

### Pipes

A pipe is a two-way communication channel between two processes. We can create a pipe using ```Pipe()```. The two ends of the pipe are known as connectors. These can be passed to two processes to allow communication between them. By default, communication is allowed in two directions. We can then use the ```send()``` and ```recv()``` methods of the connectors to send and receive data through the pipe. When the ```recv()``` method is called, the process will wait until data is available to be received.

Note that there is a maximum size of data which can be sent through the pipe (this may be around 32MB depending on operating system).

The example below shows a simple example using a pipe to communicate between the main process and a child process:

```python
import multiprocessing
import numpy

def calculate_sum(conn):
    # Wait to receive an array from the parent process
    array = conn.recv()
    # Calculate the sum of the array
    result = numpy.sum(array)
    # Send the result back to the parent process
    conn.send(result)

if __name__ == '__main__':
    # Create a Pipe() object
    # This function returns a pair of connection objects connected by a pipe
    parent_conn, child_conn = multiprocessing.Pipe()
    # Create a process and pass the child connection object to it
    # The process will implement the calculate_sum function
    p = multiprocessing.Process(target=calculate_sum, args=(child_conn,))
    # Start the process
    p.start()
    # Send an array to the child process
    parent_conn.send(numpy.arange(1, 6))
    # Receive the result from the child process
    print(parent_conn.recv())
    # Wait until the process is finished
    p.join()
    print('Main process is done')
```

The above code will not work in a Jupyter notebook due to incompatibilities between ```multiprocessing``` and Jupyter. However, you can run this code in a Python script and see that the result is printed to the terminal. A copy of this code is found in the file [```03_multiprocessing_scripts/pipe_example.py```](03_multiprocessing_scripts/pipe_example.py) which you can run.

In the above code, the parent process creates a pipe and passes one end of the pipe to the child process. The parent process then sends an array to the child process. The child process receives the array, calculates the sum of the array and sends the result back to the parent process. The parent process then receives the result and prints it.

This method of communicating requires careful thought regarding the order in which processes will need to communicate with each other to make sure data is sent and received in the correct order. This can be difficult to manage in more complex programs. If the processes are load-balanced, it can also lead to processes waiting for data from another process, reducing the benefits of parallel execution. Once we have more than one child process, we will need to create a pipe for each pair of processes that need to communicate, further increasing complexity.

### Deadlocks

A deadlock is a situation where two or more processes are waiting for each other before progressing. This can happen in a number of conditions in concurrent programming. One possible cause of deadlocks is when two processes are waiting for each other to send data through a pipe. The following code is an adapted version of the code above but without the call to ```parent_conn.send``` in the main thread:

```python
import multiprocessing
import numpy

def calculate_sum(conn):
    # Wait to receive an array from the parent process
    array = conn.recv()
    # Calculate the sum of the array
    result = numpy.sum(array)
    # Send the result back to the parent process
    conn.send(result)

if __name__ == '__main__':
    # Create a Pipe() object
    # This function returns a pair of connection objects connected by a pipe
    parent_conn, child_conn = multiprocessing.Pipe()
    # Create a process and pass the child connection object to it
    # The process will implement the calculate_sum function
    p = multiprocessing.Process(target=calculate_sum, args=(child_conn,))
    # Start the process
    p.start()
    # Receive the result from the child process
    print(parent_conn.recv())
    # Wait until the process is finished
    p.join()
    print('Main process is done')
```

This code can be run in the file [```03_multiprocessing_scripts/deadlock_example.py```](03_multiprocessing_scripts/deadlock_example_example.py). You will see that the code hangs and does not finish. This is because the parent process is waiting for the child process to send data through the pipe and the child process is waiting for the parent process to send data through the pipe. This is a deadlock. care should be taken to avoid situations like this in concurrent programming.

### Queues

A queue is a datatype which allows for communication between many processes. We can create a queue using ```multiprocessing.Queue()```. We can then use the ```put()``` and ```get()``` methods to add and remove items from the queue. The data will be stored in a First In First Out (FIFO) order. The example below shows a simple example of how data is added to and removed from a queue using only the main process.

In [2]:
import multiprocessing

queue = multiprocessing.Queue()

queue.put(1)
queue.put(2)
queue.put(3)

print(queue.get())
print(queue.get())
print(queue.get())

1
2
3


A queue may be passed to multiple different processes and each processes with access to the ```Queue``` can add data to the queue or retrieve data from it. If many processes may add data to a ```Queue``` at the same time, the exact order in which they add data is not guaranteed as the order of execution across different processes is not guaranteed. This limits the way in which a ```Queue``` can be used as it may not be clear which process a piece of data is from. 

When the ```get``` method is called, the execution of the code will block (meaning "wait") until data is available in the queue. This means we don't need to worry about if the computations required to put data in the queue have been completed when we call the ```get``` method. However, we do need to make sure the same amount of data is added to the queue as is removed from it. If we try to remove more data from the queue than will be added to it, the code will block indefinitely.

The queue is thread and process safe, meaning that it can be used to communicate between many processes without the need for locks. The example below shows how we can use a ```Queue``` to collect the results from an arbitrary number of processes:

```python
import numpy as np
import multiprocessing
import time

# Note the start time
start_time = time.time()

def find_smallest_multiple(n_data, factor, queue):
    # This function generates n_data random integers and finds the smallest multiple of factor

    # Initially we have found no multiples of factor
    result = None

    # Create the random data
    data = np.random.randint(1, 1000, n_data)

    for d in data:
        # Loop over the data and check if it's a multiple of factor
        if d % factor == 0:
            # If it is, check if it's the smallest we've found so far
            if result is None or d < result:
                # Update the result
                result = d

    # After considering each value, put the result in the queue
    queue.put(result)

if __name__ == '__main__':
    # Set up the problem data
    n_processes = 2
    n_data = int(1e6)
    factor = 7
    n_data_per_process = n_data // n_processes

    # Set up the queue
    queue = multiprocessing.Queue()

    for i in range(n_processes):
        # Spawn and start the processes
        p = multiprocessing.Process(target=find_smallest_multiple, args=(n_data_per_process, factor, queue))
        p.start()

    # We haven't found any multiples of factor yet
    result = None

    for i in range(n_processes):
        # Get each result from the queue
        # The code will pause here while the main process waits for each child process to finish
        r = queue.get()

        if result is None or r < result:
            # If it's smaller than the current result, update it
            result = r

    # Note the end time and print the elapsed time
    end_time = time.time()
    print(f'Time taken: {end_time - start_time}')

    print(f'The smallest multiple of {factor} in the data is {result}')
```

This code can be run in the file [```03_multiprocessing_scripts/queue_example.py```](03_multiprocessing_scripts/queue_example.py). In the main process we create a ```Queue``` object and pass it to each of the child processes. Each child process calculates the smallest multiple of a given factor in a subset of the data and adds the result to the ```Queue```. 

The main process collects the same number of bits of data from the ```Queue``` as there are child processes. Initially, the processes won't have completed their calculations and added them to the ```Queue``` so the main process will block until the data is available. As the result from each thread is added to the ```Queue```, the main process will collect the data and process it. This sort of process is particularly well suited to a ```Queue``` as the order in which the data is added to the ```Queue``` is not important. We don't need to wait for the processes to finish as the ```queue.get()``` method will automatically block until data is available. As a result, we also don't need to create a list of the processes.

We can observe the performance of the code by changing the number of processes and the size of the data:

<p align="center">
<img src="resources/queue_smallest_factor.png" alt="A figure showing the runtime for different numbers of processes as a function of n_data" class="center">
</p>

When we completely remove the multiprocessing and run the code in a single process, we can see that the runtime is much less for low values of ```n_data```. This is because spawning processes takes some time, slowing down the code. However, as the size of ```n_data``` increases, this overhead becomes less significant and at around 100,000,0000 data points, the performance of the multiprocessing code equals that of the serial implementation. For 10,000,000,000 pieces of data, the multiprocessing implementation with both 4 and 8 courses is around 4 times faster than the serial implementation.

### Values

The ```multiprocessing``` module also provides a way to share data between processes using the ```Value``` class. This class create a variable which references the same location in our computer's memory for each process. This means that changes to the variable in one process will be reflected in all other processes.

The data stored in ```Value```  will be in the form of a ```ctype``` object. This is a C-style data type which is used to store data in memory. The C family of languages underpins much of Python and other languages and this is why it is used here. When we create a ```Value``` object, we need to specify the type of data we want to store. We may import the different ```ctype``` objects from the ```ctypes``` module (which is part of the Python Standard Library). The most common types are:

- ```ctype.c_int```: A 32-bit integer
- ```ctype.c_double```: A double precision floating point number
- ```ctype.c_bool```: A boolean value

We can retrieve and set the value of a ```Value``` object using its ```value``` attribute. The example below shows how we can create a shared ```Value``` object and increment it in a child process:

```python
import multiprocessing
import ctypes

# Create a shared memory value
# It is an integer with an initial value of 0
v = multiprocessing.Value(ctypes.c_int, 0)

def increment(v):
    v.value += 1

if __name__ == '__main__':
    # Create a process that increments the value
    p = multiprocessing.Process(target=increment, args=(v,))
    p.start()
    p.join()

    # Print the value
    print(v.value)

```

This code can be run in the file [```03_multiprocessing_scripts/value_example.py```](03_multiprocessing_scripts/value_example.py). 

As the data in a ```Value``` is shared between processes, it would now be possible to encounter race conditions as we did with threads. However, the ```Value``` class has a built in lock which we can access with the ```get_lock``` method and use to prevent this. We can use the ```acquire()``` and ```release()``` methods of the lock to acquire and release the lock, as below:

In [None]:
import multiprocessing
import ctypes

# Create a shared memory value
# It is an integer with an initial value of 0
v = multiprocessing.Value(ctypes.c_int, 0)

# Get the lock
v.get_lock().acquire()
# Do our calculations altering the value
v.value += 1
# Release the lock
v.get_lock().release()

print(v.value)

1


We can also use a context manager to acquire and release the lock. The example below shows how we can use the lock:

In [9]:
import multiprocessing
import ctypes

# Create a shared memory value
# It is an integer with an initial value of 0
v = multiprocessing.Value(ctypes.c_int, 0)

with v.get_lock():
    # Perform our calculations altering the value in the indented code
    v.value += 1

print(v.value)

1


This is a more Pythonic way of managing locks and is less error-prone, as we cannot forget to release the lock.

The example below shows how we can use both types of locks to increment a ```Value``` object safely across multiple processes:

```python
import multiprocessing
import ctypes

def increment(v):
    # Manually acquire and release the lock
    v.get_lock().acquire()
    v.value += 1
    v.get_lock().release()

    # Use the context manager to acquire and release the lock
    with v.get_lock():
        v.value += 100

if __name__ == '__main__':
    # Create a shared memory value
    # It is an integer with an initial value of 0
    v = multiprocessing.Value(ctypes.c_int, 0)

    # Create n_process processes which increment the value
    n_process = 8
    processes = []
    for i in range(n_process):
        p = multiprocessing.Process(target=increment, args=(v,))
        p.start()
        processes.append(p)

    for i in range(n_process):
        p.join()

    # Print the value
    print(v.value)
```

This code can be run in the file [```03_multiprocessing_scripts/value_lock_example.py```](03_multiprocessing_scripts/value_lock_example.py).

### Arrays

The ```Array``` class of the ```multiprocessing``` module is similar to the ```Value``` class but allows us to store more than one value in a shared memory location. We can create an ```Array``` object using ```multiprocessing.Array()```. We need to specify the type of data we want to store and the size of the array. We can use the same ```ctype``` objects as we did with the ```Value``` class, and we set the initial values using a tuple of values, as below:

In [None]:
import multiprocessing
import ctypes

# Create a shared memory array
# It is an array of 5 floats with an initial value of 0
a = multiprocessing.Array(ctypes.c_double, (0, 0, 0, 0, 0))

# We can access a single value from the array using an index
print(a[1])

# We can modify a single value in the array using an index
a[1] = 7

# We access every value in the array using ':' as an index
print(a[:])

# We can iterate over the array
for x in a:
    print(x)

0.0
[0.0, 7.0, 0.0, 0.0, 0.0]
0.0
7.0
0.0
0.0
0.0


In the example above, we also saw how we can access and modify data in the array.

An ```array``` has a lock in a similar way to a ```Value```.

The code below shows how we can create an array which keeps track of the number of times each number has been rolled on a six-sided dice: