# Parallel Processing on Multiple Threads

(C) 2023 by [Damir Cavar](http://damir.cavar.me/)


In this notebook we will create a function that takes one parameter and runs *n*-times in parallel using a [Pool](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) from the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) module in the standard library (i.e., this is part of the Python distribution).

We import the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) module:

In [1]:
import multiprocessing as mp

In the following we create a function that takes one parameter ```elements``` to generate a tuple of value pairs that are unique combinations of all the items in ```elements```.

This function is in the ```workers.py``` file and needs to be imported as a module in Jupyter to be able to use parallel processing. Do not try to define the function within Jupyter and call it in a ```multiprocessing``` setting within the same notebook. This will not work.

**Remember:**

For a ```multiprocessing``` environment to work, in the current version of Jupyter, including Jupyter in Visual Studio Code, you have to import the worker function from an external file.

In [2]:
from worker import get_combinations

If we want to maximize the performance and utilize the maximum number of available threads, you can query the number of threads in your computer and allocate as many parallel processes. We can determine the number of CPUs (or rather [threads](https://en.wikipedia.org/wiki/Thread_(computing))) in our computer using the [cpu_count](https://docs.python.org/3/library/multiprocessing.html?highlight=cpu_count#miscellaneous) function:

In [3]:
my_processes = mp.cpu_count()

We can create some sample data to submit to the ```get_cominations``` function. The function takes an iterable data structure as a parameter. We can submit tuples of varying length:

In [4]:
test_parameter = tuple(range(6))

Calling ```get_combinations``` with the ```test_parameter``` tuple will result in ```(n(n-1))/2``` unique pairs, and for ```n=6``` this is ```15``` unique number pairs:

In [5]:
get_combinations( test_parameter )

Running combinations computation...


((0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (0, 5),
 (1, 2),
 (1, 3),
 (1, 4),
 (1, 5),
 (2, 3),
 (2, 4),
 (2, 5),
 (3, 4),
 (3, 5),
 (4, 5))

In the following we create twice as many such tuple lists as CPUs or threads available to test the parallel processing with the [Pool](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) class. We could in principle ranomize the content of the individual tuples of length ```6```, but in this case for demonstration purposes this is not necessary. The resulting tuple in ```data``` will contain twice as many tuples as threads in our computer. Each tuple will be of length ```6```, but it is irrelevant how long the individual tuples are for the ```get_combinations``` function. Any length larger than ```2``` will return a useful result.

In [4]:
data = tuple( tuple(range(6)) for i in range(my_processes * 2) )

To start the threads that compute the unique combinations, we create a [Pool](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) object and set the ```processes``` value to the number of threads in our computer, as stored in the ```my_processes``` variable. We can run the pool-processor by calling the map method with the function name of the function we want it to call, and the sequence of parameters that each function call should take:

In [5]:
p = mp.Pool(processes=my_processes)
result = p.map(get_combinations, data)
print(f"Number of threads: {my_processes}")
print(f"Number of results: {len(result)}")

Number of threads: 24
Number of results: 48


In [6]:
for r in result:
	print(r)

((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1), (0, 2), (0, 3), (0, 4),

We can wrap this sequence of calls in a ```with```-statement as such:

In [7]:
with mp.Pool(processes=my_processes) as p:
    print(p.map(get_combinations, data))

[((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)), ((0, 1), (0, 2), (0, 3)

## Calling a Worker Function with Multiple Parameters

What if we would want to call a function like ```get_combinations``` with a sequence of numbers and specify in addition how many unique combinations should be returned?

In the function ```get_combinations_n``` we process a sequence of items in the ```elements``` parameter and return a tuple of ```n``` unique elements.

Again, remember, we need to import this function from an external file for it to work in Jupyter. 

In [8]:
from worker import get_combinations_n

To create a data set where we randonly vary the length of the unique tuples that need to be returned by the combination function, we will utilize the ```random``` module:

In [10]:
import random

We want to make sure that the selected random lenght of the unique tuples is picked from a defined list specfied in ```lengths```. Be careful with the length value, since the generated data structures could be large and the computation can take very long.

In [11]:
lengths = ( 2, 3, 4 )

We generate a data structure now that contains a sequence of tuples with two parameters for the ```get_combinations_n``` function call. The first element in the tuple is a tuple of length ```6``` with numbers. The second parameter is a random choice number picked from the ```lengths``` sequence.

In [12]:
data = tuple( (tuple(range(6)), random.choice(lengths)) for i in range(my_processes * 2) )

We can print out the last element in ```data``` to verify the data structure:

In [14]:
print(data[-1])

((0, 1, 2, 3, 4, 5), 3)


As with the example above, we can create a [Pool](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) object and call the ```get_combinations_n``` function with the two parameters that are picked from the ```data``` sequence. However, instead of the ```map``` function that maps single elements from a list to parameters of a function call, we want to unwrap the tuple with parameters from the ```data``` element and submit each tuple element as one of the parameters to the function. We are using the [starmap](https://docs.python.org/3/library/multiprocessing.html?highlight=starmap#module-multiprocessing.pool) method on the [Pool](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) object for that.

In [15]:
p = mp.Pool(processes=my_processes)
result = p.starmap(get_combinations_n, data)
print(f"Number of threads: {my_processes}")
print(f"Number of results: {len(result)}")

Number of threads: 24
Number of results: 48


We can print out the resulting data structure. With a high likelihood the resulting elements would have tuples of different lengths, as listed in ```lengths``` above.

In [17]:
for r in result:
	print(r)

((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 1, 5), (0, 2, 3), (0, 2, 4), (0, 2, 5), (0, 3, 4), (0, 3, 5), (0, 4, 5), (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5))
((0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 1, 5), (0, 2, 3), (0, 2, 4), (0, 2, 5), (0, 3, 4), (0, 3, 5), (0, 4, 5), (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5))
((0, 1, 2, 3), (0, 1, 2, 4), (0, 1, 2, 5), (0, 1, 3, 4), (0, 1, 3, 5), (0, 1, 4, 5), (0, 2, 3, 4), (0, 2, 3, 5), (0, 2, 4, 5), (0, 3, 4, 5), (1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 3, 4, 5), (2, 3, 4, 5))
((0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5))
((0, 1, 2, 3), (0, 1, 2, 4), (0, 1, 2, 5), (0, 1, 3, 4), (0, 1, 3, 5), (0, 1, 4, 5), (0, 2, 3, 4), (0, 2,

I hope this small example helps you understand how running parallel computations within a Jupyter notebook can speed up the processing and exploit the properties of your hardware environment.

If you have any questions or suggestions, please let me know and see [my personal homepage](http://damir.cavar.me/) for contact details.

(C) 2023 by [Damir Cavar](http://damir.cavar.me/)