# Basic application of Multiprocessing package in Python, with simple examples

Multiprocessing package is part of Python built-in library. It is important to distinguish between •Thread• and •Process•. •Thread• objects run concurrently within the same process and share memory. Using threads is an easy way to scale for tasks that are more I/O bound than CPU bound. Whereas each *Process* is a true system process without shared memory.

Multiprocessing package offers two executor classes, `Pool` and `Process`. `Process` objects each create a new process with one task, whereas `Pool` objects create a given number of workers and multiple tasks can be assigned to the workers based on the workers' availability. 

Here is a simple example illustrating the basic usage of Process and Pool classes. It is important to keep the entire creation and execution of the multiprocessing behind `if __name__ == '__main__':`.

In [5]:
import multiprocessing
import time
import os

def do_calculation(data: int):
    print('Sleeping for {} seconds at Process name {}, ID {}'.format(data/2, multiprocessing.current_process().name,  os.getpid()))
    time.sleep(data/2)
    return data * 2
def start_process():
    timestamp  = time.time() % 100
    print('Initiating at time {:.4f} s, at Process name {}, ID {}'.format(timestamp, multiprocessing.current_process().name, os.getpid()))

We first set the baseline in the form of serial programming. To record execution time, we use `time.time()` or `time.perf_counter()`. Comparing to `time.process_time()` (measuring CPU time), `time.perf_counter()` measures system time and includes the time during sleep.

It is important to notice where to insert time stamp. The method `map(fun, iter)` returns an iterator and `fun` is only executed when the iterator is used (e.g. in a loop, or transformed to a list). To illustrate, in the following example, `do_calculation` is not executed when `builtin_outputs` is created.

In [6]:
inputs = list(range(5))
builtin_outputs = map(do_calculation, inputs)
for i in builtin_outputs:
    print('printing', i)   

Sleeping for 0.0 seconds at Process name MainProcess, ID 55634
printing 0
Sleeping for 0.5 seconds at Process name MainProcess, ID 55634
printing 2
Sleeping for 1.0 seconds at Process name MainProcess, ID 55634
printing 4
Sleeping for 1.5 seconds at Process name MainProcess, ID 55634
printing 6
Sleeping for 2.0 seconds at Process name MainProcess, ID 55634
printing 8



As we can see, the execution time of a serial programming is close to what we expect, around 22.5 s.

In [7]:
if __name__ == '__main__':
    # establish baseline with serial programming
    print('Input   :', inputs)
    t_start = time.perf_counter()
    builtin_outputs = map(do_calculation, inputs)
    print('Built-in:', list(builtin_outputs))
    t_end = time.perf_counter()
    print('Serial programming takes {:4f} seconds'.format(t_end-t_start))

Input   : [0, 1, 2, 3, 4]
Sleeping for 0.0 seconds at Process name MainProcess, ID 55634
Sleeping for 0.5 seconds at Process name MainProcess, ID 55634
Sleeping for 1.0 seconds at Process name MainProcess, ID 55634
Sleeping for 1.5 seconds at Process name MainProcess, ID 55634
Sleeping for 2.0 seconds at Process name MainProcess, ID 55634
Built-in: [0, 2, 4, 6, 8]
Serial programming takes 5.015839 seconds


Now, we try out three `Pool` methods: `map`, `apply` and `apply_async`. You can either use a `with`-block or the method `.close()`. Call `.join()` to block the main process until all the tasks are finished. The method `map` requires the target function to have arguments. The method `apply` waits for the current tasks to be all finished before assigning the next batch, whereas `apply_async` does not wait, and therefore requires a special `.get()` method to retrieve the result.

```
{
if __name__ == '__main__':    
    pool_size = multiprocessing.cpu_count() * 2

    t0 = time.time()
    pool = multiprocessing.Pool(
        processes=pool_size,
        initializer=start_process,
    )
    pool_outputs = pool.map(do_calculation, inputs)
    pool.close()  # no more tasks
    pool.join()  # wrap up current tasks
    t1 = time.time()

    print('Pool with map:', pool_outputs, 'time: ', t1-t0)
    print('*************')
    t0a = time.time()
    pool = multiprocessing.Pool(
        processes=pool_size,
        initializer=start_process,
    )
    pool_applyoutputs = [pool.apply(do_calculation, args = (i,)) for i in inputs]
    pool.close()  # no more tasks
    pool.join()  # wrap up current tasks
    t1a = time.time()
    print('Pool with apply:', pool_applyoutputs, 'time: ', t1a-t0a)
    print('*************')
    t0b = time.time()
    with multiprocessing.Pool(processes=pool_size, initializer=start_process) as pool:
        pool_applyprocesses = [pool.apply_async(do_calculation, args = (i,)) for i in inputs]
        pool_asyncresults = [q.get() for q in pool_applyprocesses]
    print('Before: Time {} at Process {}'.format(time.time(), multiprocessing.current_process().name))
    pool.join()
    print('After: Time {} at Process {}'.format(time.time(), multiprocessing.current_process().name))
    t1b = time.time()
    print('Pool with apply_async:', pool_applyoutputs, 'time: ', t1b-t0b)
}
```


Here is the expected outcome:
```
{
Initiating at time 20.6167 s, at Process name SpawnPoolWorker-1, ID 39884
Sleeping for 0.0 seconds at Process name SpawnPoolWorker-1, ID 39884
Sleeping for 0.5 seconds at Process name SpawnPoolWorker-1, ID 39884
Initiating at time 20.7503 s, at Process name SpawnPoolWorker-7, ID 39890
Sleeping for 1.0 seconds at Process name SpawnPoolWorker-7, ID 39890
Initiating at time 20.7779 s, at Process name SpawnPoolWorker-2, ID 39885
Sleeping for 1.5 seconds at Process name SpawnPoolWorker-2, ID 39885
Initiating at time 20.8255 s, at Process name SpawnPoolWorker-4, ID 39887
Sleeping for 2.0 seconds at Process name SpawnPoolWorker-4, ID 39887
Initiating at time 20.8278 s, at Process name SpawnPoolWorker-6, ID 39889
Initiating at time 20.8720 s, at Process name SpawnPoolWorker-5, ID 39888
Initiating at time 20.8942 s, at Process name SpawnPoolWorker-3, ID 39886
Initiating at time 20.9789 s, at Process name SpawnPoolWorker-8, ID 39891
Pool with map: [0, 2, 4, 6, 8] time:  3.2909700870513916
*************
Initiating at time 24.0436 s, at Process name SpawnPoolWorker-12, ID 39897
Sleeping for 0.0 seconds at Process name SpawnPoolWorker-12, ID 39897
Sleeping for 0.5 seconds at Process name SpawnPoolWorker-12, ID 39897
Initiating at time 24.1203 s, at Process name SpawnPoolWorker-10, ID 39895
Initiating at time 24.1743 s, at Process name SpawnPoolWorker-9, ID 39894
Initiating at time 24.1896 s, at Process name SpawnPoolWorker-16, ID 39901
Initiating at time 24.2187 s, at Process name SpawnPoolWorker-13, ID 39898
Initiating at time 24.2523 s, at Process name SpawnPoolWorker-11, ID 39896
Initiating at time 24.2536 s, at Process name SpawnPoolWorker-14, ID 39899
Initiating at time 24.2545 s, at Process name SpawnPoolWorker-15, ID 39900
Sleeping for 1.0 seconds at Process name SpawnPoolWorker-10, ID 39895
Sleeping for 1.5 seconds at Process name SpawnPoolWorker-9, ID 39894
Sleeping for 2.0 seconds at Process name SpawnPoolWorker-16, ID 39901
Pool with apply: [0, 2, 4, 6, 8] time:  6.2480692863464355
*************
Initiating at time 30.4251 s, at Process name SpawnPoolWorker-21, ID 39906Initiating at time 30.4251 s, at Process name SpawnPoolWorker-17, ID 39902

Sleeping for 0.0 seconds at Process name SpawnPoolWorker-17, ID 39902
Sleeping for 0.5 seconds at Process name SpawnPoolWorker-21, ID 39906
Sleeping for 1.0 seconds at Process name SpawnPoolWorker-17, ID 39902
Initiating at time 30.4488 s, at Process name SpawnPoolWorker-19, ID 39904
Sleeping for 1.5 seconds at Process name SpawnPoolWorker-19, ID 39904
Initiating at time 30.4516 s, at Process name SpawnPoolWorker-20, ID 39905
Sleeping for 2.0 seconds at Process name SpawnPoolWorker-20, ID 39905
Initiating at time 30.5086 s, at Process name SpawnPoolWorker-22, ID 39907
Initiating at time 30.5723 s, at Process name SpawnPoolWorker-23, ID 39908
Initiating at time 30.6217 s, at Process name SpawnPoolWorker-24, ID 39909
Initiating at time 30.6505 s, at Process name SpawnPoolWorker-18, ID 39903
Before: Time 1627847932.470075 at Process MainProcess
After: Time 1627847932.4701571 at Process MainProcess
Pool with apply_async: [0, 2, 4, 6, 8] time:  2.604996919631958
}
```