# Multithreading versus Multiprocessing

Multiprocessing and multithreading are two different ways to achieve parallelism in Python. In this notebook, we will compare the two approaches and see when to use one over the other.


# The Global Interpreter Lock (GIL)

The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython is not thread-safe. The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. 

The GIL is a performance bottleneck in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode. This is why Python threads are often called "fake" threads. They don't provide any performance benefits for CPU-bound tasks, but they are still useful for I/O-bound tasks.

## Multiprocessing

A process is an instance of a program that is running on your computer. When you run a program, the operating system creates a new process and runs the program in that process. Each process runs in its own memory space and is independent of the other processes. 

Each process consists of:
* An executable program
* The associated data needed by the program (variables, etc.)
* The execution context of the program (the program counter, stack, etc.)

Multiprocessing is a technique to use multiple processes to achieve parallelism. Each process runs in its own memory space and is independent of the other processes. Multiprocessing is a great way to achieve true parallelism, especially when you have multiple CPU cores. In Python multiprocessing, each process also has an instance of the Python interpreter, which means that each process has its own GIL (Global Interpreter Lock). This allows you to achieve true parallelism and avoid the GIL bottleneck. The downside of multiprocessing is that it is more memory-intensive and slower to start than multithreading.


When to use Multiprocessing:
* When you have CPU-bound tasks
* When you have multiple CPU cores
* When you want to achieve true parallelism
* When you want to avoid the Global Interpreter Lock (GIL)



## Multithreading


A thread is the smallest unit of execution within a process. A process can have multiple threads running as part of it. Each thread shares the same memory space and context and is therefore lightweight compared to processes.

Multithreading is a technique to use multiple threads to achieve parallelism. Threads share the same memory space, resources (in in the case of Python, the same Python interpreter) and are therefore 'lightweight' compared to processes. Multithreading is a great way to achieve concurrency, especially when you have I/O-bound tasks. 

Multithreading is not suitable for CPU-bound tasks because of the Global Interpreter Lock (GIL) in Python. The GIL prevents the interpreter from executing more than one thread at a time, even in a multi-threaded environment. This means that multithreading is not suitable for achieving true parallelism in Python. Despite this, it multithreading can be quite useful for I/O-bound tasks - as it can significantly improve the performance of your program by allowing it to execute other tasks while waiting for I/O operations to complete.

When to use Multithreading:

* When you have I/O-bound tasks, that spend a lot of time waiting for I/O operations to complete
* When you want to avoid the overhead of creating and managing processes
* When you want to use a shared resource

## Which is faster?

In Python 3, the Global Interpreter Lock (GIL) is removed from the `multiprocessing` module, which makes it even faster than multithreading for CPU-bound tasks. Multithreading is still faster than multiprocessing for I/O-bound tasks that do not require much CPU - even though the GIL stops the threads from running in parallel.

In the next few sections, we will see some examples to compare multiprocessing and multithreading in Python. First, we will import the required modules and set some useful variables.

In [1]:
import multiprocess 
import threading
import logging
import time

logging.basicConfig(format="%(asctime)s: %(message)s", level=logging.INFO, datefmt="%H:%M:%S")

num_of_cores = multiprocess.cpu_count()
num_of_threads = num_of_cores # multiple by 2 for Intel x86_64 architecture



## Multiprocessing

In [2]:
%%time 

def process_function(name):
    logging.info("Process %s: starting", name)
    time.sleep(2)
    logging.info("Process %s: finishing", name)
    return name

logging.info("Main    : before creating process")

with multiprocess.Pool(num_of_cores) as p:
    results_list = p.map(process_function, range(num_of_cores))

logging.info("Main    : after creating process")

results_list


10:56:19: Main    : before creating process
10:56:19: Process 0: starting
10:56:19: Process 1: starting
10:56:19: Process 2: starting
10:56:19: Process 3: starting
10:56:19: Process 4: starting
10:56:19: Process 5: starting
10:56:19: Process 7: starting
10:56:19: Process 6: starting
10:56:19: Process 8: starting
10:56:19: Process 9: starting
10:56:19: Process 10: starting
10:56:19: Process 11: starting
10:56:19: Process 15: starting
10:56:19: Process 16: starting
10:56:19: Process 17: starting
10:56:19: Process 12: starting
10:56:19: Process 18: starting
10:56:19: Process 13: starting
10:56:19: Process 14: starting
10:56:19: Process 20: starting
10:56:19: Process 19: starting
10:56:19: Process 22: starting
10:56:19: Process 21: starting
10:56:19: Process 24: starting
10:56:19: Process 28: starting
10:56:19: Process 23: starting
10:56:19: Process 26: starting
10:56:19: Process 30: starting
10:56:19: Process 25: starting
10:56:19: Process 31: starting
10:56:19: Process 29: starting
10:56

CPU times: user 3.94 ms, sys: 44.5 ms, total: 48.4 ms
Wall time: 2.05 s


[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31]

## Multithreading

In [3]:
%%time

global data
results_list = []
def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)
    data.append(name)
    return name


logging.info("Main    : before creating threads")
threads = [threading.Thread(target=thread_function, args=[x]) for x in range(num_of_threads)]
logging.info("Main    : before running threads")
for thread in threads:
    thread.start()
    
for thread in threads:
    thread.join()
logging.info("Main    : waiting for the threads to finish")
#x.join() # this will wait for the thread to finish, which is often necessary if you need to aggregate the results of multiple threads
logging.info("Main    : all done")

data

10:56:21: Main    : before creating threads
10:56:21: Main    : before running threads
10:56:21: Thread 0: starting
10:56:21: Thread 1: starting
10:56:21: Thread 2: starting
10:56:21: Thread 3: starting
10:56:21: Thread 4: starting
10:56:21: Thread 5: starting
10:56:21: Thread 6: starting
10:56:21: Thread 7: starting
10:56:21: Thread 8: starting
10:56:21: Thread 9: starting
10:56:21: Thread 10: starting
10:56:21: Thread 11: starting
10:56:21: Thread 12: starting
10:56:21: Thread 13: starting
10:56:21: Thread 14: starting
10:56:21: Thread 15: starting
10:56:21: Thread 16: starting
10:56:21: Thread 17: starting
10:56:21: Thread 18: starting
10:56:21: Thread 19: starting
10:56:21: Thread 20: starting
10:56:21: Thread 21: starting
10:56:21: Thread 22: starting
10:56:21: Thread 23: starting
10:56:21: Thread 24: starting
10:56:21: Thread 25: starting
10:56:21: Thread 26: starting
10:56:21: Thread 27: starting
10:56:21: Thread 28: starting
10:56:21: Thread 29: starting
10:56:21: Thread 30: st

CPU times: user 13.4 ms, sys: 8.66 ms, total: 22 ms
Wall time: 2.01 s


[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31]

## Will the need for the GIL be eliminated in the future?

In 2023, there was a PEP ([Python Enhancement Proposal 703](https://peps.python.org/pep-0703/)) to remove the GIL from CPythonwas accepted. This means that the GIL will be removed from CPython in the future, which will make multithreading even faster than it is now. This will make multithreading even more useful for CPU-bound tasks.

A no-GIL experimental version of Python will be available in Python 3.13, which is expected to be released in 2024. Since there is a high risk if anything goes wrong with the new no-GIL version, it is expected to take 5 years to remove the GIL from the main version of Python.