## Threading vs Multiprocessing
We have two common approaches to run code in parallel (achieve multitasking and speed up your program) : via threads or via multiple processes.

## Multiprocessing:
Multiprocessing is a technique where you create multiple processes, each with its own Python interpreter and memory space, to achieve true parallelism. Python's multiprocessing module allows you to create and manage multiple processes. Since each process runs in its own memory space, the GIL doesn't affect multiprocessing. This makes it suitable for CPU-bound tasks, where multiple processes can run on different CPU cores in parallel.


## Process

Program: A program is a set of instructions and code stored in a file or memory that defines a specific task or set of tasks to be carried out by a computer. It is essentially a passive entity, a series of instructions waiting to be executed.

Process: A process, on the other hand, is the active execution of a program. It represents a running instance of a program in a computer's memory. A process includes not only the program's code but also various resources like memory, registers, and system resources that are allocated to that program while it's running. Each process is isolated from other processes, meaning that they cannot directly access each other's memory or resources without proper inter-process communication mechanisms.

Key facts:

- A new process is started independently from the first process

    Each process is created as an independent instance of the program, meaning it has its own execution flow and memory space. Processes do not directly influence or interact with each other.

- Takes advantage of multiple CPUs and cores

    Processes can run in parallel on different CPU cores, which allows them to take full advantage of a multi-core CPU. This is especially beneficial for CPU-bound tasks that require a lot of computational power.

- Separate memory space
    Each process has its own isolated memory space. Changes made to memory in one process do not affect other processes. This isolation is crucial for maintaining data integrity and avoiding conflicts.

- Memory is not shared between processes

    Processes do not share memory by default. If you want to share data between processes, you need to use inter-process communication (IPC) mechanisms explicitly, such as pipes, queues, or shared memory.

- One GIL (Global interpreter lock) for each process, i.e. avoids GIL limitation

    Python's Global Interpreter Lock (GIL) restricts the execution of multiple threads in a single process. Since each process has its own Python interpreter and memory space, they are not affected by the GIL, allowing true parallel execution in Python.

-  Great for CPU-bound processing

    Multiprocessing is particularly effective for tasks that require a lot of CPU processing power. It allows these tasks to be distributed across multiple processes and CPU cores.

- Child processes are interruptable/killable

    You can easily terminate or interrupt a child process without affecting the main process or other child processes. This provides a way to manage and control the execution of processes.
- Starting a process is slower that starting a thread

    Creating a new process is a more resource-intensive operation than starting a thread. This is due to the overhead of setting up separate memory spaces and resources for each process.

- Larger memory footprint

    Each process consumes additional memory for its separate memory space and resources, making them heavier in terms of memory usage compared to threads.

- IPC (inter-process communication) is more complicated

    Communicating between processes requires the use of IPC mechanisms like pipes, queues, or shared memory. This complexity arises from the need to manage data sharing between isolated memory spaces.

## Threads

A thread is an entity within a process that can be scheduled for execution (Also known as "leightweight process"). A Process can spawn multiple threads. The main difference is that all threads within a process share the same memory.

Key facts:

- Multiple threads can be spawned within one process:

    A single process can create and manage multiple threads. These threads can execute concurrently within the same memory space of the process.

- Memory is shared between all threads:

    Threads within a process share the same memory space. This shared memory can be both an advantage and a source of potential issues, such as race conditions.

- Starting a thread is faster than starting a process:

    Creating a new thread is a faster and less resource-intensive operation compared to creating a new process. Threads share the same resources and memory space of the parent process, making them more lightweight.

- Great for I/O-bound tasks:

    Threads are suitable for I/O-bound tasks that spend a significant amount of time waiting for input/output operations, such as reading/writing files or making network requests. Multiple threads can work together to handle concurrent I/O operations efficiently.

- Lightweight - low memory footprint:

    Threads are lightweight because they share the memory and resources of the parent process. This makes them more memory-efficient compared to processes.

- One GIL for all threads, i.e. threads are limited by GIL:

    In CPython, the Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time, even in a multi-threaded program. This means that multithreading has no effect on CPU-bound tasks in Python, as only one thread can execute Python code at any given moment.

- Not interruptible/killable -> be careful with memory leaks:

    Threads are not as easily interruptible or killable as processes. This means you need to be cautious when managing threads to avoid potential memory leaks or unclean resource handling.

- Increased potential for race conditions:

    Because threads share the same memory space, there's an increased potential for race conditions, which occur when multiple threads access and modify shared data simultaneously. Proper synchronization mechanisms, like locks or semaphores, are needed to prevent race conditions.

## Multiprocessing

Use the multiprocessing module. The syntax is very similar to above.

In [1]:
from multiprocessing import Process
import os


def square_numbers():
    for i in range(1000):
        result = i * i


if __name__ == "__main__":
    processes = []
    num_processes = os.cpu_count()

    # create processes and asign a function for each process
    for i in range(num_processes):
        process = Process(target=square_numbers)  # if i have argumenst we can assign like   args = (a,b)  
        processes.append(process)

    # start all processes
    for process in processes:
        process.start()

    # wait for all processes to finish
    # block the main thread until these processes are finished
    for process in processes:
        process.join()

## When is Multiprocessing useful
It is useful for CPU-bound tasks that have to do a lot of CPU operations for a large amount of data and require a lot of computation time. With multiprocessing you can split the data into equal parts an do parallel computing on different CPUs.

Example: Calculate the square numbers for all numbers from 1 to 1000000. Divide the numbers into equal sized parts and use a process for each subset.

## Threading in Python

Use the threading module.

Note: The following example usually won't benefit from multiple threads since it is CPU-bound. It should just show the example of how to use threads.

In [2]:
from threading import Thread

def square_numbers():
    for i in range(1000):
        result = i * i

        
if __name__ == "__main__":        
    threads = []
    num_threads = 10

    # create threads and asign a function for each thread
    for i in range(num_threads):
        thread = Thread(target=square_numbers)
        threads.append(thread)

    # start all threads
    for thread in threads:
        thread.start()

    # wait for all threads to finish
    # block the main thread until these threads are finished
    for thread in threads:
        thread.join()

## When is Threading useful
Despite the GIL it is useful for I/O-bound tasks when your program has to talk to slow devices, like a hard drive or a network connection. With threading the program can use the time waiting for these devices and intelligently do other tasks in the meantime.

Example: Download website information from multiple sites. Use a thread for each site.

## GIL - Global interpreter lock

This is a mutex (or a lock) that allows only one thread to hold control of the Python interpreter. This means that the GIL allows only one thread to execute at a time even in a multi-threaded architecture.

### Why is it needed?
It is needed because CPython's (reference implementation of Python) memory management is not thread-safe. Python uses reference counting for memory management. It means that objects created in Python have a reference count variable that keeps track of the number of references that point to the object. When this count reaches zero, the memory occupied by the object is released. The problem was that this reference count variable needed protection from race conditions where two threads increase or decrease its value simultaneously. If this happens, it can cause either leaked memory that is never released or incorrectly release the memory while a reference to that object still exists.

### How to avoid the GIL
The GIL is very controversial in the Python community. The main way to avoid the GIL is by using multiprocessing instead of threading. Another (however uncomfortable) solution would be to avoid the CPython implementation and use a free-threaded Python implementation like Jython or IronPython. A third option is to move parts of the application out into binary extensions modules, i.e. use Python as a wrapper for third party libraries (e.g. in C/C++). This is the path taken by numypy and scipy.