# General idea

Concurrent programming is different of parallel programming. Parallel is when the process is run in two different cores at the same time. Concurrent is dealing with multiple tasks at same time, what could be done by creating a queue of processes. So, parallelism is a form of concurrency, according to Rob Pike.

Creating a thread or a process is not cheap. Usually, we put a thread in infinite loop to wait to receive data to process, creating a worker. To deal with this we need a way to send and receive data (messages) between threads.

# The jargon

- Concurrency: The ability to deal with multiple tasks, making them progress one at a time or in parallel. Also called multitasking.
- Parallelism: The ability to execute multiple operations at the same time. Requires a CPU with multiple cores, or multiple CPUs or a cluster of machines
- Execution units: objects that run code concurrently, each with their own state or a stack of independent calls. Available natively in python: process, threads or coroutines.
- Process: An instance of a program in execution, using memory and a slice of the CPU's time. Every CPU keeps hundreds of processes running with their own private memory space. Process talk to each other using pipes, sockets or files mapped in the memory. All those methods of communication only supports pure bytes, which means that python objects that will be transmitted must be serialized. That's not cheap and not all python objects can be serialized. A process can create subprocesses that will be run in isolation of the original process and other subprocesses.
- Thread: a execution unit inside the process, that share the same memory space. A process will start with a main thread, which can call the OS API to create new threads. The objects shared by threads don't need to be serialized, which can cause data corruption.
- Coroutine: a function that can suspend its own execution and continue later, using the keywords `yield` or `await`. Classical coroutines are created using generator functions and native coroutines are defined with the keyword `async def`. Coroutines are run in one thread, in general, under the supervision of an event loop. Any blocking code in a coroutine would block the execution of the other coroutines, which is different of what's done in the threads or processes. On the other hand, coroutines will consume less resources than a thread or a process.
- Queue: a data structure (first in, first out) used in the communication execution units.
- Lock: a object that shows to the execution units that some data is available or not to be read/written. Most common is a mutex (mutual exclusion).
- Contention: conflict between multiple execution unit when they need to access the same resource.

# How does it work in Python?

1. Every instance of the python interpreter is a process. You can start new processes using the `multiprocessing` or the `concurrent.futures` libs. The `subprocess` lib is used to create subprocess specially for external programs.
2. The python interpreter uses one thread to run the user code and to perform the garbage collection. You can start new threads using the `threading` or `concurrent.futures` libs.
3. The access to the object count of references or other internal states is controlled by the GIL (Global Interpreter Lock). At any moment in time, only one thread can hold the GIL, i. e., only one thread can execute Python code, independent on how many cores a CPU might have.
4. By default, the GIL is automatically liberated every 5 ms, to prevent a thread of holding it indefinitely. 
5. We don't have the direct control of the GIL with Python code, but we can have it using a Python's C API.
6. Every function that performs a `syscall` (call to a kernel function) liberates the GIL, which includes disk write/read, web transfers and `time.sleep()` as well as a lot of functions of framework such as numpy and scipy.
7. Threads that are not of a Python process can be initiated by the C API and don't affect GIL.
8. Web programming effect over the GIL is small because the latency of the web is way bigger than the memory write/read latency.
9. GIL contents slows python threads of intensive processing. For this type of task, use serial code
10. To run intensive code in python in multiple cores, you need to use multiple processes.
11. GIL does not affect the coroutines because they run in the same thread.

In [2]:
# Concurrent hello world

# This program will display a character spinning in the terminal while processing other characters

import itertools
import time
from threading import Thread, Event

def spin(msg: str, done: Event) -> None:
    for char in itertools.cycle(r'\|/-'): # infinite loop
        status = f'\r{char} {msg}'
        print(status, end='', flush=True)
        if done.wait(.1):
            break
        blanks = ' ' * len(status)
        print(f'\r{blanks}\r', end='')

def slow() -> int:
    time.sleep(2)
    return 42

def supervisor() -> int:
    done = Event()
    spinner = Thread(target=spin, args=('thinking!', done))
    print(f'spinner object: {spinner}')
    spinner.start()
    result = slow()
    done.set()
    spinner.join()
    return result

def main() -> None:
    result = supervisor()
    print(f'Answer: {result}')

if __name__ == '__main__':
    main()

spinner object: <Thread(Thread-6 (spin), initial)>
\ thinking!

| thinking! Answer: 42


In [3]:
# Similar example, but using multiprocessing

import itertools
import time
from multiprocessing import Process, Event
from multiprocessing import synchronize

# same as before
def spin(msg: str, done: synchronize.Event) -> None:
    for char in itertools.cycle(r'\|/-'): # infinite loop
        status = f'\r{char} {msg}'
        print(status, end='', flush=True)
        if done.wait(.1):
            break
        blanks = ' ' * len(status)
        print(f'\r{blanks}\r', end='')

# same as before
def slow() -> int:
    time.sleep(2)
    return 42

def supervisor() -> int:
    done = Event()
    spinner = Process(target=spin, args=('thinking!', done))
    print(f'spinner object: {spinner}') 
    spinner.start()
    result = slow()
    done.set()
    spinner.join()
    return result

def main() -> None:
    result = supervisor()
    print(f'Answer: {result}')

if __name__ == '__main__':
    main()

# Question: why did not work because it was a ipynb but it did work as script?




spinner object: <Process name='Process-1' parent=22568 initial>
Answer: 42


In [4]:
# Same program, but with coroutines
import itertools
import asyncio

async def spin(msg: str) -> None:
    for char in itertools.cycle(r'\|/-'): # infinite loop
        status = f'\r{char} {msg}'
        print(status, end='', flush=True)
        try:
            await asyncio.sleep(0.1)
        except asyncio.CancelledError:
            break
        blanks = ' ' * len(status)
        print(f'\r{blanks}\r', end='')

async def slow() -> int:
    await asyncio.sleep(2)
    return 42

def main() -> None:
    result = asyncio.run(supervisor())
    print(f'Answer: {result}')

async def supervisor() -> int:
    spinner = asyncio.create_task(spin('thinking!'))
    print(f'spinner object: {spinner}')
    result = await slow()
    spinner.cancel()
    return result

if __name__ == '__main__':
    main()

# Also did not work in the ipynb but worked in the script

RuntimeError: asyncio.run() cannot be called from a running event loop

# GIL impact

For a function that is time consuming, like the example below, the GIL will not be liberated by itself for coroutines, which mean that the spinning animation would not be displayed, just like when we call a `time.sleep(3)` instead of a `await asyncio.sleep(3)`. For processes and threads, there is no problem, because the operating system will deal with the GIL, but for coroutines we need to be aware of this.

```python
async def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    root = math.isqrt(n)
    for i in range(3, root + 1, 2):
        if n % i == 0:
            return False
    return True
```

A palliative solution would be calling `asyncio.sleep(0)` every n iterations on the `for` loop, but this would slow down the program.



In [7]:
# Example: a processor pool implementation to compute prime numbers

import sys
import math
from time import perf_counter
from typing import NamedTuple
from multiprocessing import Process, SimpleQueue, cpu_count, queues

NUMBERS = [2, 142702110479723, 299593572317531, 3333333333333301, 3333333333333333]

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    root = math.isqrt(n)
    for i in range(3, root + 1, 2):
        if n % i == 0:
            return False
    return True

class PrimeResult(NamedTuple):
    n: int
    prime: bool
    elapsed: float

JobQueue = queues.SimpleQueue[int]
ResultQueue = queues.SimpleQueue[PrimeResult]

def check(n: int) -> PrimeResult:
    t0 = perf_counter()
    res = is_prime(n)
    return PrimeResult(n, res, perf_counter() - t0)

def worker(jobs: JobQueue, results: ResultQueue) -> None:
    while n := jobs.get():
        results.put(check(n))
    results.put(PrimeResult(0, False, 0.0))

def start_jobs(
    procs: int, jobs: JobQueue, results: ResultQueue
) -> None:
    for n in NUMBERS:
        jobs.put(n)
    for _ in range(procs):
        proc = Process(target=worker, args=(jobs, results))
        proc.start()
        jobs.put(0)

def main() -> None:
    procs = cpu_count()
    print(f'Checking {len(NUMBERS)} numbers with {procs} processes:')
    t0 = perf_counter()
    jobs: JobQueue = SimpleQueue()
    results: ResultQueue = SimpleQueue()
    start_jobs(procs, jobs, results)
    checked = report(procs, results)
    elapsed = perf_counter() - t0
    print(f'{checked} checks in {elapsed:.2f}s')

def report(procs: int, results: ResultQueue) -> int:
    checked = 0
    procs_done = 0
    while procs_done < procs:
        n, prime, elapsed = results.get()
        if n == 0:
            procs_done += 1
        else:
            checked += 1
            label = 'P' if prime else ' '
            print(f'{n:16}  {label} {elapsed:9.6f}s')
    return checked

if __name__ == '__main__':
    main()

# Also did not work for the ipynb but worked for script            

Checking 5 numbers with 8 processes:


# Python in the multicore world

See the book for examples of using python in web servers, machine learning and so on.