# Programming with Python

## Lecture 09: Concurrency 1

### Armen Gabrielyan

#### Yerevan State University / ASDS

#### 19 Apr, 2025

This section is heavily influenced by the following:

*References:*

- Fluent Python, Luciano Ramalho

## Important concepts

> Concurrency is about dealing with lots of things at once.
>
> Parallelism is about doing lots of things at once.
> 
> Not the same, but related.
> 
> One is about structure, one is about execution.
> 
> Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.
>
> — Rob Pike, co-inventor of the Go language

### Sequential

Tasks are performed one after another, in a strict order.

### Concurrency

The ability to handle multiple pending tasks, making progress one at a time or in parallel (if possible) so that each of them eventually succeeds or fails. A single- core CPU is capable of concurrency if it runs an OS scheduler that interleaves the execution of the pending tasks. Also known as multitasking.

### Parallelism

The ability to execute multiple computations at the same time. This requires a multicore CPU, multiple CPUs, a GPU, or multiple computers in a cluster.


### Execution unit 

General term for objects that execute code concurrently, each with independent state and call stack. Python natively supports three kinds of execution units: processes, threads, and coroutines.

### Process

An instance of a computer program while it is running, using memory and a slice of the CPU time. Modern desktop operating systems routinely manage hundreds of processes concurrently, with each process isolated in its own private memory space. Processes communicate via pipes, sockets, or memory mapped files—all of which can only carry raw bytes. Python objects must be serialized (converted) into raw bytes to pass from one process to another. This is costly, and not all Python objects are serializable. A process can spawn subprocesses, each called a child process. These are also isolated from each other and from the parent. Processes allow preemptive multitasking: the OS scheduler preempts—i.e., suspends —each running process periodically to allow other processes to run. This means that a frozen process can’t freeze the whole system in theory.

### Thread

An execution unit within a single process. When a process starts, it uses a single thread: the main thread. A process can create more threads to operate concurrently by calling operating system APIs. Threads within a process share the same memory space, which holds live Python objects. This allows easy data sharing between threads, but can also lead to corrupted data when more than one thread updates the same object concurrently. Like processes, threads also enable preemptive multitasking under the supervision of the OS scheduler. A thread consumes less resources than a process doing the same job.

### CPU

#### Cores
A CPU (Central Processing Unit) core is the processing unit within a computer's CPU that reads and executes program instructions. The more cores a CPU has, the more instructions it can process simultaneously, improving the computer's overall performance. Each core can work on a different task or work together on the same task, depending on how the software is designed. This is known as parallel processing.

> #### Hyperthreads
> Hyper-Threading is Intel's proprietary technology that allows a single CPU core to handle multiple threads simultaneously. It does this by duplicating certain sections of the processor—those that store the architectural state but not duplicating the main execution resources. This technology makes the operating system and applications think that the processor has more cores than it actually does, allowing it to run more threads simultaneously. This can improve the efficiency of CPU resource usage and increase throughput, essentially allowing the CPU to do more tasks at once.


#### Multiprocessing vs. Multithreading

Multiprocessing involves using two or more CPUs (cores) within a single computer system to process instructions simultaneously. Each processor works on a different task. This setup is powerful for tasks that can be divided into smaller, independent tasks that can run in parallel.

Multithreading, on the other hand, refers to the ability of a CPU (whether it has a single core or multiple cores) to provide multiple threads of execution within a single process. This is beneficial for improving the performance of a single application by dividing it into smaller, more manageable tasks that can be executed simultaneously.

### Coroutine

A function that can suspend itself and resume later. In Python, classic coroutines are built from generator functions, and native coroutines are defined with `async def`. Python coroutines usually run within a single thread under the supervision of an event loop, also in the same thread. Asynchronous programming frameworks such as `asyncio`, `Curio`, or `Trio` provide an event loop and supporting libraries for nonblocking, coroutine-based I/O. Coroutines support *cooperative multitasking*: each coroutine must explicitly cede control with the `yield` or `await` keyword, so that another may proceed concurrently (but not in parallel). This means that any blocking code in a coroutine blocks the execution of the event loop and all other coroutines in contrast with the preemptive multitasking supported by processes and threads. On the other hand, each coroutine consumes less resources than a thread or process doing the same job.

### Queue

A data structure that lets us put and get items, usually in FIFO order: first in, first out. Queues allow separate execution units to exchange application data and control messages, such as error codes and signals to terminate. The implementation of a queue varies according to the underlying concurrency model: the `queue` package in Python’s standard library provides queue classes to support threads, while the `multiprocessing` and `asyncio` packages implement their own queue classes. The queue and asyncio packages also include queues that are not FIFO: `LifoQueue` and `PriorityQueue`.

### Lock 

An object that execution units can use to synchronize their actions and avoid corrupting data. While updating a shared data structure, the running code should hold an associated lock. This signals other parts of the program to wait until the lock is released before accessing the same data structure. The simplest type of lock is also known as a mutex (for mutual exclusion). The implementation of a lock depends on the underlying concurrency model. 

### Contention

Dispute over a limited asset. Resource contention happens when multiple execution units try to access a shared resource—such as a lock or storage. There’s also CPU contention, when compute-intensive processes or threads must wait for the OS scheduler to give them a share of the CPU time.

### CPU-bound

Tasks that heavily use the CPU, doing lots of calculations or data processing.

#### Examples:

- Calculating prime numbers
- Image or video processing
- Data encryption/decryption
- Running machine learning algorithms

#### Bottleneck:

The CPU itself — the task is limited by how fast your processor can compute.

### IO-bound

Tasks that spend time waiting for external resources — like reading files, querying databases or making network requests.

#### Examples:

- Downloading web pages
- Reading/writing to disk
- Making API/database calls
- Waiting for user input

#### Bottleneck:

I/O latency — the program is idle waiting for data, not doing CPU work.

## Global interpreter lock (GIL)

The **Global Interpreter Lock (GIL)** is a mechanism used in CPython to ensure that only one thread executes Python bytecode at a time, even on multi-core systems.

### Why

- CPython's memory management (especially garbage collection via reference counting) isn't thread-safe by default.
- The GIL simplifies memory management by avoiding the need for fine-grained locking.

### Downsides

- Multi-threaded Python programs can't fully utilize multiple CPU cores for CPU-bound tasks.
- If you have multiple threads doing heavy computation (e.g. number crunching), they’ll effectively run one at a time.
- It can be a bottleneck in high-performance or real-time applications.

### When Is it not a problem?

- For I/O-bound tasks (like file access, networking or database queries), threads can release the GIL while waiting, so concurrency still works well.
- Example: Using `threading` with `requests` / `httpx` to fetch multiple URLs concurrently can still be effective.

## Threads, processes and GIL

1. Each instance of the Python interpreter is a process. You can start additional Python processes using the `multiprocessing` or `concurrent.futures` libraries. Python’s `subprocess` library is designed to launch processes to run external programs, regardless of the languages used to write them. 

2. The Python interpreter uses a single thread to run the user’s program and the memory garbage collector. You can start additional Python threads using the `threading` or `concurrent.futures` libraries. 

3. Access to object reference counts and other internal interpreter state is controlled by a lock, the Global Interpreter Lock (GIL). Only one Python thread can hold the GIL at any time. This means that only one thread can execute Python code at any time, regardless of the number of CPU cores. 

4. To prevent a Python thread from holding the GIL indefinitely, Python’s bytecode interpreter pauses the current Python thread every *5ms* by default, releasing the GIL. The thread can then try to reacquire the GIL, but if there are other threads waiting for it, the OS scheduler may pick one of them to proceed.

5. When we write Python code, we have no control over the GIL. But a built-in function or an extension written in C—or any language that interfaces at the Python/C API level—can release the GIL while running time-consuming tasks. 

6. Every Python standard library function that makes a *syscall* releases the GIL. This includes all functions that perform disk I/O, network I/O, and `time.sleep()`. Many CPU-intensive functions in the `NumPy`/`SciPy` libraries, as well as the compressing/decompressing functions from the `zlib` and `bz2` modules, also release the GIL.

7. Extensions that integrate at the Python/C API level can also launch other non-Python threads that are not affected by the GIL. Such GIL-free threads generally cannot change Python objects, but they can read from and write to the memory underlying objects that support the `buffer` protocol, such as `bytearray`, `array.array`, and `NumPy` arrays.

8. The effect of the GIL on network programming with Python threads is relatively small, because the I/O functions release the GIL, and reading or writing to the network always implies high latency—compared to reading and writing to memory. Consequently, each individual thread spends a lot of time waiting anyway, so their execution can be interleaved without major impact on the overall throughput. That’s why David Beazley says: “Python threads are great at doing nothing.”

9. Contention over the GIL slows down compute-intensive Python threads. Sequential, single-threaded code is simpler and faster for such tasks.

10. To run CPU-intensive Python code on multiple cores, you must use multiple Python processes.

## Overview of `threading` and `multiprocessing`

### `threading` Module
- Used for **concurrent tasks** within the same process.
- Good for **I/O-bound** operations (e.g., file I/O, network).
- Limited by the **GIL**: no true parallelism for CPU-bound tasks.
- `start()`: begins thread execution.
- `join()`: waits for thread to finish.

---

### `multiprocessing` Module
- Used for **parallel tasks** across multiple processes.
- Best for **CPU-bound** operations (e.g., heavy computation).
- Each process runs independently, **bypassing the GIL**.
- `start()`: launches a new process.
- `join()`: waits for process to complete.


Use **`threading`** for I/O-bound tasks, **`multiprocessing`** for CPU-bound tasks. In both, `start()` runs the work, `join()` waits for it to finish.

## Simple examples

### Sequential execution

**See practical example 1**.

### Multi-threading

**See practical example 2**.

### Multi-processing

**See practical example 3**.