# Programming with Python

## Lecture 10: Concurrency 2

### Armen Gabrielyan

#### Yerevan State University / ASDS

#### 26 Apr, 2025

# Multi-threading

## `threading` module

### Overview

`threading` module constructs higher-level threading interfaces on top of the lower level `_thread` module.

**CPython implementation detail:** In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use `multiprocessing` or `concurrent.futures.ProcessPoolExecutor`. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

Here are the primary methods and attributes defined on a `threading.Thread` object:

| Method / Attribute      | Signature                         | Description                                                                                                                                                   |
|-------------------------|-----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **`__init__`**          | `Thread(group=None, target=None, name=None, args=(), kwargs={}, daemon=None)` | Constructor. You almost always supply `target`, optionally `args`/`kwargs`, and you can name or mark it daemon.                                             |
| **`start`**             | `start()`                         | Arrange for the thread’s `run()` method to be invoked in a new thread of control. Returns immediately.                                                        |
| **`run`**               | `run()`                           | The “worker” function. By default calls `self._target(*self._args, **self._kwargs)`. You override this in subclasses instead of passing `target`.            |
| **`join`**              | `join(timeout=None)`              | Block the calling thread until this thread terminates (or until `timeout` seconds pass).                                                                     |
| **`is_alive`**          | `is_alive()`                      | Return `True` if the thread is still executing.                                                                                                              |
| **`name`**  | `name` (property)       | Get or set the thread’s name (useful for debugging/logging).                                                                                                  |
| **`daemon`**            | `daemon` (property)                   | Boolean flag. If `True`, the thread won’t keep the process alive. Must be set before `start()`.                                                              |
| **`ident`**             | `ident` (read‑only)               | Thread identifier (an integer) assigned by the OS upon `start()`, or `None` if not yet started.                                                               |
| **`native_id`**         | `native_id` (read‑only)           | Native (kernel-level) thread ID, if supported by the platform.                                                                                 |


Also, `threading` module includes module level functions. For example, `threading.current_thread()` returns the current `Thread` object, corresponding to the caller’s thread of control. If the caller’s thread of control was not created through the `threading` module, a dummy thread object with limited functionality is returned.

**See practical example 1**.


### Overriding `run()` via subclassing

Use subclassing when you need per-thread state or more complex behavior; otherwise, passing `target`/`args` to the constructor is often simpler.

**See practical example 2**.


### Thread-local data

**The Problem:**

In multi-threaded applications, you often encounter situations with variables:

1.  **Local variables:** These exist only within a function's scope and are naturally private to the thread executing that function call. However, they disappear when the function returns.
2.  **Global variables:** These are accessible by *all* threads within the process. While convenient for sharing data, modifying shared global variables requires careful use of locks (`threading.Lock`) to prevent race conditions, adding complexity.

**What if you need data that:**

* Persists across function calls *within the same thread* (like a global variable)?
* But is *isolated* and *private* to each specific thread (like a local variable)?


**The Solution: `threading.local()`**

Python's `threading` module offers the `local()` class to create objects that manage thread-local data. Thread-local data is data whose values are thread specific. To manage thread-local data, just create an instance of `threading.local` (or a subclass) and store attributes on it:

```python
mydata = threading.local()
mydata.x = 1
```

The instance’s values will be different for separate threads.

**See practical example 3**.

### Race condition

A **race condition** happens when two or more threads access shared data at the same time, and the result depends on the order of execution — which is not predictable.

Think of two people writing on the same paper at the same time without coordinating. You could end up with gibberish.

It's a problem because threads may:

- Read stale or incorrect values
- Overwrite each other’s work
- Cause inconsistent or unexpected results

**See practical example 4**.

### Synchronization

**Synchronization** is the key to managing shared resources in multi-threading. It is a concept that specifies various mechanisms to ensure that no more than one concurrent thread/process can process and execute a particular program portion at a time; this portion is known as the **critical section**. Synchronization ensures that only one thread at a time can access critical sections of code or shared data, preventing race conditions and inconsistent results.

In a given program, when a thread is accessing/executing the critical section of the program, the other threads have to wait until that thread finishes executing. The typical goal of thread synchronization is to avoid any potential data discrepancies / race conditions when multiple threads access their shared resources, **allowing only one thread to execute the critical section of the program at a time** guarantees that no data conflicts occur in multithreaded applications.

Let's discuss some of the synchronization mechanisms.

#### 1. Lock / mutual exclusion (mutex)

One of the most common ways to apply thread synchronization is through the implementation of a locking mechanism. In the `threading` module, the `threading.Lock` class provides a simple and intuitive approach to creating and working with locks. Its main usage includes the following methods: 

- `threading.Lock()`: This method initializes and returns a new lock object.
- `acquire(blocking)`: When this method is called, all of the threads will run synchronously (that is, only one thread can execute the critical section at a time). The optional argument blocking allows us to specify whether the current thread should wait to acquire the lock:
  - When `blocking = 0`, the current thread does not wait for the lock and simply returns 0 if the lock cannot be acquired by the thread, or 1 otherwise
  - When `blocking = 1` (default value), the current thread blocks and waits for the lock to be released and acquires it afterwards
- `release()`: When this method is called, the lock is released.

Common pattern:

```python
import threading

lock = threading.Lock()

try:
    # Acquire the lock
    lock.acquire()
    
    # Critical section - only one thread at a time can execute this code
    critical_section_code()
finally:
    # Always release the lock, even if an exception occurs
    lock.release()
```

Lock implements context manager protocol, so it is better practice to use `with` statement:

```python
import threading

lock = threading.Lock()

with lock: # Automatically acquires and releases the lock 
    critical_region_code() # Critical section - only one thread at a time can execute this code
```

**See practical example 5**.

#### 2. Semaphore

This is one of the oldest synchronization primitives in the history of computer science, invented by the early Dutch computer scientist Edsger W. Dijkstra.

A semaphore manages an internal counter which is decremented by each `acquire()` call and incremented by each `release()` call. The counter can never go below zero; when `acquire()` finds that it is zero, it blocks, waiting until some other thread calls `release()`.

Use cases:

- **Resource pool**: Allow only 5 simultaneous DB connections
- **Rate limiting**: Max 2 API calls at once
- **Thread-safe batching**: Limit how many threads can download files at once

The `threading` module provides `threading.Semaphore` class for managing semaphores.

**See practical example 6**.

#### 3. Event

Used to signal between threads — one thread waits, another signals.

An event object manages an internal flag that can be set to true with the `set()` method and reset to false with the `clear()` method. The `wait()` method blocks until the flag is true.

The `threading` module provides `threading.Event`.

**See practical example 7**.

#### 4. Queue

A concept in computer science that is widely used in concurrent programming is queuing. **Queue** is a data structure that is a collection of different elements. Elements can be added to the end of the queue which is called enqueuing. Elements can be removed from the beginning of the queue, called dequeuing. It works in First in First out (FIFO) manner, meaning that first entered element is removed first. 

<div style="text-align: center;">
    <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/52/Data_Queue.svg/1200px-Data_Queue.svg.png" alt="Index" width="400" height="400"/>
</div>


The `queue` module in Python provides a simple implementation of the queue data structure. Each queue in the `queue.Queue` class can hold a specific amount of elements, and can have the following methods as its high-level API:
- `get()`: This method returns the next element of the calling queue object and removes it from the queue object. If optional args `block` is `True` (the default) and `timeout` is `None` (the default), block if necessary until an item is available. If `timeout` is a positive number, it blocks at most `timeout` seconds and raises the `queue.Empty` exception if no item was available within that time.
- `get_nowait()`: This method is equivalent to `get()` with `block` parameter set to `False`, meaning it returns an item if one is immediately available, else raises the `queue.Empty` exception .
- `put()`: This method adds a new element to the calling queue object 
- `qsize()`: This method returns the number of current elements in the calling queue object (that is, its size)
- `empty()`: This method returns a boolean, indicating whether the calling queue object is empty
- `full()`: This method returns a boolean, indicating whether the calling queue object is full
- `task_done()`: This method indicates that a formerly enqueued task is complete. Used by queue consumer threads. For each `get()` used to fetch a task, a subsequent call to `task_done()` tells the queue that the processing on the task is complete.
- `join()`: This method blocks until all items in the queue have been gotten and processed. The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer thread calls `task_done()` to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, `join()` unblocks.

Sometimes it is undesirable to have as many threads as the tasks we have to process. Say we have a large number of tasks to be processed, then it will be quite inefficient to spawn the same large number of threads and have each thread execute only one task. It could be more beneficial to have a **fixed number of threads (commonly known as a thread pool)** that would work through the tasks in a cooperative manner.

Here is when the concept of a queue comes in. We can design a structure in which the pool of threads will not hold any information regarding the tasks they should each execute, instead the tasks are stored in a queue (in other words task queue), and the items in the queue will be fed to individual members of the thread pool. As a given task is completed by a member of the thread pool, if the task queue still contains elements to be processed, then the next element in the queue will be sent to the thread that just became available.

**See practical example 8**.

## IO-bound task

In Python, I/O‑bound tasks are best handled with `threading` because most I/O operations (disk reads/writes, network calls, database queries) release the GIL while they’re waiting, you can get real concurrency with threads. Here’s how it compares:

- **Threading**  
  - **GIL release**: Most blocking I/O operations in the standard library (e.g., `socket`, file I/O, many database drivers) relinquish the GIL while waiting, so another thread can run.  
  - **Low overhead**: Threads share memory and have much lower start‑up and context‑switch costs than processes.  
  - **Ease of sharing state**: Because threads live in the same address space, shared data structures require only locks or queues, not inter-process communication (IPC).  
  - **Good for high‑latency tasks**: If your workload is dominated by waiting for network responses, disk reads, or other slow operations, threading can boost throughput without much complexity.

- **Multiprocessing**  
  - **Still works**: Processes also release the GIL on blocking I/O, but using multiple processes adds IPC overhead and higher memory usage.  
  - **Overkill for I/O**: Spinning up separate interpreters usually isn’t worth it if you’re not CPU‑bound; threads are lighter and simpler.

- **Asynchronous I/O (async/await)**  
  - **Single-threaded concurrency**: Uses an event loop to switch between tasks when they await I/O.  
  - **Even lighter than threads**: No context‑switching at the OS level; perfect for handling *huge* numbers of concurrent connections (web servers, chat clients).  
  - **Requires non‑blocking APIs**: You need libraries designed for asyncio or third‑party frameworks (e.g., `aiohttp`, `asyncpg`).

### Simple web scraper

To demonstrate the difference in execution time between sequential and multithreaded approaches, we'll simulate downloading content from multiple URLs.

Requesting a content over a network is I/O-bound task and well-suited for multi-threading.

In [None]:
!pip install httpx

In the following examples, we use sequential and multi-threading techniques to download websites. Also, we can do it with multi-processing, but since this is a I/O-bound task, it is better to solve the problem with multi-threading. Multi-processing can create additional overhead.

**See practical example 9**.

## Python experimental support for free threading

Starting with version `3.13`, CPython introduces an experimental feature called free threading, which removes the Global Interpreter Lock (GIL). This change enables true parallel execution of threads across multiple CPU cores, allowing programs that are designed to use threading to take full advantage of multi-core systems and potentially see performance improvements.

However, this mode is still in development and may contain bugs. Additionally, programs that run in a single thread may experience a noticeable performance decrease.

This was introduced in [PEP 703 – Making the Global Interpreter Lock Optional in CPython](https://peps.python.org/pep-0703/). See [documentation](https://docs.python.org/3/howto/free-threading-python.html) for installation guides and more information.

**See practical example 10**.

## References

- [threading — Thread-based parallelism](https://docs.python.org/3/library/threading.html)
- [Python experimental support for free threading](https://docs.python.org/3/howto/free-threading-python.html)

# Multi-processing

## `multiprocessing` module

### Overview

The `multiprocessing` module provides a way to create new processes using an API similar to the `threading` module. Unlike threads, it uses separate subprocesses, which allows it to bypass the Global Interpreter Lock (GIL) and take full advantage of multiple CPU cores. This enables true parallel execution and works on both POSIX systems and Windows.

In addition to thread-like functionality, `multiprocessing` includes features not found in the `threading` module. One notable feature is the `Pool` class, which simplifies running a function in parallel across a collection of inputs—this is known as data parallelism.

`multiprocessing.Process` objects represent activity that is run in a separate process. The `multiprocessing.Process` class has equivalents of all the methods of `threading.Thread`.

The `if __name__ == '__main__'` part is necessary in multi-processing as you can see in the following example. This is to make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such as starting a new process).

**See practical example 11**.

### Contexts and start methods

#### 1. **`spawn`**
- **How it works:** Starts a *fresh Python interpreter process*.
- **Pros:** Clean slate—only essential resources are inherited.
- **Cons:** Slower startup.
- **Default on:** **Windows** and **macOS**.

#### 2. **`fork`**
- **How it works:** Uses `os.fork()`. Child is a clone of the parent.
- **Pros:** Very fast.
- **Cons:** Not safe with threads. Can crash on **macOS**.
- **Default on:** Most **Linux**/POSIX systems (but **changing in Python 3.14**).
- **Deprecated** for multi-threaded environments since Python 3.12 because forking a multi-threaded process is problematic.

#### 3. **`forkserver`**
- **How it works:** Starts a **server** process which handles forking. As it is single-threaded, it is safer to use `fork` method.
- **Pros:** Safer than `fork`, faster than `spawn`. No excess resources inherited.
- **Cons:** Requires OS support for file descriptor passing.
- **Available on:** POSIX with certain features (e.g., Linux).

Here are two ways to select a start method.

#### Option 1: `set_start_method()` (set once per program)

```python
import multiprocessing as mp

def foo(q):
    q.put('hello')

if __name__ == '__main__':
    mp.set_start_method('spawn')  # 'spawn', 'fork', or 'forkserver'
    q = mp.Queue()
    p = mp.Process(target=foo, args=(q,))
    p.start()
    print(q.get())
    p.join()
```

#### Option 2: `get_context()` (preferred for libraries or multiple modes)

This avoids conflicts with other parts of the app or external libraries.

```python
import multiprocessing as mp

def foo(q):
    q.put('hello')

if __name__ == '__main__':
    ctx = mp.get_context('spawn')
    q = ctx.Queue()
    p = ctx.Process(target=foo, args=(q,))
    p.start()
    print(q.get())
    p.join()
```

## Inter-Process Communication (IPC)

Inter-Process Communication (IPC) refers to mechanisms that allow processes to exchange data and coordinate their actions. These mechanisms are essential for building complex systems where multiple processes need to work together. Synchronization, shared memory, queues and pipes are some examples for organizing IPC.

### Synchronization between processes

The `multiprocessing` module offers the same synchronization tools as the `threading` module. For example, you can use a **lock** to prevent multiple processes from writing to the console at the same time, which avoids jumbled output.

Here's a version of that concept using `for` loop syntax with a lock:

```python
from multiprocessing import Process, Lock


lock = Lock()


def safe_print(number):
    with lock:
        print('hello world', number)

if __name__ == '__main__':
    for i in range(10):
        p = Process(target=safe_print, args=(i,))
        p.start()

```

# Exchanging objects between processes

When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks.

For passing messages one can use `multiprocessing.Pipe()` (for a connection between two processes) or a queue (which allows multiple producers and consumers).

### `multiprocessing.Pipe([duplex])`

Returns a pair `(conn1, conn2)` of `multiprocessing.connection.Connection` objects representing the ends of a pipe.

If `duplex` is `True` (the default) then the pipe is bidirectional. If `duplex` is `False` then the pipe is unidirectional: `conn1` can only be used for receiving messages and` conn2` can only be used for sending messages.

Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

The `send()` method serializes the object using `pickle` and the `recv()` re-creates the object.

#### Key Characteristics

- **Byte Stream:** Pipes typically handle an unstructured stream of bytes. The writing process sends bytes, and the reading process receives bytes, often without inherent message boundaries. The reader needs to know how to interpret the byte stream (e.g., reading until a newline character or reading a fixed number of bytes).
- **Kernel-Managed Buffer:** The operating system manages a buffer for the pipe. If the writer produces data faster than the reader consumes it, the data accumulates in the buffer. If the buffer fills up, the writer will block (wait) until the reader consumes some data. Conversely, if the reader tries to read from an empty pipe, it will block until the writer sends data.
- **Synchronization:** The blocking behaviour provides implicit synchronization between the producer (writer) and consumer (reader).

**See practical example 12**.

### Queues

- Similar to pipes, these are another mechanism for Inter-Process Communication (IPC).
- **Message-Oriented:** Unlike pipes which handle byte streams, message queues typically handle discrete messages. The sender enqueues a whole message, and the receiver dequeues a whole message. This preserves message boundaries.
- **Many-to-Many:** Often, multiple processes can write to the same queue, and multiple processes can read from it (though often a message is consumed by only one reader).

In Python, the `multiprocessing.Queue`, `multiprocessing.SimpleQueue` and `multiprocessing.JoinableQueue` types are multi-producer, multi-consumer FIFO queues modelled on the `queue.Queue` class in the standard library. They differ in that `multiprocessing.Queue` lacks the `task_done()` and `join()` methods introduced into Python 2.5’s `queue.Queue` class.

If you use `JoinableQueue` then you must call `JoinableQueue.task_done()` for each task removed from the queue or else the semaphore used to count the number of unfinished tasks may eventually overflow, raising an exception.

One difference from other Python queue implementations, is that `multiprocessing` queues serializes all objects that are put into them using `pickle`. The object return by the `get` method is a re-created object that does not share memory with the original object.

Multi-processing queues are thread and process safe.

**See practical example 13**.