ref: 
- https://realpython.com/intro-to-python-threading/#producer-consumer-threading

In [None]:
import sys
sys.version

'3.7.12 (default, Sep 10 2021, 00:21:48) \n[GCC 7.5.0]'

# `threading.Thread()`

In [None]:
%%writefile single_thread.py
import logging
import threading
import time

def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    logging.info("Main    : before creating thread")
    x = threading.Thread(target=thread_function, args=(1,))
    logging.info("Main    : before running thread")
    x.start()
    logging.info("Main    : wait for the thread to finish")
    # x.join()
    logging.info("Main    : all done")

Writing single_thread.py


In [None]:
# will wait for Thread 1 to be finished
! python single_thread.py

16:20:59: Main    : before creating thread
16:20:59: Main    : before running thread
16:20:59: Thread 1: starting
16:20:59: Main    : wait for the thread to finish
16:20:59: Main    : all done
16:21:01: Thread 1: finishing


A `daemon` thread will shut down immediately when the program exits. One way to think about these definitions is to consider the daemon thread a thread that runs in the background without worrying about shutting it down.

In [None]:
%%writefile daemon_thread.py
import logging
import threading
import time

def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    logging.info("Main    : before creating thread")
    x = threading.Thread(target=thread_function, args=(1,), daemon=True)
    logging.info("Main    : before running thread")
    x.start()
    logging.info("Main    : wait for the thread to finish")
    # x.join()
    logging.info("Main    : all done")

Writing daemon_thread.py


In [None]:
! python daemon_thread.py

16:20:41: Main    : before creating thread
16:20:41: Main    : before running thread
16:20:41: Thread 1: starting
16:20:41: Main    : wait for the thread to finish
16:20:41: Main    : all done


In [None]:
%%writefile daemon_thread_join.py
import logging
import threading
import time

def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    logging.info("Main    : before creating thread")
    x = threading.Thread(target=thread_function, args=(1,), daemon=True)
    logging.info("Main    : before running thread")
    x.start()
    logging.info("Main    : wait for the thread to finish")
    x.join()
    logging.info("Main    : all done")

Overwriting daemon_thread_join.py


In [None]:
! python daemon_thread_join.py

16:56:52: Main    : before creating thread
16:56:52: Main    : before running thread
16:56:52: Thread 1: starting
16:56:52: Main    : wait for the thread to finish
16:56:54: Thread 1: finishing
16:56:54: Main    : all done


In [None]:
# 1. create a `Thread` object
# 2. call `.start()`
# 3. using `.join()` to keep a list of Thread objects to wait for

%%writefile multithreading.py

import logging
import threading
import time

def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    threads = list()
    for index in range(3):
        logging.info("Main    : create and start thread %d.", index)
        x = threading.Thread(target=thread_function, args=(index,))
        threads.append(x)
        x.start()

    for index, thread in enumerate(threads):
        logging.info("Main    : before joining thread %d.", index)
        thread.join()
        logging.info("Main    : thread %d done", index)

Overwriting multithreading.py


In [None]:
! python multithreading.py

17:05:33: Main    : create and start thread 0.
17:05:33: Thread 0: starting
17:05:33: Main    : create and start thread 1.
17:05:33: Thread 1: starting
17:05:33: Main    : create and start thread 2.
17:05:33: Thread 2: starting
17:05:33: Main    : before joining thread 0.
17:05:35: Thread 0: finishing
17:05:35: Main    : thread 0 done
17:05:35: Main    : before joining thread 1.
17:05:35: Thread 1: finishing
17:05:35: Thread 2: finishing
17:05:35: Main    : thread 1 done
17:05:35: Main    : before joining thread 2.
17:05:35: Main    : thread 2 done


# `ThreadPoolExecutor()`

In [2]:
%%writefile thread_pool.py
from concurrent.futures import ThreadPoolExecutor
import logging
import threading
import time

def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    with ThreadPoolExecutor(max_workers=3) as executor:
        executor.map(thread_function, range(3))

Overwriting thread_pool.py


In [3]:
! python thread_pool.py

11:12:05: Thread 0: starting
11:12:05: Thread 1: starting
11:12:05: Thread 2: starting
11:12:07: Thread 1: finishing
11:12:07: Thread 0: finishing
11:12:07: Thread 2: finishing


## map VS submit

`executor.submit`
- Usage: This method schedules a callable (like a function) to be executed and returns a Future object representing the execution of the callable. You can submit multiple calls independently.
- Return Value: You gain more control over the individual tasks since each call to submit returns a separate Future object. This allows you to handle exceptions or results for each specific task after they have completed.
- Example Use Case : It's useful when you want to start tasks in a loop (as in your code snippet) and manage their results or exceptions individually
```python
futures.append(
    executor.submit(
        self.download_file_from_s3,
        bucket_name=bucket_name,
        key=key,
        file_path=file_path
     )
)
```

`executor.map`
- Usage: This method blocks until all tasks are complete and is used for applying a function to an iterable of arguments. It automatically collects the results of each function call in the same order as they were submitted.
- Return Value: The results are returned as an iterator containing the results corresponding to the inputs.
- Example Use Case: It's simpler and cleaner when you don't need to manage Future objects individually and can handle the results as a whole:
```python
results = executor.map(
    self.download_file_from_s3,
    [bucket_name] * n,
    keys,  # Iterable of keys
    file_paths  # Iterable of file paths
)
```
**Preferred Usage**
- The choice between submit and map depends on your specific use case:
    - Use `submit` when you need fine-grained control over individual tasks, especially if their execution might fail or if you want to handle results as they come in.
    - Use `map` for simplicity and when tasks are similar and can be processed in a batch, especially when you don’t need to handle exceptions separately for each task.

In [54]:
%%writefile thread_pool.py
from concurrent.futures import ThreadPoolExecutor, as_completed
import logging
import threading
import time

max_workers = 5

def func(x):
    """I/O bound task"""
    time.sleep(2)
    return x, x**x


if __name__ == "__main__":
    final_results = []
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = []
        for val in range(1, 15):

            futures.append(
                    executor.submit(
                    func,
                    x=val
                )
            )
        for f in as_completed(futures):
            res = f.result()
            print(res)
            final_results.append(res)


Overwriting thread_pool.py


In [55]:
! python thread_pool.py

(1, 1)
(4, 256)
(2, 4)
(3, 27)
(5, 3125)
(6, 46656)
(7, 823543)
(8, 16777216)
(9, 387420489)
(10, 10000000000)
(11, 285311670611)
(12, 8916100448256)
(14, 11112006825558016)
(13, 302875106592253)


In [64]:
# step by step review:

from concurrent.futures import ThreadPoolExecutor, as_completed
import time 


def func(x):
    """I/O bound task"""
    time.sleep(2)
    return x, x**x

executor = ThreadPoolExecutor(max_workers=3)
executor


<concurrent.futures.thread.ThreadPoolExecutor at 0x108e68650>

In [65]:
futures = []
futures.append(executor.submit(func, 10))
futures

[<Future at 0x108e56850 state=running>]

In [66]:
from concurrent.futures import Future

type(executor.submit(func, 10)) == Future

True

In [74]:
from typing import Generator

isinstance(as_completed(futures), Generator)

True

In [75]:
as_completed(futures)

<generator object as_completed at 0x108ea1300>

In [67]:
obj = []
for f in as_completed(futures):
    obj.append(f.result())

In [68]:
obj

[(10, 10000000000)]

# Race Condition

- `.submit(function, *args, **kwargs)
`

In [None]:

from concurrent.futures import ThreadPoolExecutor
import logging
import time

class FakeDatabase:
    def __init__(self):
        self.value = 0

    def update(self, name):
        """
        Simulate reading a value from a database,
        doing some computation on it, and then writing a new value back to the database.
        """
        logging.info("Thread %s: starting update", name)
        local_copy = self.value
        local_copy += 1
        time.sleep(0.1)
        self.value = local_copy
        logging.info("Thread %s: finishing update", name)


if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    database = FakeDatabase()
    logging.info("Testing update. Starting value is %d.", database.value)
    with ThreadPoolExecutor(max_workers=2) as executor:
        for index in range(2):
            executor.submit(database.update, index)
    logging.info("Testing update. Ending value is %d.", database.value)

17:29:27: Testing update. Starting value is 0.
17:29:27: Thread 0: starting update
17:29:27: Thread 1: starting update
17:29:27: Thread 0: finishing update
17:29:27: Thread 1: finishing update
17:29:27: Testing update. Ending value is 1.


# Basic Synchronization Using `Lock`

To solve your race condition above, you need to find a way to allow only one thread at a time into the `read-modify-write` section of your code. 

The most common way to do this is called `Lock` in Python. In some other languages this same idea is called a `mutex`. `Mutex` comes from **MUTual EXclusion**, which is exactly what a Lock does.

- A `Lock` is an object that acts like a **hall pass**. Only one thread at a time can have the `Lock`. Any other thread that wants the `Lock` must wait until the owner of the Lock gives it up.
- The basic functions to do this are `.acquire()` and `.release()`. A thread will call `my_lock.acquire()` to get the lock. If the lock is already held, the calling thread will wait until it is released. There’s an important point here. If one thread gets the lock but never gives it back, your program will be stuck. You’ll read more about this later.
- Python’s Lock will also operate as a context manager, so you can use it in a with statement, and it gets released automatically when the with block exits for any reason.





In [None]:

%%writefile fixrace.py
from concurrent.futures import ThreadPoolExecutor
import logging
import time
import threading

class FakeDatabase:
    def __init__(self):
        self.value = 0
        self._lock = threading.Lock()  # adding Lock

    def locked_update(self, name):
        logging.info("Thread %s: starting update", name)
        logging.debug("Thread %s about to lock", name)
        with self._lock:
            logging.debug("Thread %s has lock", name)
            local_copy = self.value
            local_copy += 1
            time.sleep(0.1)
            self.value = local_copy
            logging.debug("Thread %s about to release lock", name)
        logging.debug("Thread %s after release", name)
        logging.info("Thread %s: finishing update", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.DEBUG,
                        datefmt="%H:%M:%S")
    # logging.getLogger().setLevel(logging.DEBUG)

    database = FakeDatabase()
    logging.info("Testing update. Starting value is %d.", database.value)
    with ThreadPoolExecutor(max_workers=2) as executor:
        for index in range(2):
            executor.submit(database.locked_update, index)
    logging.info("Testing update. Ending value is %d.", database.value)

Overwriting fixrace.py


In [None]:
! python fixrace.py

10:32:40: Testing update. Starting value is 0.
10:32:40: Thread 0: starting update
10:32:40: Thread 0 about to lock
10:32:40: Thread 0 has lock
10:32:40: Thread 1: starting update
10:32:40: Thread 1 about to lock
10:32:40: Thread 0 about to release lock
10:32:40: Thread 0 after release
10:32:40: Thread 0: finishing update
10:32:40: Thread 1 has lock
10:32:40: Thread 1 about to release lock
10:32:40: Thread 1 after release
10:32:40: Thread 1: finishing update
10:32:40: Testing update. Ending value is 2.


## Deadlock
As you saw, if the Lock has already been acquired, a second call to `.acquire()` will wait until the thread that is holding the Lock calls `.release()`.

1. An implementation bug where a Lock is not released properly
    - The first situation happens sometimes, but using a `Lock` as a context manager greatly reduces how often. It is recommended to write code whenever possible to make use of context managers, as they help to avoid situations where an exception skips you over the `.release()` call.
2. A design issue where a utility function needs to be called by functions that might or might not already have the Lock
    - The design issue can be a bit trickier in some languages. Thankfully, Python threading has a second object, called **`RLock`*, that is designed for just this situation. 
    
    It allows **a thread to `.acquire()` an `RLock` multiple times before it calls `.release()`**. That thread is still required to call `.release()` the same number of times it called `.acquire()`, but it should be doing that anyway.


`Lock` and `RLock` are two of the basic tools used in threaded programming to prevent race conditions.

In [None]:
# Deadlock: it will stuck and keep running
import threading

l = threading.Lock()
print("before first acquire")
l.acquire()
print("before second acquire")
l.acquire()
print("acquired lock twice")

before first acquire
before second acquire


KeyboardInterrupt: ignored

# Producer-Consumer Threading
ref: https://realpython.com/intro-to-python-threading/#producer-consumer-threading

## Producer-Consumer Using Lock

In [None]:
%%writefile producer_consumer_lock.py
#!/usr/bin/env python3
import concurrent.futures
import logging
import random
import threading

SENTINEL = object()


class Pipeline_clean:
    """ (without logging)
    Class to allow a single element pipeline between producer and consumer.
    """
    def __init__(self):
        self.message = 0
        self.producer_lock = threading.Lock()
        self.consumer_lock = threading.Lock()
        self.consumer_lock.acquire()

    def get_message(self, name):
        self.consumer_lock.acquire()
        message = self.message
        self.producer_lock.release()
        return message

    def set_message(self, message, name):
        self.producer_lock.acquire()
        self.message = message
        self.consumer_lock.release()


class Pipeline:
    """
    Class to allow a single element pipeline
    between producer and consumer.
    """

    def __init__(self):
        self.message = 0
        self.producer_lock = threading.Lock()
        self.consumer_lock = threading.Lock()
        self.consumer_lock.acquire()

    def get_message(self, name):
        logging.debug("%s:about to acquire getlock", name)
        self.consumer_lock.acquire()
        logging.debug("%s:have getlock", name)
        message = self.message
        logging.debug("%s:about to release setlock", name)
        self.producer_lock.release()
        logging.debug("%s:setlock released", name)
        return message

    def set_message(self, message, name):
        logging.debug("%s:about to acquire setlock", name)
        self.producer_lock.acquire()
        logging.debug("%s:have setlock", name)
        self.message = message
        logging.debug("%s:about to release getlock", name)
        self.consumer_lock.release()
        logging.debug("%s:getlock released", name)


def producer(pipeline):
    """Pretend we're getting a message from the network."""
    for index in range(10):
        message = random.randint(1, 101)
        logging.info("Producer got message: %s", message)
        pipeline.set_message(message, "Producer")

    # Send a sentinel message to tell consumer we're done
    pipeline.set_message(SENTINEL, "Producer")


def consumer(pipeline):
    """Pretend we're saving a number in the database."""
    message = 0
    while message is not SENTINEL:
        message = pipeline.get_message("Consumer")
        if message is not SENTINEL:
            logging.info("Consumer storing message: %s", message)


if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")
    logging.getLogger().setLevel(logging.DEBUG)

    pipeline = Pipeline()
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        executor.submit(producer, pipeline)
        executor.submit(consumer, pipeline)

Overwriting producer_consumer_lock.py


In [None]:
! python producer_consumer_lock.py

11:02:59: Producer got message: 10
11:02:59: Producer:about to acquire setlock
11:02:59: Consumer:about to acquire getlock
11:02:59: Producer:have setlock
11:02:59: Producer:about to release getlock
11:02:59: Producer:getlock released
11:02:59: Consumer:have getlock
11:02:59: Producer got message: 46
11:02:59: Consumer:about to release setlock
11:02:59: Producer:about to acquire setlock
11:02:59: Consumer:setlock released
11:02:59: Producer:have setlock
11:02:59: Consumer storing message: 10
11:02:59: Producer:about to release getlock
11:02:59: Consumer:about to acquire getlock
11:02:59: Producer:getlock released
11:02:59: Consumer:have getlock
11:02:59: Producer got message: 98
11:02:59: Consumer:about to release setlock
11:02:59: Producer:about to acquire setlock
11:02:59: Consumer:setlock released
11:02:59: Producer:have setlock
11:02:59: Consumer storing message: 46
11:02:59: Producer:about to release getlock
11:02:59: Consumer:about to acquire getlock
11:02:59: Producer:getlock re

## Producer-Consumer Using Queue

If you want to be able to handle more than one value in the pipeline at a time, you’ll need a data structure for the pipeline that allows the number to grow and shrink as data backs up from the producer.

- `Event`: The `threading.Event` object allows one thread to signal an event while many other threads can be waiting for that event to happen. The key usage in this code is that the threads that are waiting for the event do not necessarily need to stop what they are doing, they can just check the status of the Event every once in a while.

- `Queue`: The core devs who wrote the standard library knew that a `Queue` is frequently used in multi-threading environments and incorporated all of that locking code inside the `Queue` itself. `Queue` is **thread-safe**.

ref: https://realpython.com/intro-to-python-threading/#producer-consumer-using-queue

In [None]:
%%writefile producer_consumer_queue.py
import concurrent.futures
import logging
import queue
import random
import threading
import time

def producer(queue, event):
    """Pretend we're getting a number from the network."""
    while not event.is_set():
        message = random.randint(1, 101)
        logging.info("Producer got message: %s", message)
        queue.put(message)

    logging.info("Producer received event. Exiting")

def consumer(queue, event):
    """Pretend we're saving a number in the database."""
    while not event.is_set() or not queue.empty():
        message = queue.get()
        logging.info(
            "Consumer storing message: %s (size=%d)", message, queue.qsize()
        )

    logging.info("Consumer received event. Exiting")

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    pipeline = queue.Queue(maxsize=10)
    event = threading.Event()
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        executor.submit(producer, pipeline, event)
        executor.submit(consumer, pipeline, event)

        time.sleep(0.1)
        logging.info("Main: about to set event")
        event.set()  # All threads waiting for it to become true are awakened

Writing producer_consumer_queue.py


In [None]:
! python producer_consumer_queue.py

11:05:40: Producer got message: 5
11:05:40: Producer got message: 84
11:05:40: Producer got message: 6
11:05:40: Producer got message: 87
11:05:40: Producer got message: 26
11:05:40: Producer got message: 64
11:05:40: Producer got message: 19
11:05:40: Producer got message: 36
11:05:40: Producer got message: 95
11:05:40: Producer got message: 72
11:05:40: Producer got message: 83
11:05:40: Consumer storing message: 5 (size=9)
11:05:40: Producer got message: 82
11:05:40: Consumer storing message: 84 (size=9)
11:05:40: Producer got message: 32
11:05:40: Consumer storing message: 6 (size=9)
11:05:40: Producer got message: 92
11:05:40: Consumer storing message: 87 (size=9)
11:05:40: Producer got message: 26
11:05:40: Consumer storing message: 26 (size=9)
11:05:40: Producer got message: 18
11:05:40: Consumer storing message: 64 (size=9)
11:05:40: Producer got message: 17
11:05:40: Consumer storing message: 19 (size=9)
11:05:40: Producer got message: 79
11:05:40: Consumer storing message: 36

Example2:
ref: https://towardsdatascience.com/dive-into-queue-module-in-python-its-more-than-fifo-ce86c40944ef

In [None]:
from threading import Thread
import time
import queue

q = queue.Queue()
SENTINEL = "END"

def producer(queue):
    for i in range(5):
        # time.sleep(0.01) # Do you think the result will be changed if I uncomment this line? 
        print(f"Insert element {i}")
        queue.put(i)
    queue.put(SENTINEL)
    print(f"Insert sentinel")

def consumer(queue):
    while True:
        item = queue.get()
        if item != SENTINEL:
            print(f"Retrieve element {item}")
            queue.task_done()
        else:
            print("Receive SENTINEL, the consumer will be closed.")
            queue.task_done()
            break

threads = [Thread(target=producer, args=(q,)),Thread(target=consumer, args=(q,)),]

for thread in threads:
    thread.start()

q.join()


Insert element 0
Insert element 1
Insert element 2
Insert element 3
Insert element 4
Insert sentinel
Retrieve element 0
Retrieve element 1
Retrieve element 2
Retrieve element 3
Retrieve element 4
Receive SENTINEL, the consumer will be closed.
