# Chapter 7: Concurrency and Parallelism

**Concurrency** enables a computer to do many different things **seemingly** at the same time.  For example, on a computer with one CPU core, the operating system rapidly changes which program is running on the single processor.  In doing so, it interleaves execution of the programs providing the illusion that hte programs are running simultaneously.  

**Parallelism**, in contrast, involves **actually** doing many different things at the same time.  A computer with multiple CPU cores can execute multiple programs simultaneously.  Each CPU core runs the instructions of a seperate program, allowing each program to make forward progress during the same instant.  

Within a single program, concurrency is a tool that makes it easier for programmers to solve certain types of problems.  

The key difference between parallelism and concurrency is **speedup**.  When two distinct paths of execution in a program make forward progress in parallel, the time it takes to do the total work is cut in half.  

Threads support a relatively small amount of concurrency, while coroutines enable vast numbers of concurrent functions.  It can be very difficult to make concurrent Python code truly run in parallel.  

## Item 52: Use `subprocess` to Manage Child Processes

Python has pretty robust libraries for runing and managing child processes.  This makes it a great language for gluing together other tools, such as command-line utilities.  When existing shell scripts get complicated, as they often do over time, graduating them to a rewrite in Python for the sake of readability and maintainability is a natural choice.  

Child processes start by Python are able to run in parallel, enabling you to use Python to consume all of the CPU cores of a machine and maximize the throughput of programs.  Although Python itself may be CPU bound, it's easy to use Python to drive and coordinate CPU-intensive workloads.  

The best option for managing child processes is to use the `subprocess` built-in module.  

In [1]:
import subprocess

result = subprocess.run(
    ['echo', 'Hello from the child!'],
    capture_output=True,
    encoding='utf-8')

result.check_returncode()  # No exception means it exited cleanly
print(result.stdout)

Hello from the child!



Child processes run independently from their parent processes, the Python interpreter.  If I create a subprocess using the `Popen` class instead of the run function, I can poll the child process status periodically while Python does other work:

In [2]:
proc = subprocess.Popen(['sleep', '1'])
while proc.poll() is None:
    print('Working...')
    # Some time-consuming work here
    import time
    time.sleep(0.3)

print('Exit status', proc.poll())

Working...
Working...
Working...
Working...
Exit status 0


Decoupling the child process fro mthe parent frees up the parent process to run many child processes in parallel.  Here, I do this by starting all the child processes together with `Popen` upfront:

In [4]:
import time

start = time.time()
sleep_procs = []
for _ in range(10):
    # Use this line instead to make this example work on Windows
    # proc = subprocess.Popen(['sleep', '1'], shell=True)
    proc = subprocess.Popen(['sleep', '1'])
    sleep_procs.append(proc)
    
for proc in sleep_procs:
    proc.communicate()

end = time.time()
delta = end - start
print(f'Finished in {delta:.3} seconds')

Finished in 1.01 seconds


You can also pipe data from a Python program into a subprocess and retrive its output:

In [6]:
import os
# On Windows, after installing OpenSSL, you may need to
# alias it in your PowerShell path with a command like:
# $env:path = $env:path + ";C:\Program Files\OpenSSL-Win64\bin"

def run_encrypt(data):
    env = os.environ.copy()
    env['password'] = 'zf7ShyBhZOraQDdE/FiZpm/m/8f9X+M1'
    proc = subprocess.Popen(
        ['openssl', 'enc', '-des3', '-pass', 'env:password'],
        env=env,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE)
    proc.stdin.write(data)
    proc.stdin.flush()  # Ensure that the child gets input
    return proc


# Example 6
procs = []
for _ in range(3):
    data = os.urandom(10)
    proc = run_encrypt(data)
    procs.append(proc)
    
for proc in procs:
    out, _ = proc.communicate()
    print(out[-10:])

b'\x19P\x90\xd7\xba\xc9\x1f\x96^;'
b'\xdc\xac\xae\x02\x19sz\xf9\xa4\x16'
b's}\xcdA\xd6\x9aF\x90U\x9f'


Use the subprocess module to run child processes and manage their input and output streams.  

Child processes run in parallel with the Python interpreter, enabling you to maximize your usage of CPU cores.  

Use the `run` convenience function for simple usage, and the `Popen` class for advanced usage line UNIX-style pipelines.  

Use the `timeout` parameter of the `communicate` method to avoid deadlocks and hanging child processes.  

## Item 53: Use Threads for Blocking I/O, Avoid for Parallelism  

The standard implementation of Python is called CPython.  CPython runs a Python program in two steps:
1. it parses and compiles the source text into **bytecode**, which is a low-level representation of the program as 8-bit instructions.  As of Python 3.6 this is actually 16-bit **wordcode**.
2. CPython then runs the bytecode using a stack-based interpreter.  The bytecode interpreter has state that must be maintained and coherent while the Python program executes.  CPython enforces coherence with a mechanism called the **global interpreter lock (GIL)**.  

Essentially, the GIL is a mutual exclusion lock (mutex) that prevents CPython from being affected by preemptive multithreading, where one thread takes control of a program by interrupting another thread.  Such an interruption could corrupt the interpreter state (e.g. garbage collection reference counts) if it comes at an unexpected time.  The GIL prevents these interruptions and ensures that every bytecode instruction works correctly with the CPython implementation and its C-extension modules.  

The downside of the GIL is that, opposed to languages like C++ or Java where having multiple threads of execution means that a program could utilize multiple CPU cores at the same time, Python doesn't allow for this.  The GIL only allows one execution thread to progress at a time.  This means that if you reach for threads to do parallel computation and speed up your Python programs, you will be disappointed.

Despite these perhaps perceived shortcomings, Python supports threads for two good reasons:
1. Multiple threads make it easy for a program to seem like it's doing multiple things at the same time, as opposed to you having to juggle things.
2. To deal with blocking I/O.

Blocking I/O includes things like reading and writing files, interacting with networks, communicating with devices like displays, etc.  

In [17]:
import select
import socket
from threading import Thread

def slow_systemcall():
    select.select([socket.socket()], [], [], 0.1)

start = time.time()

for _ in range(5):
    slow_systemcall()

end = time.time()
delta = end - start
print(f'Took {delta:.3f} seconds')

Took 0.001 seconds


When you find yourself needing to do blocking I/O and compuation simultaneously, it's time to consider moving your system calls to threads.  


In [10]:
start = time.time()

threads = []
for _ in range(5):
    thread = Thread(target=slow_systemcall)
    thread.start()
    threads.append(thread)

def compute_helicopter_location(index):
    pass

for i in range(5):
    compute_helicopter_location(i)

for thread in threads:
    thread.join()

end = time.time()
delta = end - start
print(f'Took {delta:.3f} seconds')    


Took 0.001 seconds


Python threds can't run in parallel on multiple CPU cores because of the global interpreter lock (GIL).  

Python threads are still useful despite the GIL because they provide an easy way to do multiple things seemingly at the same time.  

Use Python threads to make multiple system calls in parallel.  This allows you to do blocking I/O at the same time as computation.

## Item 54: Use `Lock` to Prevent Data Races in Threads

The global interpreter lock will not protect you from data races in threads.  Although only one Python thread runs at a time, **a thread's operations on data structures can be interrupted between any two bytecode instructions.**

In [18]:
class Counter:
    def __init__(self):
        self.count = 0

    def increment(self, offset):
        self.count += offset

def worker(sensor_index, how_many, counter):
    # I have a barrier in here so the workers synchronize
    # when they start counting, otherwise it's hard to get a race
    # because the overhead of starting a thread is high.
    BARRIER.wait()
    for _ in range(how_many):
        # Read from the sensor
        # Nothing actually happens here, but this is where
        # the blocking I/O would go.
        counter.increment(1)

from threading import Barrier
BARRIER = Barrier(5)
from threading import Thread

how_many = 10**5
counter = Counter()

threads = []
for i in range(5):
    thread = Thread(target=worker,
                    args=(i, how_many, counter))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

expected = how_many * 5
found = counter.count
print(f'Counter should be {expected}, got {found}')

Counter should be 500000, got 444585


To prevent data races and other forms of data structure corruption, Python includes a robust set of tools in the `threading` built-in module.  The simplest and most useful of them is the `Lock` class, which is a mutual-exclusion lock (mutex).  

By using a lock, I can have the `Counter` class protect its current value against simultaneous accesses from multiple threads.  Even better is using it with the **context manager** `with` to act like a C++ lock guard that will then free the lock after the operation is complete, and prevent you from worrying about dead locks.  

In [20]:
from threading import Lock

class LockingCounter:
    def __init__(self):
        self.lock = Lock()
        self.count = 0

    def increment(self, offset):
        with self.lock:
            self.count += offset

BARRIER = Barrier(5)
counter = LockingCounter()

for i in range(5):
    thread = Thread(target=worker,
                    args=(i, how_many, counter))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

expected = how_many * 5
found = counter.count
print(f'Counter should be {expected}, got {found}')

Counter should be 500000, got 500000


Even though Python has the GIL, you're still responsible for protecting against data races between the threads in your program.  

Your programs will corrupt their data structures if you allow multiple threads to modify the same objects without mutexes.  

Use the `Lock` class from the `threading` built-in module to enforce your program's invariants between multiple threads.

## Item 55:  Use `Queue` to Coordinate Work Between Threads

Python programs that do many things concurrently often need to coordinate their work.  One of the most useful arrangements for concurrent work is a pipeline of functions.  

The handing off of work between pipeline phases can be modeled as a thread-safe **producer-consumer queue.**

For example, say that I want to build a system that will take a constant stream of images from a digital camera, resize them, and then add them to a photo gallery online.  Such a program could be split into three phases of a pipeline.  New images are retrieved in the first phase, The downloaded images are passed through the resize function in the second phase.  The resized images are then consumed by the upload function in the final phase.

### Let's take a look at a naive way to do this:

In [22]:
from collections import deque
from threading import Lock
from threading import Thread
import time

def download(item):
    return item

def resize(item):
    return item

def upload(item):
    return item

class MyQueue:
    def __init__(self):
        self.items = deque()
        self.lock = Lock()

    def put(self, item):
        with self.lock:
            self.items.append(item)

    def get(self):
        with self.lock:
            return self.items.popleft()


class Worker(Thread):
    def __init__(self, func, in_queue, out_queue):
        super().__init__()
        self.func = func
        self.in_queue = in_queue
        self.out_queue = out_queue
        self.polled_count = 0
        self.work_done = 0

    def run(self):
        while True:
            self.polled_count += 1
            try:
                item = self.in_queue.get()
            except IndexError:
                time.sleep(0.01)  # No work to do
            except AttributeError:
                # The magic exit signal
                return
            else:
                result = self.func(item)
                self.out_queue.put(result)
                self.work_done += 1


download_queue = MyQueue()
resize_queue = MyQueue()
upload_queue = MyQueue()
done_queue = MyQueue()
threads = [
    Worker(download, download_queue, resize_queue),
    Worker(resize, resize_queue, upload_queue),
    Worker(upload, upload_queue, done_queue),
]


for thread in threads:
    thread.start()

for _ in range(1000):
    download_queue.put(object())

while len(done_queue.items) < 1000:
    # Do something useful while waiting
    time.sleep(0.1)
# Stop all the threads by causing an exception in their
# run methods.
for thread in threads:
    thread.in_queue = None
    thread.join()
    
processed = len(done_queue.items)
polled = sum(t.polled_count for t in threads)
print(f'Processed {processed} items after '
      f'polling {polled} times')

Processed 1000 items after polling 3033 times


When the worker functions vary in their respective speeds, an earlier phase can prevent progress in later phases, backing up the pipeline.  This causes later phases to starve and constantly check their input queues for new work in a tight loop.  The outcome is that worker threads waste CPU time doing nothing useful: they're constantly raising and catching `IndexError` exceptions.

The `Queue` class from the `queue` built-in module provides all of the functionality you need to solve the above problem and more.  `Queue` eliminates the busy waiting in the worker by making the `get` method block until new data is available.

### Let's take a look at the better way to implement a pipeline:

In [23]:
from queue import Queue

my_queue = Queue()

def consumer():
    print('Consumer waiting')
    my_queue.get()              # Runs after put() below
    print('Consumer done')

thread = Thread(target=consumer)
thread.start()


# Example 12
print('Producer putting')
my_queue.put(object())          # Runs before get() above
print('Producer done')
thread.join()


# Example 13
my_queue = Queue(1)             # Buffer size of 1

def consumer():
    time.sleep(0.1)             # Wait
    my_queue.get()              # Runs second
    print('Consumer got 1')
    my_queue.get()              # Runs fourth
    print('Consumer got 2')
    print('Consumer done')

thread = Thread(target=consumer)
thread.start()


# Example 14
my_queue.put(object())          # Runs first
print('Producer put 1')
my_queue.put(object())          # Runs third
print('Producer put 2')
print('Producer done')
thread.join()


# Example 15
in_queue = Queue()

def consumer():
    print('Consumer waiting')
    work = in_queue.get()       # Done second
    print('Consumer working')
    # Doing work
    print('Consumer done')
    in_queue.task_done()        # Done third

thread = Thread(target=consumer)
thread.start()


# Example 16
print('Producer putting')
in_queue.put(object())         # Done first
print('Producer waiting')
in_queue.join()                # Done fourth
print('Producer done')
thread.join()

Consumer waiting
Producer putting
Producer done
Consumer done
Producer put 1
Consumer got 1
Producer put 2
Producer done
Consumer got 2
Consumer done
Consumer waitingProducer putting
Producer waiting

Consumer working
Consumer done
Producer done


The `Queue` class can also track the progress of work using the `task_done` method.  This lets you wait for a phase's intput `queue` to drain and eliminates the need to poll the last phase of a pipeline.

In [25]:
# Example 17
class ClosableQueue(Queue):
    SENTINEL = object()

    def close(self):
        self.put(self.SENTINEL)

    def __iter__(self):
        while True:
            item = self.get()
            try:
                if item is self.SENTINEL:
                    return  # Cause the thread to exit
                yield item
            finally:
                self.task_done()

class StoppableWorker(Thread):
    def __init__(self, func, in_queue, out_queue):
        super().__init__()
        self.func = func
        self.in_queue = in_queue
        self.out_queue = out_queue

    def run(self):
        for item in self.in_queue:
            result = self.func(item)
            self.out_queue.put(result)

download_queue = ClosableQueue()
resize_queue = ClosableQueue()
upload_queue = ClosableQueue()
done_queue = ClosableQueue()
threads = [
    StoppableWorker(download, download_queue, resize_queue),
    StoppableWorker(resize, resize_queue, upload_queue),
    StoppableWorker(upload, upload_queue, done_queue),
]


for thread in threads:
    thread.start()

for _ in range(1000):
    download_queue.put(object())

download_queue.close()


download_queue.join()
resize_queue.close()
resize_queue.join()
upload_queue.close()
upload_queue.join()
print(done_queue.qsize(), 'items finished')

for thread in threads:
    thread.join()




1000 items finished


This approach can be extended to using multiple worker threads per phase, which can increase I/O parallelism and speed up this type of program significantly.

In [27]:
def start_threads(count, *args):
    threads = [StoppableWorker(*args) for _ in range(count)]
    for thread in threads:
        thread.start()
    return threads

def stop_threads(closable_queue, threads):
    for _ in threads:
        closable_queue.close()

    closable_queue.join()

    for thread in threads:
        thread.join()

download_queue = ClosableQueue()
resize_queue = ClosableQueue()
upload_queue = ClosableQueue()
done_queue = ClosableQueue()

download_threads = start_threads(
    3, download, download_queue, resize_queue)
resize_threads = start_threads(
    4, resize, resize_queue, upload_queue)
upload_threads = start_threads(
    5, upload, upload_queue, done_queue)

for _ in range(1000):
    download_queue.put(object())

stop_threads(download_queue, download_threads)
stop_threads(resize_queue, resize_threads)
stop_threads(upload_queue, upload_threads)

print(done_queue.qsize(), 'items finished')

1000 items finished


Pipelines are a great way to organize sequences of work - especially I/O bound programs - that run concurrently using multiple Python threads.  

Be aware of the many problems in biulding concurrent pipelines: busy waiting, how to tell workers to stop, and potential memory explosion.  
The `Queue` class has all of the facilities you need to build robust pipelines: blocking operations, buffer sizes, and joining.


## Item 56: Know How to Recognize When Concurrency is Necessary

The most common types of concurrency coordination are **fan-out** (generating new units of concurrency), and **fan-in** (waiting for existing units of concurrency to complete).  Let's look at how this can be used to perform I/O while also running *The Game of Life*.

In [34]:
from threading import Lock

ALIVE = '*'
EMPTY = '-'

class Grid:
    def __init__(self, height, width):
        self.height = height
        self.width = width
        self.rows = []
        for _ in range(self.height):
            self.rows.append([EMPTY] * self.width)

    def get(self, y, x):
        return self.rows[y % self.height][x % self.width]

    def set(self, y, x, state):
        self.rows[y % self.height][x % self.width] = state

    def __str__(self):
        output = ''
        for row in self.rows:
            for cell in row:
                output += cell
            output += '\n'
        return output


# Example 3
grid = Grid(5, 9)
grid.set(0, 3, ALIVE)
grid.set(1, 4, ALIVE)
grid.set(2, 2, ALIVE)
grid.set(2, 3, ALIVE)
grid.set(2, 4, ALIVE)
print(grid)

---*-----
----*----
--***----
---------
---------



We now need a way to set the status of neighboring cells.

In [35]:
def count_neighbors(y, x, get):
    n_ = get(y - 1, x + 0)  # North
    ne = get(y - 1, x + 1)  # Northeast
    e_ = get(y + 0, x + 1)  # East
    se = get(y + 1, x + 1)  # Southeast
    s_ = get(y + 1, x + 0)  # South
    sw = get(y + 1, x - 1)  # Southwest
    w_ = get(y + 0, x - 1)  # West
    nw = get(y - 1, x - 1)  # Northwest
    neighbor_states = [n_, ne, e_, se, s_, sw, w_, nw]
    count = 0
    for state in neighbor_states:
        if state == ALIVE:
            count += 1
    return count

We now define the logic for the game based on the game's three rules:
1. Die if a cell has fewer than two neighbors.
2. Die if a cell has more than three neighbors.
3. Become alive if an empty cell has exactly three neighbors.

In [36]:
def game_logic(state, neighbors):
    if state == ALIVE:
        if neighbors < 2:
            return EMPTY     # Die: Too few
        elif neighbors > 3:
            return EMPTY     # Die: Too many
    else:
        if neighbors == 3:
            return ALIVE     # Regenerate
    return state

def step_cell(y, x, get, set):
    state = get(y, x)
    neighbors = count_neighbors(y, x, get)
    next_state = game_logic(state, neighbors)
    set(y, x, next_state)
    
def simulate(grid):
    next_grid = Grid(grid.height, grid.width)
    for y in range(grid.height):
        for x in range(grid.width):
            step_cell(y, x, grid.get, next_grid.set)
    return next_grid

class ColumnPrinter:
    def __init__(self):
        self.columns = []

    def append(self, data):
        self.columns.append(data)

    def __str__(self):
        row_count = 1
        for data in self.columns:
            row_count = max(
                row_count, len(data.splitlines()) + 1)

        rows = [''] * row_count
        for j in range(row_count):
            for i, data in enumerate(self.columns):
                line = data.splitlines()[max(0, j - 1)]
                if j == 0:
                    padding = ' ' * (len(line) // 2)
                    rows[j] += padding + str(i) + padding
                else:
                    rows[j] += line

                if (i + 1) < len(self.columns):
                    rows[j] += ' | '

        return '\n'.join(rows)
    
columns = ColumnPrinter()
for i in range(5):
    columns.append(str(grid))
    grid = simulate(grid)
    
print(columns)

    0     |     1     |     2     |     3     |     4    
---*----- | --------- | --------- | --------- | ---------
----*---- | --*-*---- | ----*---- | ---*----- | ----*----
--***---- | ---**---- | --*-*---- | ----**--- | -----*---
--------- | ---*----- | ---**---- | ---**---- | ---***---
--------- | --------- | --------- | --------- | ---------


This works great for a program that can run in one thread on a single machine, but let's imagine that the program's requirements have changed and now I need to do some I/O from within the `game_logic` function.

In [33]:
def game_logic(state, neighbors):
    # Do some blocking input/output in here:
    data = my_socket.recv(100)

## Item 57: Avoid Creating New `Thread` Instances for On-demand Fan-out

Threads are the natural first tool to reach for in order to do parallel I/O in Python.  However, they have significant downsides when yo utry to use them for fanning out to many concurrent lines of execution.  Let's continue with the game of life example from above:


In [37]:
from threading import Lock, Thread

ALIVE = '*'
EMPTY = '-'

class Grid:
    def __init__(self, height, width):
        self.height = height
        self.width = width
        self.rows = []
        for _ in range(self.height):
            self.rows.append([EMPTY] * self.width)

    def get(self, y, x):
        return self.rows[y % self.height][x % self.width]

    def set(self, y, x, state):
        self.rows[y % self.height][x % self.width] = state

    def __str__(self):
        output = ''
        for row in self.rows:
            for cell in row:
                output += cell
            output += '\n'
        return output

class LockingGrid(Grid):
    def __init__(self, height, width):
        super().__init__(height, width)
        self.lock = Lock()

    def __str__(self):
        with self.lock:
            return super().__str__()

    def get(self, y, x):
        with self.lock:
            return super().get(y, x)

    def set(self, y, x, state):
        with self.lock:
            return super().set(y, x, state)
        
def count_neighbors(y, x, get):
    n_ = get(y - 1, x + 0)  # North
    ne = get(y - 1, x + 1)  # Northeast
    e_ = get(y + 0, x + 1)  # East
    se = get(y + 1, x + 1)  # Southeast
    s_ = get(y + 1, x + 0)  # South
    sw = get(y + 1, x - 1)  # Southwest
    w_ = get(y + 0, x - 1)  # West
    nw = get(y - 1, x - 1)  # Northwest
    neighbor_states = [n_, ne, e_, se, s_, sw, w_, nw]
    count = 0
    for state in neighbor_states:
        if state == ALIVE:
            count += 1
    return count

def game_logic(state, neighbors):
    # Do some blocking input/output in here:
    data = my_socket.recv(100)

def game_logic(state, neighbors):
    if state == ALIVE:
        if neighbors < 2:
            return EMPTY     # Die: Too few
        elif neighbors > 3:
            return EMPTY     # Die: Too many
    else:
        if neighbors == 3:
            return ALIVE     # Regenerate
    return state

def step_cell(y, x, get, set):
    state = get(y, x)
    neighbors = count_neighbors(y, x, get)
    next_state = game_logic(state, neighbors)
    set(y, x, next_state)

def simulate_threaded(grid):
    next_grid = LockingGrid(grid.height, grid.width)

    threads = []
    for y in range(grid.height):
        for x in range(grid.width):
            args = (y, x, grid.get, next_grid.set)
            thread = Thread(target=step_cell, args=args)
            thread.start()  # Fan out
            threads.append(thread)

    for thread in threads:
        thread.join()       # Fan in

    return next_grid

In [38]:
class ColumnPrinter:
    def __init__(self):
        self.columns = []

    def append(self, data):
        self.columns.append(data)

    def __str__(self):
        row_count = 1
        for data in self.columns:
            row_count = max(
                row_count, len(data.splitlines()) + 1)

        rows = [''] * row_count
        for j in range(row_count):
            for i, data in enumerate(self.columns):
                line = data.splitlines()[max(0, j - 1)]
                if j == 0:
                    padding = ' ' * (len(line) // 2)
                    rows[j] += padding + str(i) + padding
                else:
                    rows[j] += line

                if (i + 1) < len(self.columns):
                    rows[j] += ' | '

        return '\n'.join(rows)

grid = LockingGrid(5, 9)            # Changed
grid.set(0, 3, ALIVE)
grid.set(1, 4, ALIVE)
grid.set(2, 2, ALIVE)
grid.set(2, 3, ALIVE)
grid.set(2, 4, ALIVE)

columns = ColumnPrinter()
for i in range(5):
    columns.append(str(grid))
    grid = simulate_threaded(grid)  # Changed

print(columns)

    0     |     1     |     2     |     3     |     4    
---*----- | --------- | --------- | --------- | ---------
----*---- | --*-*---- | ----*---- | ---*----- | ----*----
--***---- | ---**---- | --*-*---- | ----**--- | -----*---
--------- | ---*----- | ---**---- | ---**---- | ---***---
--------- | --------- | --------- | --------- | ---------


This works as expected, and the I/O is now parallelized between the threads.  However, this code has three big problems:
1. The `Thread` instances require special tools to coorindate with each other safely.  This makes the code that uses threads harder to reason about than the procedural, single-threaded code from Item 56.  This complexity makes the threaded code more difficult to extend and maintain over time.  
2. Threads require a lot of memory - about 8 MB per executing thread.  On many computers, that amount of memory doesn't matter for the 45 threads I'd need in this example, but if we decided to move to 100,000 cells, I would need to create a ton of threads, which would exhaust system memory.  Running a thread per concurrent activity just won't work.
3. Starting a thread is costly, and threads have a negative performace impact when they run due to context switching between them.  In this case, all of the threads are started and stopped each generation of the game, which has high overhead and will increase latency beyond the expected I/O time of 100 milliseconds.

This code would also be very difficult to debug if something went wrong.  For example, imagine that the `game_logic` function raises an exception, which is highly likely due to the generally flakey nature of I/O:

In [39]:
def game_logic(state, neighbors):
    raise OSError('Problem with I/O')

I can test what this would do by running a `Thread` instance pointed at this function and redirecting the sys.stderr output from the program to an in-memory `StringIO` buffer:

In [40]:
import contextlib
import io

fake_stderr = io.StringIO()
with contextlib.redirect_stderr(fake_stderr):
    thread = Thread(target=game_logic, args=(ALIVE, 3))
    thread.start()
    thread.join()

print(fake_stderr.getvalue())

Exception in thread Thread-295:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-39-e22627dd5612>", line 2, in game_logic
OSError: Problem with I/O



An `OSError` exception is raised as expected, but somehow the code that created the `Thread` and called `join` on it is unaffected.  This is because the `Thread` class will independently catch any exceptions that are raised by the target function and then write their traceback to `sys.stderr`.  Such exceptions are never re-raised to the caller that started the thread in the first place.

Given all of these issues it's clear that threads are not the solution if you need to constantly create and finish new concurrent functions.

Threads have many downsides:  They're costly to start and run if you need a lot of then, they each require a significant amount of memory, and they require special tools like `Lock` instances for coordination.  
Threads do not provide a built-in way to raise exceptions back in the code that started a thread or that is waiting for one to finish, which makes them difficult to debug.


## Item 58: Understand How Using `Queue` for Concurrency Requires Refactoring