- Item 36: Use `subprocess` to Manage Child Processes
- Item 37: Use `Threads` for Blocking I/O, Avoid for Parallelism
- Item 38: Use Lock to Prevent Data Races in `Threads`
- Item 39: Use `Queue` to Coordinate Work Between `Threads`
- Item 40: Consider Coroutines to Run Many Functions Concurrently
- Item 41: Consider `concurrent.futures` for True Parallelism

In [1]:
# Preamble to mimick book environment
import logging
from pprint import pprint
from sys import stdout as STDOUT

## Item 36: Use `subprocess` to Manage Child Processes

In [3]:
# Example: subprocess
import subprocess
proc = subprocess.Popen(
    ['echo', 'Hello from the child!'],
    stdout=subprocess.PIPE)
out, err = proc.communicate() # reads the child process’s output and waits for termination.
print(out.decode('utf-8'))

Hello from the child!



In [4]:
# Child processes will run independently from their parent process
from time import sleep, time
proc = subprocess.Popen(['sleep', '0.3'])
while proc.poll() is None:
    print('Working...')
    # Some time consuming work here
    sleep(0.2)

print('Exit status', proc.poll())

Working...
Working...
Exit status 0


In [5]:
# the parent process is free to run many child processes in parallel.
def run_sleep(period):
    proc = subprocess.Popen(['sleep', str(period)])
    return proc

start = time()
procs = []
for _ in range(10):
    proc = run_sleep(0.1)
    procs.append(proc)


for proc in procs:
    proc.communicate()
end = time()
print('Finished in %.3f seconds' % (end - start))

Finished in 0.196 seconds


In [15]:
# pipe data from your Python program into a subprocess and retrieve its output.
import os

def run_openssl(data):
    env = os.environ.copy()
    env['password'] = b'\xe24U\n\xd0Ql3S\x11'
    proc = subprocess.Popen(
        ['openssl', 'enc', '-des3', '-pass', 'env:password'],
        env=env,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE)
    proc.stdin.write(data)
    proc.stdin.flush()  # Ensure the child gets input
    return proc


procs = []
for _ in range(3):
    data = os.urandom(10)
    proc = run_openssl(data)
    procs.append(proc)


for proc in procs:
    out, err = proc.communicate()
    print(out[-10:])

b'\x08\x03\xc2L\x1e\xae\xfd\xb4\nu'
b'U\x00\x01U\xd3g\x05c/Q'
b'\xbce\n\xe9U\xb0\xfe\x86\xd2A'


In [17]:
# create chains of parallel processes just like UNIX pipes
def run_md5(input_stdin):
    proc = subprocess.Popen(
        ['md5sum'],
        stdin=input_stdin,
        stdout=subprocess.PIPE)
    return proc


input_procs = []
hash_procs = []
for _ in range(3):
    data = os.urandom(10)
    proc = run_openssl(data)
    input_procs.append(proc)
    hash_proc = run_md5(proc.stdout)
    hash_procs.append(hash_proc)


for proc in input_procs:
    proc.communicate()
for proc in hash_procs:
    out, err = proc.communicate()
    print(out.strip())

b'0e4e433631c0afe4ae93a66daca9b677  -'
b'ace0f8d933ad03b46ec81c50dc09d658  -'
b'9cc345e9be44ff1ce20b3d946bf5bd5e  -'


In [19]:
# Set timeout parameter
proc = run_sleep(10)
try:
    proc.communicate(timeout=0.1)
except subprocess.TimeoutExpired:
    proc.terminate()
    proc.wait()

print('Exit status', proc.poll())

Exit status -15


### Things to Remember
- Use the `subprocess` module to run child processes and manage their input and output streams.
- **Child processes run in parallel** with the Python interpreter, enabling you to **maximize your CPU usage**.
- Use the `timeout` parameter with `communicate` to **avoid deadlocks** and **hanging child processes**.

## Item 37: Use `Threads` for Blocking I/O, Avoid for Parallelism

In [25]:
# Factoring a set of numbers in serial takes quite a long time.
def factorize(number):
    for i in range(1, number + 1):
        if number % i == 0:
            yield i


from time import time
numbers = [2139079, 1214759, 1516637, 1852285]
start = time()
for number in numbers:
    list(factorize(number))
end = time()
print('Took %.3f seconds' % (end - start))

Took 0.472 seconds


In [28]:
# Using Thread
# Although Python supports multiple threads of execution, the GIL causes only one of them to make forward progress at a time
# This demonstrates the effect of the GIL on programs running in the standard CPython interpreter.
from threading import Thread

class FactorizeThread(Thread):
    def __init__(self, number):
        super().__init__()
        self.number = number

    def run(self):
        self.factors = list(factorize(self.number))


start = time()
numbers = [2139079, 1214759, 1516637, 1852285]
threads = []
for number in numbers:
    thread = FactorizeThread(number)
    thread.start()
    threads.append(thread)


for thread in threads:
    thread.join()
end = time()
print('Took %.3f seconds' % (end - start))

Took 0.486 seconds


In [29]:
import select, socket

# Creating the socket is specifically to support Windows. Windows can't do
# a select call with an empty list.
# Running this system call in serial requires a linearly increasing amount of time.
def slow_systemcall():
    select.select([socket.socket()], [], [], 0.1)


start = time()
for _ in range(5):
    slow_systemcall()
end = time()
print('Took %.3f seconds' % (end - start))

Took 0.513 seconds


In [30]:
# using thread
start = time()
threads = []
for _ in range(5):
    thread = Thread(target=slow_systemcall)
    thread.start()
    threads.append(thread)


def compute_helicopter_location(index):
    pass

for i in range(5):
    compute_helicopter_location(i)
for thread in threads:
    thread.join()
end = time()
print('Took %.3f seconds' % (end - start))

Took 0.103 seconds


### Things to Remember
- Python `threads` **can’t run bytecode in parallel on multiple CPU cores** because of the global interpreter lock (**GIL**).
- Python `threads` are **still useful** despite the GIL because they provide an easy way **to do multiple things at seemingly the same time**.
- **Use Python threads** to make **multiple system calls in parallel**. This allows you to do **blocking I/O** at the same time as computation.

## Item 38: Use Lock to Prevent Data Races in Threads

In [35]:
# write a program that counts many things in parallel
class Counter(object):
    def __init__(self):
        self.count = 0

    def increment(self, offset):
        self.count += offset


def worker(sensor_index, how_many, counter):
    # I have a barrier in here so the workers synchronize
    # when they start counting, otherwise it's hard to get a race
    # because the overhead of starting a thread is high.
    BARRIER.wait()
    for _ in range(how_many):
        # Read from the sensor
        counter.increment(1)


from threading import Barrier, Thread
BARRIER = Barrier(5)
def run_threads(func, how_many, counter):
    threads = []
    for i in range(5):
        args = (i, how_many, counter)
        thread = Thread(target=func, args=args)
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()


how_many = 10**5
counter = Counter()
run_threads(worker, how_many, counter)
print('Counter should be %d, found %d' %
      (5 * how_many, counter.count))


Counter should be 500000, found 378708


In [36]:
offset = 5
counter.count += offset


value = getattr(counter, 'count')
result = value + offset
setattr(counter, 'count', result)


# Running in Thread A
value_a = getattr(counter, 'count')
# Context switch to Thread B
value_b = getattr(counter, 'count')
result_b = value_b + 1
setattr(counter, 'count', result_b)
# Context switch back to Thread A
result_a = value_a + 1
setattr(counter, 'count', result_a)

In [37]:
# Using Lock
from threading import Lock

class LockingCounter(object):
    def __init__(self):
        self.lock = Lock()
        self.count = 0

    def increment(self, offset):
        with self.lock:
            self.count += offset


BARRIER = Barrier(5)
counter = LockingCounter()
run_threads(worker, how_many, counter)
print('Counter should be %d, found %d' %
      (5 * how_many, counter.count))

Counter should be 500000, found 500000


### Things to Remember
- Even though Python has a **global interpreter lock**, you’re still **responsible for protecting against data races** between the threads in your programs.
- Your **programs will corrupt their data structures** if you **allow multiple threads** to modify the same objects without locks.
- The **Lock class** in the `threading` built-in module is Python’s standard mutual exclusion lock implementation.