# subprocess

Sometimes there is a need to **launch a program from your Python code**

*For example*: our testing system runs tests for your code. And you need to make sure the tests ran correctly

<center>
<img src="https://blag.felixhummel.de/_images/process_stdin_stdout_stderr_return-code.png" alt="process" width=800 />
</center>

In [1]:
import subprocess

In [None]:
# Simplified code
print('\033[93mRunning tests...\033[0m')
try:
    subprocess.run(
        ['pytest', '-p', 'no:cacheprovider', '--tb=no'],
        cwd=str(build_dir),
        timeout=60,
        check=True,
    )
except subprocess.CalledProcessError as exc:
    # Catch if any error occured
except subprocess.TimeoutExpired as exc:
    # or process timed out

In [None]:
# recommended to use whenever it's possible
subprocess.run

# for more complex cases
subprocess.Popen

In [None]:
class Popen:    
    def __init__(
        self,
        args,           # string or sequence to execute
        stdin=None, stdout=None, stderr=None,
        cwd=None,       # set current working directory
        env=None,       # set environment variables
        text=False,     # use default encoding
        shell=False,    # handle args as shell command
        ...             # and many other arguments
    ): ...

Documentation: https://docs.python.org/3/library/subprocess.html

In [6]:
process = subprocess.Popen(['ls', '-l'])  # in practice, use os.listdir
process

<Popen: returncode: None args: ['ls', '-l']>

total 72
-rw-r--r--@ 1 Rodion.Khvorostov  staff  35718 Oct 29 19:22 lec09.ipynb


An **exit code** is a number a process returns when it finishes, indicating whether it succeeded (usually `0`) or failed (non-zero).


In [7]:
# wait and get exit code
# "to poll" something means to repeatedly check its state without waiting (OS terminology)
process.poll()

0

In [8]:
# wait with timeout
process.wait(timeout=1)  # timeout in secs

0

In [10]:
stdout, stderr = process.communicate()
stdout  # no stdout :(

In [15]:
process = subprocess.Popen(['echo', 'something'], stdout=subprocess.PIPE, text=True)
stdout, _ = process.communicate()

In [16]:
print(''.join(stdout))

something



In [19]:
# communicate(input='...') with Popen(..., stdin=PIPE) to pass smth to stdin
# communicate(timeout=1) for setting timeout in seconds

In [None]:
def run(
    *popenargs,
    check=False,           # check process exit code and raise on failure
    capture_output=False,  # set stdout=PIPE and stderr=PIPE
    timeout=None,          # passed to Popen.communicate(), raise on timeout
    input=None,            # passed to Popen.communicate() as stdin
    **popenkwargs
) -> subprocess.CompletedProcess: ...

In [21]:
subprocess.run(['bc'], input=b'2 * 3\n', capture_output=True)

CompletedProcess(args=['bc'], returncode=0, stdout=b'6\n', stderr=b'')

At the moment, `subprocess` is the most proper way to launch other processes in Python

# threading

<center>
<img alt="threads" src="https://www.backblaze.com/blog/wp-content/uploads/2017/08/diagram-thread-process-1.png" width="800px" />
</center>

**Thread** — an execution stream within a process

Threads within a single process share **common** memory

<div align="center"><img src="https://i.imgur.com/nlhI00n.png?1" width="650px"/></div>

<center>
<img src="http://www.openrtos.net/implementation/TaskExecution.gif" alt="concurrency" width="800px" />
</center>

At any given moment, a **single** CPU core executes **exactly one** thread

Multiple cores can execute multiple threads literally **at the same time**

<center>
<img src="https://www.backblaze.com/blog/wp-content/uploads/2017/08/diagram-thread-concurrency.png" alt="parallelism" width=800 />
</center>

However, it **doesn't make sense** to create more threads than you have CPU cores (if the goal is to increase performance)

In [23]:
import threading

In [None]:
def greeter(num: int) -> None:
    print(f'Hello {num}') # -> PRINT(TEXT) & PRINT(NEWLINE)

In [25]:
def run_threads(count: int) -> None:
    threads = [
        threading.Thread(target=greeter, args=(i,))
        for i in range(count)
    ]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()

In [27]:
run_threads(4)

Hello 0Hello 1
Hello 2
Hello 3



The following example is from Raymond Hettinger's talk (Python core developer):
https://www.youtube.com/watch?v=Bv25Dwe84g0

In [None]:
count = 0

def counter() -> None:
    global count
    count += 1 # count = count + 1; 
    print(f'{count} ', end='')

In [43]:
threads = [threading.Thread(target=counter) for _ in range(10)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

1 2 3 4 5 6 7 8 9 10 

There's a bug lurking in the slide's code that hasn't manifested yet

<center>
<img src="https://s3.amazonaws.com/s3-blogs.mentor.com/colinwalls/files/2018/05/RTC-520x118.png" alt="context-switching" />
</center>
Reminder: threads are constantly being context-switched by the OS

Let's increase the thread's lifetime using `time.sleep`

In [44]:
import time
import random

In [47]:
count = 0

def counter() -> None:
    global count
    old_count = count
    time.sleep(random.randint(0, 1))
    count = old_count + 1
    print(f'{count} ', end='')

In [48]:
count = 0
threads = [threading.Thread(target=counter) for _ in range(10)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

1 2 3 4 5 6 7 1 6 6 

This is the classic problem of multithreaded code — a **race** (aka **race condition**)

### Solution

A **lock** is a synchronization mechanism that allows only one thread at a time to access a shared resource, preventing race conditions.

In [None]:
count = 0
lock = threading.Lock()

def counter() -> None:
    global count
    # count += 1 ; with lock: count += 1
    with lock:
        # now only one thread can be here at a time
        old_count = count
        time.sleep(random.randint(0, 1))
        count = old_count + 1
        print(f'{count} ', end='')

In [50]:
threads = [threading.Thread(target=counter) for _ in range(10)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

1 2 3 4 5 6 7 8 9 10 

### Approach with `queue.Queue`

<center>
<img src="https://pengphy.files.wordpress.com/2010/09/image5b95d.png?w=1400" alt="queue" width=500 />
</center>

### Task
Compute the sum of a sequence of numbers using N threads (<= the number of CPU cores)

In [51]:
import queue
from collections.abc import Iterable

In [52]:
def adder(array: Iterable[int], part_id: int, thread_count: int, queue_out: queue.Queue) -> None:
    queue_out.put(sum(array[i] for i in range(part_id, len(array), thread_count)))

# thread_count is the number of threads to use
# part_id is the id of the current thread

# if thread_count is 8 and part_id is 3, then range(part_id, len(array), thread_count) is [3, 11, 19...]
# this way we split the array into 8 parts and each thread computes the sum of its part

In [53]:
def sum_using_threads(array: Iterable[int], thread_count: int) -> list[int]:
    queue_out = queue.Queue()
    threads = [
        threading.Thread(target=lambda i=i: adder(array, i, thread_count, queue_out))
        for i in range(thread_count)
    ]
    for thread in threads:
        thread.start()
    results = []
    for thread in threads:
        results.append(queue_out.get())
        thread.join()
    return sum(results)

In [54]:
array = [1 for _ in range(10_000_000)]

In [55]:
%%timeit
sum(array[i] for i in range(len(array)))  # sum(array)

330 ms ± 8.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [56]:
%%timeit
sum_using_threads(array, 8)

322 ms ± 9.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Reason for the lack of speedup: **GIL (Global Interpreter Lock)**

The GIL makes Python threads useless for parallelizing CPU-bound computations

In [57]:
import requests

In [58]:
urls = [
    'https://ya.ru', 'https://www.google.com',
    'https://www.python.org', 'https://isocpp.org',
    'https://habr.com', 'https://news.ycombinator.com'
]

In [59]:
def read_url(url: str) -> str:
    return requests.get(url).text

In [60]:
%%timeit
for url in urls:
    read_url(url)

1.74 s ± 193 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [62]:
%%timeit
readers = [
    threading.Thread(target=lambda url=url: read_url(url))
    for url in urls
]
for reader in readers:
    reader.start()
for reader in readers:
    reader.join()

654 ms ± 17.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


For certain types of tasks — **I/O‑bound** — Python threads are effective

<center>
<img alt="IO-bound" src="https://risingstack-blog.s3.amazonaws.com/2016/Jun/non_async_blocking_operations_example_in_node_hero_1459856858194-1466683867567.png" width=700 />
</center>

**NB**: It can also be useful to use threads to separate execution paths.

Although this won't give you a speedup, organizing complex code this way can be more convenient.

For example, rendering animation, background activities, and other lightly loaded tasks.

optional

# multiprocessing

The GIL forbids parallel execution of multiple threads

<center>
<img src="https://s3.amazonaws.com/media-p.slid.es/uploads/299675/images/1413349/Screen_Shot_2015-05-23_at_15.58.31.png" alt="GIL" width=600 />
</center>

But the GIL can't affect separate processes, so in theory you can use multiple processes to parallelize computations!

In [63]:
import multiprocessing

In [64]:
accumulator = []

def worker() -> None:
    accumulator.append('item')

In [66]:
processes = [multiprocessing.Process(target=worker) for _ in range(5)]
for p in processes:
    p.start()
for p in processes:
    p.join()
    
accumulator

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'worker' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 13

[]

Processes don't share memory, so you need a mechanism for exchanging data between them

`multiprocessing.Pool` provides such a ready‑made mechanism

In [1]:
import multiprocessing

def multiplier(x: int) -> int:
    return x * 2

In [None]:
with multiprocessing.Pool() as pool:
    result = pool.map(multiplier, range(10))
result

Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/queues.py", line 389, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'multiplier' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
Process SpawnPoolWorker-3:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/process.py"

Back to the summation task

In [7]:
size = 10_000_000
array = [1 for _ in range(size)]

In [8]:
%%timeit
sum(array)

28.7 ms ± 941 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [9]:
import multiprocessing

process_count = multiprocessing.cpu_count()  # 8
part_size = size // process_count
array_parts = [
    array[i * part_size: (i + 1) * part_size]
    for i in range(process_count)
]

In [10]:
with multiprocessing.Pool(process_count) as pool:
    %timeit pool.map(sum, array_parts)

104 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [11]:
one_part = array[0 * part_size: (0 + 1) * part_size]
%timeit sum(one_part)

2.85 ms ± 19 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


A lot of time is spent on interprocess communication :(

Let's try not to transfer a lot of data between processes

In [12]:
def sum_n(n: int) -> int:
    return sum(1 for _ in range(n))

In [13]:
%%timeit
sum_n(size)

282 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [14]:
import multiprocessing

process_count = 8
with multiprocessing.Pool(process_count) as pool:
    %timeit pool.map(sum_n, (part_size for _ in range(process_count)))

Process SpawnPoolWorker-37:
Process SpawnPoolWorker-39:
Process SpawnPoolWorker-38:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/queues.py", line 389, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'sum_n' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
  File "/Library/Frameworks/Python.framew

KeyboardInterrupt: 

There's a noticeable speedup, but the multiplier is smaller than the number of CPU cores :(

### Summary

Spawning processes is expensive

Transferring data between processes is also expensive

Therefore, if there's a lot of data to exchange and the task isn't too heavy, it's better to avoid `multiprocessing`

Otherwise, use `multiprocessing` — it is straightforward