# Lab6 11/1/23

# Example for single-thread, multi-thread and multi-process comparison

* https://medium.com/towards-artificial-intelligence/the-why-when-and-how-of-using-python-multi-threading-and-multi-processing-afd1b8a8ecca
* https://medium.com/analytics-vidhya/multithreading-and-multiprocessing-in-python-1f773d1d160d

## 1. Single-threaded, single-process

In [1]:
import urllib.request
from concurrent.futures import ThreadPoolExecutor

urls = [
  'http://www.python.org',
  'https://docs.python.org/3/',
  'https://docs.python.org/3/whatsnew/3.7.html',
  'https://docs.python.org/3/tutorial/index.html',
  'https://docs.python.org/3/library/index.html',
  'https://docs.python.org/3/reference/index.html',
  'https://docs.python.org/3/using/index.html',
  'https://docs.python.org/3/howto/index.html',
  'https://docs.python.org/3/installing/index.html',
  'https://docs.python.org/3/distributing/index.html',
  'https://docs.python.org/3/extending/index.html',
  'https://docs.python.org/3/c-api/index.html',
  'https://docs.python.org/3/faq/index.html'
  ]

10


In [2]:
%%timeit

results = []
for url in urls:
    with urllib.request.urlopen(url) as src:
        results.append(src)

1.33 s ± 156 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## 2. Multi-threading

 * Can be used for IO-bound tasks (such as reading urls)
 * The majority of the time is waiting from the network, database, file, etc
 * Waiting for the I/O process can be used to access multiple urls concurrently 
 * https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor
 * A higher level interface to push tasks to a background thread without blocking execution of the calling thread, while still being able to retrieve their results when needed.

In [3]:
import urllib.request
from concurrent.futures import ThreadPoolExecutor

urls = [
  'http://www.python.org',
  'https://docs.python.org/3/',
  'https://docs.python.org/3/whatsnew/3.7.html',
  'https://docs.python.org/3/tutorial/index.html',
  'https://docs.python.org/3/library/index.html',
  'https://docs.python.org/3/reference/index.html',
  'https://docs.python.org/3/using/index.html',
  'https://docs.python.org/3/howto/index.html',
  'https://docs.python.org/3/installing/index.html',
  'https://docs.python.org/3/distributing/index.html',
  'https://docs.python.org/3/extending/index.html',
  'https://docs.python.org/3/c-api/index.html',
  'https://docs.python.org/3/faq/index.html'
  ]

In [4]:
%%timeit

with ThreadPoolExecutor(4) as executor:
    results = executor.map(urllib.request.urlopen, urls)

377 ms ± 19.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
%%timeit

with ThreadPoolExecutor(16) as executor:
    results = executor.map(urllib.request.urlopen, urls)

208 ms ± 68.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## 3. Multi-processing

 * Speed up CPU bound tasks by using the multiprocessing module
 * Useful for scripts that spend most of the computations on the CPU (mathematical computations, image processing, ect)
 * If computations are independent of one another, split them up among the available CPU cores 
 * Independent workers (number of workers) depend on number of cores, RAM, other processes currently running, etc

In [10]:
from multiprocessing import Pool, cpu_count

from if_prime import if_prime

In [11]:
%%time

sum_primes = 0

for i in range(1000000):
    sum_primes += if_prime(i)

CPU times: user 954 ms, sys: 1.84 ms, total: 956 ms
Wall time: 959 ms


In [12]:
%%time


with Pool(2) as p:
    answer = sum(p.map(if_prime, list(range(1000000))))

CPU times: user 55.2 ms, sys: 22.5 ms, total: 77.7 ms
Wall time: 563 ms


In [13]:
%%time

with Pool(4) as p:
    answer = sum(p.map(if_prime, list(range(1000000))))

CPU times: user 56.8 ms, sys: 32.6 ms, total: 89.4 ms
Wall time: 339 ms


In [14]:
%%timeit

with Pool(8) as p:
    answer = sum(p.map(if_prime, list(range(1000000))))

213 ms ± 3.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [15]:
%%timeit

with Pool(16) as p:
    answer = sum(p.map(if_prime, list(range(1000000))))

260 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [16]:
%%timeit

with Pool(32) as p:
    answer = sum(p.map(if_prime, list(range(1000000))))

382 ms ± 3.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [17]:
%%timeit

with Pool(64) as p:
    answer = sum(p.map(if_prime, list(range(1000000))))

634 ms ± 5.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
%%timeit

with Pool(cpu_count()-1) as p:
    answer = sum(p.map(if_prime, list(range(1000000))))

218 ms ± 3.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
