In [1]:
import numpy as np, pandas as pd, os, time

### 1. Introduction

**Parallelism** means performing multiple tasks or calculations at the same time e.g. on different CPUs or cores simultaneously.

**Concurrency** means ability to execute tasks out-of-order, possibly at the same time. In a concurrent application, two tasks can start, run, and complete in overlapping time periods i.e Task-2 can start even before Task-1 gets completed.

How concurrency is achieved various across architectures. In a single core environment, concurrency is achieved via a process called context-switching. If its a multi-core environment, concurrency can be achieved through parallelism.

**Synchronous** programming model allows tasks to be created and executed in order: next task is executed only after current task has finished.

**Asynchronous** programming model allows task switching: new tasks can be started without waiting for current tasks to finish.
- Asynchronous programming model helps to achieve concurrency. In a multi-threaded environment, it allows parallelism.
- Can be achieved via context switching in OS threads, or cooperative multitasking in user space.

---

<img src="Sync.png" width="80%"> 

<img src="Async.png" width="80%"> 

To summarize:
- Single Threaded and Multi-Threaded -> The environment of task execution. CPUs, cores, etc.
- Concurrency and Parallelism -> The way tasks are executed in the environment. 
- Synchronous and Asynchronous -> Programming model.

The above introduction is derived from this nice article: https://medium.com/swift-india/concurrency-parallelism-threads-processes-async-and-sync-related-39fd951bc61d

#### 1.1 What python offers?


- Parallelism across different CPUs using multiprocessing.Process and concurrent.futures.ProcessPoolExecutor
- Concurrency, but unfortunately not parallelism, using *threading* module and concurrent.futures.ThreadPoolExecutor. 
    - Threads in general allow parallelism but not in python due to Global Interpreter Lock (GIL).
- Asynchronous programming (cooperative multitasking) using asyncio library.

### 2. Multiprocessing

There are two main libraries for multiprocessing: 
- concurrent.futures, which provides high-level API and is easier to use
- multiprocessing, which gives more flexilibity at the expense of more boilerplate code

Below I give two small examples using each of the libraries.

#### 2.1 Using concurrent.futures
- Provides ProcessPoolExecutor and ThreadPoolExecutor classes
    - Allows creating a pool of processes or threads. Distributing tasks to the pool and managing processes in the pool is take care of by library.
    
- There are two ways to assign tasks to and gather results from the processes in the pool.
    - Executor.map function
    - Executor.submit function

**Executor.map(func, *iterables, timeout=None, chunksize=1)**
- Similar to map(func, *iterables)
- For each value in the iterables, the callable func is executed on different processes in the pool.

In [30]:
result = map(lambda x: x**2, [1,2,3,4,5])
list(result)

[1, 4, 9, 16, 25]

In [33]:
from concurrent.futures import ProcessPoolExecutor

# Since processes are forked, any global data will be available to each process but it is not shared
DIM = 10000

def worker(vector):
    time.sleep(0.01) ## Comment this line and run
    return np.linalg.norm(vector)

def parallel():
    vectors = [np.random.rand(DIM) for i in range(1000)]
    with ProcessPoolExecutor(max_workers=4) as executor:  # usually more workers than CPUs is not good idea
        result_iter = executor.map(worker, vectors) 
        result = sum(result_iter)
        print(f'Sum of Norms: {result}')

Let's look at how fast it is compared to sequential approach.

In [3]:
%%timeit -n 1 -r 5
parallel()

Sum of Norms: 57738.268572516885
Sum of Norms: 57733.0925081861
Sum of Norms: 57720.342560219
Sum of Norms: 57719.221496151535
Sum of Norms: 57730.91516878594
3.11 s ± 33.8 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)


In [4]:
def sequential():
    vectors = [np.random.rand(DIM) for i in range(1000)]
    result = sum(worker(v) for v in vectors)
    print(f'Sum of Norms: {result}')

In [5]:
%%timeit -n 1 -r 5
sequential()

Sum of Norms: 57745.07947708336
Sum of Norms: 57728.51933057455
Sum of Norms: 57737.54008351761
Sum of Norms: 57739.985365237895
Sum of Norms: 57746.92520893234
11.7 s ± 173 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)


Remarks:
- Parallelism is best achieved when execution time per process >> communication time between processes

---

**Executor.submit(func, *args, **kwargs)**
- Schedules the callable, fn, to be executed as fn(*args **kwargs) and returns a Future object representing the execution of the callable.
- A Future object encapsulates the asynchronous execution of a callable: results as well as exceptions are wrapped into the object.


In [40]:
from concurrent.futures import as_completed

# an example to show how exception are be captured and sent to another process
def worker(inp):
    if np.random.random() < 0.3:
        raise Exception('Bad luck!')
    return {'pid': os.getpid(), 'result': inp*inp}

def master():
    with ProcessPoolExecutor(max_workers=4) as executor:
        futures = {executor.submit(worker, i) for i in range(10)}
        
        while len(futures) > 0:
            completed_futures = [f for f in futures if f.done()]
            for future in completed_futures:
                exception = future.exception()
                if exception is None:
                    print('Result: %s' % future.result())
                else:
                    print('Exception: %s' % exception)
                futures.remove(future)

In [41]:
master()

Result: {'pid': 39487, 'result': 0}
Result: {'pid': 39490, 'result': 9}
Exception: Bad luck!
Exception: Bad luck!
Exception: Bad luck!
Result: {'pid': 39488, 'result': 1}
Result: {'pid': 39487, 'result': 64}
Result: {'pid': 39489, 'result': 4}
Result: {'pid': 39488, 'result': 81}
Exception: Bad luck!


Following is an easier way to do the same thing:

In [23]:
from concurrent.futures import as_completed

def worker(inp):
    if np.random.random() < 0.3:
        raise Exception('Bad luck!')
    return {'pid': os.getpid(), 'result': inp*inp}

def master():
    with ProcessPoolExecutor(max_workers=4) as executor:
        futures = [executor.submit(worker, i) for i in range(10)]
        for future in as_completed(futures):
            exception = future.exception()
            if exception is None:
                print('Result: %s' % future.result())
            else:
                print('Exception: %s' % exception)

In [24]:
master()

Result: {'pid': 39398, 'result': 0}
Result: {'pid': 39400, 'result': 4}
Result: {'pid': 39401, 'result': 9}
Result: {'pid': 39399, 'result': 1}
Exception: Bad luck!
Result: {'pid': 39400, 'result': 64}
Result: {'pid': 39400, 'result': 81}
Exception: Bad luck!
Exception: Bad luck!
Exception: Bad luck!


Remarks:
- There are some more useful methods on Future object: add_done_callback
- *timeout* argument to result and exception methods is also very useful

#### 2.3 Using multiprocessing

In [8]:
from multiprocessing import Queue, Process

#### 2.4 Pure python vs zeromq

An excellent guide to zeromq, which is a lightweight networking library and concurrency framework.
- http://zguide.zeromq.org/page:all

To install zeromq and python bindings:
- conda install zeromq pyzmq

### 3. Asynchronous Programming using asyncio

#### 3.0 An example of undesirable context-switching when using Threading

In [9]:
from concurrent.futures import ThreadPoolExecutor

def print_func(s):
    for i in range(10):
        print(s)
        time.sleep(0.1)

    
with ThreadPoolExecutor(max_workers=3) as executor:  # change num of workers
    executor.submit(print_func, '0'*50)
    executor.submit(print_func, '1'*50)

00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111

1111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000000

11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
0000000000000000000000000000000

#### 3.1 Generators
Generators are functions that generates values. A function usually returns a value and then the underlying scope is destroyed. When we call again, the function is started from scratch. It’s one time execution. But a generator function can yield a value and pause the execution of the function. The control is returned to the calling scope.

In [10]:
def gen_squares(max_value=None):
    n = 0
    while True:
        yield n * n
        n += 1
        if n*n > max_value:
            break

A generator function doesn’t directly return any values but when we call it, we get a generator object which is like an iterable. So we can call next() on a generator object to iterate over the values. Or run a for loop.

In [11]:
squares = gen_squares()

In [12]:
next(squares)

0

In [13]:
squares_upto_100 = gen_squares(max_value=100)

In [14]:
next(squares_upto_100)

0

In [15]:
for s in gen_squares(max_value=200):
    print(s)

0
1
4
9
16
25
36
49
64
81
100
121
144
169
196


#### 3.2 Naive task switching using generators

In [17]:
from itertools import cycle

def print_func(s):
    for i in range(10):
        print(s)
        time.sleep(np.random.random())
        yield None
        
def print_func2():
    for i in range(20):
        print('+'*50)
        time.sleep(np.random.random())
        yield None

def loop(generators):
    for gen in cycle(generators): # [A, B ,C] -> A, B, C, A, B, C, ...
        try:
            value = next(gen)  # we don't care about value that is yielded
        except StopIteration:
            break

loop([print_func('0'*50), 
      print_func('1'*50), 
      print_func2()])

00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
00000000000000000000000000000000000000000000000000
1111111111111111111111111111111

#### 3.3 Async programming using Coroutines

- Event Loop
- Coroutines

Revisiting the context-switching example

In [18]:
import asyncio

async def print_func(s):
    for i in range(10):
        print(s)
        await asyncio.sleep(np.random.random())
    
asyncio.ensure_future(print_func('0'*50))
asyncio.ensure_future(print_func('1'*50))
asyncio.ensure_future(print_func('+'*50))

<Task pending coro=<print_func() running at <ipython-input-18-1e6a37addcb3>:3>>

00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
11111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
++++++++++++++++++++++++++++++++++++++++++++++++++
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
++++++++++++++++++++++++++++++++++++++++++++++++++
11111111111111111111111111111111111111111111111111
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
++++++++++++++++++++++++++++++++++++++++++++++++++
11111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111
++++++++++++++++++++++++++++++++++++++++++++++++++
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
0000000000000000000000000000000

#### 3.5 Small example: Fetching webpages

In [19]:
import requests

URLS = ['https://facebook.com',
                'https://github.com',
                'https://google.com',
                'https://microsoft.com',
                'https://yahoo.com']

def sequential(urls):
    start = time.time()
    for url in urls:
        r = requests.get(url)
    print('Elapse time: %s' % (time.time()-start))

In [20]:
sequential(URLS)

Elapse time: 10.780002117156982


In [21]:
import asyncio
from aiohttp import ClientSession

async def fetch(url, session):
    async with session.get(url) as response:
        resp = await response.read()
        return resp
    
async def fetch_all(urls):
    tasks = []
    start = time.time()
    async with ClientSession() as session:
        for url in urls:
            task = asyncio.ensure_future(fetch(url, session))
            tasks.append(task) 
        await asyncio.gather(*tasks) 
    print('Elapse time: %s' % (time.time()-start))

In [22]:
future = asyncio.ensure_future(fetch_all(URLS)) # not blocking
loop = asyncio.get_event_loop()
loop.run_until_complete(future) # blocking

RuntimeError: This event loop is already running

In [None]:
asyncio.ensure_future(fetch_all(URLS))

Elapse time: 1.7943570613861084


### 4. Cool JupyterLab Trick