Pierre Navaro - [Institut de Recherche Mathématique de Rennes](https://irmar.univ-rennes1.fr) - [CNRS](http://www.cnrs.fr/)

[![nbviewer](https://img.shields.io/badge/render-nbviewer-orange.svg)](http://nbviewer.jupyter.org/github/pnavaro/big-data/blob/master/03.ParallelComputation.ipynb)

# Parallel Computation

## Parallel computers
- Multiprocessor/multicore: several processors work on data stored in shared memory
- Cluster: several processor/memory units work together by exchanging data over a network
- Co-processor: a general-purpose processor delegates specific tasks to a special-purpose processor (GPU, Xeon Phi,...)


## Parallel Programming
- Decomposition of the complete task into independent subtasks and the data flow between them.
- Distribution of the subtasks over the processors minimizing the total execution time.
- For clusters: distribution of the data over the nodes minimizing the communication time.
- For multiprocessors: optimization of the memory access patterns minimizing waiting times.
- Synchronization of the individual processes.

## MapReduce

In [1]:
from time import sleep
def f(x):
    sleep(1)
    return x*x
L = list(range(8))
L

[0, 1, 2, 3, 4, 5, 6, 7]

In [2]:
%time sum([f(x) for x in L])

CPU times: user 10 ms, sys: 0 ns, total: 10 ms
Wall time: 8.02 s


140

In [3]:
%time sum(map(f,L))

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 8.02 s


140

## Multiprocessing 

<p>
<font color=red> This first part with multiprocessing does not work
    on Windows </font>
    </p>


The multiprocessing allows the programmer to fully leverage multiple processors.
- The Pool object parallelizes the execution of a function across multiple input values.
- The if __name__ == '__main__' part is necessary.
- The multiprocessing Pool class provides a map function. Partition and distribute input to a user-specified function in pool of worker processes is automatic.

In [4]:
from multiprocessing import cpu_count

cpu_count()

4

In [5]:
%%time 
from multiprocessing import Pool

if __name__ == '__main__': # Executed only on main process.
    with Pool() as p:
        print(sum(p.map(f, L))) # Apply f on L sequence and sum


140
CPU times: user 30 ms, sys: 20 ms, total: 50 ms
Wall time: 2.08 s


- Pool() launches one slave process per physical processor on the computer. 
- pool.map(...) divides the input list into chunks and puts the tasks (function + chunk) on a queue.
- Each slave process takes a task (function + a chunk of data), runs map(function, chunk), and puts the result on a result list.
- pool.map on the master process waits until all tasks are handled and returns the concatenation of the result lists.

### Exercise 3.1

- Use `paragraph` function module from `lorem` to create a text
- Create a list of words from it
- Use `map` function from `multiprocessing.Pool` to compute each word length
- Compare time with sequential version.


In [6]:
%%time
from lorem import paragraph

words_list = paragraph().lower().replace('.','').split()
print(*map(len,words_list))

4 7 7 7 5 5 2 7 10 4 3 5 7 7 6 3 8 8 5 3 5 4 10 5 10 3 4 7 3 7 10
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 5.99 ms


In [7]:
%%time 
from multiprocessing import Pool

if __name__ == '__main__': # Executed only on main process.
    with Pool() as p:
        results = p.map(len, words_list)# Apply f on L sequence and sum

print(*results)

4 7 7 7 5 5 2 7 10 4 3 5 7 7 6 3 8 8 5 3 5 4 10 5 10 3 4 7 3 7 10
CPU times: user 20 ms, sys: 10 ms, total: 30 ms
Wall time: 137 ms


## Thread and Process: Differences

- A Process is an instance of a running program. 
- Process may contain one or more threads, but a thread cannot contain a process.
- Process has a self-contained execution environment. It has its own memory space. 
- Application running on your computer may be a set of cooperating processes.

- A Thread is made of and exist within a Process; every process has at least one. 
- Multiple threads in a process share resources, which helps in efficient communication between threads.
- Threads can be concurrent on a multi-core system, with every core executing the separate threads simultaneously.




## The Global Interpreter Lock (GIL)

- The Python interpreter is not thread safe.
- A few critical internal data structures may only be accessed by one thread at a time. Access to them is protected by the GIL.
- Attempts at removing the GIL from Python have failed until now. The main difficulty is maintaining the C API for extension modules.
- Multiprocessing avoids the GIL by having separate processes which each have an independent copy of the interpreter data structures.
- The price to pay: serialization of tasks, arguments, and results.

## Futures

The `concurrent.futures` module provides a high-level interface for asynchronously executing callables.



The asynchronous execution can be performed with:
- **threads**, using ThreadPoolExecutor, 
- separate **processes**, using ProcessPoolExecutor. 
Both implement the same interface, which is defined by the abstract Executor class.

`concurrent.futures` does not work on windows. Windows users must install 
[loky](https://github.com/tomMoral/loky).

In [8]:
#!pip install loky  # Windows users will need to install loky

In [9]:
%%time
from concurrent.futures import ProcessPoolExecutor
# from loky import ProcessPoolExecutor  # for Windows users
e = ProcessPoolExecutor()

results = sum(e.map(f, L))
print(results)

140
CPU times: user 20 ms, sys: 10 ms, total: 30 ms
Wall time: 2.03 s


In [10]:
%%time
from concurrent.futures import ThreadPoolExecutor
e = ThreadPoolExecutor()

results = sum(e.map(f, L))
print(results)

140
CPU times: user 10 ms, sys: 0 ns, total: 10 ms
Wall time: 1.02 s


### Exercise 3.2

- Use `ProcessPoolExecutor` to compute each word length.
- Use `map` to apply `len` function on each word of a text created with lorem module.

In [11]:
from lorem import text
from concurrent.futures import ProcessPoolExecutor
#from loky import ProcessPoolExecutor  # for Windows

texte = text()

e = ProcessPoolExecutor(4)
word_lengths = e.map(len, texte.split())
print(*word_lengths)

5 8 6 7 6 7 6 4 5 7 2 6 9 5 2 5 6 7 10 4 8 2 3 6 7 6 9 4 7 5 3 4 7 10 7 8 8 11 3 10 4 8 5 11 3 4 4 3 4 4 7 3 5 4 5 5 3 7 6 6 6 4 6 2 7 6 10 7 6 4 6 8 7 5 5 7 7 4 3 7 7 4 2 3 7 6 7 5 5 7 7 6 7 8 4 6 3 8 7 8 5 6 7 7 11 6 5 11 11 11 4 7 6 7 4 5 7 8 4 3 10 2 6 5 5 3 4 3 6 8 3 3 6 5 7 3 2 4 4 5 7 5 8 4 8 3 2 3 3 7 8 4 3 3 9 7 3 8 5 4 5 4 3 10 7 7 4 4 4 7 5 3 11 7 8 5 5 4 2 10 6 11 5 4 7 7 6 11 3 3 5 7 7 6 4 8 4 5 8 11 4 2 7 5 10 7 7


### Exercise 3.3

Same as exercise 3.2 but use `ThreadPoolExecutor`.

In [12]:
from lorem import text
from concurrent.futures import ThreadPoolExecutor

texte = text()

e = ProcessPoolExecutor(4)
word_lengths = e.map(len, texte.split())
print(*word_lengths)

3 4 8 11 6 2 6 12 6 7 5 4 7 10 7 8 6 8 7 10 8 8 7 8 5 7 6 5 11 7 10 4 7 10 6 4 4 7 6 7 6 6 3 5 3 5 6 12 3 6 8 6 10 4 3 7 5 11 5 3 6 8 4 8 2 7 7 8 7 4 2 6 4 7 8 5 8 8 5 8 3 3 6 6 4 8 7 12 6 3 5 4 7 5 7 8 6 3 7 9 7 5 7 4 3 7 6 8 10 7 7 5 6 10 4 5 5 7 5 5 6 6 5 3 7 5 3 3 3 7 6 5 8 6 2 7 3 3 7 8 5 5 7 4 6 4 7 8 5 4 7 4 5 4 10 3 5 8 7 5 3 4 11 6 4 4 7 7 10 8 3 8 5 4 10 7 11 7 7 5 8 4 7 7 6 4 4 10 2 3 10 2 10 4 2 10 10 4 7 3 10 3 8 9 11 7 4 7 3 4 8 5 11 7 11 8 3 5 6 5 8 5 5 7 6 5 5 6 7 4 6 8 5 11 5 7 3 4 3 2 10 12 7 10 10 9 6 8 11 2 5 10 3 11 6 4 5 6 8 7 5 3 5 10 5 6 5 3 3 7


# Map

This words version contains some improvements and print out the 
process number where the function is executed.

In [13]:
import string
import multiprocessing as mp  # Windows users should comment this line
def words_mp(file):
    """
    Check if file is utf8
    Read a text file and return a sorted list of (word, 1) values.
    """
    # Windows users should comment this line below
    print(mp.current_process().name, 'reading', file)
    translator = str.maketrans('', '', string.punctuation)
    output = []
    try:
        with open(file) as f:
            for line in f:   
                line = line.strip()
                line = line.translate(translator)
                for word in line.split():
                    if word.isalpha():
                        word = word.lower()
                        output.append((word, 1))
                        
    except UnicodeDecodeError as err:
        print("Some error occurred decoding file %s: %s" % (file, err))
                
    output.sort()
    return output

words_mp('sample.txt')

MainProcess reading sample.txt


[('aliquam', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('amet', 1),
 ('amet', 1),
 ('amet', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolor', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('est', 1),
 ('est', 1),
 ('est', 1),
 ('est', 1),
 ('est', 1),
 ('est', 1),
 ('est', 1),
 ('est', 1),
 ('etincidunt', 1),
 ('etincidunt', 1),
 ('ipsum', 1),
 ('ipsum', 1),
 ('labore', 1),
 ('labore', 

# Partition
Before parallel reduce operation, data must be aligned in a container. We create a function named `partition_mp` that stores the key/value pairs from `words_mp` into a `defaultdict` from collections module. Ouput is:
[('word1', [1, 1]), ('word2', [1]), ('word3', [1, 1, 1])]

In [14]:
import collections
def partition_mp(mapped_values):
    """
        Organize the mapped values by their key.
        Returns an unsorted sequence of tuples 
        with a key and a sequence of values.
    """
    partitioned_data = collections.defaultdict(list)
    for key, value in mapped_values:
        partitioned_data[key].append(value)
    return partitioned_data.items()

In [15]:
partition_mp(words_mp('sample.txt'))

MainProcess reading sample.txt


dict_items([('aliquam', [1, 1, 1, 1, 1, 1, 1, 1]), ('amet', [1, 1, 1]), ('consectetur', [1, 1, 1, 1, 1]), ('dolor', [1, 1, 1, 1, 1, 1, 1, 1, 1]), ('dolore', [1, 1, 1, 1, 1, 1, 1]), ('dolorem', [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), ('eius', [1, 1, 1, 1, 1, 1, 1]), ('est', [1, 1, 1, 1, 1, 1, 1, 1]), ('etincidunt', [1, 1]), ('ipsum', [1, 1]), ('labore', [1, 1, 1, 1, 1, 1, 1, 1]), ('magnam', [1, 1, 1, 1, 1, 1, 1]), ('modi', [1, 1, 1, 1, 1, 1, 1]), ('neque', [1, 1, 1, 1, 1, 1]), ('non', [1, 1, 1, 1, 1]), ('numquam', [1, 1, 1, 1, 1, 1, 1, 1, 1]), ('porro', [1, 1, 1, 1, 1]), ('quaerat', [1, 1, 1, 1]), ('quiquia', [1, 1, 1, 1, 1, 1, 1]), ('quisquam', [1, 1, 1, 1, 1, 1]), ('sed', [1, 1, 1, 1, 1, 1, 1, 1, 1]), ('sit', [1, 1, 1]), ('tempora', [1, 1, 1, 1, 1, 1]), ('ut', [1, 1, 1, 1, 1, 1, 1, 1]), ('velit', [1, 1, 1, 1, 1, 1]), ('voluptatem', [1, 1, 1, 1, 1, 1])])

# Reduce

In [16]:
def reduce_mp(item):
    """Convert the partitioned data for a word to a
    tuple containing the word and the number of occurances.
    """
    word, occurances = item
    return (word, len(occurances))

In [17]:
for occurences in partition_mp(words_mp('sample.txt')):
    print(reduce_mp(occurences))

MainProcess reading sample.txt
('aliquam', 8)
('amet', 3)
('consectetur', 5)
('dolor', 9)
('dolore', 7)
('dolorem', 10)
('eius', 7)
('est', 8)
('etincidunt', 2)
('ipsum', 2)
('labore', 8)
('magnam', 7)
('modi', 7)
('neque', 6)
('non', 5)
('numquam', 9)
('porro', 5)
('quaerat', 4)
('quiquia', 7)
('quisquam', 6)
('sed', 9)
('sit', 3)
('tempora', 6)
('ut', 8)
('velit', 6)
('voluptatem', 6)


In [18]:
%%time
import itertools   
import glob

# Sequential version

filenames = glob.glob("sample0*.txt")
mapped_values = map(words_mp, filenames)
partionned_data = partition_mp(itertools.chain(*mapped_values))
results = map(reduce_mp,partionned_data)

MainProcess reading sample00.txt
MainProcess reading sample01.txt
MainProcess reading sample02.txt
MainProcess reading sample03.txt
MainProcess reading sample04.txt
MainProcess reading sample05.txt
MainProcess reading sample06.txt
MainProcess reading sample07.txt
CPU times: user 0 ns, sys: 10 ms, total: 10 ms
Wall time: 29.3 ms


In [19]:
print(*results)

('adipisci', 52) ('aliquam', 47) ('amet', 58) ('consectetur', 45) ('dolor', 51) ('dolore', 59) ('dolorem', 58) ('eius', 44) ('est', 53) ('etincidunt', 54) ('ipsum', 67) ('labore', 68) ('magnam', 60) ('modi', 69) ('neque', 57) ('non', 55) ('numquam', 57) ('porro', 46) ('quaerat', 48) ('quiquia', 51) ('quisquam', 61) ('sed', 54) ('sit', 48) ('tempora', 59) ('ut', 60) ('velit', 61) ('voluptatem', 61)


### Exercise 3.4

Write a parallel program that uses the three functions above using `multiprocessing`. It reads all the "sample\*.txt" files. Some hints:
- Map and reduce steps are parallel.
- See how `itertools.chain(*mapped_values)` is used in notebook exercise 01.6.
- Compare time between the notebook 01 version. 

In [20]:
%%time
import itertools   
import glob
import operator
from multiprocessing import Pool

filenames = glob.glob("sample0*.txt")

# Parallel version
if __name__ == '__main__':
    with Pool() as p:
        mapped_values = p.map(words_mp, filenames)
        partionned_data = partition_mp(itertools.chain(*mapped_values))
        results = p.map(reduce_mp,partionned_data)
print(sorted(results, key=operator.itemgetter(1), reverse=True))

ForkPoolWorker-22 reading sample01.txt
ForkPoolWorker-24 reading sample03.txt
ForkPoolWorker-21 reading sample00.txt
ForkPoolWorker-23 reading sample02.txt
ForkPoolWorker-23 reading sample05.txt
ForkPoolWorker-21 reading sample06.txt
ForkPoolWorker-22 reading sample04.txt
ForkPoolWorker-24 reading sample07.txt
[('modi', 69), ('labore', 68), ('ipsum', 67), ('quisquam', 61), ('velit', 61), ('voluptatem', 61), ('magnam', 60), ('ut', 60), ('dolore', 59), ('tempora', 59), ('amet', 58), ('dolorem', 58), ('neque', 57), ('numquam', 57), ('non', 55), ('etincidunt', 54), ('sed', 54), ('est', 53), ('adipisci', 52), ('dolor', 51), ('quiquia', 51), ('quaerat', 48), ('sit', 48), ('aliquam', 47), ('porro', 46), ('consectetur', 45), ('eius', 44)]
CPU times: user 40 ms, sys: 10 ms, total: 50 ms
Wall time: 133 ms


### Exercise 3.5

- Replace `multiprocessing` by `concurrent.futures` functions.
- Try  `ProcessPoolExecutor` and `ThreadPoolExecutor`

In [21]:
%%time
import itertools   
import glob
from concurrent.futures import ThreadPoolExecutor

# Parallel version
e = ThreadPoolExecutor()
filenames = glob.glob("sample0*.txt")
mapped_values = e.map(words_mp, filenames)
partionned_data = partition_mp(itertools.chain(*mapped_values))
results = e.map(reduce_mp,partionned_data)
print(sorted(results, key=operator.itemgetter(1), reverse=True))

MainProcess reading MainProcesssample00.txt 
reading sample01.txt
MainProcess reading sample02.txt
MainProcess reading sample03.txt
MainProcess MainProcessreading sample04.txt
 reading MainProcess readingsample05.txt 
sample06.txt
MainProcess reading sample07.txt
[('modi', 69), ('labore', 68), ('ipsum', 67), ('quisquam', 61), ('velit', 61), ('voluptatem', 61), ('magnam', 60), ('ut', 60), ('dolore', 59), ('tempora', 59), ('amet', 58), ('dolorem', 58), ('neque', 57), ('numquam', 57), ('non', 55), ('etincidunt', 54), ('sed', 54), ('est', 53), ('adipisci', 52), ('dolor', 51), ('quiquia', 51), ('quaerat', 48), ('sit', 48), ('aliquam', 47), ('porro', 46), ('consectetur', 45), ('eius', 44)]
CPU times: user 10 ms, sys: 30 ms, total: 40 ms
Wall time: 39.1 ms


In [22]:
%%time
import itertools   
import glob
from concurrent.futures import ProcessPoolExecutor

# Parallel version
e = ProcessPoolExecutor()
filenames = glob.glob("sample0*.txt")
mapped_values = e.map(words_mp, filenames)
partionned_data = partition_mp(itertools.chain(*mapped_values))
results = e.map(reduce_mp,partionned_data)
print(sorted(results, key=operator.itemgetter(1), reverse=True))

Process-26 reading sample01.txt
Process-28 reading sample03.txt
Process-27 reading sample02.txt
Process-25 reading sample00.txt
Process-26 reading sample05.txt
Process-28 reading sample06.txt
Process-25 reading sample07.txt
Process-27 reading sample04.txt
[('modi', 69), ('labore', 68), ('ipsum', 67), ('quisquam', 61), ('velit', 61), ('voluptatem', 61), ('magnam', 60), ('ut', 60), ('dolore', 59), ('tempora', 59), ('amet', 58), ('dolorem', 58), ('neque', 57), ('numquam', 57), ('non', 55), ('etincidunt', 54), ('sed', 54), ('est', 53), ('adipisci', 52), ('dolor', 51), ('quiquia', 51), ('quaerat', 48), ('sit', 48), ('aliquam', 47), ('porro', 46), ('consectetur', 45), ('eius', 44)]
CPU times: user 40 ms, sys: 20 ms, total: 60 ms
Wall time: 76.9 ms


You can use for your multi-processing computations both `multiprocessing.Pool` and  `concurrent.futures` object, which behaves more or less identically.

However, today most library designers are coordinating around the  second interface, so it's wise to move over.

`concurrent.futures.ProcessPoolExecutor` is suitable for simple parallelism across many files and you gain some speed boost. 
Describing each task as a function call helps use tools like map for parallelism.

## Increase volume of data

### Getting the data

[The Latin Library](http://www.thelatinlibrary.com/) contains a huge collection of freely accessible Latin texts. We get links on the Latin Library's homepage ignoring some links that are not associated with a particular author.




In [23]:
%%file credentials.py
login = "your sesame login"
password = "your sesame password"                   )

Overwriting credentials.py


In [24]:
from bs4 import BeautifulSoup
from urllib.request import *
from credentials import *


#Uncomment lines below to enable proxy
#proxy_url = login+':'+password+'@192.168.192.17:8080'
#proxy = ProxyHandler({'http': 'http://'+proxy_url, 'https': 'https://'+proxy_url })
#auth = HTTPBasicAuthHandler()
#opener = build_opener(proxy, auth, HTTPHandler)
#install_opener(opener)

base_url = "http://www.thelatinlibrary.com/"
home_content = urlopen(base_url)

soup = BeautifulSoup(home_content, "lxml")
author_page_links = soup.find_all("a")
author_pages = [ap["href"] for i, ap in enumerate(author_page_links) if i < 49]

SyntaxError: invalid syntax (credentials.py, line 2)

Create a list of all links pointing to Latin texts. The Latin Library uses a special format which makes it easy to find the corresponding links: All of these links contain the name of the text author.

In [25]:
ap_content = list()
for ap in author_pages:
    ap_content.append(urlopen(base_url + ap))

NameError: name 'author_pages' is not defined

In [26]:
book_links = list()
for path, content in zip(author_pages, ap_content):
    author_name = path.split(".")[0]
    ap_soup = BeautifulSoup(content, "lxml")
    book_links += ([link for link in ap_soup.find_all("a", {"href": True}) if author_name in link["href"]])

print(book_links[:5])

NameError: name 'author_pages' is not defined

In [27]:
texts = list()
num_pages = 100

for i, bl in enumerate(book_links[:num_pages]):
    print("Getting content " + str(i + 1) + " of " + str(num_pages), end="\r", flush=True)
    try:
        content = urlopen(base_url + bl["href"]).read() 
        texts.append(content)
    except HTTPError as err:
        print("Unable to retrieve " + bl["href"] + ".")
        continue

We split the text at periods to convert it into sentences.

In [28]:
%%time
sentences = list()

for text in texts:
    print("Document " + str(i + 1) + " of " + str(len(texts)), end="\r", flush=True)
    textSoup = BeautifulSoup(text, "lxml")
    paragraphs = textSoup.find_all("p", attrs={"class":None})
    prepared = ("".join([p.text.strip().lower() for p in paragraphs[1:-1]]))
    for t in prepared.split("."):
        part = "".join([c for c in t if c.isalpha() or c.isspace()])
        sentences.append(part.strip())

print(sentences[0])

IndexError: list index out of range

### Exercise 3.6

Parallelize these last process using `concurrent.future`.
Do not try to print out too much text, the notebook will raise an error.

In [29]:
import json
with open('sentences.txt', 'w') as outfile:
    json.dump(sentences, outfile)

In [30]:
%%time
from concurrent.futures import ProcessPoolExecutor
import itertools

def convert_to_sentences(text):
    sentences = list()
    print("Document " + str(i + 1) + " of " + str(len(texts)), end="\r", flush=True)
    textSoup = BeautifulSoup(text, "lxml")
    paragraphs = textSoup.find_all("p", attrs={"class":None})
    prepared = ("".join([p.text.strip().lower() for p in paragraphs[1:-1]]))
    for t in prepared.split("."):
        part = "".join([c for c in t if c.isalpha() or c.isspace()])
        sentences.append(part.strip())
    return sentences
        
e = ProcessPoolExecutor()
result = e.map(convert_to_sentences, texts)

print(next(result)[0])

StopIteration: 

## References

- [Using Conditional Random Fields and Python for Latin word segmentation](https://medium.com/@felixmohr/using-python-and-conditional-random-fields-for-latin-word-segmentation-416ca7a9e513)