Pierre Navaro - [Institut de Recherche Mathématique de Rennes](https://irmar.univ-rennes1.fr) - [CNRS](http://www.cnrs.fr/)

[![nbviewer](https://img.shields.io/badge/render-nbviewer-orange.svg)](http://nbviewer.jupyter.org/github/pnavaro/big-data/blob/master/03.ParallelComputation.ipynb)

# Parallel Computation

## Parallel computers
- Multiprocessor/multicore: several processors work on data stored in shared memory
- Cluster: several processor/memory units work together by exchanging data over a network
- Co-processor: a general-purpose processor delegates specific tasks to a special-purpose processor (GPU, Xeon Phi,...)


## Parallel Programming
- Decomposition of the complete task into independent subtasks and the data flow between them.
- Distribution of the subtasks over the processors minimizing the total execution time.
- For clusters: distribution of the data over the nodes minimizing the communication time.
- For multiprocessors: optimization of the memory access patterns minimizing waiting times.
- Synchronization of the individual processes.

## MapReduce

In [1]:
from time import sleep
def f(x):
    sleep(1)
    return x*x
L = list(range(8))
L

[0, 1, 2, 3, 4, 5, 6, 7]

In [2]:
%time sum([f(x) for x in L])

CPU times: user 2.34 ms, sys: 2.9 ms, total: 5.24 ms
Wall time: 8.03 s


140

In [3]:
%time sum(map(f,L))

CPU times: user 2.69 ms, sys: 3.23 ms, total: 5.92 ms
Wall time: 8.02 s


140

## Multiprocessing 

<p>
<font color=red> This first part with multiprocessing does not work
    on Windows </font>
    </p>


The multiprocessing allows the programmer to fully leverage multiple processors.
- The Pool object parallelizes the execution of a function across multiple input values.
- The if __name__ == '__main__' part is necessary.
- The multiprocessing Pool class provides a map function. Partition and distribute input to a user-specified function in pool of worker processes is automatic.

In [4]:
from multiprocessing import cpu_count

cpu_count()

8

In [5]:
%%time 
from multiprocessing import Pool

if __name__ == '__main__': # Executed only on main process.
    with Pool() as p:
        print(sum(p.map(f, L))) # Apply f on L sequence and sum


140
CPU times: user 15.3 ms, sys: 29 ms, total: 44.3 ms
Wall time: 1.08 s


- Pool() launches one slave process per physical processor on the computer. 
- pool.map(...) divides the input list into chunks and puts the tasks (function + chunk) on a queue.
- Each slave process takes a task (function + a chunk of data), runs map(function, chunk), and puts the result on a result list.
- pool.map on the master process waits until all tasks are handled and returns the concatenation of the result lists.

### Exercise 3.1

- Use `paragraph` function module from `lorem` to create a text
- Create a list of words from it
- Use `map` function from `multiprocessing.Pool` to compute each word length
- Compare time with sequential version.


In [6]:
%%time
from lorem import paragraph

words_list = paragraph().lower().replace('.','').split()
print(*map(len,words_list))

3 3 5 3 6 4 5 2 7 5 6 2 10 2 3 2 8 2 7 5 7 10 5 4 5 3 4 7 5 5 8 3 10 3 5 8 4 6 3 3 7 2 7
CPU times: user 2.59 ms, sys: 2.19 ms, total: 4.78 ms
Wall time: 3.51 ms


In [7]:
%%time 
from multiprocessing import Pool

if __name__ == '__main__': # Executed only on main process.
    with Pool() as p:
        results = p.map(len, words_list)# Apply f on L sequence and sum

print(*results)

3 3 5 3 6 4 5 2 7 5 6 2 10 2 3 2 8 2 7 5 7 10 5 4 5 3 4 7 5 5 8 3 10 3 5 8 4 6 3 3 7 2 7
CPU times: user 17.1 ms, sys: 34.7 ms, total: 51.9 ms
Wall time: 144 ms


## Thread and Process: Differences

- A Process is an instance of a running program. 
- Process may contain one or more threads, but a thread cannot contain a process.
- Process has a self-contained execution environment. It has its own memory space. 
- Application running on your computer may be a set of cooperating processes.

- A Thread is made of and exist within a Process; every process has at least one. 
- Multiple threads in a process share resources, which helps in efficient communication between threads.
- Threads can be concurrent on a multi-core system, with every core executing the separate threads simultaneously.




## The Global Interpreter Lock (GIL)

- The Python interpreter is not thread safe.
- A few critical internal data structures may only be accessed by one thread at a time. Access to them is protected by the GIL.
- Attempts at removing the GIL from Python have failed until now. The main difficulty is maintaining the C API for extension modules.
- Multiprocessing avoids the GIL by having separate processes which each have an independent copy of the interpreter data structures.
- The price to pay: serialization of tasks, arguments, and results.

## Futures

The `concurrent.futures` module provides a high-level interface for asynchronously executing callables.



The asynchronous execution can be performed with:
- **threads**, using ThreadPoolExecutor, 
- separate **processes**, using ProcessPoolExecutor. 
Both implement the same interface, which is defined by the abstract Executor class.

`concurrent.futures` does not work on windows. Windows users must install 
[loky](https://github.com/tomMoral/loky).

In [8]:
#!pip install loky  # Windows users will need to install loky

In [9]:
%%time
from concurrent.futures import ProcessPoolExecutor
# from loky import ProcessPoolExecutor  # for Windows users
e = ProcessPoolExecutor()

results = sum(e.map(f, L))
print(results)

140
CPU times: user 12.9 ms, sys: 26.9 ms, total: 39.8 ms
Wall time: 1.04 s


In [10]:
%%time
from concurrent.futures import ThreadPoolExecutor
e = ThreadPoolExecutor()

results = sum(e.map(f, L))
print(results)

140
CPU times: user 3.65 ms, sys: 4.69 ms, total: 8.34 ms
Wall time: 1.01 s


### Exercise 3.2

- Use `ProcessPoolExecutor` to compute each word length.
- Use `map` to apply `len` function on each word of a text created with lorem module.

In [11]:
from lorem import text
from concurrent.futures import ProcessPoolExecutor
#from loky import ProcessPoolExecutor  # for Windows

texte = text()

e = ProcessPoolExecutor(4)
word_lengths = e.map(len, texte.split())
print(*word_lengths)

7 2 3 3 10 8 6 5 3 3 7 8 5 2 7 6 7 10 11 7 5 8 5 2 5 10 4 11 6 3 5 5 5 7 4 3 5 11 4 6 4 6 2 2 5 6 8 7 5 7 3 7 6 4 6 5 8 7 5 4 5 6 7 10 5 7 7 7 10 3 5 5 7 4 4 5 7 10 7 7 7 8 5 3 6 7 5 8 8 7 4 5 5 5 6 2 4 7 5 8 7 6 3 5 7 5 4 3 3 3 5 4 7 5 8 5 7 4 8 7 3 3 3 7 3 4 7 6 2 2 3 6 10 4 5 4 9 8 3 4 7 7 5 6 5 8 2 6 4 5 6 4 7 4 6 6 8 4 6 8 11 3 6 3 10 5 10 8 7 10 7 4 10 8 7 6 7 3 8 6 3 7 2 11 7 8


### Exercise 3.3

Same as exercise 3.2 but use `ThreadPoolExecutor`.

In [12]:
from lorem import text
from concurrent.futures import ThreadPoolExecutor

texte = text()

e = ProcessPoolExecutor(4)
word_lengths = e.map(len, texte.split())
print(*word_lengths)

7 5 7 4 5 8 5 10 6 3 6 2 6 6 4 3 4 7 5 7 5 3 8 6 3 7 3 8 8 5 7 7 7 6 5 5 5 7 8 7 4 3 5 7 5 5 6 10 6 4 6 3 7 5 4 3 7 6 3 3 7 4 4 8 7 7 5 10 7 4 7 4 6 6 7 5 4 10 5 4 4 5 5 5 7 8 5 3 7 4 7 7 6 5 8 2 4 5 7 4 2 5 3 3 7 7 2 5 7 5 8 11 8 6 3 11 4 4 3 5 8 8 7 3 3 11 7 7 5 3 8 4 4 3 6 6 3 7 7 6 4 7 9 3 4 4 10 3 7 6 7 11 3 10 6 10 4 7 3 2 9 4 4 7 6 8 3 11 10 8 8 5 3 6 4 5 4 5 5 3 3 7 7 8 12 8 5 7 7 3 9 2 4 7 7 8 8 4 6 11 6 8 5 6 7 8 7 5 5 8 5 11 8 5 5 6 5 4 7 7 6 10 2 6 3 4 5 2 8


# Map

This words version contains some improvements and print out the 
process number where the function is executed.

In [13]:
import string
import multiprocessing as mp  # Windows users should comment this line
def words_mp(file):
    """
    Check if file is utf8
    Read a text file and return a sorted list of (word, 1) values.
    """
    # Windows users should comment this line below
    print(mp.current_process().name, 'reading', file)
    translator = str.maketrans('', '', string.punctuation)
    output = []
    try:
        with open(file) as f:
            for line in f:   
                line = line.strip()
                line = line.translate(translator)
                for word in line.split():
                    if word.isalpha():
                        word = word.lower()
                        output.append((word, 1))
                        
    except UnicodeDecodeError as err:
        print("Some error occurred decoding file %s: %s" % (file, err))
                
    output.sort()
    return output

words_mp('sample.txt')

MainProcess reading sample.txt


[('adipisci', 1),
 ('adipisci', 1),
 ('adipisci', 1),
 ('adipisci', 1),
 ('adipisci', 1),
 ('adipisci', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('aliquam', 1),
 ('amet', 1),
 ('amet', 1),
 ('amet', 1),
 ('amet', 1),
 ('amet', 1),
 ('amet', 1),
 ('amet', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('consectetur', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolore', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('dolorem', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('eius', 1),
 ('est', 1),
 ('est', 1),
 ('est', 1),
 ('etincidunt', 1),
 ('etincidunt', 1),
 ('etincidunt', 1),
 ('etincidunt', 1),
 ('etincidunt', 1),
 ('ipsum', 1),
 ('ipsum', 1),
 ('ipsum', 1),
 ('ipsum', 1),
 ('ipsum', 1),
 ('ipsum', 1),
 ('ipsum', 1),
 ('labore', 1),
 ('labore', 1),
 ('labore', 1),
 ('labore', 1),
 ('labore', 1)

# Partition
Before parallel reduce operation, data must be aligned in a container. We create a function named `partition_mp` that stores the key/value pairs from `words_mp` into a `defaultdict` from collections module. Ouput is:
[('word1', [1, 1]), ('word2', [1]), ('word3', [1, 1, 1])]

In [14]:
import collections
def partition_mp(mapped_values):
    """
        Organize the mapped values by their key.
        Returns an unsorted sequence of tuples 
        with a key and a sequence of values.
    """
    partitioned_data = collections.defaultdict(list)
    for key, value in mapped_values:
        partitioned_data[key].append(value)
    return partitioned_data.items()

In [15]:
partition_mp(words_mp('sample.txt'))

MainProcess reading sample.txt


dict_items([('adipisci', [1, 1, 1, 1, 1, 1]), ('aliquam', [1, 1, 1]), ('amet', [1, 1, 1, 1, 1, 1, 1]), ('consectetur', [1, 1, 1, 1, 1, 1, 1, 1]), ('dolore', [1, 1, 1, 1]), ('dolorem', [1, 1, 1, 1, 1, 1, 1]), ('eius', [1, 1, 1, 1, 1]), ('est', [1, 1, 1]), ('etincidunt', [1, 1, 1, 1, 1]), ('ipsum', [1, 1, 1, 1, 1, 1, 1]), ('labore', [1, 1, 1, 1, 1, 1, 1]), ('magnam', [1, 1, 1, 1, 1, 1, 1]), ('modi', [1, 1]), ('neque', [1, 1, 1, 1, 1, 1, 1, 1, 1]), ('non', [1, 1, 1, 1, 1, 1]), ('numquam', [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), ('porro', [1, 1, 1, 1, 1]), ('quaerat', [1, 1, 1, 1, 1, 1, 1, 1, 1]), ('quiquia', [1, 1, 1, 1, 1, 1, 1]), ('quisquam', [1, 1, 1, 1]), ('sed', [1, 1, 1, 1, 1, 1, 1]), ('sit', [1, 1, 1, 1, 1, 1, 1, 1]), ('tempora', [1, 1, 1, 1, 1, 1]), ('ut', [1, 1, 1, 1, 1, 1, 1, 1]), ('velit', [1, 1, 1, 1, 1, 1, 1]), ('voluptatem', [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])])

# Reduce

In [16]:
def reduce_mp(item):
    """Convert the partitioned data for a word to a
    tuple containing the word and the number of occurances.
    """
    word, occurances = item
    return (word, len(occurances))

In [17]:
for occurences in partition_mp(words_mp('sample.txt')):
    print(reduce_mp(occurences))

MainProcess reading sample.txt
('adipisci', 6)
('aliquam', 3)
('amet', 7)
('consectetur', 8)
('dolore', 4)
('dolorem', 7)
('eius', 5)
('est', 3)
('etincidunt', 5)
('ipsum', 7)
('labore', 7)
('magnam', 7)
('modi', 2)
('neque', 9)
('non', 6)
('numquam', 13)
('porro', 5)
('quaerat', 9)
('quiquia', 7)
('quisquam', 4)
('sed', 7)
('sit', 8)
('tempora', 6)
('ut', 8)
('velit', 7)
('voluptatem', 12)


In [18]:
%%time
import itertools   
import glob

# Sequential version

filenames = glob.glob("sample0*.txt")
mapped_values = map(words_mp, filenames)
partionned_data = partition_mp(itertools.chain(*mapped_values))
results = map(reduce_mp,partionned_data)

MainProcess reading sample00.txt
MainProcess reading sample01.txt
MainProcess reading sample02.txt
MainProcess reading sample03.txt
MainProcess reading sample04.txt
MainProcess reading sample05.txt
MainProcess reading sample06.txt
MainProcess reading sample07.txt
CPU times: user 6.24 ms, sys: 3.09 ms, total: 9.33 ms
Wall time: 12.8 ms


In [19]:
print(*results)

('adipisci', 54) ('aliquam', 68) ('amet', 63) ('consectetur', 51) ('dolor', 56) ('dolore', 51) ('dolorem', 57) ('eius', 61) ('est', 71) ('etincidunt', 63) ('ipsum', 68) ('labore', 50) ('magnam', 47) ('modi', 54) ('neque', 66) ('non', 56) ('numquam', 50) ('porro', 63) ('quaerat', 40) ('quiquia', 53) ('quisquam', 60) ('sed', 43) ('sit', 54) ('tempora', 66) ('ut', 40) ('velit', 57) ('voluptatem', 43)


### Exercise 3.4

Write a parallel program that uses the three functions above using `multiprocessing`. It reads all the "sample\*.txt" files. Some hints:
- Map and reduce steps are parallel.
- See how `itertools.chain(*mapped_values)` is used in notebook exercise 01.6.
- Compare time between the notebook 01 version. 

In [20]:
%%time
import itertools   
import glob
import operator
from multiprocessing import Pool

filenames = glob.glob("sample0*.txt")

# Parallel version
if __name__ == '__main__':
    with Pool() as p:
        mapped_values = p.map(words_mp, filenames)
        partionned_data = partition_mp(itertools.chain(*mapped_values))
        results = p.map(reduce_mp,partionned_data)
print(sorted(results, key=operator.itemgetter(1), reverse=True))

ForkPoolWorker-34 reading sample01.txt
ForkPoolWorker-36 reading sample03.txt
ForkPoolWorker-33 reading sample00.txt
ForkPoolWorker-35 reading sample02.txt
ForkPoolWorker-39 reading sample06.txt
ForkPoolWorker-38 reading sample05.txt
ForkPoolWorker-37 reading sample04.txt
ForkPoolWorker-40 reading sample07.txt
[('est', 71), ('aliquam', 68), ('ipsum', 68), ('neque', 66), ('tempora', 66), ('amet', 63), ('etincidunt', 63), ('porro', 63), ('eius', 61), ('quisquam', 60), ('dolorem', 57), ('velit', 57), ('dolor', 56), ('non', 56), ('adipisci', 54), ('modi', 54), ('sit', 54), ('quiquia', 53), ('consectetur', 51), ('dolore', 51), ('labore', 50), ('numquam', 50), ('magnam', 47), ('sed', 43), ('voluptatem', 43), ('quaerat', 40), ('ut', 40)]
CPU times: user 17.9 ms, sys: 34.1 ms, total: 51.9 ms
Wall time: 142 ms


### Exercise 3.5

- Replace `multiprocessing` by `concurrent.futures` functions.
- Try  `ProcessPoolExecutor` and `ThreadPoolExecutor`

In [21]:
%%time
import itertools   
import glob
from concurrent.futures import ThreadPoolExecutor

# Parallel version
e = ThreadPoolExecutor()
filenames = glob.glob("sample0*.txt")
mapped_values = e.map(words_mp, filenames)
partionned_data = partition_mp(itertools.chain(*mapped_values))
results = e.map(reduce_mp,partionned_data)
print(sorted(results, key=operator.itemgetter(1), reverse=True))

MainProcess reading sample00.txt
MainProcess reading sample01.txt
MainProcess reading sample02.txt
MainProcess reading sample03.txt
MainProcess reading sample04.txt
MainProcessMainProcess reading sample06.txt
MainProcess reading sample05.txt
 reading sample07.txt
[('est', 71), ('aliquam', 68), ('ipsum', 68), ('neque', 66), ('tempora', 66), ('amet', 63), ('etincidunt', 63), ('porro', 63), ('eius', 61), ('quisquam', 60), ('dolorem', 57), ('velit', 57), ('dolor', 56), ('non', 56), ('adipisci', 54), ('modi', 54), ('sit', 54), ('quiquia', 53), ('consectetur', 51), ('dolore', 51), ('labore', 50), ('numquam', 50), ('magnam', 47), ('sed', 43), ('voluptatem', 43), ('quaerat', 40), ('ut', 40)]
CPU times: user 11.6 ms, sys: 6.2 ms, total: 17.8 ms
Wall time: 14.8 ms


In [22]:
%%time
import itertools   
import glob
from concurrent.futures import ProcessPoolExecutor

# Parallel version
e = ProcessPoolExecutor()
filenames = glob.glob("sample0*.txt")
mapped_values = e.map(words_mp, filenames)
partionned_data = partition_mp(itertools.chain(*mapped_values))
results = e.map(reduce_mp,partionned_data)
print(sorted(results, key=operator.itemgetter(1), reverse=True))

Process-41 reading sample00.txt
Process-43 reading sample02.txt
Process-45 reading sample04.txt
Process-44 reading sample03.txt
Process-42 reading sample01.txt
Process-46 reading sample05.txt
Process-47 reading sample06.txt
Process-48 reading sample07.txt
[('est', 71), ('aliquam', 68), ('ipsum', 68), ('neque', 66), ('tempora', 66), ('amet', 63), ('etincidunt', 63), ('porro', 63), ('eius', 61), ('quisquam', 60), ('dolorem', 57), ('velit', 57), ('dolor', 56), ('non', 56), ('adipisci', 54), ('modi', 54), ('sit', 54), ('quiquia', 53), ('consectetur', 51), ('dolore', 51), ('labore', 50), ('numquam', 50), ('magnam', 47), ('sed', 43), ('voluptatem', 43), ('quaerat', 40), ('ut', 40)]
CPU times: user 31.4 ms, sys: 42.5 ms, total: 73.8 ms
Wall time: 68.5 ms


You can use for your multi-processing computations both `multiprocessing.Pool` and  `concurrent.futures` object, which behaves more or less identically.

However, today most library designers are coordinating around the  second interface, so it's wise to move over.

`concurrent.futures.ProcessPoolExecutor` is suitable for simple parallelism across many files and you gain some speed boost. 
Describing each task as a function call helps use tools like map for parallelism.

## Increase volume of data

### Getting the data

[The Latin Library](http://www.thelatinlibrary.com/) contains a huge collection of freely accessible Latin texts. We get links on the Latin Library's homepage ignoring some links that are not associated with a particular author.




In [24]:
%%file credentials.py
login = "your sesame login"
password = "your sesame password"

Overwriting credentials.py


In [25]:
from bs4 import BeautifulSoup
from urllib.request import *
from credentials import *


#Uncomment lines below to enable proxy
#proxy_url = login+':'+password+'@192.168.192.17:8080'
#proxy = ProxyHandler({'http': 'http://'+proxy_url, 'https': 'https://'+proxy_url })
#auth = HTTPBasicAuthHandler()
#opener = build_opener(proxy, auth, HTTPHandler)
#install_opener(opener)

base_url = "http://www.thelatinlibrary.com/"
home_content = urlopen(base_url)

soup = BeautifulSoup(home_content, "lxml")
author_page_links = soup.find_all("a")
author_pages = [ap["href"] for i, ap in enumerate(author_page_links) if i < 49]

Create a list of all links pointing to Latin texts. The Latin Library uses a special format which makes it easy to find the corresponding links: All of these links contain the name of the text author.

In [26]:
ap_content = list()
for ap in author_pages:
    ap_content.append(urlopen(base_url + ap))

In [27]:
book_links = list()
for path, content in zip(author_pages, ap_content):
    author_name = path.split(".")[0]
    ap_soup = BeautifulSoup(content, "lxml")
    book_links += ([link for link in ap_soup.find_all("a", {"href": True}) if author_name in link["href"]])

print(book_links[:5])

[<a href="ammianus/14.shtml">Liber XIV</a>, <a href="ammianus/15.shtml">Liber XV</a>, <a href="ammianus/16.shtml">Liber XVI</a>, <a href="ammianus/17.shtml">Liber XVII</a>, <a href="ammianus/18.shtml">Liber XVIII</a>]


In [32]:
texts = list()
num_pages = 10

for i, bl in enumerate(book_links[:num_pages]):
    print("Getting content " + str(i + 1) + " of " + str(num_pages), end="\r", flush=True)
    try:
        content = urlopen(base_url + bl["href"]).read() 
        texts.append(content)
    except HTTPError as err:
        print("Unable to retrieve " + bl["href"] + ".")
        continue

Getting content 10 of 10

We split the text at periods to convert it into sentences.

In [33]:
%%time
sentences = list()

for text in texts:
    print("Document " + str(i + 1) + " of " + str(len(texts)), end="\r", flush=True)
    textSoup = BeautifulSoup(text, "lxml")
    paragraphs = textSoup.find_all("p", attrs={"class":None})
    prepared = ("".join([p.text.strip().lower() for p in paragraphs[1:-1]]))
    for t in prepared.split("."):
        part = "".join([c for c in t if c.isalpha() or c.isspace()])
        sentences.append(part.strip())

print(sentences[0])

post emensos insuperabilis expeditionis eventus languentibus partium animis quas periculorum varietas fregerat et laborum nondum tubarum cessante clangore vel milite locato per stationes hibernas fortunae saevientis procellae tempestates alias rebus infudere communibus per multa illa et dira facinora caesaris galli qui ex squalore imo miseriarum in aetatis adultae primitiis ad principale culmen insperato saltu provectus ultra terminos potestatis delatae procurrens asperitate nimia cuncta foedabat
CPU times: user 829 ms, sys: 18.1 ms, total: 847 ms
Wall time: 844 ms


### Exercise 3.6

Parallelize these last process using `concurrent.future`.
Do not try to print out too much text, the notebook will raise an error.

In [34]:
import json
with open('sentences.txt', 'w') as outfile:
    json.dump(sentences, outfile)

In [35]:
%%time
from concurrent.futures import ProcessPoolExecutor
import itertools

def convert_to_sentences(text):
    sentences = list()
    print("Document " + str(i + 1) + " of " + str(len(texts)), end="\r", flush=True)
    textSoup = BeautifulSoup(text, "lxml")
    paragraphs = textSoup.find_all("p", attrs={"class":None})
    prepared = ("".join([p.text.strip().lower() for p in paragraphs[1:-1]]))
    for t in prepared.split("."):
        part = "".join([c for c in t if c.isalpha() or c.isspace()])
        sentences.append(part.strip())
    return sentences
        
e = ProcessPoolExecutor()
result = e.map(convert_to_sentences, texts)

print(next(result)[0])

post emensos insuperabilis expeditionis eventus languentibus partium animis quas periculorum varietas fregerat et laborum nondum tubarum cessante clangore vel milite locato per stationes hibernas fortunae saevientis procellae tempestates alias rebus infudere communibus per multa illa et dira facinora caesaris galli qui ex squalore imo miseriarum in aetatis adultae primitiis ad principale culmen insperato saltu provectus ultra terminos potestatis delatae procurrens asperitate nimia cuncta foedabat
CPU times: user 23.5 ms, sys: 48.5 ms, total: 72 ms
Wall time: 131 ms


## References

- [Using Conditional Random Fields and Python for Latin word segmentation](https://medium.com/@felixmohr/using-python-and-conditional-random-fields-for-latin-word-segmentation-416ca7a9e513)