# The concurrent.futures module

## concurrent.futures module

The concurrent.futures module was added in python 3.2, and essentially serves as a abstraction layer on top of the threading and multiprociessing modules that makes it easier to work with them but less flexible.

There are two main classes in this module: ThreadPoolExecutor (for managing threads) and ProcessPoolExecutor (for managing processes). 

The future is the result of a process or thread before it has finished, you can think of it as a pending result.

## Creating a Pool

Here we are using the same code as the asnycio chapter () where we download urls, but here we replace it with the concurrent.futures module:

In [3]:
# Imports
import os
import urllib.request

from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed


def downloader(url):
    """
    Downloads the specified URL and saves it to disk
    """
    
    # use urllib request to download file
    req = urllib.request.urlopen(url)
    filename = os.path.basename(url)
    ext = os.path.splitext(url)[1]
    if not ext:
        raise RuntimeError('URL does not contain an extension')

    with open(filename, 'wb') as file_handle:
        while True:
            chunk = req.read(1024)
            if not chunk:
                break
            file_handle.write(chunk)
    msg = 'Finished downloading {filename}'.format(filename=filename)
    return msg

In [6]:
# Downloading files asyncronously using the ThreadPoolExecutor sub-class

def main(urls):
    """
    Create a thread pool and download specified urls
    """
    
    # Use ThreadPoolExecutor class as context manager with 5 workers
    with ThreadPoolExecutor(max_workers=5) as executor:
        # Use list comprehension to create futures
        futures = [executor.submit(downloader, url) for url in urls]
        # print out results as they are completed
        for future in as_completed(futures):
            print(future.result())

if __name__ == '__main__':
    # specify pdf files to download
    urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
    
    # run our function to execute async program
    main(urls)

Finished downloading f1040sb.pdf
Finished downloading f1040.pdf
Finished downloading f1040es.pdf
Finished downloading f1040ez.pdf
Finished downloading f1040a.pdf


In [9]:
# Usig the map function instead of list comprehension to clean code up a bit

def main(urls):
    """
    Create a thread pool and download specified urls
    """
    with ThreadPoolExecutor(max_workers=5) as executor:
        return executor.map(downloader, urls, timeout=60, chunksize=8)

if __name__ == '__main__':
    urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
            "http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
    results = main(urls)
    for result in results:
        print(result)

Finished downloading f1040.pdf
Finished downloading f1040a.pdf
Finished downloading f1040ez.pdf
Finished downloading f1040es.pdf
Finished downloading f1040sb.pdf


## Deadlocks

Deadlocks are a pitfall of concurrent.futures, wehre the result of one future is waiting on the results of another. This creates a problem where the process cannot finish.

In [10]:
# Creating a deadlock
# The calling wait_forever as we do below has one future wait on the results of another!
# THIS IS CAUSED BY ONLY HAVING 1 THREAD

def wait_forever():
    """
    This function will wait forever if there's only one
    thread assigned to the pool
    """
    
    my_future = executor.submit(zip, [1, 2, 3], [4, 5, 6])
    result = my_future.result()
    print(result)

if __name__ == '__main__':
    executor = ThreadPoolExecutor(max_workers=1)
    executor.submit(wait_forever)

In [11]:
# Restructuring the code above to have it work
# Here we use return and call .result() on our returned value

def wait_forever():
    """
    This function will wait forever if there's only one
    thread assigned to the pool
    """
    my_future = executor.submit(zip, [1, 2, 3], [4, 5, 6])

    return my_future

if __name__ == '__main__':
    executor = ThreadPoolExecutor(max_workers=3)
    fut = executor.submit(wait_forever)

    result = fut.result()
    print(list(result.result()))

[(1, 4), (2, 5), (3, 6)]


Once again as a rule of thumb if you have a process that is network or I/O bound and not much commputation being done locally using threads is fine. Although if you have a more computationally expensive task you will want to use a process instead.