<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Threading-and-Benchmarking-in-Python" data-toc-modified-id="Threading-and-Benchmarking-in-Python-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Threading and Benchmarking in Python</a></span><ul class="toc-item"><li><span><a href="#Code-Challenge:-Timer" data-toc-modified-id="Code-Challenge:-Timer-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Code Challenge: <code>Timer</code></a></span></li><li><span><a href="#Code-Challenge:-single_thread_download" data-toc-modified-id="Code-Challenge:-single_thread_download-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Code Challenge: <code>single_thread_download</code></a></span></li><li><span><a href="#Code-Challenge:-thread_handler" data-toc-modified-id="Code-Challenge:-thread_handler-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Code Challenge: <code>thread_handler</code></a></span></li><li><span><a href="#Code-Challenge:-threaded_download" data-toc-modified-id="Code-Challenge:-threaded_download-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Code Challenge: <code>threaded_download</code></a></span></li></ul></li></ul></div>

# Threading and Benchmarking in Python
In many domains of computation, dividing work across multiple threads can save us realworld time. There is ongoing debate over whether it is appropriate to split the task of downloading a large file across multiple threads. To some extent, it depends on the resource you'd like to download, and the server that hosts it. Proponents of multi-threaded downloading tend to claim that most servers will have a limit on how many packets per second are sent to any given client. By making multiple requests across different threads, we can bypass this throttling. On the other hand, opponents of multi-threaded downloading claim that download bottlenecks are more likely to come from local network bandwith limits, which multi-threaded approaches won't address.

Whether or not multi-threaded downloading is worth our time will depend on a number of factors including our network bandwith and throttling enforced by a server we are requesting resources from. In the skill builder below, we will create two functions for downloading resources from the internet - one which uses a single thread and another which uses multiple threads. Additionally, we will create a Timer context manager which times how long our tasks take.

*Run the cell below to import dependencies!*

In [None]:
from time import time
from math import ceil
import requests
import threading
import os

LANDSAT_URL = (
    'https://landsat-pds.s3.amazonaws.com/c1/L8/139/045/'
    'LC08_L1TP_139045_20170304_20170316_01_T1/'
    'LC08_L1TP_139045_20170304_20170316_01_T1_B8.TIF'
)

## Code Challenge: `Timer`
First, let's create our timer. We would like to be able to use our timer like so:

```python
with Timer('Creating 10,000 new files'):
    for i in range(10000):
        f = open('file{}.txt'.format(i), 'wb')
        f.close()
```

The keyword `with` is used to create a *context manager*. A context manager is an instance of a class which implements the `__enter__` and `__exit__` special methods. Upon entering the `with` block, the `__enter__` method on the instance will be called. After the block of code is finished executing, the `__exit__` method will be called.

We would like our timer to accomplish two things:
1. Log out how much time the code block took to execute. In the example above, we would expect something like the following to print to the console: "Creating 10,000 new files took 11.04 seconds to complete".
1. We'd like to clean up after the code we are testing by deleting any files that were created in our current working directory during the execution of the code block. In this example, all 10,000 files created should be deleted upon exiting the `with` block.

In [None]:
class Timer:
    def __init__(self, task):
        self.task = task
    def __enter__(self):
        self.start = time()
        self.dircontents = os.listdir()
        
    def __exit__(self, exception_type, exception_value, traceback):
        self.end = time()
        print(f"{self.task} took {self.end - self.start} seconds to complete.")
        for f in os.listdir():
            if f not in self.dircontents: os.remove(f)

*You may test your timer with the cell below* 

In [None]:
with Timer('Making 10 files'):
    for i in range(10):
        f = open('file{}.txt'.format(i), 'w+b')
        f.close()

## Code Challenge: `single_thread_download`
* input:
    1. `url` - a string that holds the location of the resource to download
    1. `filename` - the name of a new file to create which should hold the contents of the url

`single_thread_download` will not have any output, but should make an http request and write the content of the response to a new file.

In [None]:
def single_thread_download(url, filename):
    pass

## Code Challenge: `thread_handler`
The threading module allows us to create a new thread and pass it a function to execute. Below, we will write the function that we'd like each thread to run.

* input:
    1. `start_byte` - an integer representing the first byte to begin downloading from
    1. `end_byte` - an integer representing the last byte to end downloading at
    1. `url` - a string
    1. `filename` - a string, the name of the file to write the retreived content to
    
`thread_handler` won't have any output. Your function should make an http get request (you may use the requests module) to the url provided. You should pass custom headers that specify which portion of the resource you would like to download (look into the [Range header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range)).

once the (partial) resource is retrieved, `thread_handler` should write to the file specified by `filename`. You should begin writing at `start_byte` (look into [seek](http://python-reference.readthedocs.io/en/latest/docs/file/seek.html)).

In [None]:
def thread_handler(start_byte, end_byte, url, filename):
    pass

*If your thread_handler is working, you should have a new file in this directory. Don't worry if you can't open the file - because we only downloaded the first 10 bytes it shouldn't be openable.*

In [None]:
ex_thread = threading.Thread(target=thread_handler, args=(0,10,LANDSAT_URL,'landsat_ex.tiff'))
ex_thread.start()

*You may use the cell below to remove the file we just created to test our thread handler*

In [None]:
os.remove('landsat_ex.tiff')

## Code Challenge: `threaded_download`
* input:
    1. `url` - same as earlier
    1. `filename` - same as earlier
    1. `n_threads` - number of threads to use in downloading the resource located at `url`
    
`threaded_download` shouldn't have any output. First, make an http HEAD request to the url provided. (this http method will retrieve the headers that would normally be included in the response to a get request. This allows us to view the metadata about a resource without actually retrieving the resource itself.)

Use the response to our head request to determine the number of bytes in the resource stored at `url`. (should be under the key "content-length"). Each thread we will create should be responsible for downloading `ceil(content-length / n_threads)` bytes.

Create and start each thread. Afterwards, use `thread.join` to ensure the function doesn't terminate till each thread has completed its task.

In [None]:
def threaded_download(url, filename, n_threads=4):
    pass

*The code below should time how long it takes to download a ~230MB file using a single thread. If your timer works correctly, it should also take care of removing the file we just downloaded.*

In [None]:
with Timer('Single threaded download'):
    single_thread_download(LANDSAT_URL, 'landsat_one_thread.tiff')

*The code below should time how long it takes to downlaod a ~230MB file using four threads.*

In [None]:
with Timer('Four threaded download'):
    threaded_download(LANDSAT_URL, 'landsat_4_threads.tiff')