# Downloading files with Python

This short tutorial explains how to download a file using Python. Specifically, we will be downloading a dataset from [data.bl.uk](https://data.bl.uk/), primarily so that we can use these datasets on some of our more complex tutorials.

We will be using the [requests](http://docs.python-requests.org/en/master/) library, which provides methods for sending HTTP requests using Python. The library makes it easy to add content like form data, headers and URL parameters.

This tutorial assumes some basic familiarity with Python and a command-line interface.

## Install the required libraries

First, we need to install the following Python libraries:

1. [requests](https://github.com/requests/requests) for sending our HTTP GET request.
2. [tqdm](https://github.com/tqdm/tqdm) for displaying a progress bar while downloading the file.

These libraries can be installed via PyPi, by opening up a terminal window and running the following commands:

## Write the script

TODO: Explain

First, we import the required libraries for use in our application.

In [3]:
import os
import tqdm
import requests

Declare some variables

In [8]:
DATASET_URL = 'https://data.bl.uk/iad/iad-xml.zip'
DATA_DIR = './data'

Ensure that our downloads directory exists, creating it if necessary.

In [9]:
if not os.path.exists(DATA_DIR):
    os.mkdir(DATA_DIR)

Now we can download our dataset.

In [17]:
def download_dataset(url, directory):
    download_fn = url.split('/')[-1]
    download_path = os.path.join(directory, download_fn)
    if not os.path.exists(download_path):
        headers = {'User-agent': 'bl-digischol'}
        r = requests.get(url, stream=True, headers=headers)
        total_length = int(r.headers.get('Content-Length'))
        total_size = (total_length/1024) + 1
        with open(download_path, 'wb') as f:
            for chunk in tqdm.tqdm(r.iter_content(chunk_size=1024), 
                                   total=total_size, 
                                   desc='Downloading', 
                                   unit='kb',
                                   unit_scale=True, 
                                   miniters=1): 
                if chunk:
                    f.write(chunk)

download_dataset(DATASET_URL, DATA_DIR)