This week’s tutorial will cover retrieving data from web APIs, loading
data from text files, and constructing DataFrames.

## Setup

1.  Make a new notebook for this week
2.  What’s the first thing to do? RENAME IT!
3.  Name it `week8.ipynb`

## Making web Requests

Let’s import the `requests` library, which provides a simple set of
functions for making web requests.

> Normally we’d have to install `requests` using `pip`, but Colab
> already has it installed.

In [None]:
import requests

-   Let’s use a geocoding web API to get standard address fields for a
    given address query.
-   We use `requests.get()` to make an HTTP `GET` request, which is the
    standard method for requests to retrieve data.
    -   Other methods include `POST` and `PUT`, which are commonly used
        for submitting new data or data updates to an API.
-   The URL to use and expected parameters for the API are documented
    at:
    [nominatim.org/release-docs/develop/api/Overview](https://nominatim.org/release-docs/develop/api/Overview/)

In [None]:
r = requests.get(
    'https://nominatim.openstreetmap.org/search,
    params={
        'q': '221B Baker Street, London',
        'format': 'jsonv2',
        'addressdetails': 1,
    },
)

We can check the status of the response (`200` means the request was
successful):

In [None]:
r.status_code

We can look at the text returned by the API:

In [None]:
r.text

-   The response text is formatted as JSON, a common data format used by
    APIs.
-   `requests` can convert the JSON data to a structure of Python
    strings, numbers, lists and dictionaries:

In [None]:
r.json()

### Handling Errors

What happens if we change our request to specify an invalid format?

In [None]:
r = requests.get(
    'https://nominatim.openstreetmap.org/search,
    params={
        'q': '221B Baker Street, London',
        'format': 'oops',
        'addressdetails': 1,
    },
)

The text is not the JSON we expect:

In [None]:
r.text

And the status code of `400` indicates a failure - specifically a “Bad
Request”:

In [None]:
r.status_code

We can tell requests to raise an **exception** if the response has any
non-successful status code:

In [None]:
r.raise_for_status()

-   If we want our program to continue even when an exception is raised,
    we can use a **try/except** statement to execute some code in the
    case of an exception.
-   If any line of code in the `try` clause raises an exception, Python
    will stop executing the `try` block and execute the `except` block
    instead.
-   The `as ex` saves the exception itself in a variable called `ex` so
    that we can get more details from it.
    -   We can use any variable name we want, but `ex` is conventional.

In [None]:
try:
    r.raise_for_status()
except requests.HTTPError as ex:
    print(f'Failed request: {ex}')

-   We only want to catch exceptions we expect might happen:
    -   Wrap as few lines of code as possible.
    -   Only catch the types of exception we are expecting.

## Defining a request function

-   Taking what we’ve learned, let’s define a function to provide a
    simple interface for getting the details of an address.
-   Note that:
    -   We add a `sleep` for 1 second, as the API we are using has a
        rate limit of 1 request per second.
        -   We might get an error or be blocked if we make requests
            faster than that.
    -   In the case of an error, we return the special value `None`.

In [None]:
from time import sleep

def get_address_details(address):
    """Given a loosely-formatted address string,
    return a dictionary of standard address details.

    If the request fails, None is returned."""
    r = requests.get(
        'https://nominatim.openstreetmap.org/search,
        params={
            'q': address,
            'format': 'jsonv2',
            'addressdetails': 1,
        },
    )
    # Avoid hitting the API rate limit
    sleep(1)

    try:
        r.raise_for_status()
    except requests.HTTPError as ex:
        print(f'Failed request: {ex}')
        return None

    data = r.json()
    return data['address']

get_address_details('221B Baker Street, London')

## Converting a file of addresses to a DataFrame of address details

-   Now let’s process a text file of addresses to produce a DataFrame of
    address details

-   Download `addresses.txt` from:
    [pynoon.github.io/curriculum/week_8/addresses.txt](https://pynoon.github.io/curriculum/week_8/addresses.txt)

-   Click the folder icon on the left side of the Colab interface, then
    use the upload button to upload `addresses.txt`

-   Now, we can use `open()` to load

-   `open()` should be used with a `with` statement so that the file is
    automatically closed when we’re finished with it:

In [None]:
with open('addresses.txt') as addresses_file:
    addresses = addresses_file.readlines()

`.readlines()` has provided us with a list of strings representing each
line in the file:

In [None]:
addresses

We can use a list comprehension to apply our function to each address,
producing a list of corresponding address details:

In [None]:
address_details = [get_address_details(address) for address in addresses]

Because of our exception handling in `get_address_details`, we should
remove any `None` values from the list of details:

In [None]:
address_details = [
    address_detail for address_detail in address_details
    if address_detail is not None
]
address_details

-   `pd.DataFrame` can be used to construct a DataFrame from a list of
    dictionaries like `address_details`.
-   Where each dictionary represents the values for each column in a
    given row.

In [None]:
import pandas as pd

address_df = pd.DataFrame(address_details)
address_df

## Reflection

-   In this tutorial, we:
    -   Took a list of addresses in a simple text file.
    -   Retrieved more details for each address from an API.
    -   And reformatted those details into a DataFrame, which we could
        export to a CSV file, or even an SQL database.
-   This kind of data transformation is a common task that Python is
    very helpful for.