# Web APIs
## Introduction

*[Chapter 12](https://automatetheboringstuff.com/2e/chapter12) in the book is about web scraping, but focuses mostly on controlling a browser or reading/parsing HTML web pages. These are the options we only resort to, if there is no API (application programming interface) available. APIs allow machines, programs or web services to communicate with each other through well-defined interfaces using structured data. This notebook will concentrate on this most straightforward way of communication, and will only in small parts be based on book chapter 12.*

Computers are arguably the most interesting if they can be connected to and communicate with each other over a network. In this notebook, we'll have a look at how to use APIs to request data we want from a web service. As a test web service to answer our requests, we'll be using [reqres.in](reqres.in) and their API. To generate the HTTP requests and receive the server responses, the [Requests](https://requests.readthedocs.io) library is excellent.

**For an introduction to REST APIs, this article is a great first read: [Python and REST APIs: Interacting With Web Services](https://realpython.com/api-integration-in-python/).** It is a bit extensive, you might want to focus yourself on the chapters: [REST APIs and Web Services](https://realpython.com/api-integration-in-python/#rest-apis-and-web-services) and [REST and Python: Consuming APIs](https://realpython.com/api-integration-in-python/#rest-and-python-consuming-apis) (GET only).

**There is a quickstart guide on how to use the Requests library here: [Requests Library Quickstart](https://requests.readthedocs.io/en/latest/user/quickstart).** Especially these chapters will be relevant: [Make a Request](https://requests.readthedocs.io/en/latest/user/quickstart/#make-a-request), [Passing Parameters In URLs](https://requests.readthedocs.io/en/latest/user/quickstart/#passing-parameters-in-urls), [JSON Response Content](https://requests.readthedocs.io/en/latest/user/quickstart/#json-response-content), [Response Status Codes](https://requests.readthedocs.io/en/latest/user/quickstart/#response-status-codes), [Timeouts](https://requests.readthedocs.io/en/latest/user/quickstart/#timeouts) and [Errors and Exceptions](https://requests.readthedocs.io/en/latest/user/quickstart/#errors-and-exceptions). If you don't want to read everything at once, just keep in mind to have a look here if a new topic comes up in the notebook.

### Optional resources

- [httpbin: Testing HTTP Requests](https://httpbin.org)
- [HTTP Status Codes Explained](https://httpstatuses.com/)
- [HTTP Cats](https://http.cat/)
- [opendata.swiss: Swiss Open Government Data](https://opendata.swiss)
- The omitted chapters of the resources mentioned above

## Summary

### Basic Usage

To ask a server with a REST API for a particular piece of data, we can query the corresponding *resource*. The `requests` module can be used to send a HTTP GET request to the path of the resource, which is called an *endpoint*.

In the following example, the selected resource is the user `1`, and the corresponding API endpoint is therefore `https://reqres.in/api/users/1`.

After sending out the request, the `get` method will wait for a response from the server. If all goes well and no errors occured, the selected resource (our piece of data) will be contained in the `response` object. The response can then be decoded with the expected encoding of the data. For example, the response might contain plaintext data, but for APIs (mostly machine-to-machine communication) JSON encoding is more common. The response object returned from functions like `requests.get` contain a [`.json()` method](https://requests.readthedocs.io/en/latest/user/quickstart/#json-response-content) which decodes the data as JSON automatically.

In our example, the data encoding is JSON as well. We can use the structured nature of JSON to our advantage and directly select the specific data field we are interested in, `data` in this case.

In [None]:
import requests

response = requests.get("https://reqres.in/api/users/1")
response.json()["data"]

Usually it is also possible to get a list of resources, by going up a level in the API hierarchy. This request will return a number of users as a list of user objects.

In [100]:
response = requests.get("https://reqres.in/api/users")
response.json()["data"]

[{'id': 1,
  'email': 'george.bluth@reqres.in',
  'first_name': 'George',
  'last_name': 'Bluth',
  'avatar': 'https://reqres.in/img/faces/1-image.jpg'},
 {'id': 2,
  'email': 'janet.weaver@reqres.in',
  'first_name': 'Janet',
  'last_name': 'Weaver',
  'avatar': 'https://reqres.in/img/faces/2-image.jpg'},
 {'id': 3,
  'email': 'emma.wong@reqres.in',
  'first_name': 'Emma',
  'last_name': 'Wong',
  'avatar': 'https://reqres.in/img/faces/3-image.jpg'},
 {'id': 4,
  'email': 'eve.holt@reqres.in',
  'first_name': 'Eve',
  'last_name': 'Holt',
  'avatar': 'https://reqres.in/img/faces/4-image.jpg'},
 {'id': 5,
  'email': 'charles.morris@reqres.in',
  'first_name': 'Charles',
  'last_name': 'Morris',
  'avatar': 'https://reqres.in/img/faces/5-image.jpg'},
 {'id': 6,
  'email': 'tracey.ramos@reqres.in',
  'first_name': 'Tracey',
  'last_name': 'Ramos',
  'avatar': 'https://reqres.in/img/faces/6-image.jpg'}]

If there is a large number of resources to be returned in such a list, the server might divide them into pages, and each page needs to be requested seperately. This is also the case for this test API, as can be seen in the examples on the page. The server response will include additional attributes: `page`, `per_page`, `total` and `total_pages`.

In [5]:
res = requests.get("https://reqres.in/api/users").json()
print(f"We got page {res['page']} of {res['total_pages']} pages")
print(f"We got {res['per_page']} of total {res['total']} users")

We got page 1 of 2 pages
We got 6 of total 12 users


### Parameters

But now, how can we request the second page? We need to supply an additional *parameter* with our request: `page=2`. Parameters can be passed to the request method as a dictionary with a key/value pair representing the parameter and its value.

In [7]:
import requests

parameters = {"page": 2}
res = requests.get("https://reqres.in/api/users", params=parameters).json()
print(f"We got page {res['page']} of {res['total_pages']} pages")
print(f"We got {res['per_page']} of total {res['total']} users")
res["data"]

We got page 2 of 2 pages
We got 6 of total 12 users


We could also change other parameters, such as the number of users per page, which caused us this headache in the first place: `per_page=12`.

In [8]:
parameters = {"per_page": 12}
res = requests.get("https://reqres.in/api/users", params=parameters).json()
print(f"We got page {res['page']} of {res['total_pages']} pages")
print(f"We got {res['per_page']} of total {res['total']} users")

We got page 1 of 1 pages
We got 12 of total 12 users


However, for larger data sets dividing the results into pages is useful, because otherwise the server responses might get too large, so we should choose a sane compromise.

### Connection Errors

If Requests is not able to complete the request successfully, it will throw an exception. This might happen, if the domain the request was sent to is unavailable or there is a problem with name resolution (DNS). In these cases, the exception will be of the type `requests.ConnectionError`.

The exception can be caught and the program could exit gracefully, try again later or with a different request, depending on the situation.

In [61]:
import requests

try:
    requests.get("http://invalid.")
except requests.ConnectionError as e:
    print(e)

HTTPConnectionPool(host='invalid.', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc2483ff3d0>: Failed to establish a new connection: [Errno -2] Name or service not known'))


Another reason a request might fail is the host server itself being unavailable. In this case the program will hang, because the `get` method is blocking and there is no connection timeout by default. If this is not the desired behavior, a timeout can be set and a `requests.Timeout` exception will be thrown if the request times out.

For the test server we are using, a `delay` parameter can be given, which will delay the server response for the specified number of seconds. If this parameter is set to a higher value than the timeout parameter in the `get` method, the request will time out.

In [135]:
parameters = {"delay": 5}
try:
    requests.get("http://reqres.in/api/users/1", params=parameters, timeout=3)
except requests.Timeout as e:
    print(e)

HTTPSConnectionPool(host='reqres.in', port=443): Read timed out. (read timeout=3)


### HTTP Status Codes

Even if the request reached its destination and we received a response back from the server, the response might still contain an error. The connection works fine, but the server might not be able or willing to handle our request. This is indicated with the *HTTP status code* of the response. A successful response to a GET request will come with a status code 200, for example. If a requested resource could not be found, the status code will be 404.

Explanations of different HTTP status codes can be found here: [HTTP Status Codes](https://httpstatuses.com/)

In [72]:
import requests

response = requests.get("http://reqres.in/api/users/1")
response.status_code

200

In [82]:
response = requests.get("http://reqres.in/api/users/42")
response.status_code

404

While it is possible to check the HTTP status code of the response like this, often we just want to know if it was successful or not. In this case it might be more convenient to raise an exception if the server responded with an error. This can be done by calling the `raise_for_status()` method on the response, which will raise an `HTTPError` if the status code indicates an error. This exception could be caught again or propagated for a higher-level module to handle.

In [76]:
response = requests.get("http://reqres.in/api/users/42")
response.raise_for_status()

HTTPError: 404 Client Error: Not Found for url: https://reqres.in/api/users/42

## Exercises

### Exercise 1: Where's George?

In one of the earlier exercise sessions, you had to find Waldo. Now we need your help again, this time to find George. We don't even have his last name, but luckily we know what he looks like.

Write a function `find_user()` that takes as an argument a first name (like George) and searches for everyone with that first name in the [reqres.in](https://reqres.in/) user database. Return a Python dictionary with all the results, where the last name is the key and the link to the avatar picture is the value. You can assume that no two people have the same first and last names.

Use the [reqres.in API documentation](https://reqres.in/api-docs/) to make yourself familiar with how the API works.

Expected output (pretty-printed for readability, your function should return a dict):

```python
>>> find_user("George")
{
    "Bluth": "https://reqres.in/img/faces/1-image.jpg",
    "Edwards": "https://reqres.in/img/faces/11-image.jpg",
}
```

For this exercise, you can assume that the HTTP request will always succeed.

In [None]:
import requests

def find_user(first_name):
    # TODO: Connect to API and find matching people

Use this separate cell to try out your code.
Your code should work with the example below, but you're free to change it.

In [None]:
find_user("George")

### Exercise 2: Save the Lawn

You're soon leaving for your vacation, but you worry about your beloved award-winning English lawn. If it doesn't rain enough while you're away, it will dry out and there will be ugly brown patches. To save your precious lawn, you bought a computer-controlled sprinkler system. Sadly, the system is not very smart and can not detect by itself if watering is necessary. You will have to control it yourself with some intelligent code.

But you're lucky, because the Swiss meteorological service (MeteoSchweiz) offers an open data API from where you can get information about the amount of precipitation (rain) in your area! You can use the following API endpoint 
to get this data for the last 72 hours for a number of weather stations, updated hourly:

https://data.geo.admin.ch/ch.meteoschweiz.messwerte-niederschlag-72h/ch.meteoschweiz.messwerte-niederschlag-72h_en.json

Implement a function `check_precipitation()` that takes a weather station name as a string argument. The function shall return the precipitation value of the station in the last 72 hours as a float, the measurement unit shall be *mm* (millimeters of rain), the same as in the meteo data:

```python
>>> check_precipitation("Jona")  # actual return value will differ, depending on the current weather
6.7
```

If the HTTP request fails, make sure that an **appropriate `HTTPError` exception is raised** (**Hint:** You don't need to raise the exception manually).

It can be helpful to gain an initial understanding of a more complex data set like this one by exploring a sample with a suitable viewer. There is a sample JSON file of this data set already downloaded for you to inspect in the same folder as this notebook. You can open and explore the sample with JupyterLab or Firefox, both will present the data as a hierarchical structure with elements that can be expanded and collapsed to focus on specific parts.

In [None]:
import requests

def check_precipitation(station_name):
    # TODO: Connect to API and return result

Use this separate cell to try out your code.
Your code should work with the example below, but you're free to change it.

In [None]:
check_precipitation('Jona')

# Feedback form

We'd like to get some feedback for this lab! To give us feedback, double-click the cells below and edit it in the appropriate places:

- Replace `[ ]` by `[x]` to cross checkboxes, they should look like this once you finish editing:
  * [ ] uncrossed
  * [x] crossed
- Add additional text where indicated (optional)

**Difficulty:**

The difficulty of the materials in this lab was:

- [ ] Much too difficult
- [ ] A little too difficult
- [ ] Just right
- [ ] A little too easy
- [ ] Much too easy

**Time:**

For one block (usually multiple labs), you should spend around 4h at home and 4h in the course. There are four labs in this block, so we'd expect you to spend a total of **around 2h on this one (both reading and solving)**.

For the materials in *this lab*, do you think you spent:

- [ ] Much more time
- [ ] A little more time
- [ ] About the scheduled amount of time
- [ ] A little less time
- [ ] Much less time

**Any topics you found especially enjoyable or difficult in this lab?**

<!-- Write below this line -->

**Anything else you'd like to tell us?**

<!-- Write below this line -->

# Submit

First, **save this file** (no grey dot should be visible in the tab above). Then, run the cell below to submit your work and see the results. You can submit as often as you like.

In case of problems:
- *Don't panic!*
- If you're in a course, show the error to your instructor.
- If the **tests failed** and you suspect an issue in the tests:
    * Mail your instructor, Cc `florian.bruhin@ost.ch` (if instructor != florian)
    * **No attachments** necessary.
- If the **submission failed** (error message, etc.):
    * Mail your instructor, Cc `florian.bruhin@ost.ch` (if instructor != florian)
    * Attach a screenshot of the issue
    * Attach the notebook (File > Download).

In [None]:
!submit web-apis.ipynb