# Simple Python Web API Interfacing Tutorial
> An attempt at creating a tutorial for accessing web APIs with Python3 

## What is an API?

An API is an *Application Programming Interface*. APIs have existed long before the internet. **Setting program flags in the command line** or writing in a [DSL (domain system language)](https://en.wikipedia.org/wiki/Domain-specific_language) that is interpreted by another program would be considered accessing a program's API. Programming language libraries (such as [*pandas*](https://pandas.pydata.org/) are also APIs as you interface with the libraries to access the programs written within the library. I'll not be discussing DSLs or libraries in this tutorial.

Web APIs allow for developers and applications to programmatically access data on a website or other web-based service. As an example, the [GitHub REST API](https://docs.github.com/en/rest) allows developers to pull down information about a GitHub repository as JSON content. This allows the developer to get all of the information that GitHub has on the repository without opening up the repository GitHub page and manually recording the information.

APIs not only allow a user to retrieve data from a service but also send data to the service. Retrieving data is done via the *GET* HTTPS Method, and writing is typically done via the *POST* HTTPS method. More about these methods can be found [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods). For this tutorial, I'll only be focussing on APIs written for retrieving, or *GET*ing, data from a web service.

APIs can also require authentication or user credentials to use them. There are several different ways that an API can enforce this, but for the sake of this tutorial, we will be looking at simple authentication. If the API that you are planning to use requires [OAuth2](https://oauth.net/2/) or a different authentication practice than the ones listed here, I'm sorry you are on your own. Please visit [StackOverflow](https://stackoverflow.com/) with any questions that you have about other authentication practices.

## Requirements for This Tutorial

You will need:
- [Python3](https://www.python.org/downloads/)
- [*requests*](https://docs.python-requests.org/en/latest/)

You can install *requests* via one of the following commands:

`pip install requests` or `pip install -r requirements.txt`

## urllib: The Hard Way to Get Data

[*urllib*](https://docs.python.org/3/library/urllib.html) is a built-in Python library that allows Python to communicate with websites. It is the worst way to do so but is handy if you are constrained on space or need to work in a pure vanilla Python environment.

To use *urllib*, you need the *urlopen* method from the *urllib.request* module. However, this is only if you need to return your data as a string. If you need it as a dictionary, you will need the [*json*](https://docs.python.org/3/library/json.html) built-in library as well. Furthermore, to properly document your code, you will need the *HTTPResponse* class from the *http.client* module and the *Request* class from the *urllib.request* module.

In other words, to properly get JSON data from a web service, you need two different libraries. To properly document your code, you need an additional two classes.

I do not encourage this method as there are faster ways to do this as you will see, but to appreciate how much easier it is, later on, you should at least be exposed to *urllib*.

In [None]:
from http.client import HTTPResponse
from urllib.request import Request
import json
import urllib.request

### Simple API Access (No Authentication)

If the API endpoint that you are getting data from doesn't require authentication, you can access it by passing the URL into the `urllib.request.urlopen()` method.

This will return an *HTTPResponse* object, not a dict or str object. *HTTPResponse* objects contain much more information about the data that is being sent back from the server, including [HTTP status codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status), response headers, and other server-side information. However, as I only care about getting the data from an *HTTPResponse*, I will not be going into detail on how to access that information in this tutorial.

To extract the data from an *HTTPResponse* object, we need to first read it, then decode it, and finally convert it into a dict object.

By using the `read()` method on an *HTTPResponse* object, we are returned a Python bytes object. This is similar to a string, however, is practically useless for most applications. To convert a *bytes* object into a *str*, you need to use the `decode()` method on it. `decode()` is a *bytes* specific function that exists outside of the *HTTPResponse* class, and can be used on any *bytes* object.

After running `decode()`, we now have an str object. While it is possible to extract data from this, it isn't the best option. One of the better options is to convert the str object into a dict object. Since we are working with JSON data, and JSON data is formatted similarly to Python dict options, we can convert our decoded JSON *str* object, into a proper Python dict object via the `json.loads()` method.

Once done so, one can finally start to work on their gotten API data.

In [None]:
resp: HTTPResponse = urllib.request.urlopen("https://api.github.com/repos/numpy/numpy/community/profile")
data: dict = json.loads(resp.read().decode())
print(data)

### Authenticated API Access

If your API requires you to be authenticated to use the data, then you will most likely have to pass header information. This header information is different from API to API as each web service is different. For this tutorial, we are using the GitHub REST API which allows for both authentications via OAuth2 or a GitHub Personal Access Token. I'll be going over how to use a [GitHub Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token#creating-a-token) here with *urllib*.

Unlike before, we need to pass header information into the API to authenticate against GitHub's servers. Therefore, we can't use the urllib.request.urlopen method as before as it doesn't offer enough context into the urllib.request.Request object to support headers. Therefore, we need to create our own.

We can create one by passing a URL into the Request object. From there, we can add headers to the Request object by executing the req.add_header() method for each header we need to add. This method takes the positional arguments of the header key-value, followed by the actual value.

After doing so, we can pass the Request object into the urllib.request.urlopen() method and rerun our dict conversion steps to get back usable data.

In [None]:
pat: str = "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"
req: Request = Request("https://api.github.com/repos/numpy/numpy/community/profile")
req.add_header("Authorization", f"token {pat}")
resp: HTTPResponse = urllib.request.urlopen(req)
data: dict = json.loads(resp.read().decode())
print(data)

## requests: The Easy Way

I think urllib is awful. I'm not a fan of how much typing it takes to get the output I need when accessing APIs. I don't like that I then have to jump hoops to convert the data output into a usable format either.

A simpler way of doing all of this is via the requests library. This library makes everything that I just did, extremely easy while also adding more user-friendly options and tools that I will not cover.

urllib does have these options and tools. But they are hidden from the developer across multiple classes and modules which makes them developer unfriendly to use. Particularly if the task to accomplish is short.

To get started with requests, simply import the requests library. To properly document your code, import the Response class from the requests.models module.

In [None]:
from requests.models import Response
import requests

### Simple API Access (No Authentication)

Simply call the requests.get() method and pass in your API endpoint as an argument to get data from the API.

This data will be available as a requests Response object, however, this class has a built-in method to convert data to a JSON dict. The method to do so is json().

It truly is that simple.

In [None]:
resp: Response = requests.get("https://api.github.com/repos/numpy/numpy/community/profile")
data: dict = resp.json()
print(data)

### Authenticated API Access

To access an API that requires authentication headers creates a dict object containing the header key as the dict key, and the header value as the dict value. Pass this into the requests.get() method with the keyword parameter headers.

Convert to a JSON dict as before and you're done.

In [None]:
header: dict = {"Authentication": "token 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"}
resp: Response = requests.get("https://api.github.com/repos/numpy/numpy/community/profile", headers=header)
data: dict = resp.json()
print(data)

## Summary

So request easier? Yes.

Is urllib irrelevant? No.

urllib is important if you need to work in a tight system environment where external dependencies are not encouraged or allowed, or if you need to work closely with how Python accesses the internet. But in my opinion (and I'm biased towards ease of use), requests should be your go-to for connecting Python to APIs as it is simpler, more readable, and offers better documentation should you need it.

Hopefully, this tutorial helped. If not, please file a GitHub issue here.