# API WTF
Stands for Application Programming Interface. Originally a concept from Computer Science, they have now grown into something that many apprentices will have heard of and some might be using directly.

## Brief computer science definition
If you have ever used `import` to pull in a library into your Python code you have been using an API. Libraries give you extra functions and features that you can then use. How exactly each of those functions *actually works* is hidden (a computer scientist would say "abstracted") away from you, which is great because normally we couldn't care less what happens behind the scenes, we just want to get stuff done.

For example:

In [1]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/ms-robot-please/Python-for-Data-Science/master/enrollment_forecast.csv')
df.head()

Unnamed: 0,year,roll,unem,hgrad,inc
0,1,5501,8.1,9552,1923
1,2,5945,7.0,9680,1961
2,3,6629,7.3,9731,1979
3,4,7556,7.5,11666,2030
4,5,8716,7.0,14675,2112


We say a library like `pandas` ***exposes*** a number of functions that we can use: in the above code these were `.read_csv()` and `.head()` both of which do a whole bunch of stuff for us that we take for granted.

If we focus on the terms *Application*, *Programming* and *Interface*: in this example `pandas` can be considered our application because we can use it (*apply* it) to get stuff done. We interact with it by writing a *program* that uses the provided *interface* which includes functions like `.read_csv()`.

This orignial concept, which described how different applications could interface with each other programmatically, is still important and relevant to our apprentices some of whom will be very aware that different internal systems in their workplace communicate with each other through APIs. In the Internet age it has also expanded to cover how we can interact with systems over the web.

## Examples of APIs

Nearly all the services that most of us are familar with on the internet offer some sort of API, see for example this article on [15 APIs developers need to know](https://www.creativebloq.com/web-design/apis-developers-need-know-121518469). Some of our apprentices *might* be interested in some of the services in that article, but perhaps not. There are literally thousands of different APIs out in the world that serve various purposes, to the extent that sites such [API list](https://apilist.fun/) now exist to try and keep track of them all.

### Some potentially interesting things I found

[Sheetsu](https://docs.sheetsu.com/?python#introduction) - create your own API from a Google Spreadsheet. Gotta think about the data protection but potentially *very* interesting!

Google have lots of their own APIs for example [Gmail API](https://developers.google.com/gmail/api/) or [Show your Data in a Google Map with Python](https://thedatafrog.com/en/articles/show-data-google-map-python/)

## Key concepts and terms

Whilst different APIs provide different services, most of them all work in the same way, so if you can get your head around some basic concepts you will be able to support learners with pretty much any API-based situation they might throw at you.

Some terms to be familiar with:

**HTTP** - this is the protocol that makes the internet work. You will also be familiar with its secure cousin, HTTPS. A big reason for the internet being such a success is that HTTP is brilliant for transferring data between systems, so APIs usually use HTTP/HTTPS for that exact reason. We will learn a little bit more about HTTP as we go here.

**JSON** - you may or may not have come across JSON, it is just a way of structuring data which is very similar to how dictionaries work in Python. Many APIs give you data in this format.

**REST** - this is more niche and stands for REpresentational State Transfer. Many APIs that conform to REST principles are described as RESTful. What this actually means is pretty irrelevant really, if you want to know more see [something like this](https://restfulapi.net/)

## Less theory, more fun

### Example 1: Numbers API

There are lots of "fun" APIs to play with, in this case we are going to have a go with the Numbers API, which is a database of facts about different numbers: http://numbersapi.com/

Generally when working with an API for the first time I would suggest the following approach:

1. RTFM
1. Try it in your browser
1. Try it in Python

#### Read The F***ing Manual

Every API should have some sort of documentation which tells us what it can do (its *application*) and how (its *interface*). An HTTP based RESTful API will look *something* like this:

`http://some.api-site.com/{value}`

or:

`http://some.api-site.com/something/?value={value}`

where `{value}` is some kind of parameter value that we want to request. With the latter approach, if we want mulitple paramters we can be using `&` symbol, for example:

`http://some.api-site.com/something/?value={value}&second={value2}&third={value3}`

Let's have a look at what we can do with the Numbers API by reading http://numbersapi.com/

#### Try it in your browser

Before we start faffing around with code, testing the API and getting it all to work in a browser is the place to start. If it works in Chrome, it should work in Python just the same.

Some examples:

http://numbersapi.com/42

http://numbersapi.com/42?json


#### Try it in Python

If you have done any web scraping you will have come across the `requests` library which makes HTTP requests for us (hence the name) which basically means downloading stuff, like this:

In [2]:
import requests
response = requests.get('http://numbersapi.com/42')
response.text

'42 is the answer to the Ultimate Question of Life, the Universe, and Everything.'

Simple! According to the Numbers API documentation we can also get this API to give us the result in JSON instead of just plain text:

> Include the query parameter `json` or set the HTTP header `Content-Type` to `application/json` to return the fact and associated meta-data as a JSON object

If you are wondering what an *HTTP header* is, hold that thought, we will come back to headers (much) later, but for the time being just know what we can do this with `requests` by adding `json=True`

In [3]:
response = requests.get('http://numbersapi.com/42', json=True)
response.json() # was .text before when the response was plain text, now it is json use .json()

{'text': '42 is the number of museums in Amsterdam (Netherlands has the highest concentration of museums in the world).',
 'number': 42,
 'found': True,
 'type': 'trivia'}

As the documentation said in the quote above, we can achieve the same result by: "*include the query parameter* `json`"

In [4]:
response = requests.get('http://numbersapi.com/42?json') # same as before but with ?json on the end
response.json() # makes sense - note the brackets though

{'text': '42 is the number of spots (or pips, circular patches or pits) on a pair of standard six-sided dice.',
 'number': 42,
 'found': True,
 'type': 'trivia'}

As mentioned earlier, JSON is a lot like a dictionary in Python and can be used as such, for example:

In [5]:
response = requests.get('http://numbersapi.com/42?json') # get another random 42 fact
data = response.json()
data['text']

'42 is the number of kilometers in a marathon.'

In [6]:
data['type']

'trivia'

#### Handling errors

Making this code robust is a massive topic. One easy thing we should do is check the HTTP status code in the response. If it is 200 then our request worked OK; if it is *not* 200 then something else is going on, which might be an error such as a 404 error (not found).

In [8]:
response = requests.get('http://numbersapi.com/bunny?json') # this.... won't work
# check the status code is 200
if response.status_code == 200:
    data = response.json()
    data['text']
else:
    # raise an exception
    response.raise_for_status()

HTTPError: 404 Client Error: Not Found for url: http://numbersapi.com/bunny?json

See https://docs.python-requests.org/en/latest/user/quickstart/#response-status-codes for more details on this.

In the rest of these examples I am going to be lazy and *not* check the status code to keep the code to a minimum, but know that ignoring the status code is not a good idea.

### Example 2: postcodes.io

Do stuff with postcodes!

1. RTFM: https://postcodes.io/
1. Try it in your browser: https://postcodes.io/postcodes/W1U6QQ
1. Try it in Python:

In [9]:
response = requests.get('https://postcodes.io/postcodes/W1U6QQ') # this API *only* returns json
response.json()

{'status': 200,
 'result': {'postcode': 'W1U 6QQ',
  'quality': 1,
  'eastings': 527955,
  'northings': 181790,
  'country': 'England',
  'nhs_ha': 'London',
  'longitude': -0.157153,
  'latitude': 51.520543,
  'european_electoral_region': 'London',
  'primary_care_trust': 'Westminster',
  'region': 'London',
  'lsoa': 'Westminster 008C',
  'msoa': 'Westminster 008',
  'incode': '6QQ',
  'outcode': 'W1U',
  'parliamentary_constituency': 'Cities of London and Westminster',
  'admin_district': 'Westminster',
  'parish': 'Westminster, unparished area',
  'admin_county': None,
  'admin_ward': 'Marylebone High Street',
  'ced': None,
  'ccg': 'NHS North West London',
  'nuts': 'Westminster',
  'codes': {'admin_district': 'E09000033',
   'admin_county': 'E99999999',
   'admin_ward': 'E05000641',
   'parish': 'E43000236',
   'parliamentary_constituency': 'E14000639',
   'ccg': 'E38000256',
   'ccg_id': 'W2U3Z',
   'ced': 'E99999999',
   'nuts': 'TLI32',
   'lsoa': 'E01004712',
   'msoa': 'E02

#### GET versus POST

In the https://postcodes.io/ documentation, some of the methods are listed as GET whereas some are listed as POST. So far, we have been using GET without even realising it, note the `.get()` in something like `response = requests.get('https://postcodes.io/postcodes/W1U6QQ')`

If we need to do POST instead, we can use `.post()` but need to add what is sometimes called a *payload* to the request. From the postcodes.io documentation:

> Bulk lookup postcodes
>
> POST api.postcodes.io/postcodes
>
> {
>
>  "postcodes" : ["OX49 5NU", "M32 0JG", "NE30 1DP"]
>
> }

We can do this in Python with something like this:

In [10]:
payload = {"postcodes" : ["OX49 5NU", "M32 0JG", "NE30 1DP"]}
# note how we are using .post() here with the address from the quote above:
response = requests.post('https://api.postcodes.io/postcodes', data=payload)
response.json()

{'status': 200,
 'result': [{'query': 'OX49 5NU',
   'result': {'postcode': 'OX49 5NU',
    'quality': 1,
    'eastings': 464438,
    'northings': 195677,
    'country': 'England',
    'nhs_ha': 'South Central',
    'longitude': -1.069876,
    'latitude': 51.6562,
    'european_electoral_region': 'South East',
    'primary_care_trust': 'Oxfordshire',
    'region': 'South East',
    'lsoa': 'South Oxfordshire 011B',
    'msoa': 'South Oxfordshire 011',
    'incode': '5NU',
    'outcode': 'OX49',
    'parliamentary_constituency': 'Henley',
    'admin_district': 'South Oxfordshire',
    'parish': 'Brightwell Baldwin',
    'admin_county': 'Oxfordshire',
    'admin_ward': 'Chalgrove',
    'ced': 'Chalgrove and Watlington',
    'ccg': 'NHS Oxfordshire',
    'nuts': 'Oxfordshire CC',
    'codes': {'admin_district': 'E07000179',
     'admin_county': 'E10000025',
     'admin_ward': 'E05009735',
     'parish': 'E04008109',
     'parliamentary_constituency': 'E14000742',
     'ccg': 'E38000136',


Dictionaries are nice, pandas dataframes are nicer:

In [11]:
# NB we imported pandas already
payload = {"postcodes" : ["OX49 5NU", "M32 0JG", "NE30 1DP"]}
response = requests.post('https://api.postcodes.io/postcodes', data=payload)
data = response.json()
# throw away 'status' and focus on 'result'
results = data['result']
df = pd.DataFrame(results)
df

Unnamed: 0,query,result
0,OX49 5NU,"{'postcode': 'OX49 5NU', 'quality': 1, 'eastin..."
1,M32 0JG,"{'postcode': 'M32 0JG', 'quality': 1, 'easting..."
2,NE30 1DP,"{'postcode': 'NE30 1DP', 'quality': 1, 'eastin..."


... this is a bit sub-optimal. `pd.json_normalize()` makes it better ...

In [12]:
# NB we imported pandas already
payload = {"postcodes" : ["OX49 5NU", "M32 0JG", "NE30 1DP"]}
response = requests.post('https://api.postcodes.io/postcodes', data=payload)
data = response.json()
# throw away 'status' and focus on 'result'
results = data['result']
# ********* use the wonderful pd.json_normalize() here instead to do this better *************
df = pd.json_normalize(results)
df

Unnamed: 0,query,result.postcode,result.quality,result.eastings,result.northings,result.country,result.nhs_ha,result.longitude,result.latitude,result.european_electoral_region,...,result.codes.admin_ward,result.codes.parish,result.codes.parliamentary_constituency,result.codes.ccg,result.codes.ccg_id,result.codes.ced,result.codes.nuts,result.codes.lsoa,result.codes.msoa,result.codes.lau2
0,OX49 5NU,OX49 5NU,1,464438,195677,England,South Central,-1.069876,51.6562,South East,...,E05009735,E04008109,E14000742,E38000136,10Q,E58001732,TLJ14,E01028601,E02005968,E07000179
1,M32 0JG,M32 0JG,1,379988,395476,England,North West,-2.302836,53.455654,North West,...,E05000829,E43000163,E14000979,E38000187,02A,E99999999,TLD34,E01006187,E02001261,E08000009
2,NE30 1DP,NE30 1DP,1,435958,568671,England,North East,-1.439269,55.011303,North East,...,E05001130,E43000176,E14001006,E38000127,99C,E99999999,TLC22,E01008561,E02001753,E08000022


### Example 3: Weather API
Let's get the current weather for each of the locations that we got in example 2. Like many APIs whilst this is free to use you do have to register in order to get a key which we then include in each request.

RTFM: https://www.weatherapi.com/docs/

Test: http://api.weatherapi.com/v1/current.json?q=51.6562,-1.0698&key=78223cef73204c2787f230717221503

Python:

In [13]:
key = '78223cef73204c2787f230717221503' # this key is registered to Chris (!)

In [14]:
for i in range(len(df)):
    row = df.loc[i]
    lat = row['result.latitude']
    lng = row['result.longitude']
    url = f'http://api.weatherapi.com/v1/current.json?q={lat},{lng}&key={key}' # f notation is great
    print(url)
    response = requests.get(url)
    print(response.json())

http://api.weatherapi.com/v1/current.json?q=51.6562,-1.069876&key=78223cef73204c2787f230717221503
{'location': {'name': 'Brightwell Baldwin', 'region': 'Oxfordshire', 'country': 'United Kingdom', 'lat': 51.66, 'lon': -1.07, 'tz_id': 'Europe/London', 'localtime_epoch': 1647434844, 'localtime': '2022-03-16 12:47'}, 'current': {'last_updated_epoch': 1647433800, 'last_updated': '2022-03-16 12:30', 'temp_c': 10.0, 'temp_f': 50.0, 'is_day': 1, 'condition': {'text': 'Patchy light rain', 'icon': '//cdn.weatherapi.com/weather/64x64/day/293.png', 'code': 1180}, 'wind_mph': 4.3, 'wind_kph': 6.8, 'wind_degree': 110, 'wind_dir': 'ESE', 'pressure_mb': 1016.0, 'pressure_in': 30.0, 'precip_mm': 0.0, 'precip_in': 0.0, 'humidity': 94, 'cloud': 75, 'feelslike_c': 8.6, 'feelslike_f': 47.4, 'vis_km': 5.0, 'vis_miles': 3.0, 'uv': 3.0, 'gust_mph': 8.3, 'gust_kph': 13.3}}
http://api.weatherapi.com/v1/current.json?q=53.455654,-2.302836&key=78223cef73204c2787f230717221503
{'location': {'name': 'Gorse Hill', 're

In [15]:
# do something cleverer, like put current condition into our dataframe

conditions = []

for i in range(len(df)):
    row = df.loc[i]
    lat = row['result.latitude']
    lng = row['result.longitude']
    url = f'http://api.weatherapi.com/v1/current.json?q={lat},{lng}&key={key}' # f notation is great
    response = requests.get(url)
    data = response.json()
    condition = data['current']['condition']['text']
    conditions.append(condition)
    
df['condition'] = conditions
df

Unnamed: 0,query,result.postcode,result.quality,result.eastings,result.northings,result.country,result.nhs_ha,result.longitude,result.latitude,result.european_electoral_region,...,result.codes.parish,result.codes.parliamentary_constituency,result.codes.ccg,result.codes.ccg_id,result.codes.ced,result.codes.nuts,result.codes.lsoa,result.codes.msoa,result.codes.lau2,condition
0,OX49 5NU,OX49 5NU,1,464438,195677,England,South Central,-1.069876,51.6562,South East,...,E04008109,E14000742,E38000136,10Q,E58001732,TLJ14,E01028601,E02005968,E07000179,Patchy light rain
1,M32 0JG,M32 0JG,1,379988,395476,England,North West,-2.302836,53.455654,North West,...,E43000163,E14000979,E38000187,02A,E99999999,TLD34,E01006187,E02001261,E08000009,Partly cloudy
2,NE30 1DP,NE30 1DP,1,435958,568671,England,North East,-1.439269,55.011303,North East,...,E43000176,E14001006,E38000127,99C,E99999999,TLC22,E01008561,E02001753,E08000022,Overcast


## Other stuff

### URL encoding

When doing something like this:

`https://postcodes.io/postcodes/W1U6QQ`

we are using a URL, and there are rules around things like special characters. Note how the postcode does *not* have a space in it here, this is because you cannot have spaces in a URL. If you must have a space you need to replace it with a `+` symbol like this:

`https://postcodes.io/postcodes/W1U+6QQ`

This is an example of "URL encoding" and you can get the computer to do the hard work for you. Let's imagine you are using  a URL shortening service, so you need to send it a URL e.g.

`https://fakeurlshortener.com/?url=https://en.wikipedia.org/wiki/Kitten#/media/File:Juvenile_Ragdoll.jpg`

this would not work as we need to URL encode the colons and #, which we can do like this:

In [16]:
import urllib.parse
url = 'https://en.wikipedia.org/wiki/Kitten#/media/File:Juvenile_Ragdoll.jpg'
encoded = urllib.parse.quote(url)
print(encoded)

https%3A//en.wikipedia.org/wiki/Kitten%23/media/File%3AJuvenile_Ragdoll.jpg


so the following would, in theory, work if this were a real service

`https://fakeurlshortener.com/?url=https%3A//en.wikipedia.org/wiki/Kitten%23/media/File%3AJuvenile_Ragdoll.jpg`

### HTTP headers

We parked this concept when we first looked at making a request and setting `json=True` in our GET request, which meant that the response will be JSON instead of plain text:

```python
response = requests.get('http://numbersapi.com/42', json=True)
response.json()
```

This is a shorthand provided by the `requests` library for changing a value in the HTTP header. A bit like an email header, the HTTP header includes some extra settings that we can use to control the request. Many APIs require you to set something in the header, for example the very real world [bit.ly URL shortening API](https://dev.bitly.com/api-reference#createBitlink) requires you to register an account which they creates you a token which you must include in the header like this:

```python
import requests

headers = {
    'Authorization': 'Bearer abcdefghijklmnop1234567890',
    'Content-Type': 'application/json',
}

data = '{ "long_url": "https://dev.bitly.com", "domain": "bit.ly", "group_guid": "Ba1bc23dE4F" }'

response = requests.post('https://api-ssl.bitly.com/v4/shorten', headers=headers, data=data)
```

We have seen `.post()` with `data=` before.

What is new is `headers=`. You can see that the header is contructed before the request is made using a dictionary. In this example the header has two parts: `Authorization` and `Content-Type`. I have made up the value of the token used in the authorization, this is where you would include the token the service provided to you when you signed up. Don't miss out the `Bearer` before the token!!

All requests include a `Content-Type` in the header, when we use the `json=True` shorthand, `requests` is setting the `Content-Type` to `'application/json'` for us, in a similar way to the bit.ly example above.

Note that `headers=` can be used in exactly the same way with both GET and POST.

For more information on setting headers start at https://docs.python-requests.org/en/latest/user/quickstart/#custom-headers


### OAuth

Many APIs require you to register with them to gain an OAuth token. The basic premise of this is that you then use the token in the HTTP header as described above, but it is a bit more complex than that and beyond the scope of today. one such service is Twitter, who have a whole course on how to use their API with Python, including how to do the OAuth stuff, here:

https://github.com/twitterdev/getting-started-with-the-twitter-api-v2-for-academic-research


### Libraries / SDKs

Some APIs have gone beyond just providing RESTful HTTP endpoints that we hit using `requests` and have built whole Python libraries which can be installed using `!pip` and provide helper methods to handle things like OAuth and/or application specific stuff for you. Using this libraries can be a help, but sometimes frankly they just add an extra layer of complexity for little gain, and many apprentices will not be in a position where they can download and use additional libraries.

Another name sometimes given to these addtional libraries is an *SDK* or *Software Developers Kit*. This is just an even more fancy name for the above.

### cURL

You might see references to a tool called *cURL* which is a command line tool on Linux, which is why most people will never have heard of it. It is just another way of downloading stuff from a URL hence why it comes up a lot in the context of APIs. However, all it does is pretty much the same stuff that we do in Pyhton using `requests`.

For example, here is a POST request to the bit.ly API that we have seen already, but using cURL:

```bash
curl \
-H 'Authorization: Bearer abcdefghijklmnop1234567890' \
-H 'Content-Type: application/json' \
-X POST \
-d '{
  "long_url": "https://dev.bitly.com",
  "domain": "bit.ly",
  "group_guid": "Ba1bc23dE4F"
}' \
https://api-ssl.bitly.com/v4/shorten
```

and here again is exactly the same thing but using `requests`

```python
import requests

headers = {
    'Authorization': 'Bearer abcdefghijklmnop1234567890',
    'Content-Type': 'application/json',
}

data = '{ "long_url": "https://dev.bitly.com", "domain": "bit.ly", "group_guid": "Ba1bc23dE4F" }'

response = requests.post('https://api-ssl.bitly.com/v4/shorten', headers=headers, data=data)
```
