<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#What-is-an-API?" data-toc-modified-id="What-is-an-API?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>What is an API?</a></span></li><li><span><a href="#What-is-HTTP?" data-toc-modified-id="What-is-HTTP?-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>What is <code>HTTP</code>?</a></span><ul class="toc-item"><li><span><a href="#HTTP-request-types" data-toc-modified-id="HTTP-request-types-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span><code>HTTP</code> request types</a></span></li><li><span><a href="#HTTP-status-codes" data-toc-modified-id="HTTP-status-codes-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>HTTP status codes</a></span></li></ul></li><li><span><a href="#The-requests-library" data-toc-modified-id="The-requests-library-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>The <code>requests</code> library</a></span><ul class="toc-item"><li><span><a href="#API-endpoints" data-toc-modified-id="API-endpoints-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>API endpoints</a></span></li><li><span><a href="#Rate-Limiting" data-toc-modified-id="Rate-Limiting-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Rate Limiting</a></span></li></ul></li><li><span><a href="#Query-strings" data-toc-modified-id="Query-strings-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Query strings</a></span></li><li><span><a href="#Optional:-Persisting-API-results-to-a-DataFrame" data-toc-modified-id="Optional:-Persisting-API-results-to-a-DataFrame-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Optional: Persisting API results to a <code>DataFrame</code></a></span></li><li><span><a href="#Optional:-API-keys-and-version-control-software" data-toc-modified-id="Optional:-API-keys-and-version-control-software-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Optional: API keys and version control software</a></span></li></ul></div>

# What is an API?

Basically an API (or 'application programming interface') is a way for one piece of software to talk to another piece of software. In this lesson we will write and run some `Python` code to go and fetch data from another piece of software running on another machine. This is called a **web API**.

You can think of web APIs as being like websites tailored for computers. A website is designed for humans to use, APIs are designed for other pieces of software to use. Many of the things that you would do by interacting with a website, software can also do by interacting with an API.


# What is `HTTP`?

To interact with APIs we will be using `HTTP`. You will definitely be familiar with the letters 'HTTP': every web address starts with `http://` (or `https://` for secure connections). But what does 'HTTP' mean?

`HTTP` stands for HyperText Transfer Protocol: this is an example of a **networking protocol**. Basically, `HTTP` is a set of conventions or rules for sending messages between computers. The rules of `HTTP` define the form that the messages can take. There are other networking protocols available, but `HTTP` is the one that powers the web, and so it is obviously very widespread. 

We won't go into a lot of detail about how `HTTP` works. But for now you need to understand that when you visit a website your computer sends a **request** to another computer running the website. In return, the other computer will send a **response** back to your computer. Taken together, we call this the **request-response cycle**.

The computer sending the request is called the **client** and the computer that returns the response is known as the **server**.

![](images/http-requests.png)

If this seems simple, that's because it is! One of the important principles of HTTP is that it is **stateless**: that is, the server doesn't retain any information about the state of the client between requests. Think of the server as having 'amnesia' about the requests it has received from a client, and the responses it has sent back to a client.  

## `HTTP` request types

There are several different types of request that you can send. By far the most common is a `GET` request. This simply asks for a piece of data in response and doesn't change anything on the server. Other types of request include `POST`, where you send some data to the server; or `DELETE` where you request to remove data from the server. 

For example, if you wanted to be added to a mailing list, you enter your e-mail address on a website and click the 'Submit' button. At this point, your computer (the client) sends a `POST` request containing data (your e-mail address) to the server running the website. The server adds your e-mail to a database, then returns a response telling the client that the e-mail was successfully added. If you later wanted to be taken off the mailing list, your computer might send a `DELETE` request to the server. 

Taken as a whole, you might hear the various request types being called `HTTP` **verbs**, because these are the **'actions'** the client can request the server to perform.

## HTTP status codes

Every `HTTP` response will include a status code. If your request succeeded, the code will be **200**. Other numbers signify different outcomes:

* Any code starting with `4`, means that you made an error in sending your request. The most famous of these is a `404 error`, which I'm sure you'll have seen before.
* A code starting with `5` means there's a problem with the server, so it can't send you a response back. If you get a `5XX error`, there's probably not much you can do to fix it, aside from contact the administrator of the web service you are trying to use.
* If the code starts with `3` that means the request has been redirected.

# The `requests` library

Now you now how `HTTP` requests work, let's learn how to make them in `Python`.

We're going to use the `requests` package for this: you may need to install it. If so, remember that in `Python`, you install from the command line.

```
conda install requests
```

In [1]:
import requests

As a simple example, we're going to make use of the Star Wars API. This provides data relating to characters, ships, planets etc in the Star Wars films. Have a look at the documentation available on the [website](http://swapi.dev/). 

Let's start by sending a `GET` request using the function `requests.get()`. When you run this function an `HTTP` request is sent - so you will need to be connected to the internet for this to work! 

We're going to make the first request that you can see on the Star Wars API website and ask for data about Luke Skywalker. 

In [2]:
response = requests.get("http://swapi.dev/api/people/1/")

We can then check the status code of the response  

In [3]:
response.status_code

200

A status code of 200 indicates that the request was successful. Let's see what the API has sent back 

In [4]:
response.text

'{"name":"Luke Skywalker","height":"172","mass":"77","hair_color":"blond","skin_color":"fair","eye_color":"blue","birth_year":"19BBY","gender":"male","homeworld":"http://swapi.dev/api/planets/1/","films":["http://swapi.dev/api/films/1/","http://swapi.dev/api/films/2/","http://swapi.dev/api/films/3/","http://swapi.dev/api/films/6/"],"species":[],"vehicles":["http://swapi.dev/api/vehicles/14/","http://swapi.dev/api/vehicles/30/"],"starships":["http://swapi.dev/api/starships/12/","http://swapi.dev/api/starships/22/"],"created":"2014-12-09T13:50:51.644000Z","edited":"2014-12-20T21:17:56.891000Z","url":"http://swapi.dev/api/people/1/"}'

Can we see this in a more structured form? Yes, the API sends back data in **JavaScript object notation (`JSON`) form**, which `requests` automatically converts into a regular `Python` `dictionary`

In [5]:
json = response.json()
json

{'name': 'Luke Skywalker',
 'height': '172',
 'mass': '77',
 'hair_color': 'blond',
 'skin_color': 'fair',
 'eye_color': 'blue',
 'birth_year': '19BBY',
 'gender': 'male',
 'homeworld': 'http://swapi.dev/api/planets/1/',
 'films': ['http://swapi.dev/api/films/1/',
  'http://swapi.dev/api/films/2/',
  'http://swapi.dev/api/films/3/',
  'http://swapi.dev/api/films/6/'],
 'species': [],
 'vehicles': ['http://swapi.dev/api/vehicles/14/',
  'http://swapi.dev/api/vehicles/30/'],
 'starships': ['http://swapi.dev/api/starships/12/',
  'http://swapi.dev/api/starships/22/'],
 'created': '2014-12-09T13:50:51.644000Z',
 'edited': '2014-12-20T21:17:56.891000Z',
 'url': 'http://swapi.dev/api/people/1/'}

<hr style="border:8px solid black"> </hr>

***

**<u>Task - 5 mins</u>**

* Extract the name 'Luke Skywalker' from the `JSON` response data
* Look at what has been returned for Luke's 'homeworld'. How do you think we can get further details of this planet? Write code to try it!

**Solution**

In [6]:
name = json['name']
name

'Luke Skywalker'

In [7]:
homeworld_url = json['homeworld']
json = requests.get(homeworld_url).json()
json

{'name': 'Tatooine',
 'rotation_period': '23',
 'orbital_period': '304',
 'diameter': '10465',
 'climate': 'arid',
 'gravity': '1 standard',
 'terrain': 'desert',
 'surface_water': '1',
 'population': '200000',
 'residents': ['http://swapi.dev/api/people/1/',
  'http://swapi.dev/api/people/2/',
  'http://swapi.dev/api/people/4/',
  'http://swapi.dev/api/people/6/',
  'http://swapi.dev/api/people/7/',
  'http://swapi.dev/api/people/8/',
  'http://swapi.dev/api/people/9/',
  'http://swapi.dev/api/people/11/',
  'http://swapi.dev/api/people/43/',
  'http://swapi.dev/api/people/62/'],
 'films': ['http://swapi.dev/api/films/1/',
  'http://swapi.dev/api/films/3/',
  'http://swapi.dev/api/films/4/',
  'http://swapi.dev/api/films/5/',
  'http://swapi.dev/api/films/6/'],
 'created': '2014-12-09T13:50:49.641000Z',
 'edited': '2014-12-20T20:58:18.411000Z',
 'url': 'http://swapi.dev/api/planets/1/'}

***

<hr style="border:8px solid black"> </hr>

## API endpoints

Notice above that we first made a `GET` request to a URL of the form `http://swapi.dev/api/people/1/`. Later, for Luke's homeworld, we made a request to a URL `http://swapi.dev/api/planets/1/`. 

Look at the similarities and differences between these two URLs:

* The **root** (or base) of both is the same: `http://swapi.dev/api/`
* They differ in what comes **after** the root: `people/1/` and `planets/1/`

Here, `people/1/` and `planets/1/` are different examples of what are called **endpoints** of the API: when we send a `GET` request to `people/1/`, we are asking for information on the item with `id = 1` in the `people` collection. Think of this as being like a table in `SQL`: imagine a big table of people in Star Wars, all with differing `id` numbers. When we send a `GET` request to `planets/1/`, the API interprets this as a request to a general endpoint `/planets/id/` in which we are asking for the `id=1` item in the `planets` collection. It is very common for an API to offer **multiple endpoints**: each endpoint referring to a class of 'item' or 'entity' within the context of the API. 

You will sometimes hear endpoints referred to as **resources**. In fact, the terms resource and endpoint differ subtly in terms of meaning, but the difference need not concern us here. 

<hr style="border:8px solid black"> </hr>

***

**<u>Task - 2 mins</u>**

Have a look at the `Documentation` for the Star Wars API (top right of `http://swapi.dev`. What **resources** (or **endpoints**) does the API offer?

**Solution**

The resources offered are: `films`, `people`, `planets`, `species`, `starships` and `vehicles`. We can actually see these by sending a `GET` request to the `root` endpoint

In [8]:
requests.get('http://swapi.dev/api/').json()

{'people': 'http://swapi.dev/api/people/',
 'planets': 'http://swapi.dev/api/planets/',
 'films': 'http://swapi.dev/api/films/',
 'species': 'http://swapi.dev/api/species/',
 'vehicles': 'http://swapi.dev/api/vehicles/',
 'starships': 'http://swapi.dev/api/starships/'}

***

<hr style="border:8px solid black"> </hr>

## Rate Limiting

Every API you use will have a **rate limit**. This is a limit on the number of requests you can make in a given time. Every time you make a request you are using the API's server, which costs the people running the API a very small amount of money. The Star Wars API has a very generous rate limit of 10,000 requests per day. **Be careful, however - it is surprisingly easy to go over an API rate limit if you automatically generate and send requests**.

# Query strings

Fairly often API requests will look like this:

```
https://swapi.dev/api/people/?search=luke&format=wookiee
```

with a question mark `?`, followed by key-value pairs separated by ampersands `&`. The part after the `?` is what is known as a **query string**. 

While it's totally fine to write query strings yourself and use them in requests:

In [9]:
response = requests.get("https://swapi.dev/api/people/?search=luke&format=wookiee")

response.status_code

200

the `requests` library can handle generating query strings for you. You do this by passing in a `dictionary` via the `params=` argument

In [10]:
response = requests.get("https://swapi.dev/api/people/",
  params = {"search":"luke", "format":"wookiee"}
)

response.status_code

200

In [11]:
response.text

'{"oaoohuwhao":1,"whwokao":whhuanan,"akrcwohoahoohuc":whhuanan,"rcwochuanaoc":[{"whrascwo":"Lhuorwo Sorroohraanorworc","acwoahrracao":"172","scracc":"77","acraahrc_oaooanoorc":"rhanoowhwa","corahwh_oaooanoorc":"wwraahrc","worowo_oaooanoorc":"rhanhuwo","rhahrcaoac_roworarc":"19BBY","rrwowhwaworc":"scraanwo","acooscwoohoorcanwa":"acaoaoak://cohraakah.wawoho/raakah/akanrawhwoaoc/1/","wwahanscc":["acaoaoak://cohraakah.wawoho/raakah/wwahanscc/1/","acaoaoak://cohraakah.wawoho/raakah/wwahanscc/2/","acaoaoak://cohraakah.wawoho/raakah/wwahanscc/3/","acaoaoak://cohraakah.wawoho/raakah/wwahanscc/6/"],"cakwooaahwoc":[],"howoacahoaanwoc":["acaoaoak://cohraakah.wawoho/raakah/howoacahoaanwoc/14/","acaoaoak://cohraakah.wawoho/raakah/howoacahoaanwoc/30/"],"caorarccacahakc":["acaoaoak://cohraakah.wawoho/raakah/caorarccacahakc/12/","acaoaoak://cohraakah.wawoho/raakah/caorarccacahakc/22/"],"oarcworaaowowa":"2014-12-09T13:50:51.644000Z","wowaahaowowa":"2014-12-20T21:17:56.891000Z","hurcan":"acaoaoak://cohr

It's generally a good idea to use `requests` query string builder for the following reasons: 

* It helpy you avoid mistakes
* Query strings must be valid URLs: which can be hard to ensure manually, if you need to use special characters or spaces
* It's easier to write functions that call APIs using the query builder

<hr style="border:8px solid black"> </hr>

***

**<u>Task - 5 mins</u>**

Using **query strings**, see if you can find any information on:

* 'Darth Vader' (try the `people` resource)
* Any `starships` containing the word 'star' in 'name' or 'model' (don't worry, these are the default search fields for this resource). How many starships are returned?

**Solution**

In [12]:
requests.get("https://swapi.dev/api/people/",
  params = {"search":"Darth Vader"}
).json()

{'count': 1,
 'next': None,
 'previous': None,
 'results': [{'name': 'Darth Vader',
   'height': '202',
   'mass': '136',
   'hair_color': 'none',
   'skin_color': 'white',
   'eye_color': 'yellow',
   'birth_year': '41.9BBY',
   'gender': 'male',
   'homeworld': 'http://swapi.dev/api/planets/1/',
   'films': ['http://swapi.dev/api/films/1/',
    'http://swapi.dev/api/films/2/',
    'http://swapi.dev/api/films/3/',
    'http://swapi.dev/api/films/6/'],
   'species': [],
   'vehicles': [],
   'starships': ['http://swapi.dev/api/starships/13/'],
   'created': '2014-12-10T15:18:20.704000Z',
   'edited': '2014-12-20T21:17:50.313000Z',
   'url': 'http://swapi.dev/api/people/4/'}]}

In [13]:
star_ships = requests.get("https://swapi.dev/api/starships/",
  params = {"search": "star"}
).json()

len(star_ships['results'])

10

***

<hr style="border:8px solid black"> </hr>

# Optional: Persisting API results to a `DataFrame` 

Let's see how to load API results into a `pandas` `DataFrame`. The easiest way is as a `list` of `dictionaries` (this is a recurring theme). But what if we want to trim the dictionaries down before we save them (i.e., we don't want to persist all of the keys and values the API has provided). Say we want to keep only the ships' `name`, `model`, `length` and `crew`. Let's see how to do that. The easiest way is by nesting a `dictionary` comprehension inside a `list` comprehension. 

In [14]:
ships = star_ships['results']

keys_to_keep = ['name', 'model', 'length', 'crew']
ships_snipped = [{key:val for key, val in ship.items() if key in keys_to_keep} for ship in ships]
ships_snipped

[{'name': 'Star Destroyer',
  'model': 'Imperial I-class Star Destroyer',
  'length': '1,600',
  'crew': '47,060'},
 {'name': 'Death Star',
  'model': 'DS-1 Orbital Battle Station',
  'length': '120000',
  'crew': '342,953'},
 {'name': 'Executor',
  'model': 'Executor-class star dreadnought',
  'length': '19000',
  'crew': '279,144'},
 {'name': 'Calamari Cruiser',
  'model': 'MC80 Liberty type Star Cruiser',
  'length': '1200',
  'crew': '5400'},
 {'name': 'B-wing',
  'model': 'A/SF-01 B-wing starfighter',
  'length': '16.9',
  'crew': '1'},
 {'name': 'Naboo fighter',
  'model': 'N-1 starfighter',
  'length': '11',
  'crew': '1'},
 {'name': 'Naboo Royal Starship',
  'model': 'J-type 327 Nubian royal starship',
  'length': '76',
  'crew': '8'},
 {'name': 'Scimitar', 'model': 'Star Courier', 'length': '26.5', 'crew': '1'},
 {'name': 'Jedi starfighter',
  'model': 'Delta-7 Aethersprite-class interceptor',
  'length': '8',
  'crew': '1'},
 {'name': 'Republic attack cruiser',
  'model': 'Se

This looks very complicated, but break it down mentally into two steps:

1. `[(do something with ship) for ship in ships]`
2. `(do something with ship) = {add key:val for each key, val in ship if key in keys_to_keep}` 

Now `pd.DataFrame()` will accept a `list` of `dictionaries` as input

In [15]:
import pandas as pd
ships = pd.DataFrame(ships_snipped)
ships

Unnamed: 0,name,model,length,crew
0,Star Destroyer,Imperial I-class Star Destroyer,1600.0,47060
1,Death Star,DS-1 Orbital Battle Station,120000.0,342953
2,Executor,Executor-class star dreadnought,19000.0,279144
3,Calamari Cruiser,MC80 Liberty type Star Cruiser,1200.0,5400
4,B-wing,A/SF-01 B-wing starfighter,16.9,1
5,Naboo fighter,N-1 starfighter,11.0,1
6,Naboo Royal Starship,J-type 327 Nubian royal starship,76.0,8
7,Scimitar,Star Courier,26.5,1
8,Jedi starfighter,Delta-7 Aethersprite-class interceptor,8.0,1
9,Republic attack cruiser,Senator-class Star Destroyer,1137.0,7400


And it just remains to do some final tidying up of the datatypes of the `length` and `crew` columns

In [16]:
ships.loc[:, ['length', 'crew']] = ships[['length', 'crew']].replace(',', '', regex=True)
ships = ships.astype({'length': 'float', 'crew': 'int'})
ships

Unnamed: 0,name,model,length,crew
0,Star Destroyer,Imperial I-class Star Destroyer,1600.0,47060
1,Death Star,DS-1 Orbital Battle Station,120000.0,342953
2,Executor,Executor-class star dreadnought,19000.0,279144
3,Calamari Cruiser,MC80 Liberty type Star Cruiser,1200.0,5400
4,B-wing,A/SF-01 B-wing starfighter,16.9,1
5,Naboo fighter,N-1 starfighter,11.0,1
6,Naboo Royal Starship,J-type 327 Nubian royal starship,76.0,8
7,Scimitar,Star Courier,26.5,1
8,Jedi starfighter,Delta-7 Aethersprite-class interceptor,8.0,1
9,Republic attack cruiser,Senator-class Star Destroyer,1137.0,7400


# Optional: API keys and version control software

Many APIs will require users to submit an **API key** with their requests. This is certainly true for paid APIs, but often for many free APIs too, as it helps the API administrators track usage. This key is **unique** to you as a user: you typically get an API key by filling out a form on the API website.

If you are using version control software (e.g. `git` and `GitHub`), then **you must make sure you that you do not add your API keys in plain text to any repository**. If your keys are made available in an online repository, you run the risk of others using your allocated resources, or even worse, using your keys in an unauthorised or unethical way.

A common way of avoiding committing API keys is to keep them in a separate file, and add that file to `.gitignore`. You might have a file called `api_keys.py`, that looks like this:

In [17]:
api_key_for_first_API = 'dCtXRjNnhm8qbXnY'
api_key_for_second_API = 'Gvmeywn2F8d8GxfX'

And in your main code you can use import to run the file at the start. We want to store `api_keys.py` somewhere that **will never be included in a project repository**, so I have created a directory `api_keys` in my home directory to store these files. I add this directory to the `path` (the list of places `Python` searches when we ask it to `import` something), and then import the file 

In [18]:
import sys
sys.path.insert(1, '/home/del/api_keys')
import api_keys

Now you can use the variables `api_key_for_first_API` and `api_key_for_second_API` in your code **without having their actual values revealed**:  all that appears in your code are the variable names, and not the actual key strings that they contain.

In [19]:
api_key_for_first_API

'dCtXRjNnhm8qbXnY'

This is a quick, low tech approach to the problem. However it is easy to get wrong - if you change the location of the your API keys, you need to remember to update the `.gitignore` file (if you are using `git`). Also, anyone with access to your laptop can access your API keys. For a better solution that is somewhat beyond our discussion here, have a look at the `keyring` package.