# Introduction to Web APIs
## Part I: HTTP

### Learning Objectives

- Describe the HTTP protocol
- Describe the role of a client and server
- Identify all of the HTTP verbs and their uses
- Describe what an API is and name some examples
- Use the `requests` library in Python to send an HTTP request and examine the response

<a id='what-is-api'></a>
## What is an API?

---

An application programming interface (API) is a set of routines, protocols, and tools for interacting with a system. It specifies how to interact with a software component.

APIs are a way developers make data (or tools, or other resources) available for reuse by other developers. 

Some examples include:

- Libraries that post content on Twitter, Facebook, Yelp, LinkedIn
- Web services for accessing currency or stock prices
- Python modules
- Accessing built-in functions of a mobile device (GPS reading, saving data to file, etc)


In the context of data science, APIs are a common method for interacting with data hosted by third parties and are most commonly provided by **web APIs**.

<a id='api-examples'></a>

### Examples of APIs: Facebook

Facebook provides an API for interacting with its service. At a glance, you can:

- View your posts.
- View websites, people, posts, and pages that you've liked.
- View activity on apps from you and your friends.
  - Movies watched.
  - Music listened to.
  - Games played.
- View places traveled/check ins.
- Maintain or build relationships.


<a id='yelp'></a>
### Examples of APIs: Yelp

Yelp provides a way for developers to access:

- Reviews.
 - Services.
 - Restaurants, bars, and cafes.
 - Businesses.
- Business metadata.


<a id='echonest'></a>
### Examples of APIs: Echonest

Echonest consolidates access to many entertainment service APIs in one place. It has a huge list of features and connected services, including:

- Spotify
- Pandora
- Rdio
- Gracenote
- SoundHound
- Shazam

Some Echonest features include:

- Music waveform identification (like Shazam or SoundHound's music ID).
- Playlist recommendations.
- Detailed artist, album, and track lookup.
 - Artist biographies, origins, contemporaries, and noteworthy accomplishments.
 - Official Twitter, website, and social media links.
 - BPM, mood, popularity, and genre(s). 
 - Images, videos, and media.
- Detailed movie, actor, and product lookup.
- Concert schedules and ticket metadata.

<a id='http'></a>
## Hypertext Transfer Protocol (HTTP)

---

HTTP is a protocol — a system of rules — that determines how information should be
passed around over the web. 

It defines the format of the messages passed between clients and servers.



HTTP is just one of many web protocols:
    * IP
    * FTP
    * SMTP
    * TCP
    * UDP
    * SSH
    * SSL
    * TLS
    * POP3
    * IMAP
    * IRC
    * DNS
    * DHCP
    * BitTorrent
    * more!

### The HTTP Client

Clients send requests to servers and receive responses in return.

Some types of clients include:

* Browsers — Chrome, Firefox, and Safari.
* Command line programs — [curl](http://curl.haxx.se/docs/) and [wget](http://www.gnu.org/software/wget/manual/wget.html).
* Application code — Python requests, Scrapy, and Mechanize.


### The HTTP Server


Servers receive requests from clients and send back responses.



<a id='web-app'></a>
### Web Server vs Web Applications

---

Web applications are programs that run on a web server, process the requests the server receives, and generate responses.


1. A client sends an HTTP request to an HTTP server running on a remote machine.  
  * The _hostname_ given in the URL indicates which server will receive the request.  
2. The HTTP server processes the HTTP request. This may entail passing the request to a web application, which creates an HTTP response.
3. The response gets sent back to the client.
4. The client processes the response.




![](assets/request_response.png)

<a id='http-request'></a>
## HTTP Request

---

What's in a request?

- URL
- HTTP Request method (like GET)
- Headers (Content-Type, etc.)
- Sometimes a body

<a id='request-structure'></a>
### HTTP Request Structure

```
[http request method] [URL] [http version]  
[list of headers]

[request body]
```


#### HTTP Request Method Example (No Body)

    GET http://vermonster.com HTTP/1.1  
    Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8  
    Accept-Encoding:gzip,deflate,sdch
    Accept-Language:en-US,en;q=0.8  
    Connection:keep-alive  
    Host:vermonster.com  
    User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5)  
    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1659.2 Safari/537.36 



<a id='request-methods'></a>
### HTTP Request Methods:

* **`GET`** => Retrieve a resource.  
* **`POST`** => Create a resource.  
* **`PATCH`** (_or **`PUT`**, but **`PATCH`** is recommended_) => Update an existing resource.  
* **`DELETE`** => Delete a resource.  
* **`HEAD`** => Retrieve the headers for a resource.

Of these, **`GET`** is the one we'll be using most.

<a id='http-response'></a>
## HTTP Response

---

What’s in a response?
- Status code (200 OK, 404 Not Found, etc)
- Headers
- Body (most of the time)


When a client sends a request, the server sends back a response; the standard format for this response is:

```
[http version] [status] [reason]  
[list of headers]

[response body] # Typically HTML
```

<a id='response-types'></a>
### Response Types Overview

**[Status codes](https://httpstatuses.com/)** have standard meanings. 

#### Status Code Groups
* 1×× Informational
* 2×× Success
* 3×× Redirection
* 4×× Client Error
* 5×× Server Error

Here are a few specific status codes to know about:

|Code|Reason|
|:---|:-----|
|200| OK
|301| Moved Permanently
|302| Moved Temporarily
|400| Bad Request
|403| Forbidden
|404| Not Found
|429| Too Many Requests
|500| Internal Server Error



## HTTP is stateless

- State (or context) is never saved between requests
- The server will forget who you are each time
- Each new request needs to provide the right context

![](assets/stateless_server.png)

### This can be solved with cookies

<img src="assets/cookies.png" style="width: 100px" >

![](assets/state_cookies.png)

<a name="dev-tools"></a>
### Examining Requests and Responses in the Chrome Dev Tools

---

Let's explore HTTP resources. We'll start by looking at HTTP requests and responses using the Chrome Inspector.

* In Chrome, open up the developer tools:
    - right click and select "Inspect Element" 
    - OR `cmd` + `opt` + `i` on OS X
    - OR `shift` + `ctrl` + `i` on Linux
* Select the Network tab. It should look something like this:

<img src="assets/chrome_inspector.png" width="750px"/>

If you don't see the network tab as an option, you might have to click the `>>` arrows at the top to reveal more tabs:

<img src="assets/network_tab.png" width="250px"/>

* Next, go to the URL https://generalassemb.ly/.

You should be able to see a few HTTP requests and responses in the Network tab. For each request you'll see a **path**, **method**, **status**, **type**, and **size**, along with information about how long it took to get each of these resources.
  * Most of this information comes from the HTTP request and response.*
  * Some HTTP requests are for CSS, JavaScript, and images that are referenced by the HTML.
  * Select `generalassemb.ly` in the path column on the far left.
  * Select the Headers tab. **Headers** are metadata properties of an HTTP request or response, separate from the body of the message.

<a id="python-requests"></a>
### Making Requests Using the `requests` Library

In order to make the most of the requests library, we will make a simple request and inspect the common elements that are provided by the library.

> There's also a great [quickstart guide](http://docs.python-requests.org/en/master/user/quickstart/).

In [1]:
import requests

url = "https://generalassemb.ly/"

In [2]:
result = requests.get(url)
result

<Response [200]>

In [3]:
result.request.headers

{'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

In [4]:
result.headers

{'Connection': 'keep-alive', 'Server': 'nginx', 'Date': 'Wed, 18 Dec 2019 20:24:26 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Status': '200 OK', 'Cache-Control': 'max-age=0, private, must-revalidate', 'Strict-Transport-Security': 'max-age=15552000', 'X-Xss-Protection': '1; mode=block', 'X-Request-Id': 'ce6a4e62-5831-465a-9e75-7388b9b35776', 'Etag': 'W/"54945185bad39cb63c1d97933c7bcb9c"', 'X-Frame-Options': 'ALLOW-FROM https://*.optimizely.com', 'X-Runtime': '0.158930', 'X-Content-Type-Options': 'nosniff', 'Set-Cookie': 'metro=%7B%22metro%22%3A%7B%22id%22%3A21%2C%22name%22%3A%22Toronto%22%2C%22slug%22%3A%22toronto%22%7D%2C%22geolocated_location%22%3A%22Toronto%2C+Canada%2C+North+America%22%2C%22distance_from_metro%22%3A2.662056046343685%2C%22latitude%22%3A43.6653%2C%22longitude%22%3A-79.4343%2C%22prioritize_online%22%3Afalse%7D; domain=.generalassemb.ly; path=/; expires=Wed, 18 Mar 2020 20:24:26 -0000; secure, analytics_uuid=fe89d86c-70a1-45f8-82c

In [5]:
result.text

