# HTTP
>HTTP is the foundation of the World Wide Web, governing how information is transmitted and displayed across the internet. It establishes a standardised set of rules for communication in a client-server relationship. HTTP is located in the application layer of the TCP/IP model. It runs on top of other protocols, mainly TCP in the transport layer. 

This notebook will explore the structure and function of HTTP, including the request-response model, session management, how to make programmatic HTTP requests, and an introduction to APIs.

## Resources and URIs

HTTP allows a browser to interact with different *resources* hosted by servers, which can be web pages or parts of web pages like text data or images. A complete document is reconstructed from the different parts fetched, for instance, text, layout description, images, videos, scripts, and more. 

To do this the browser needs both the identity and the location of the resources. These two pieces of information are described by a *URI (Uniform Resource Identifier)*.

The most common form of URI is a *URL (Uniform Resource Locator)*. For example, https://google.com is the URL, which can be typed into a browser to get the Google homepage (the resource).

## The Request-Response Model
The basis of HTTP is a *request* and *response* interaction, where each request and response separately are known as *HTTP messages*. The entire interaction runs as follows:

1. The client sends a request to the server

2. The server receives and processes the request

3. The server then formulates and sends a response back to the client

4. The client interprets the response and displays the relevant data <br></br>

The request and response each comprise some different elements in order to complete their functions, but they both share the following components:

- **Headers:** Metadata of the request in key-value pairs which can be used to communicate information such as authentication details, what web browser the client is using, and what data is being transferred.

- **Body:** The body of a HTTP message contains data to be transferred from the client to the server, or vice versa. For example, in a request this might include form data. In a response it could contain the HTML of a webpage, an image, or any other data from the server.

- **Version:** The version of the HTTP protocol being used for this request and response. HTTP/3 is the most recent version.

### The Request

<p align="center"> <img src="images/http-request.png" height="405" width="604"/> </p>

In addition to the elements already discussed above, the following are also included in a HTTP request:

#### HTTP Method 
HTTP operates through a set of methods that dictates the action to be performed on a given resource. Commonly used methods are:

- `GET`: The `GET` method requests data to be sent from the server to the client. When you enter a URL into your web browser you are making a `GET` request for the website information.

- `POST`: This method allows data to be sent from a client to the server. For example, filling out a form with personal details. When you press 'Submit', a `POST` request is being made.

- `PUT`: A `PUT` request also sends data from the client to a server. The main difference is that `PUT` is idempotent, which means multiple occurences of the same `PUT` request have the same effect as only one `PUT` request. Whereas multiple occurences of the same `POST` requests mean the same data is being sent and received multiple times. Also, in a `PUT` request, the client specifies the exact URL to be updated update with the supplied data, whereas in a `POST` request, how the information is utilised is left up to the server.

- `DELETE`: This method deletes the specified resource

#### Resource URI: 

Consider the following URI as an example: `http://www.example.com:80/path/to/file.html?key1=value1#anchor`

- **Protocol:** `http://` is the protocol. It indicates the browser must use HTTP for this request. Usually it is HTTP but browsers can also handle other protocols such as `ftp://` for the File Transfer Protocol, or `mailto:` to open a mail client. 

- **Domain:** `www.example.com` is the domain name. This is the human-readable translation of an IP addresss, managed by the DNS protocol.

- **Port:** `:80` is the port in this example. It points to a virtual space on the web server where the resource is stored. It is usually omitted if the web server uses the standard ports of the HTTP protocol (80 for HTTP and 443 for HTTPS).

- **Path:** `/path/to/file.html` is the path to the resource itself on the web server

- **Query:** `?key1=value1` are extra pieces of information provided to the web server. These are used by the web server to decide in more detail how it should send the requested information. For example, when you Google something, that thing then becomes part of a query: `https://google.com?q=something`.

- **Anchor:** `#anchor` points to a specific part of the resource itself. Also known as a *fragment*, it could be a particular heading on a web page, for example.
    
#### Request Headers
These are some common things to be included as request headers:

- **`Content-Type`:** The media type of the data in the request body. It informs the server or client about how to interpret and handle the data. Common values for the `Content-Type` header include:

    - `application/json`: This indicates that the request or response body contains data in JSON format

    - `application/xml`: This signifies that the data is in XML format. XML (eXtensible Markup Language) is another popular format for representing structured data.

- **`Authorisation`:** This header element carries credentials or tokens to prove the identity of the client making the request

- **`Accept`:** This allows the client to specify the media types it can accept in the response. It informs the server about the preferred format of the response data, for example: `Accept: application/json`.

- **`User-Agent`:** Identifies the client application or user agent that is making the request. It provides information about the software, device, and platform used by the client. 

    For example: `User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36.`

    In this example, the User-Agent indicates that the request is coming from a web browser running on Windows 10 (Windows NT 10.0). The browser is Google Chrome version 91.0.4472.124, and it utilizes the WebKit rendering engine.


#### Request Body
The body of a HTTP request contains any information needing to be sent to the server. This is optional; requests that send data to the server such as `PUT` or `POST` requests typically require a body, but methods that only fetch information from the server usually do not need a body.

Bodies can be broadly divided into two categories:

- **Single-resource bodies:** One single file, defined by the two headers: `Content-Type` and `Content-Length`

- **Multi-resource bodies:** Multiple parts to the body, each containing a different bit of information. This is typically associated with HTML forms.

### The Response
<p align="center"> <img src="images/http-response.png" height="373" width="751"/> </p>

#### Status Line 
This is the first line of an HTTP response, it contains the following information:
    
- **Protocol and Version:** `HTTP/1.1` in the image above

- **Status Code:** A three digit code which indicates if the request was successful or not, and why. These are the groups of status codes:

    - **100 - 199 (Informational):** The server acknowledges and is processing the request

    - **200 - 299 (Success):** The server successfully received, understood, and processed the request

    - **300 — 399 (Redirection):** The server received the request, but there’s a redirect to somewhere else, which means the requested resource is in a different location
    
    - **400 — 499 (Client Error):** The server couldn’t find (or reach) the requested information, most likely due to something wrong or missing from the request on the client's side
    
    - **500 - 599 (Server Error):** The client made a valid request, but the server failed to complete the request

#### Response Headers
Response headers use the same format as request headers: key-value pairs of strings. These are some examples of response headers:

- **`Cache-Control`:** This header specifies how the response should be cached (saved in memory by the client)

- **`Location`:** The location header is used to redirect the client to a different URL. This is commonly used for HTTP redirects, such as when a user submits a form and is redirected to a confirmation page.

#### Response Body
The response body contains the actual requested information. In most web requests, this is HTML data that a web browser will translate into a webpage. For example, in the above image the body is HTML data displaying the text "Hello, World!".

The response body can often be translated into a format that is more efficient for storage or transmission, and then translated back later, a process known as *serialisation*.

## HTTP Cookies
>A *cookie* is a small piece of data sent from a server to a web browser that is used to communicate state information. Browsers can store cookies, create new cookies, modify existing ones, and send them back to the server with later requests. So while the core of HTTP itself is stateless, HTTP cookies allow for stateful sessions. This technique is a type of *session management*, other techniques include communicating values in URL query parameters or headers.

One example use case of HTTP cookies is to retain a user's sign-in state while they are logged in to a website. This would enable the user to navigate to other pages of the website and for them to receive a tailored experience, since the browser has retained state information about the user.

## HTTPS (HyperText Transfer Protocol Secure)
>HTTPS is an extension of HTTP that adds an extra layer of security through encryption. It provides protection of sensitive data from unauthorised access. 

These are the key aspects of HTTPS:

- **Encryption:** HTTPS uses encryption protocols such as *SSL (Secure Sockets Layer)* or *TLS (Transport Layer Security)*, which operate between the application layer and the transport layer of the TCP/IP suite to encrypt the data exchanged between the client and server. This ensures that the information transmitted cannot be easily intercepted or deciphered by unauthorised parties.

- **Digital Certificates:** HTTPS relies on digital certificates issued by trusted Certificate Authorities (CAs) to verify the authenticity of the server. Certificate Authorities are organisations that are trusted to issue digital certificates by the makers of browsers and operating systems. The certificates establish a secure connection and enable the client to trust that it is communicating with the intended server.

- **Port:** HTTPS typically uses port 443 for communication, while HTTP uses port 80. The use of a different port helps differentiate between secure and non-secure connections.

## Making HTTP Requests Programmatically
It's important for a developer to be able to make HTTP requests outside the scope of a web browser. This allows for testing and debugging of web applications and infrastructure, automation and HTTP requests can be integrated into scripts or applications to allow for more complex workflows.

### cURL
`cURL` is a tool used for querying URLs from the command line. It works by performing a `GET` request to the given URL. For example:

- Open a terminal and type `curl parrot.live`. You should be rewarded with a dancing ASCII parrot as the response.

- `curl cheat.sh` responds with a cheatsheet for `cURL` itself

#### cURL Options
`cURL` is a simple tool but there are various flag options which can customise the output:

- **`-o <file>`:** This flag writes the output to a file 

- **`-v`:** The verbose flag, `cURL` provides a detailed output of its operations

- **`-i`:** Include the HTTP header in the output

### The Python Requests Library
Python provides several libraries and modules that make it straightforward to send HTTP requests. These libraries abstract the complexities of HTTP communication, allowing developers to interact with web services using simple and intuitive code. One commonly used library for making HTTP requests in Python is `requests`.

Before using the `requests` library, you need to install it. Run the following command to install the `requests` library:
 
```
pip install requests
```

The example below demonstrates how to make a `GET` request to retrieve some simple JSON information stored on a web page using the `get()` function.

In [9]:
import requests

# Send a GET request to the URL to retrieve the JSON information 
response = requests.get('http://httpbin.org/json')

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Access the response data
    data = response.text
    print(data)

# If the request was not successful, print the status code and response text
else:
    print(f"Request failed with status code: {response.status_code}")
    print(f"Response Text: {response.text}")

{
  "slideshow": {
    "author": "Yours Truly", 
    "date": "date of publication", 
    "slides": [
      {
        "title": "Wake up to WonderWidgets!", 
        "type": "all"
      }, 
      {
        "items": [
          "Why <em>WonderWidgets</em> are great", 
          "Who <em>buys</em> WonderWidgets"
        ], 
        "title": "Overview", 
        "type": "all"
      }
    ], 
    "title": "Sample Slide Show"
  }
}



We use `requests.get()` to send a `GET` request to the resource at `http://httpbin.org/json`. The response from the server is stored in the `response` object. If the request was successful (status code 200), we print the JSON data. If the request was not successful, we print the status code and the response text, which provides additional information about the error.

## HTTP APIs
>An *API (Application Programming Interface)* is a set of rules and protocols that allow different software applications and web services to communicate with each other, acting like a bridge. APIs can use specific, custom URLs called *endpoint URLs* which map to a third-party application or resource that a client can then interact with. An API then allows you to create routes within this endpoint that enable more granular organisation and management of the services and data within that resource. 

The term API is a broad concept used in all areas of programming, but this lesson is specifically about a type of API known as a *HTTP API* (sometimes referred to as a *RESTful API*) which is an API that allows communication between web services using HTTP.

As a concrete example, consider this scenario: you are building an application for a hotel's guests, and you want to add a feature to display the current weather conditions. Instead of 'reinventing the wheel' by building a system to gather and process raw meteorological data, you can use an API from a service like the Met Office, which uses the endpoint URL `http://metoffice-api.gov.uk/data` (not the actual Met Office API). 

Your application sends a request to the Met Office's API with the specifics about what data it needs, such as the location you want to know the weather for. The specific route you would use might then be `http://metoffice-api.gov.uk/data/locationId`. The API then processes your request and returns the information you requested, which your application can then display to the user.

### API Authentication
Authentication in APIs involves verifying the identity of a client, before granting access to protected resources. It ensures that only authorised individuals or entities can interact with the API and perform specific actions. Authentication typically occurs during the initial stages of an API request, where the client provides credentials to prove its identity.

The primary objectives of authentication in APIs are:

- **Identity Verification:** Ensuring that the client's claimed identity is valid and can be trusted
- **Access Control:** Granting or denying access to specific resources based on the client's authenticated identity

#### Bearer Token Authentication
One of the most common forms of authentication for APIs is the use of *bearer tokens*, which are credentials that represent a client's authorisation for accessing protected resources. These are some of the most common types of authentication tokens: 

- **Access Tokens:** Access tokens are temporary credentials issued to authenticated clients after a successful authentication process. Access tokens are used to authorise subsequent API requests. They are typically short-lived and have an expiration time. Access tokens can be more secure than API keys as they have a limited validity period, reducing the risk if they are compromised.

- **Refresh Tokens:** Refresh tokens are additional tokens issued alongside access tokens. They are used to obtain new access tokens when the original access token expires. Refresh tokens are usually long-lived and can be used to maintain an authenticated session without requiring the client to re-enter their credentials. Refresh tokens need to be securely stored and transmitted since they grant the ability to obtain new access tokens.

-  **API Keys:** API keys are unique identifiers issued to clients by the API provider, so that clients can include them in API requests to authenticate themselves. API keys are typically long alphanumeric strings, and do not expire. Clients include the API key in the request headers, such as the `X-API-Key` header.

Consider the below example, which is making a HTTP GET request to a free weather API, specifying longitude and latitude values in the URL, and passing an API key in as a request header for authentication.

In [None]:
response = requests.request(
    method="GET", 
    url='https://api.openweathermap.org/data/weather?lat=44.34&lon=10.99',
    headers={'X-API-Key': 'e0cec6bdc76d35d2842baf84a3'}
    )

## Key Takeaways

- **HTTP Basics:** HTTP is the protocol used for communication on the web, operating in a client-server model with request and response messages.

- **Resources and URIs:** Resources are where web pages or parts of web pages are stored. URIs are used to identify and locate resources; URLs are the most common type of URI

- **Request Components:** A HTTP request comprises a method, body, version, and header, where the header contains the resource URI

- **Response Components:** A HTTP response comprises a status line (protocol and version, and a status code), headers and body

- **Session Management:** Cookies retain state information, allowing for persistent sessions despite HTTP being stateless

- **Programmatic HTTP Requests:** Tools like cURL and Python's Requests library allow for testing, debugging, and automating HTTP requests

- **HTTPS:** HTTPS adds security through encryption (SSL/TLS) and uses port 443 instead of port 80

- **HTTP APIs:** An API is a set of rules and protocols that allow different software applications and web services to communicate with each other. A HTTP API (or RESTful API) is one that uses HTTP. 

- **API Authentication:** Ensures only authorised clients can access API resources. Mechanisms include Basic Authentication, Bearer Token Authentication, OAuth 2.0, and JWTs.