# Application-Layer Protocols
## Overview
### What You'll Learn
In this section, you'll learn
1. What application-layer protocols define
1. How HTTP, an application-layer protocol, works

### Prerequisites
Before starting this section, you should have an understanding of
1. [Socket Programming](https://colab.research.google.com/github/HackBinghamton/ComputerNetworksWorkshop/blob/main/intro-sockets/introduction-to-socket-programming.ipynb)

### Introduction
Application-layer protocols make up the meat of what you'll need to know when working with sockets. When you're figuring out how you want to format messages in your program, or how you want to encrypt data over the wire, you're designing an application-layer protocol.

## What are Application-Layer Protocols?

Let's consider an application like Snapchat. When you send someone a message on Snapchat, there are lots of things that have to happen:

1. Notifying the other user that recent messages have been read
1. Notifying the other user that you're typing or have the chat window open
1. Sending the chat message itself

As a developer, you need to come up with a protocol that lets you send each of these pieces of information without any ambiguity. This kind of protocol development is all in the application layer.

When writing an application-layer protocol, you need to decide:

* What kinds of information do I need to send?
* In what format should I send it?
* Should there be a pattern of sending and receiving between clients and servers?

...and many other questions.

## Hyper-Text Transfer Protocol (HTTP)

HTTP is the protocol responsible for how we interface with web servers.

The developers of HTTP built it upon a system of "requests" and "responses", where every request will be met with a corresponding response.

### Requests

The basic format of an HTTP request is:

```
<METHOD> <PATH> HTTP/1.1
Host: <DOMAIN>
<OTHER HEADERS if needed>
------ blank line only if there is data ------
<DATA>
------------ necessary blank line ------------
------------ necessary blank line ------------
```

* `METHOD` is an HTTP method like `GET`, `POST`, `PUT`, etc. (`GET` just asks for the contents of a page)
* `PATH` is a file path (`/index.html` is the default homepage for many webservers)
* `DOMAIN` is the domain name of the application (e.g. `www.amazon.com` if you went to Amazon)
* `OTHER HEADERS` could be things like cookies
* `DATA` is only needed when sending data to a website, as in the `POST` and `PUT` methods

#### Example

A GET request for Amazon's `robots.txt` file (which does exist) might look like this:

```
GET /robots.txt HTTP/1.1
Host: www.amazon.com
------ Blank line ------
------ Blank line ------
```

We can represent these blank lines with `"\r\n\r\n"` in Python.

#### Exercise

Send an HTTP GET request for the `/robots.txt` file of any HTTP page of your choice (*not* HTTPS). Most websites have this file.

Remember that you'll need to create your socket, connect to the page (you can use the domain name as the IP address), and then send the request in the correct format.

*Hint:* You have to send your request in one piece!

In [None]:
import socket

# Your code here!

If all goes well, you'll probably end up with a response (hopefully not a 400 Bad Request -- that means you did it wrong!).

This reponse, too, is standardized so that our browsers can easily interpret it!

### Responses

Responses come in the following format:

```
HTTP/1.1 <RESPONSE CODE> <RESPONSE NAME>
<HEADERS>
------------ necessary blank line ------------
------------ necessary blank line ------------
<DATA>
```

* `RESPONSE CODE` is a number like 200, 301, 400, 404, etc.
* `RESPONSE NAME` is the descriptive name of that error code (e.g. Not Found for 404)
* `HEADERS` is other things about the server, the date, cookies, etc. -- it all depends on the server

#### Example

Here's an example response for requesting Amazon's `robots.txt` file -- it might not be what you expect:

```
HTTP/1.1 301 Moved Permanently
Server: CloudFront
Date: Mon, 26 Oct 2020 03:45:03 GMT
Content-Type: text/html
Content-Length: 183
Connection: keep-alive
Location: https://www.amazon.com/robots.txt
X-Cache: Redirect from cloudfront
Via: 1.1 a1882a601559755135741e91a9f86c28.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: EWR52-C4
X-Amz-Cf-Id: 6SgE4iurcmt6gyuOfppoKvQN_61-mEBC8IuQBlser_McshWyNvSlJg==

<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>CloudFront</center>
</body>
</html>
```

A 301 response tries to redirect the requester to the location specified in the `Location` header -- in this case, we can see that they're trying to redirect me to the HTTP**S** version of the website.

Sadly, interacting with HTTPS is remarkably difficult to code, unless you look into the [SSL library](https://docs.python.org/3/library/ssl.html)!

### Challenge

Now that you know:

1. How to set up a client socket
2. How to connect to a website over HTTP
3. How to send an HTTP request
4. The structure of an HTTP response

You know everything you need to be able to write a simple web browser!

Try writing a program that takes a user through the following loop:

1. Ask for a URL (i.e. `domainname.com/item.html`)
2. Request that resource from the specified web server
3. Parse the response and print out the content (without the headers!)

*Bonus Points:* Remove all the HTML tags!

*Extra Bonus Points:* Keep links from `<a href="link.com">` tags!

In [None]:
import socket

# Your code here!
