# Denison DA210/CS181 Homework 5.a - Step 1

Before you turn this notebook in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells**.

Make sure you fill in any place that says `# YOUR CODE HERE` or "YOUR ANSWER HERE".

---

#### Setup

Note that for these exercises, we'll use `mysocket.py`, provided with the book, and available to you now in `modules/` in this repository.

In [None]:
import os
import os.path
import sys
import importlib

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import mysocket as sock
importlib.reload(sock)

---

## Part A: Identifying Resources with URLs and URIs

Uniform Resource Identifiers (URIs) and Uniform Resource Locators (URLs) define a standard notation for specifying the files, data, and resources of the internet.  Note that URI is the broader term, so all URLs are URIs.

Using an explicit protocol scheme, host location, and resource path, URLs can be used to uniquely identify a resource at a specific location on the internet.  These components are summarized in the following table:

Item | Description
:----|:--------------
_protocol_ | The network stack layer above TCP; we'll use `http` and `https`
_location_ | The server/host machine within the internet
_port_     | The program used for connections; we usually use port 80 for `HTTP` web server programs and port 443 for HTTPS web server programs
_resource-path_ | Identifies a particular resource within the host/port endpoint; could also include a query string

The general form of a URL is given by the following (shown with extra spaces for readability):

_protocol_ : // _location_ [ : _port_ ] _resource-path_

**Q1:** Type the following URL in a web browser: http://datasystems.denison.edu:80/topnames.html.  What are the _protocol_, _location_, _port_, and _resource-path_ for this URL?

Hint: Be very careful about specifying where any forward slashes ( `/` ) belong!

YOUR ANSWER HERE

**Q2:** Now, use a search engine to search for "Denison University".  What are the _protocol_, _location_, _port_, and _resource-path_ for the resulting URL? 

YOUR ANSWER HERE

---

## Part B: HTTP Definition

Web browsers are simply programs that request data (often HTML of web pages) from web servers, and display them to the user.  HTTP exists to enable these requests.

As discussed in class, HTTP is an application protocol, and is therefore built on TCP and the sockets interface.

0. The web server program is in an "always ready" state, waiting with an unresolved TCP socket endpoint, listening for requests for port 80 (for HTTP).  
1. A client (e.g., your web browser or this notebook) makes a TCP connection to the server endpoint, and a bidrectional communication is initiated.  
2. The client constructs an _HTTP request_.  
3. The request is sent:  
3a. The request is sent over the TCP socket connection to the server.  
3b. The server receives the request and processes it, constructing an _HTTP response_.  
4. The response is sent:  
4a. The response is sent over the TCP socket connection back to the client.  
4b. The client receives the response and processes it.  
5. Both the client and server close the TCP socket connection.

Note that steps 2-4 can happen just once or many times, depending on the HTTP request parameters.

A module, `mysocket`, is included with our textbook, and imported above as `import mysocket as sock`.  It provides the following helper functions:

Function                                           | Description
---------------------------------------------------|-------------------------------------------------------------------
`makeConnection(host, port)`                       | Establish a TCP connection from the client machine to a server at the given machine `host` and listening at the given `port`. This returns the socket connection.  This corresponds to Step 1 of the client-side steps.
`sendString(conn, s)`                              | Given an established socket `conn`, take `s`, a string, and send it over the connection.  This corresponds to Step 3 of the client-side steps, where `s` would define all the characters making up a complete HTTP request.
`receiveTillClose(conn)`                            | This performs a socket `recv()` from the connection, consuming data until the server closes the connection.  This returns the complete HTTP response message. This corresponds to Step 4 of the client-side steps, and assumes that a connection close will define the end of the response message.

-----------------------------------------------------------------------------------------------------------------------

Let's now walk through the steps of communication:

**Step 1**

In [None]:
connection = sock.makeConnection("httpbin.org", 80)
assert connection is not None

**Step 2**

In [None]:
request_line = 'GET / HTTP/1.1\r\n'     # we've already seen this
host_line = 'Host: httpbin.org\r\n'     # required for HTTP 1.1
one_and_done = 'Connection: close\r\n'  # specifies whether to keep connection alive
empty_line = '\r\n'                     # we need this before the (optional) body

request_message = request_line + host_line + \
                  one_and_done + empty_line
                  
print(request_message)

**Step 3**

In [None]:
sock.sendString(connection, request_message)

**Step 4**

In [None]:
response = sock.receiveTillClose(connection)

**Step 5**

In [None]:
connection.close()

We can view the first 250 characters of the response (lines are separated by `'\r\n'`):

In [None]:
print(response[:250])

---

## Part C: Practice with HTTP Requests

**Q3:** Suppose we wish to retrieve (GET) a file via HTTP (so port 80) from `datasystems.denison.edu`.  The resource path of the file is `/data/ind0.json`.  We wish to use version 1.1 of HTTP and to request that the connection be closed after a single request/reply exchange.  We will need a header line to satisfy the HTTP 1.1 requirement of a valid `Host` header.  Write a sequence of code to compose a valid HTTP request as a Python string, and assign the result to `message`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

print(message)
print("--------------------")

In [None]:
# Testing cell
assert type(message) == str
assert message[:3] == "GET"
assert message[4:4+len("/data/ind0.json")] == "/data/ind0.json"
assert "Host: datasystems.denison.edu" in message
assert message.count('\r\n') == 4
assert message[-4:] == '\r\n\r\n'

**Q4:** Write a sequence of code to establish a connection to the host `datasystems.denison.edu` at port 80, to send the string `message` from the previous problem to the host, receive the reply from the host until the server closes the connection, assigning the reply to `reply`, and close the connection.  Note: if the request is not completely correct, a network connection can wait forever for a reply that will never come.  So if you have difficulty here, double check your answer to the previous problem.

In [None]:
# Step 1
# YOUR CODE HERE
raise NotImplementedError()

# Step 2
# YOUR CODE HERE
raise NotImplementedError()

# Step 3
# YOUR CODE HERE
raise NotImplementedError()

# Step 4
# YOUR CODE HERE
raise NotImplementedError()

# Step 5
# YOUR CODE HERE
raise NotImplementedError()

print(reply)

In [None]:
# Testing cell
assert type(reply) == str
assert "200 OK" in reply
assert "application/json" in reply
assert reply.endswith("19485.4}}}")

**Q5:** Suppose we want to generalize the scenario from the first exercise, where the two things that can change are the *host location* and the *resource path*.  For example, we might want to change the host to `httpbin.org` and the resource path to `/`, or many other combinations.  Write a function
```
    buildRequest(location, resource)
```    
that constructs and returns a Python string containing a valid HTTP GET request that incorporates the parameters `location` and `resource` into the request at the appropriate places, and includes the appropriate header lines (for the required `Host` and to request the server close the connection after the exchange).

Note: Your function should not actually _issue_ the request.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

print(buildRequest("httpbin.org", "/get"))
print("---------------------")

In [None]:
# Testing cell
r1 = buildRequest("datasystems.denison.edu", "/data/ind0.json")
assert r1[:3] == "GET"
assert r1[4:4+len("/data/ind0.json")] == "/data/ind0.json"
assert "Host: datasystems.denison.edu" in r1
assert r1.count('\r\n') == 4
assert r1[-4:] == '\r\n\r\n'

r2 = buildRequest("httpbin.org", "/get")
assert r2[:3] == "GET"
assert r2[4:4+len("/get")] == "/get"
assert "Host: httpbin.org" in r2
assert r2.count('\r\n') == 4
assert r2[-4:] == '\r\n\r\n'

**Q6:** Write a function
```
    issueRequest(location, resource)
```
that first constructs a valid HTTP GET request for `resource` at host `location`, as a Python string (using your function from the previous question), and then performs the  request-reply steps of making the connection, sending the string request, receiving a reply until the connection closes, and finally closing the client side of the connection.  The function should return the reply.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

print(issueRequest("datasystems.denison.edu", "/basic.html"))

In [None]:
# Debugging cell #1
resp1 = issueRequest("datasystems.denison.edu", "/basic.html")
print(resp1)

In [None]:
# Debugging cell #2
resp2 = issueRequest("datasystems.denison.edu", "/data/ind0.json")
print(resp2)

In [None]:
# Debugging cell #3
resp3 = issueRequest("httpbin.org", "/get")
print(resp3)

In [None]:
# Testing cell
resp1 = issueRequest("datasystems.denison.edu", "/basic.html")
assert "200 OK" in resp1
assert "text/html" in resp1
assert resp1.endswith("</html>\n")

resp2 = issueRequest("datasystems.denison.edu", "/data/ind0.json")
assert "200 OK" in resp2
assert "application/json" in resp2
assert resp2.endswith("19485.4}}}")

resp3 = issueRequest("httpbin.org", "/get")
assert "200 OK" in resp3
assert "application/json" in resp3
assert resp3.endswith(""""url": "http://httpbin.org/get"\n}\n""")

---

## Part D: HTTP Response Messages

The next set of exercises are about parsing through the reply resulting from a request.  If we consider an HTTP reply, we can partition it into a status line, the set of headers, and the body.  The exercises ask for functions that, given a reply, and parse the reply and return each of these pieces.

**Q7:** Write a function
```
    parseStatus(reply)
```
that finds and returns a Python string consisting of only the status line of a reply.  The returned value should include the line-terminating `"\r\n"`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

reply = issueRequest("datasystems.denison.edu", "/basic.html")
print(repr(parseStatus(reply)))
reply = issueRequest("datasystems.denison.edu", "/foobar.txt")
print(repr(parseStatus(reply)))

In [None]:
r1 = issueRequest("datasystems.denison.edu", "/basic.html")
s1 = parseStatus(r1)
assert s1 == "HTTP/1.1 200 OK\r\n"

r2 = issueRequest("datasystems.denison.edu", "/foobar.txt")
s2 = parseStatus(r2)
assert s2 == "HTTP/1.1 404 Not Found\r\n"

**Q8:** Write a function
```
    parseHeaders(reply)
```
that finds and returns a single Python string that starts with the first header in the reply and continues up through the last header in the reply, including the line-terminating `"\r\n"`, but *not* the empty line separating the headers from the body.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

reply = issueRequest("datasystems.denison.edu", "/basic.html")
print(repr(parseHeaders(reply)))
reply = issueRequest("datasystems.denison.edu", "/foobar.txt")
print(repr(parseHeaders(reply)))

In [None]:
# Testing cell
r1 = issueRequest("datasystems.denison.edu", "/basic.html")
h1 = parseHeaders(r1)
assert "Server: Apache" in h1
assert "Connection: close\r\n" in h1
assert "Content-Type: text/html" in h1

r2 = issueRequest("datasystems.denison.edu", "/foobar.txt")
h2 = parseHeaders(r2)
assert "Server: Apache" in h2
assert "Connection: close\r\n" in h2
assert "Content-Type: text/html" in h2

**Q9:** Write a function
```
    parseBody(reply)
```
that finds and returns a single Python string that starts with the beginning of the body (i.e. after the empty line of the reply) and continues to the end of the reply.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

reply = issueRequest("datasystems.denison.edu", "/basic.html")
print(parseBody(reply))
reply = issueRequest("datasystems.denison.edu", "/foobar.txt")
print(parseBody(reply))

In [None]:
# Testing cell
r1 = issueRequest("datasystems.denison.edu", "/basic.html")
b1 = parseBody(r1)
r2 = issueRequest("datasystems.denison.edu", "/foobar.txt")
b2 = parseBody(r2)
assert b1.startswith("<!DOCTYPE html>")
assert b1.endswith("</html>\n")
assert b2.startswith("<!DOCTYPE HTML")
assert b2.endswith("</body></html>\n")

---

---

## Part E

**Q10:** How much time (in minutes/hours) did you spend on this homework assignment?

YOUR ANSWER HERE

**Q11:** Who was your partner for this assignment?  If you worked alone, say so instead.

YOUR ANSWER HERE