# Basic HTTP Homework

In this homework notebook, we will use the bare-bones HTTP library, `myhttp`, to perform requests with a few web servers and work with the results.

Recall the three functions provided by the module:

-   `makeConnection(host, port)`: Establish a TCP connection from the client machine to a server
     at the given machine `host` and listening at the given `port`. Returns the socket connection.

-   `sendStringRequest(connection, s)`: Given an established socket `connection`, take `s`, a 
    string, and send it over the connection. If line endings in `s` are `'\r\n'`, the
    string is sent without modifications, but if line endings are `'\n'`,
    like in typical Python strings, these are replaced with `'\r\n'`.

-   `receiveStringResponse(connection)`: Receive and return a string HTTP response from established 
    socket `connection`.  To obtain a Python string, `'\r\n'` are translated
    to `'\n'` line endings.  Further, the `Content-Length` header line
    of the response is used to determine the length to retrieve as the
    body of the response.  This function should not be used for non-text body responses.

In [1]:
import myhttp
import re

**Q1:** Suppose we want to retrieve a text file resource from a web server on machine `hadoop2.mathsci.denison.edu` where the resource path is `/static/chapter6/foo`.  Write a function
```
get_foo()
```
that returns the full string HTTP response.  If you need to, look at our class example, as well as the databook (http://hadoop2.mathsci.denison.edu/databook/clientserver.html) for complete information on correct HTTP syntax.

In [2]:
### BEGIN SOLUTION
def get_foo():
    host = 'hadoop2.mathsci.denison.edu'
    request="""GET /static/chapter6/foo HTTP/1.1
Host: hadoop2.mathsci.denison.edu
\n"""
    connection = myhttp.makeConnection(host, port=80)
    myhttp.sendStringRequest(connection, request)
    result = myhttp.receiveStringResponse(connection)
    connection.close()
    return result
### END SOLUTION
result = get_foo()
print(result)

Error connecting to ('hadoop2.mathsci.denison.edu', 80)


AttributeError: 'NoneType' object has no attribute 'sendall'

In [None]:
result = get_foo()
assert result[:15] == "HTTP/1.1 200 OK"
assert "Content-Length: 201" in result
assert "/users/bressoud/datasystems/web/static/chapter6/foo" in result

**Q2:** Now write a function to retrieve a text file resource from a web server on machine `hadoop2.mathsci.denison.edu` where the resource path is `/static/chapter6/bar.txt`.  Write a function
```
get_bar()
```
that returns the full string HTTP response.

In [6]:
### BEGIN SOLUTION
def get_bar():
    host = 'hadoop2.mathsci.denison.edu'
    request="""GET /static/chapter6/bar.txt HTTP/1.1
Host: hadoop2.mathsci.denison.edu
\n"""
    connection = myhttp.makeConnection(host, port=80)
    myhttp.sendStringRequest(connection, request)
    result = myhttp.receiveStringResponse(connection)
    connection.close()
    return result
### END SOLUTION
result = get_bar()
print(result)

HTTP/1.1 200 OK
Server: nginx/1.10.3 (Ubuntu)
Date: Tue, 30 Oct 2018 18:11:20 GMT
Content-Type: text/plain
Content-Length: 208
Last-Modified: Sat, 27 Oct 2018 21:53:27 GMT
Connection: keep-alive
ETag: "5bd4de57-d0"
Expires: Thu, 29 Nov 2018 18:11:20 GMT
Cache-Control: max-age=2592000
Accept-Ranges: bytes

This is a file in the filesystem of the web server.
It is at path /users/bressoud/datasystems/web/static/chapter6/bar.txt
  in the server's filesystem,
but that maps to resource path /static/chapter6/bar.txt



In [7]:
result = get_bar()
assert result[:15] == "HTTP/1.1 200 OK"
assert "Content-Length: 208" in result
assert "/users/bressoud/datasystems/web/static/chapter6/bar.txt" in result

**Q3:** We should be able to generalize our function to be able to retrieve HTTP responses from any valid resource path on the `hadoop2`.  Write a function
```
get_hadoop2(resource_path)
```
that makes an HTTP GET request to hadoop2 for the resource specified by `resource_path`.

In [8]:
### BEGIN SOLUTION
def get_hadoop2(resource_path):
    host = 'hadoop2.mathsci.denison.edu'
    request="GET {} HTTP/1.1\nHost: hadoop2.mathsci.denison.edu\n\n".format(resource_path)
    connection = myhttp.makeConnection(host, port=80)
    myhttp.sendStringRequest(connection, request)
    result = myhttp.receiveStringResponse(connection)
    connection.close()
    return result
### END SOLUTION
result = get_hadoop2("/static/chapter6/foo")
print(result)

HTTP/1.1 200 OK
Server: nginx/1.10.3 (Ubuntu)
Date: Tue, 30 Oct 2018 18:11:23 GMT
Content-Type: application/octet-stream
Content-Length: 201
Last-Modified: Sat, 27 Oct 2018 21:51:15 GMT
Connection: keep-alive
ETag: "5bd4ddd3-c9"
Expires: Thu, 29 Nov 2018 18:11:23 GMT
Cache-Control: max-age=2592000
Accept-Ranges: bytes

This is a file in the filesystem of the web server.
It is at path /users/bressoud/datasystems/web/static/chapter6/foo 
  in the server's filesystem,
but that maps to resource path /static/chapter6/foo



In [9]:
result = get_hadoop2("/static/chapter6/foo")
assert result[:15] == "HTTP/1.1 200 OK"
assert "Content-Length: 201" in result
assert "/users/bressoud/datasystems/web/static/chapter6/foo" in result

**Q4:** We can generalize even our function even further by specifying the server host as well as the resource path.  Write a function
```
get_http(host, resource_path)
```
that makes an HTTP GET request to host `host` for the resource specified by `resource_path`.  Make sure you include the given `host` in the `Host: ` header line of the HTTP request as well as using it for making the connection.

In [10]:
### BEGIN SOLUTION
def get_http(host, resource_path):
    request="GET {} HTTP/1.1\nHost: {}\n\n".format(resource_path, host)
    connection = myhttp.makeConnection(host, port=80)
    myhttp.sendStringRequest(connection, request)
    result = myhttp.receiveStringResponse(connection)
    connection.close()
    return result
### END SOLUTION
result = get_http("hadoop2.mathsci.denison.edu", "/static/chapter6/brics.csv")
print(result)

HTTP/1.1 200 OK
Server: nginx/1.10.3 (Ubuntu)
Date: Tue, 30 Oct 2018 18:11:24 GMT
Content-Type: application/octet-stream
Content-Length: 187
Last-Modified: Sat, 27 Oct 2018 21:58:28 GMT
Connection: keep-alive
ETag: "5bd4df84-bb"
Expires: Thu, 29 Nov 2018 18:11:24 GMT
Cache-Control: max-age=2592000
Accept-Ranges: bytes

,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.10,143.5
IN,India,New Delhi,3.286,1252
CH,China,Beijing,9.597,1357
SA,South Africa,Pretoria,1.221,52.98


In [11]:
result = get_http("hadoop2.mathsci.denison.edu", "/static/chapter6/brics.csv")
assert result[:15] == "HTTP/1.1 200 OK"
assert "Content-Length: 187" in result
assert ",country,capital,area,population" in result

With our ability to make basic HTTP requests, which yields HTTP responses, we now need to build some ability to process the result and extract the various parts of the result.  In particular, we want to be able to extract:
- the result status, which is the three digits after the HTTP version in the result line.
- the set of lines containing the headers
- the body of the result

**Q5:** Write a function
```
extractStatus(result)
```
that finds and returns the integer version of the three digit result code.

In [12]:
### BEGIN SOLUTION
import re
def extractStatus(result):
    pattern = r'HTTP/1.1 (\d\d\d)'
    m = re.match(pattern, result)
    return int(m.group(1))
### END SOLUTION
result = get_foo()
extractStatus(result)

200

In [13]:
result1 = get_foo()
assert extractStatus(result1) == 200
result2 = get_hadoop2("/static/chapter6/baz")
print(result2[:22])
assert extractStatus(result2) == 404

HTTP/1.1 404 Not Found


**Q6:** Write a function
```
extractHeaders(result)
```
that finds all the result header lines (after the start line and before the blank line in the result) and uses the header: value pair to build a dictionary whose keys are the header types and the values are the string up to, but not inlcuding, the newline that terminates the header line.

In [14]:
### BEGIN SOLUTION
import re
def extractHeaders(result):
    hdrspattern = r'\n(.*\n)\n'
    oneheader = r'^([\w-]+): (.*)$'
    D = {}
    m = re.search(hdrspattern, result, flags=re.S)
    for line in m.group(1).split('\n'):
        #print(line)
        m2 = re.search(oneheader, line)
        if m2:
            D[m2.group(1)] = m2.group(2)
    return D
### END SOLUTION
result = get_foo()
extractHeaders(result)

{'Server': 'nginx/1.10.3 (Ubuntu)',
 'Date': 'Tue, 30 Oct 2018 18:11:28 GMT',
 'Content-Type': 'application/octet-stream',
 'Content-Length': '201',
 'Last-Modified': 'Sat, 27 Oct 2018 21:51:15 GMT',
 'Connection': 'keep-alive',
 'ETag': '"5bd4ddd3-c9"',
 'Expires': 'Thu, 29 Nov 2018 18:11:28 GMT',
 'Cache-Control': 'max-age=2592000',
 'Accept-Ranges': 'bytes'}

In [15]:
result1 = get_foo()
hdrs1 = extractHeaders(result1)
assert len(hdrs1) == 8
assert 'Content-Length' in hdrs1
result2 = get_hadoop2("/static/chapter6/baz")
hdrs2 = extractHeaders(result2)
assert len(hdrs2) == 5

AssertionError: 

**Q7:** Write a function
```
extractBody(result)
```
that returns the body of the result.

In [69]:
### BEGIN SOLUTION
import re
def extractBody(result):
    bodypattern = r'\n\n(.*)$'
    m = re.search(bodypattern, result, flags=re.S)
    return m.group(1)
### END SOLUTION
result = get_foo()
extractBody(result)

"This is a file in the filesystem of the web server.\nIt is at path /users/bressoud/datasystems/web/static/chapter6/foo \n  in the server's filesystem,\nbut that maps to resource path /static/chapter6/foo\n"

In [70]:
result1 = get_foo()
body1 = extractBody(result1)
assert len(body1.split('\n')) == 5
result2 = get_hadoop2("/static/chapter6/baz")
body2 = extractBody(result2)
assert body2 == '<html>\n<head><title>404 Not Found</title></head>\n<body bgcolor="white">\n<center><h1>404 Not Found</h1></center>\n<hr><center>nginx/1.10.3 (Ubuntu)</center>\n</body>\n</html>\n'