# Denison CS-181/DA-210 Homework

---

## HTTP and `requests` Intro Homework

In [1]:
import os
import sys
import json

def add_modules():
    """
    Starting at the current directory and proceeding up the file system
    tree, search for a directory named `modules`.  If found, and if not
    already there, add to the Python module search path.
    
    Params: None
    
    Return: None
    """
    directory = "."
    levels = 0
    while not os.path.isdir(os.path.join(directory, "modules")) and \
          levels < 5:
        directory = os.path.join(directory, "..")
        levels += 1
    module_path = os.path.abspath(os.path.join(directory, "modules"))
    if os.path.isdir(module_path):
        if not module_path in sys.path:
            sys.path.append(module_path)

add_modules()
import util
import mysocket as sock

## Programming Response Replies

The next set of exercises are about parsing through the reply resulting from a request.  If we consider an HTTP reply, we can partition it into a status line, the set of headers, and the body.  The exercises ask for functions that, given a reply, and parse the reply and return each of these pieces.

The code below **uses the `makeRequest()`**, which itself should be using **`buildRequest()`**, which are both part of the last homework, so you will want to make those function definitions a part of your solution to the first problem here.

**Q1:** Write a function

    parseStatus(reply)

that finds and returns a Python string consisting of only the status line of a reply.  The returned value should include the line-terminating `"\r\n"`.

In [25]:
def buildRequest(location, resource):
    '''
    This function builds the request string with
    a location and the resource it needs to get.
    
    Parameters: location: the host of the request
                resource: what the user is requesting
    
    Return: the request message string.
    '''
    return "GET {} HTTP/1.1\r\nHost: {}\r\nConnection: close\r\n\r\n".format(resource, location)

def makeRequest(location, resource):
    '''
    This function makes a request by establishing a
    connection and sending a request message with
    a location and resource through it. Finally it
    will return the reply of the request.
    
    Parameters: location: the host of the request
                resource: what the user is requesting
                
    Return: reply: the reply of the message request
    '''
    message = buildRequest(location, resource)
    connection = sock.makeConnection(location, 80)
    sock.sendString(connection, message)
    reply = sock.receiveTillClose(connection)
    connection.close()
    return reply

def parseStatus(reply):
    '''
    This funciton returns the status header
    
    Parameters: reply: the make request reply
    
    Return: the status header of the reply
    '''
    return reply.split('\n')[0] + '\n'

reply = makeRequest("datasystems.denison.edu", "/basic.html")
print(repr(parseStatus(reply)))
reply = makeRequest("datasystems.denison.edu", "/foobar.txt")
print(repr(parseStatus(reply)))

'HTTP/1.1 200 OK\r\n'
'HTTP/1.1 404 Not Found\r\n'


In [26]:
r1 = makeRequest("datasystems.denison.edu", "/basic.html")
s1 = parseStatus(r1)
assert s1 == "HTTP/1.1 200 OK\r\n"

r2 = makeRequest("datasystems.denison.edu", "/foobar.txt")
s2 = parseStatus(r2)
assert s2 == "HTTP/1.1 404 Not Found\r\n"

**Q2:** Write a function

    parseHeaders(reply)

that finds and returns a single Python string that starts with the first header in the reply and continues up through the last header in the reply, including the line-terminating `"\r\n"`, but *not* the empty line separating the headers from the body.

In [30]:
def parseHeaders(reply):
    '''
    This function returns the header of a make
    request reply
    
    Paramets: reply: the make request reply
    
    Returns: header: the header of the reply
    '''
    reply = reply.split('\r\n')
    header = ''
    pos = 0
    while (reply[pos] != ''):
        header = header + (reply[pos] + '\r\n')
        pos = pos + 1
    return header

reply = makeRequest("datasystems.denison.edu", "/basic.html")
print(repr(parseHeaders(reply)))
reply = makeRequest("datasystems.denison.edu", "/foobar.txt")
print(repr(parseHeaders(reply)))

'HTTP/1.1 200 OK\r\nDate: Mon, 26 Apr 2021 15:00:52 GMT\r\nServer: Apache/2.4.6 (CentOS)\r\nAccept-Ranges: bytes\r\nContent-Length: 496\r\nConnection: close\r\nContent-Type: text/html; charset=UTF-8\r\n'
'HTTP/1.1 404 Not Found\r\nDate: Mon, 26 Apr 2021 15:00:52 GMT\r\nServer: Apache/2.4.6 (CentOS)\r\nContent-Length: 296\r\nConnection: close\r\nContent-Type: text/html; charset=iso-8859-1\r\n'


In [31]:
r1 = makeRequest("datasystems.denison.edu", "/basic.html")
h1 = parseHeaders(r1)
assert "Server: Apache" in h1
assert "Connection: close\r\n" in h1
assert "Content-Type: text/html" in h1
r2 = makeRequest("datasystems.denison.edu", "/foobar.txt")
h2 = parseHeaders(r2)
assert "Server: Apache" in h2
assert "Connection: close\r\n" in h2
assert "Content-Type: text/html" in h2

**Q3:** Write a function

    parseBody(reply)

that finds and returns a single Python string that starts with the beginning of the body (i.e. after the empty line of the reply) and continues to the end of the reply.

In [33]:
def parseBody(reply):
    '''
    This function returns the make request
    reply body
    
    Parameters: reply: the make request reply
    
    Return: the body of the reply
    '''
    return reply.split('\r\n')[-1]
reply = makeRequest("datasystems.denison.edu", "/basic.html")
print(parseBody(reply))
reply = makeRequest("datasystems.denison.edu", "/foobar.txt")
print(parseBody(reply))

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Data Systems Basic HTML Page</title>
  </head>
  <body>
    <h1>First Level Heading</h1>

    <p>Paragraph defined in <b>body</b>.

    <h2>Second Level Heading</h2>

    <a href="http://docs.python.org">Link</a> to Python documentation.
    </p>

    <ul>
      <li>Item 1
      <ol>
        <li>Item 1 nested</li>
        <li>Item 2 nested</li>
      </ol>
      </li>
      <li>Item 2</li>
      <li>Item 3</li>
    </ul>
  </body>
</html>

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /foobar.txt was not found on this server.</p>
<hr>
<address>Apache/2.4.6 (CentOS) Server at datasystems.denison.edu Port 80</address>
</body></html>



In [34]:
r1 = makeRequest("datasystems.denison.edu", "/basic.html")
b1 = parseBody(r1)
r2 = makeRequest("datasystems.denison.edu", "/foobar.txt")
b2 = parseBody(r2)
assert b1.startswith("<!DOCTYPE html>")
assert b1.endswith("</html>\n")
assert b2.startswith("<!DOCTYPE HTML")
assert b2.endswith("</body></html>\n")

### Utility Functions and Using the `requests` Module

In [35]:
import requests

**Q4** The `requests` module uses URLs as the first argument to its HTTP method functions, but we often start with the "piece parts" of the information contained in a URL.  Write a function

    buildURL(location, resource, protocol='http', query_string=None)

that returns a string URL based on the four component parts of `protocol`, `location`, and `resource`, and `query_string`.  Your function should be flexible, so that if a user omits a leading `\` on the resource path, one is prepended.   Note that we are specifying a default value for `protocol` so that it will use `http` if `buildURL` is called with just two arguments.  Likewise, the function has a default for `query_string` as None, in which case there is no query string.  But if present, the built URL should have a question mark followed by the passed query string.

Python format strings are the right tool for the job here.

In [54]:
# Solution cell

def buildURL(location, resource, protocol='http', query_string=None):
    '''
    This function builds a url
    
    Parameters: location: the location of the url
                resource: the resource of the url
                protocol: the protocol defaulint got http
                query_string: the query_string of url defaulting
                to None
    Return: url: the url that is requested
    '''
    if (resource[0] != '/'):
        resource = resource + '/'
    url = protocol + '://' + location + resource
    if (query_string != None):
        url = url + query_string
    return url

print(buildURL('httpbin.org', 'get'))
print(buildURL("datasystems.denison.edu",
               "/data/ind0.json", protocol="https"))
print(buildURL('httpbin.org', 'post'))
print(buildURL('httpbin.org', 'post', query_string="foo=1&bar=2"))


http://httpbin.orgget/
https://datasystems.denison.edu/data/ind0.json
http://httpbin.orgpost/
http://httpbin.orgpost/foo=1&bar=2


In [39]:
assert True

**Q5** Write, as a global cell sequence of code that starts with:

    resource = "/data/ind0.json"
    location = "datasystems.denison.edu"

and build an appropriate URL (using the function you just defined), uses `requests` to issue a GET request, and assigns the following variables based on the result:

- `stat`: has the integer status code,
- `headers`: has a dictionary of headers from the response, and
- `body` has the *parsed* data from the JSON-formatted body

Be sure and test your solution and print the above variables.

In [49]:
resource = "/data/ind0.json"
location = "datasystems.denison.edu"
url = buildURL(location, resource)
response = requests.get(url)
stat = response.status_code
headers = response.headers
body = response.content
print(stat)
print(headers)
print(body)

200
{'Date': 'Mon, 26 Apr 2021 15:40:24 GMT', 'Server': 'Apache/2.4.6 (CentOS)', 'Last-Modified': 'Wed, 16 Dec 2020 23:45:42 GMT', 'ETag': '"10d-5b69d7922d580"', 'Accept-Ranges': 'bytes', 'Content-Length': '269', 'Connection': 'close', 'Content-Type': 'application/json'}
b'{"FRA": {"2007": {"pop": 64.02, "gdp": 2657.21}, "2017": {"pop": 66.87, "gdp": 2586.29}}, "GBR": {"2007": {"pop": 61.32, "gdp": 3084.12}, "2017": {"pop": 66.06, "gdp": 2637.87}}, "USA": {"2007": {"pop": 301.23, "gdp": 14451.9}, "2017": {"pop": 325.15, "gdp": 19485.4}}}'


In [50]:
assert True

**Q6** Suppose you often coded a similar set of steps to make a GET request, where often the body of the result was JSON, in which case you wanted the data parsed, but sometimes the data was *not* JSON, in which case you wanted the data as a string.  Write a function

     makeRequest(location, resource, protocol="http")

that makes a GET request to the given `location`, `resource`, and `protocol`.  If the request is *not* successful (i.e. not in the 200's), the function should check for this and return `None`.  If the request is successful, the function should *use the response headers* and determine whether or not the `Content-Type` header maps to `application/json`.  If it is, it should parse the result and return the data structure.  If it is not, it should return the string making up the body of the response.

For an extra credit point, extend this `makeRequest` to add an optional (default-valued) named parameter for a query string.

In [51]:
def makeRequest(location, resource, protocol="http", query_string=None):
    '''
    This function makes a request with a wanted url and returns
    the content or a data structure of a json
    
    Parameters: location: the location of the url
                resource: the resource of the url
                protocol: protocol of the url defaulting to http
                querey_string: query_string of url defaulting to None
                
    Return: content of the request
    '''
    url = buildURL(location, resource, protocol, query_string)
    try:
        response = requests.get(url)
        if(response.status_code == 200):
            return None
        headers = response.headers
        if(headers("Content-Type") == "application/json"):
            return response.json()
        else:
            return response.content
        
    except:
        return None

In [52]:
assert True

**Q7** You have probably had the experience before of trying to open a webpage, and having a redirect page pop up, telling you that the page has moved and asking if you want to be redirected. The same thing can happen when we write code to make requests. Write a function:


    getRedirectURL(location, resource)


that begins like your function `makeRequest` but **does *not* allow redirects** when invoking `get`. Look carefully at table 20.5 in the book for how you can do this.  This function will return a *url*. If the `get` results in a success status code (one in the 200's), you return the original url (obtained from `buildURL`, with `http` protocol). If you detect that `get` tried to redirect  (by looking for a 300, 301, or 302 status code), **search within the headers** to find the `"Location"` it tried to redirect to, and return that URL instead. If you get any other status code, return `None`.

In [53]:
# Solution cell

def getRedirectURL(location, resource):
    '''
    This function returns the redirected url from request
    
    Paramter: location: location of the url
              resource: resource of the url
              
    Return: the redirected url
    '''
    url = buildURL(location, resource)
    response = requests.get(url)
    if (response.status_code == 200):
        return url
    else:
        return response.url

print(getRedirectURL("personal.denison.edu", '/~kretchmar'))
print(getRedirectURL("personal.denison.edu", '/~kretchmar/'))
print(getRedirectURL("personal.denison.edu", '/~whiteda/DenisonWebsiteInfo.pdf'))

http://personal.denison.edu/~kretchmar
http://personal.denison.edu/~kretchmar/
http://personal.denison.edu/~whiteda/DenisonWebsiteInfo.pdf


In [None]:
assert True