# Denison DA210/CS181 SW Lab #13 - Step 1

Before you get your checkpoints, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells**.

Make sure you fill in any place that says `# YOUR CODE HERE` or "YOUR ANSWER HERE".

---

In [None]:
import os
import os.path
import sys
import importlib
import json

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import util
importlib.reload(util)

import mysocket as sock
importlib.reload(sock)

---

## Part A: URLs, revisited

Recall that the general form of a URL is given by the following (shown with extra spaces for readability):

_protocol_ : // _location_ [ : _port_ ] _resource-path_

**Q1:** Write a function
```
    buildURL(location, resource, protocol='http')
```
that returns a string URL based on the three component parts of `protocol`, `location`, and `resource`.

Your function should be flexible, so that if a user omits a leading `/` on the resource path, one is prepended.   Note that we are specifying a default value for `protocol` so that it will use `http` if `buildURL` is called with just two or three arguments.  Python format strings are the right tool for the job here.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

print(buildURL('httpbin.org', 'get'))
print(buildURL("datasystems.denison.edu",
               "/data/ind0.json", protocol="https"))
print(buildURL('httpbin.org', 'post'))

In [None]:
# Testing cell
assert buildURL('httpbin.org', 'get') == "http://httpbin.org/get"
assert buildURL('httpbin.org', '/get') == "http://httpbin.org/get"
assert buildURL("datasystems.denison.edu",
               "/data/ind0.json", protocol="https") == "https://datasystems.denison.edu/data/ind0.json"
assert buildURL('httpbin.org', 'post') == "http://httpbin.org/post"
assert buildURL('httpbin.org', '/get', 'https') == "https://httpbin.org/get"

> You've reached the first checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 1: Web browsers provide some shortcuts.  Which parts of the URL can we _not_ specify?  What are the defaults in that case?  (Hint: try leaving out parts of a URL typed into a browser and see if you still get to the same page.)

---

## Part B: `GET` with the Python `requests` module

The Python `requests` module provides a helpful implementation of the HTTP application protocol layer for Python programs.  To use this module, we first must import it:

In [None]:
import requests

This module provides different functions for each of the HTTP methods we may want.  Each function takes a URL string as input:

```python
    requests.get(url)     # read a resource; no body
    requests.head(url)    # like get, but just for metadata; no body

    requests.post(url)    # send form data / update; body is often in JSON
    requests.put(url)     # update a resource; body contains data
    
    requests.delete(url)  # delete a resource
```

For example, we can use `get()` to perform a `GET` request:

In [None]:
# Specify the URL as a string
url = "http://datasystems.denison.edu/basic.html"

# Perform the GET request
response = requests.get(url)
print("Response status:", response.status_code)

The status code indicates whether the request was successful.  In this case, a `200` status code indicates that a `GET` request succeeded.

We can also look at the URL associated with the request:

In [None]:
# Retrieve the request associated with this HTTP response
request = response.request

# Print the URL for the request
print("URL:", request.url)

Python has abstracted away the HTTP application layer for us, but the message would be translated as:

```
    GET /basic.html HTTP/1.1
    Host: datasystems.denison.edu
    User-Agent: python-requests/2.23.0
    Accept-Encoding: gzip, deflate
    Accept: */*
    Connection: keep-alive
```

In this case, there are several headers, including `Host`, `User-Agent`, and `Connection`, specified as key-value pairs.  These are provided by the Python `requests` module.

If we want to specify our own headers, we can additionally pass a dictionary to `get()` using the `headers` named parameter:

In [None]:
# Specify the URL as a string
url = "http://httpbin.org/get"

# Build a dictionary for header key-value pairs
headerD = {
    "Accept": "application/json",
    "User-Agent": "datasystems-client"
}

# Issue the request, including the headers dictionary
response = requests.get(url, headers=headerD)
print("Response status:", response.status_code)

# Look at the dictionary actually used for the request
request = response.request
print("\nRequest headers:")
util.print_headers(request.headers)

Similar to providing custom headers, you can also provide query parameters using a dictionary:

In [None]:
# Specify the URL as a string
url = "http://httpbin.org/get"

# Build a dictionary for query-parameter key-value pairs
paramsD = {
    "user": "smith",
    "query": "movies tv"
}

# Issue the request, including the query parameters dictionary
response = requests.get(url, params=paramsD)
print("Response status:", response.status_code)

# Look at the actual resource path in the URL for the request
request = response.request
print("\nResource path:", request.path_url)

In addition to the response attributes we've already looked at, there are several more available in the `requests` module:

* `status_code`: three-digit integer status code, from the status line of the response
* `content`: the raw bytes version of the response body
* `text`: the response body, if it is textual; decoded into a Python string using the response header info to determine the encoding used
* `headers`: field-name/field-value pairs for headers in the response; can be converted to a dictionary (and behaves as one)
* `url`: the complete URL 
* `request`: an abstraction of the request for this response

---

## Part C: Try it yourself - `GET`

**Q2:** Write a sequence of code that starts with:
```
    resource = "/data/ind0.json"
    location = "datasystems.denison.edu"
```
and builds an appropriate URL, uses `requests` to issue a `GET` request, and assigns the variables based on the result:

- `status`: has the integer status code,
- `headers`: has a dictionary of headers from the response, and
- `data`: has the *parsed* data from the JSON-formatted body as a dictionary.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

print("Status:", status)
print(headers)
print(data)

In [None]:
# Testing cell
assert status == 200

assert len(headers) == 8
assert headers["Content-Length"] == "269"
assert "ETag" in headers

assert isinstance(data, dict)
assert len(data) == 3
assert "FRA" in data
assert data["GBR"]["2007"]["gdp"] == 3084.12

**Q3:** Given parallel lists `headerNameList` and `headerValueList`, you can build a dictionary that maps from header names to their associated values (given by the parallel structure).

Write a function
```
    makeRequestHeader(location, resource, headerNameList, headerValueList)
```
that builds a custom header dictionary and then passes it to the `requests.get` method. 

Your function should call `buildURL` (with protocol `https`) to build the URL to pass to `requests.get`. Your function should return the response from the `requests.get` invocation.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

r = makeRequestHeader('httpbin.org', '/get/', ['Transfer-Encoding'],['compress'])
request = r.request
print(request.headers)
print()

r = makeRequestHeader('httpbin.org', '/get/',['Transfer-Encoding','Accept'],['compress','text/html'])
request = r.request
print(request.headers)
print()

r = makeRequestHeader('httpbin.org', '/get/',[],[])
request = r.request
print(request.headers)

In [None]:
# Testing cell

r = makeRequestHeader('httpbin.org', '/get/',['Transfer-Encoding'],['compress'])
request = r.request
assert request.headers['Transfer-Encoding'] == 'compress'
assert request.headers['Connection'] == 'keep-alive'

r = makeRequestHeader('httpbin.org','/get/',['Transfer-Encoding','Accept'],['compress','text/html'])
request = r.request
assert request.headers['Accept'] == 'text/html'
assert request.headers['Transfer-Encoding'] == 'compress'

**Q4:** Suppose you have often coded a similar set of steps to make a `GET` request, where often the body of the result was JSON, in which case you wanted the data parsed, but sometimes the data was *not* JSON, in which case you wanted the data as a string.  

Write a function
```
     makeRequest(location, resource, protocol="http")
```
that makes a `GET` request to the given `location`, `resource`, and `protocol`.

If the request is *not* successful (i.e. not in the 200s), the function should check for this and return `None`.  If the request is successful, the function should *use the response headers* and determine whether or not the `Content-Type` header maps to `application/json`.  If it is, it should parse the result and return the data structure.  If it is not, it should return the string making up the body of the response.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Testing cell
assert makeRequest("httpbin.org", "/post") == None

result2 = makeRequest("www.denison.edu", "/academics/data-analytics")
assert result2 is not None
assert result2.startswith("<!DOCTYPE html>")

result3 = makeRequest("datasystems.denison.edu", "data/ind0.json")
assert result3 is not None
assert isinstance(result3, dict)
assert "FRA" in result3
assert result3["USA"]["2017"]["gdp"] == 19485.4

> You've reached the second checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 2: What do you think the `Content-Type` header line is used for by web browsers?

---

## Part D: Redirects

You have probably had the experience before of trying to open a webpage, and having a redirect page pop up, telling you that the page has moved and asking if you want to be redirected. The same thing can happen when we write code to make requests.

If a status code is in the 300s, the HTTP request resulted in a redirect.  This occurs when the request resource is no longer at the given resource path.  The new location/path is given in the `Location` header of the response.

For example, this can happen if a website redirects from HTTP to HTTPS:

In [None]:
# Specify the URL as a string
url = "http://www.denison.edu"

# Issue the request
response = requests.get(url, allow_redirects=False)
print("Response status:", response.status_code)

# Look at the URL we're being redicted to
new_url = response.headers["Location"]
print("\nNew URL:", new_url)

**Q5:** Write a function:
```
    getRedirectURL(location, resource)
```
that begins like your function `makeRequest` from Question 4, but does *not* allow redirects when invoking `requests.get`.  This function will return a *URL*.

If the call to `requests.get` results in a success status code (one in the 200s), you return the original URL (obtained from `buildURL`, with `http` protocol).  If you detect that `requests.get` tried to redirect  (by looking for a 300, 301, or 302 status code), **search within the headers** to find the `"Location"` it tried to redirect to, and return that URL instead. If you get any other status code, return `None`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

print(getRedirectURL("denison.edu", '/'))                 # redirect to https
print(getRedirectURL("datasystems.denison.edu", '/data'))     # redirect to add '/'
print(getRedirectURL("datasystems.denison.edu", '/data/'))    # no change
print(getRedirectURL("datasystems.denison.edu", '/foo.html')) # 404 not found

In [None]:
# Testing cell

# http -> https
assert getRedirectURL("denison.edu", '/') != "http://denison.edu/"
assert getRedirectURL("denison.edu", '/') == "https://denison.edu/"

# end in /
assert getRedirectURL("datasystems.denison.edu", '/data') == \
    "http://datasystems.denison.edu/data/"
assert getRedirectURL("datasystems.denison.edu", '/data/') == \
    "http://datasystems.denison.edu/data/"

# None
assert getRedirectURL("datasystems.denison.edu", "/foo.html") is None

> You've reached the third (and final) checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 3: A single `GET` request may result in a chain of multiple redirects.  Is there a limit on the number of redirects that the Python `requests` module allows?  (Hint: You may want to consult the documentation: https://requests.readthedocs.io/en/latest/user/quickstart/#errors-and-exceptions.)

---

## Part E: `POST` Requests

The HTTP `POST` method is used to provide information from the client to the server via the request _body_.  For example:

In [None]:
# Specify the URL as a string
url = "http://httpbin.org/post"

# Build the query-paremter and header dictionaries
paramsD = {
    "user": "jones",
    "query": "TV?episodes"
}
headerD = {
    "Accept": "application/json"
}

# Build the body of the request as a key-value pair dictionary
body = {"a": 1, "b": 2}

# Issue the request with provided data for the body
response = requests.post(url, params=paramsD, headers=headerD, data=body)
print("Response status:", response.status_code)

# Inspect the resource-path of the URL
request = response.request
print("Request Path:", request.path_url)

# Inspect the body of the request
print("Request Body:", request.body)

Alternatively, we can provide the data for the `POST` request as a JSON string:

In [None]:
# Specify the URL as a string
url = "http://httpbin.org/post"

# Build the query-paremter and header dictionaries
paramsD = {
    "user": "jones",
    "query": "TV"
}
headerD = {
    "Accept": "application/json"
}

# Build the body of the request as a JSON array
json_data = ["foo", "bar", {"a": 1, "b": 2}]

# Issue the request with provided JSON for the body
response = requests.post(url, params=paramsD, headers=headerD, json=json_data)
print("Response status:", response.status_code)

# Inspect the resource-path of the URL
request = response.request
print("Request Path:", request.path_url)

# Inspect the body of the request, decoded using UTF-8
print("Request Body:", request.body.decode("utf-8"))

**Q6:** Write a function
```
    postData(location, resource, dataToPost)
```
that uses your `buildURL` function to build a URL (using `https`), then uses the `requests` module to `POST` `dataToPost` to that URL (note: `dataToPost` will be the *body* of the message you send).

Your function should return the `response` returned by the method of `requests` that you invoke.  Note: the URL `https://httpbin.org/post` is set up to allow you to post there.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

response = postData('httpbin.org','/post',"Wow, what a cool string!")
print(response.status_code)
print(response.request.body)

In [None]:
# Testing cell
response = postData('httpbin.org', '/post', "CS181 is the best")
r = response.request
assert r.method == 'POST'
assert r.body == 'CS181 is the best'
assert r.url == 'https://httpbin.org/post'

---

---

## Part F

How much time (in minutes/hours) did you spend on this lab outside of class?

YOUR ANSWER HERE