Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

In [None]:
import os
import os.path
import io

import pandas as pd
import json
from lxml import etree

import requests

## Acquiring From the Network

In [None]:
protocol = 'https'
host = 'tcbressoud.github.io'
buildURL = lambda resource: "{}://{}/datasystems-bookweb/data/{}".format(protocol, host, resource)

### CSV

Notes:

- requests gives two views on the body of a request
    - `response.content`: the raw bytes version of the data
    - `response.text`: the translation of the raw bytes into a sequence of characters.  This uses the **assumed/inferred** encoding, which can be found in `response.encoding`.

In [None]:
csvurl = buildURL("ind2016.csv")
response = requests.get(csvurl)
if response.status_code != 200:
    print("Error acquiring file")

In [None]:
response.encoding

If we look at response.headers['Content-Type'], we get `'text/csv; charset=utf-8'`

In [None]:
response.headers['Content-Type']

In [None]:
print(response.text)

Now we have two (correct) ways of getting at the data, both as binary and as string text.

Our goal is to take the CSV and use it the same way we use file-based CSV data.

The **key** is two object constructors available in the `io` module.  The basic purpose of both of these are the same: to take an in-memory structure and to make it operate the same way that a file does.

- `StringIO`: takes a string buffer and returns an object that operates in the same way as a file object returned from an `open()` call.  Like a file object, this object has a notion of a *current location* that advances as we read (using `read()`, `readline()`, etc.) through the characters of the object.
- `BytesIO`: takes a bytes buffer and returns an object that operates in the same way as a file object returned from an `open()` call and opened in binary mode. Like a file object, this object has a notion of a *current location* that advances as we perform `read()` operations over the bytes of the object.

In [None]:
fileLikeObj = io.StringIO(response.text)

headerList = fileLikeObj.readline().strip().split(',')
LoL = []
for line in fileLikeObj:
    rowlist = line.strip().split(',')
    LoL.append(rowlist)

df = pd.DataFrame(LoL, columns=headerList)
df = df.astype({'pop': float, 'gdp': float, 'gdp': float, 'life': float, 'cell': float})
df
    

In `pandas`, using the `read_csv()` data frame constructor, the first argument can be a file object or a file-like object.  So we can create the file-like object from the string version of the response, and use that as the first argument, with the rest of the benefit in parameter options that come from using `read_csv()`

In [None]:
fileLikeObj = io.StringIO(response.text)
df = pd.read_csv(fileLikeObj, index_col='code')
df

Now let us turn to a case where the encoding is UTF-16BE.

In [None]:
csvurl = buildURL("ind2016_16.csv")
#csvurl = buildURL("topnames_16.csv")
response = requests.get(csvurl)
if response.status_code != 200:
    print("Error acquiring file")

For a web server and the HTTP request, there is little difference between one file and another.  So we would not expect the assumed encoding to be correct:

In [None]:
response.encoding

If we were to look at the decoded version through `response.text`, we see a nonsense string, exactly because the decoding was incorrect.

In [None]:
response.text[:20]

We set the enocoding to the proper value, given our knowledge of how this particular resource was encoded, and we then see an appropriate `response.text`:

In [None]:
response.encoding = 'UTF-16BE'
print(response.text)

If `response.encoding` is correct, then `response.text` will be a correct string containing the CSV data.  At this point, the *same technique*, where we use the `response.text` string and create a file-like object, and can do the same things we did in Chapter 2 and with pandas:

In [None]:
fileLikeObj = io.StringIO(response.text)
df = pd.read_csv(fileLikeObj, index_col='code')
df

The changes in `response.encoding` and resultant difference in `response.text` did **not** change the underlying bytes data, available in `response.content`.  While it is more complex, particularly across non-standard encoding, to use the bytes data and direct file-type operations to construct a data frame, the pandas `read_csv()` can take its input from a file-like object containing bytes data, and can perform the decoding itself.

To demonstrate this across our two different encodings, we GET both the UTF-8 encoded CSV file and the UTF-16BE encoded CSV file, and use different response objects for the two results:

In [None]:
csvurl1 = buildURL("ind2016.csv")
response1 = requests.get(csvurl1)
if response1.status_code != 200:
    print("Error acquiring file")
    
csvurl2 = buildURL("ind2016_16.csv")
response2 = requests.get(csvurl2)
if response2.status_code != 200:
    print("Error acquiring file")

When we are dealing with the underlying bytes data, and we want/need a file-like object, we use `io.BytesIO()` to construct the file-like object from the bytes in `response1.content` and `response2.content`.  We then pass the file-like objects to `read_csv()` and specify the proper encoding:

In [None]:
fileLikeObj1 = io.BytesIO(response1.content)
fileLikeObj2 = io.BytesIO(response2.content)
df1 = pd.read_csv(fileLikeObj1, encoding='UTF-8', index_col='code')
df2 = pd.read_csv(fileLikeObj2, encoding='UTF-16BE', index_col='code')

In [None]:
df1

In [None]:
df2

## JSON

In common to the following examples, we obtain from the web server files with JSON as the body data.  In `response1`, we have UTF-8 encoded data, in `response2` we have UTF-16BE encoded data.

In [None]:
json_url1 = buildURL("ind0.json")
response1 = requests.get(json_url1)
if response1.status_code != 200:
    print("Error acquiring file")
    
json_url2 = buildURL("ind0_16.json")
response2 = requests.get(json_url2)
if response2.status_code != 200:
    print("Error acquiring file")

### JSON From String in Response

In common with examples above, when we want to use the `.text` (string) version of the response, we **have** to get the encoding right.  We do this for both `response1` and `response2`, at which point the character string version of the two responses is valid, and we can use a variety of techniques to go from a string into a JSON-based data structure.  Since the latter steps are the same after we get the encoding right, we just run through examples using `response1.text`.

In [None]:
response1.encoding = 'UTF-8'
response1.text

In [None]:
response2.encoding = 'UTF-16BE'
response2.text

#### Option 1: Use `json.loads()`, which takes a string and returns the in-memory data structure:

In [None]:
ds1 = json.loads(response1.text)
ds1

#### Option 2: Create a file-like object, and then use `json.load()`

In [None]:
fileLikeObj1 = io.StringIO(response1.text)
ds1 = json.load(fileLikeObj1)
ds1

#### Option 3: Use `requests` `.json()` method of a response object

In [None]:
ds1 = response1.json()
ds1

In [None]:
ds1 = json.loads(response1.content)
ds1

### JSON From Bytes Data in Response Body

Because of its alternate encoding resulting in a different set of bytes for the sequence of characters, we use the bytes data of response2 in our examples demonstrating bytes data conversion into JSON-derived data structure.

Turns out the the RFC standard for JSON explicitly allows all three of UTF-8, UTF-16, and UTF-32 to be allowed in data formatted as JSON.  This means that the `json` module will recognize the bytes data directly, as if it were already a decoded string

#### Option 1: Use `json.loads()`, which takes bytes data in UTF-8, UTF-16, or UTF-32 and returns the in-memory data structure:

In [None]:
ds2 = json.loads(response2.content)
ds2

#### Option 2: Create a bytes file-like object, and then use `json.load()`

In [None]:
fileLikeObj2 = io.BytesIO(response2.content)
ds2 = json.load(fileLikeObj2)
ds2