Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE`/`raise NotImplementedError` or "YOUR ANSWER HERE", as well as your name and collaborators below:

# Programming and Data Acquisiton using HTTP

In [None]:
import os.path

import requests
import json
import io
import pandas as pd

import importlib
import util

In [None]:
importlib.reload(util)
from util import *

A collegue has placed a file `mystery3.dat` on a web server `personal.denison.edu` in the resource path `/~bressoud/datasystems/data`.  You know the data is textual, and is a tab-separated data collection where each line consists of:

    male_name <tab> male_count <tab> female_name <tab> female_count
    
for the top 10 name applications of each sex to the US Social Security Administration for the year 2015.

Using this common setup, answer the following questions.

**Q1** Unfortunately, your colleage is using an old IBM-based platform, and they did not tell you the encoding of the data.  You know that the encoding is almost certainly one of:

- 'UTF-8'
- 'UTF-16BE'
- 'UTF-16LE'
- 'cp037'
- 'latin_1'

In the code cell that follows, and using variable names for constant strings and the `buildURL()` utility function, fetch the resource and determine the correct encoding of the file.  Use the `.content`, `.encoding`, and `.text` attributes.  As a matter of good practice, you should assert that the status is 200 as a simple means of checking that you were able to retrieve the resource.

By completion of the cell, you should assign the type of the content to `content_type` and the correct encoding to `content_enc`, and variable `content_body` should have reference the tab-separated lines of top baby names data.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert content_enc != None
assert content_type != None
assert content_body != None

**Q2** Using the technique shown in class to construct a file-like object from a **text string**, do so, and use this to create a dictionary of lists representation of the tab-separated data.  Note that there is no header line in the data.  We start you off with the basic dictionary structure. 

Use what we did in class, the documentation for the constructors to make file-like objects, and the chapter to figure out how to do this ... it would not be appropriate to ask the Slack channel for the incantations needed to do this.

In [None]:
DoL = {'malename': [], 'malecount': [], 'femalename': [], 'femalecount': []}
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert len(DoL['malename']) == 10
assert DoL['malename'][0] == 'Noah'
assert DoL['femalename'][9] == 'Harper'

In [None]:
assert True

**Q3** Finally, let's use `pandas` in our data acquisition.  Again create a file-like object for pandas and give it the file-like object in a call to `read_csv()`.  Name your resultant data frame `df`.  Make sure you have reasonable column names.

Use what we did in class, the documentation for the constructors to make file-like objects, pandas documentation for the `read_csv()` function, and the chapter to figure out how to do this ... it would not be appropriate to ask the Slack channel for the incantations needed to do this, nor to post your own code or screenshot.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert len(df) == 10
assert isinstance(df.columns[0],str)
assert df.iloc[0,0] == 'Noah'
assert df.iloc[0,1] == 19635
assert df.iloc[9,2] == 'Harper'
assert df.iloc[9,3] == 10295

The COVID Tracking Project at `https://covidtracking.com` offers a basic API available in the resource subtree rooted at `/api`.

Visit https://covidtracking.com/api for a web-browser friendly information page about the API.

We are particularly interested in the JSON version of the current stats per state avaliable at resource path: `'/api/vi/states/current.json'`, and the historical state data available at resource path: `'/api/vi/states/daily.json'`.

**Q4** Retrieve the current stats for all 50 states and create an in-memory data structure from the JSON-formatted result.  

As before, use variables for constant values and utilize the `buildURL()` utility function.

Assign the resultant data structure to `data`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert data
assert data != None
assert len(data) == 56
assert isinstance(data, list)
assert isinstance(data[0], dict)
assert data[0]['state'] == 'AK'

**Q5** Create a pandas dataframe of the result in which the state is the row index and the columns are `'positive'`, `'negative'`, `'hospitalized'`, and `'death'`.  Assign the result to `state_current`.  Hint: A List of Dictionaries is as simple as a Dictionary of Lists when creating a pandas DataFrame.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert len(state_current)
assert 'OH' in state_current.index
assert state_current.loc['OH', 'positive'] > 4000

**Q6** According to the documentation on the web site:

> If you want to filter the /api/states/daily you can add a query param like ?state=NY to only show cases in New York. Or /api/states/daily?state=NY&date=20200316 to show the result of a specific date. 

This means that our resource path includes both an endpoint for the resource, `/api/states/daily` immediately followed (i.e. with no intervening spaces) by a sequence of characters that begins with a `'?'` and is followed by *key*=*value* pairs, which are themselves separated by a `'&'` character.

These are known by the term **query parameters** and are part of the full definition of a resource path.

If we know that we want to specify filters for *both* a state and for a specfic date, we could use Python string formatting to build a resource path from a template string:

    "/api/states/daily?state={}&date={}"
    
Write a function:

    getCOVID(state, date)
    
that uses this technique to build a resource path, a url, and then issue the HTTP request.  The result will be JSON formatted, so you obtain the in-memory representation and then return the dictionary.  If the HTTP request returns something other than status code 200, return None from your function.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
getCOVID('OH', 20200407)

In [None]:
result1 = getCOVID('OH', 20200406)
assert isinstance(result1, dict)
result2 = getCOVID('OH', '20200406')
assert result2['positive'] == 4450