Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

In [None]:
import pandas as pd
from lxml import etree
import requests
import os.path
import io

datadir = "public_data"
filename = "ind0.xml"                 # Text file encoded as UTF-8
path = os.path.join(datadir, filename)

protocol = "http"
location = "personal.denison.edu"
resourcepath = "/~bressoud/datasystems/data/{}"

buildURL = lambda s: "{}://{}{}".format(protocol, location, resourcepath.format(s))

def print_tree(node, pretty_print=True, encoding='utf-8'):
    result = etree.tostring(node, pretty_print=pretty_print)
    if isinstance(result, bytes):
        result = result.decode(encoding)
    print(result)

## Building Tree From Existing XML

### Simple Local File

In [None]:
tree = etree.parse(path)
root = tree.getroot()

### Custom Parser Local File

In [None]:
myparser = etree.XMLParser(remove_blank_text=True)

tree = etree.parse(path, myparser)
root = tree.getroot()

### Network Request

In [None]:
response = requests.get(buildURL(filename))
assert response.status_code == 200

fileObj = io.BytesIO(response.content)
tree = etree.parse(path, myparser)
root = tree.getroot()

## Basic Operations

As an aid for working with Element nodes, we summarize some of the fundamental operations

Operation     |  Syntax Hint  |Brief Description
:-------------|:--------------|:-----------------------------------------
Get a Child   | `[index]`     |Access the node's child at index
Get tag       | `.tag`        |Obtain tag of node
Get text      | `.text`       |Obtain text of node up to child node or end tag
Access all attributes | `.attrib` | Obtain dictionary of all of node's xml attributes
Access one attribute | `.get()` | Fetch value for specified attribute, or `None` if not present
Find child node | `.find()` | Search for first child matching search specification (by tag)
Iterator child search | `.iterfind()` | Iterator for all children matching search specification (by tag)
Unconditional Child Iteration | *node* | A node itself can be used as an iterator to obtain all children in document order
Count children | `len(`*node*`)` | Find the number of children of a node
Interator on descendents | `iter()` | Iterator over all descendents


**Q** Print the full tree.  You can use the provided `print_tree()` function.  Try it with different arguments for the named parameters.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Q** Get the index 2 child of the root, assign it to `node` and print the tree rooted at `node`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Q** Get the index 1 node of the index 2 node of the root, assign it to `node`, and then find the `gdp`-tagged child of `node`, and assign it to `gdp-node`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Q** Repeat the above, but then obtain the value (based on the text of the gdp_node), and assign to `gdp_value`, and then assign to `value` 10% more than `gdp_value`.  Print this final value.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Q** Iterate over the `country` nodes and check each for a case where the letter `'A'` appears in the node's `code` XML attribute.  If found, print the value of the `name` attribute.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Q** Use nested loops to accumulate a list of **just** the `timedata` nodes for the year "2017".  

Hints:

- Initialize an empty list
- Outer loop will iterate over root's `country` nodes
- Inner loop will iterate of each country node's `timedata` nodes
- For each of these, check for the attribute of the `timedata` node to be equal to the string `"2017"`.  If found, accumulate into the list.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()
for node in timelist:
    print_tree(node)

**Bonus Q** Write a function

    recursive_printtags(node)
    
that prints the tag of the given node, and then recurses to print the tags of the subtree rooted at each of its children.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()
recursive_printtags(root)

## From XML to Build a Data Frame

Often, we obtain XML-formatted data, but for manipulation, transformation, and analysis, we need to construct one or more tabular data frames.  This gives us a functional use and practice for our various procedural operations learned today.

In particular, we want to take an XML based data set of our topnames information, and to construct a dataframe with columns `year`, `sex`, `name`, and `count`.

**Q** Using one of the techniques at the beginning of this notebook, retrieve (from local file or from the network), the xml tree in the resource file `"topnames.xml"`, and assign to `root` the Element at the root of that tree.  Finish by printing the number of children of the root node.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Q** Using nested loops, with the outer loop iterating over the children of the topnames root, and the inner iterating over those children's children, print out, inside the inner loop, the value of the year (from the outer node's xml-attribute) and the value of the sex (from the inner node's xml-attribute).  A prefix of the resultant output:
```
1880 Female
1880 Male
1881 Female
1881 Male
1882 Female
```

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

**Q** We saw, from the book, that it is convenient in these cases to collect our row data in a list, and for each element in the list to be a **dictionary**, in which the **keys** are the names of the columns/fields, and the **values** contain the value, for that row, of the given field.  Without yet worrying about the `name` and `count` columns, let us build such a List of Dictionaries (LoD) for the year and sex combinations.  So a prexix of the LoD would look like the following:

```
[
  {'year': 1880, 'sex': 'Female'}, 
  {'year': 1880, 'sex': 'Male'}, 
  {'year': 1881, 'sex': 'Female'}, 
  {'year': 1881, 'sex': 'Male'}, 
  {'year': 1882, 'sex': 'Female'}, 
  {'year': 1882, 'sex': 'Male'},
  ...
]
```
Augment your code from the last question to build the List of Dictionaries, using the typical accumulation pattern, starting with an empty list named `LoD` and replacing your `print()` with the creation and appending of the dictionary needed for each row.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()
assert len(LoD) == 278
assert isinstance(LoD, list)
assert isinstance(LoD[0], dict)
assert LoD[0]['year'] == 1880
assert LoD[0]['sex'] == 'Female'

**Q** Finish your build of the LoD by including in each dictionary the value (based on the `.text` attribute) of the `name` child and the `count` child of the `sex` node from the inner loop.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()
assert len(LoD) == 278
assert isinstance(LoD, list)
assert isinstance(LoD[0], dict)
assert LoD[0]['year'] == 1880
assert LoD[0]['sex'] == 'Female'
assert LoD[0]['name'] == 'Mary'
assert LoD[0]['count'] == 7065

**Q** For the final step, use pandas to construct a data frame from the list of dictionaries; set the index of this data frame to the independent variables of `year` and `sex`, and display the head() of the resultant data frame.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()