# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
	* [XML](#XML)
		* [expat](#expat)
		* [ElementTree](#ElementTree)
		* [SAX (Simple API for XML)](#SAX-%28Simple-API-for-XML%29)
		* [DOM (Document Object Model)](#DOM-%28Document-Object-Model%29)
	* [Exercise (representing and processing XML)](#Exercise-%28representing-and-processing-XML%29)


# Learning Objectives:

* Work with XML data using several APIs:
  * expat
  * ElementTree
  * SAX (Simple API for XML)
  * DOM (Document Object Model  

## XML

In [None]:
# Some XML files from the HDF5 descriptions info
metadata = "data/Granule_Metadata.xml"
collection = "data/GES_DISC_GPM_3GPROFF16SSMIS_DAY_V03_dif.xml"

### expat

More details at https://docs.python.org/3/library/pyexpat.html#module-xml.parsers.expat

In [None]:
import xml.parsers.expat as expat
indent = 0  # global variable quick-and-dirty

# 3 handler functions
def start_element(name, attrs):
    global indent
    print("  "*indent + 'Start element:', name, attrs)
    indent += 1
def end_element(name):
    global indent
    indent -= 1
    print("  "*indent + 'End element:', name)
def char_data(data):
    global indent
    print("  "*(indent-1) + 'Character data:', repr(data))

p = expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
p.ParseFile(open(metadata, 'rb'))

### ElementTree

More details at https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

In [None]:
import xml.etree.ElementTree as ET

def print_element(elem, indent=0):
    print("  "*indent + "Start element:", elem.tag, elem.attrib)
    print("  "*indent + "Character data:", repr(elem.text))
    for child in elem:
        print_element(child, indent+1)
    if elem.tail:
        print("  "*indent + "Character data:", repr(elem.tail))
    print("  "*indent + "End element:", elem.tag)

tree = ET.parse(metadata)
root = tree.getroot()
print_element(root)

In [None]:
for shortname in root.iter("ShortName"):
    print(shortname.text)

### SAX (Simple API for XML)

More details at https://docs.python.org/3/library/xml.sax.html#module-xml.sax

In [None]:
import xml.sax as sax

# Similar pull-based style as expat, slightly higher level.

### DOM (Document Object Model)

More details at https://docs.python.org/3/library/xml.dom.html#module-xml.dom

Really only use this if you need compatibility in programming style with older code, or with code in other programming languages like Java.  For the Pythonic high-level approach, use `ElementTree`

In [None]:
import xml.dom.minidom as DOM
dom = DOM.parse(metadata)
print(dom.childNodes)
root = dom.childNodes[2]
print(root.tagName, root.attributes.items())
# ... etc ...

## Exercise (representing and processing XML)

Remember the YAML file we looked at from a conda package?  See on your local system:

```
data/graphviz-meta.yaml
```

Many of your conda packages installed on your system have a similar file (i.e. called `meta.yaml` in a package directory).  For this exercise, imagine that Continuum Analytics were transported back in time to the early 2000s, and wanted to change the storage of all this package metadata into an XML format.

* Develop the XML dialect to be used to represent the data in this (and similar) YAML files.
  * You may define this dialect purely informally.  If you have way too much time, feel free to write a DTD (Document Type Definition), W3C XML Schema, or ISO RELAX NG, formal definitions of the dialect.
* Write the content of the mentioned data file as XML in the dialect you developed.
* Read the XML you have written out using one of the Python XML parsing libraries discussed.
* Write a utility function `get_requirements(meta, type_='build')` that will pull out a list of requirements for a package (either `build`, `conflicts`, or `run`) from your parsed representation of the XML.
  * If you have time, write a couple other utility functions that seem useful for working with your format.