# Chapter 6  
# Data Encoding and Processing

The main focus of this chapter is using Python to process data presented in different kinds of common encodings, such as CSV files, JSON, XML, and binary packed records.  
Unlike the chapter on data structures, this chapter is not focused on specific algorithms, but instead on the problem of getting data in and out of a program.

## 6.1 Reading and Writing CSV Data

If you want to read or write data encoded as a CSV file, you can use Python's `csv` library.  
We will use some stock market data from a CSV file for this example.

You can read the data as a sequence of tuples:

In [117]:
import csv

with open('stocks.csv') as f:
    f_csv = csv.reader(f)
    headers = next(f_csv)
    for row in f_csv:
        # Process row
        # ... and so forth
        pass

In the preceding code, `row` will be a tuple.  
Thus, to access certain fields, you will need to use indexing, such as `row[0]` (Symbol) and `row[4]` (Change).  
Since such indexing can often be confusing, this is one place where you might want to consider the use of named tuples.

In [118]:
from collections import namedtuple
with open('stocks.csv') as f:
    f_csv = csv.reader(f)
    headings = next(f_csv)
    Row = namedtuple('Row', headings)
    for r in f_csv:
        row = Row(*r)
        # Process row
        # ... and so forth
        pass

This would allow you to use the column headers such as `row.Symbol` and `row.Change` instead of indices.  
It should be noted that this only works if the column headers are valid Python identifiers.  
If not, you might have to massage the initial headings (e.g., replacing nonidentifier characters with underscores or similar).  
Another approach allows you to read the data as a sequence of dictionaries instead.

In [119]:
import csv

with open('stocks.csv') as f:
    f_csv = csv.DictReader(f)
    for row in f_csv:
        # Do something ...
        pass

In this version, youo would access the elements of each row using the row headers.  
For example, `row['Symbol']` or `row['Change']`.  
To write CSV data, you also use the `csv` module, but you create a writer object.

In [120]:
headers = ['Symbol','Price','Date','Time','Change','Volume']
rows = [('AA', 39.48, '6/11/2007', '9:36am', -0.18, 181800),
            ('AIG', 71.38, '6/11/2007', '9:36am', -0.15, 195500),
            ('AXP', 62.58, '6/11/2007', '9:36am', -0.46, 935000),]

In [121]:
with open('stocks.csv', 'w') as f:
    f_csv = csv.writer(f)
    f_csv.writerow(headers)
    f_csv.writerows(rows)

If you have the data as a sequence of dictionaries, like so:

In [122]:
headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
rows = [{'Symbol':'AA', 'Price':39.48, 'Date':'6/11/2007',
         'Time':'9:36am', 'Change':-0.18, 'Volume':181800},
        {'Symbol':'AIG', 'Price': 71.38, 'Date':'6/11/2007',
         'Time':'9:36am', 'Change':-0.15, 'Volume': 195500},
        {'Symbol':'AXP', 'Price': 62.58, 'Date':'6/11/2007',
         'Time':'9:36am', 'Change':-0.46, 'Volume': 935000},]

In [123]:
with open('stocks.csv', 'w') as f:
    f_csv = csv.DictWriter(f, headers)
    f_csv.writeheader()
    f_csv.writerows(rows)

### 6.1 Discussion

Using Python's `csv` module can save you quite a bit of time over parsing, splitting, and cleaning the data manually by yourself.  
Here is an example:

In [124]:
with open('stocks.csv') as f:
    for line in f:
        row = line.split(',')
        # Do something ...
        pass

The problem with this approach is that you’ll still need to deal with some nasty details.  
For example, if any of the fields are surrounded by quotes, you’ll have to strip the quotes.  
In addition, if a quoted field happens to contain a comma, the code will break by producing a row with the wrong size.  
By default, the `csv` library is programmed to understand CSV encoding rules used by Microsoft Excel.  
This is probably the most common variant, and will likely give you the best compatibility.  
However, if you consult the documentation for csv, you’ll see a few ways to tweak the encoding to different formats (e.g., changing the separator character, etc.).  
For example, if you want to read tab-delimited data instead, use this:

In [125]:
with open('stocks.csv') as f:
    f_tsv = csv.reader(f, delimiter='\t')
    for row in f_tsv:
        # Do something ...
        pass

If you're reading CSV data and converting it into named tuples, use caution when validating column headers.  
For example, a CSV file could have a header line containing nonvalid identifier characters like this:

`Street Address,Num-Premises,Latitude,Longitude`  
`5412 N CLARK,10,41.980262,-87.668452`  

This will actually cause the creation of a `namedtuple` to fail with a `ValueError` exception.  
To work around this, you might have to scrub the headers first.  
For instance, carrying a regex substitution on nonvalid identifier characters like this:

In [126]:
import re

with open('stocks.csv') as f:
    f_csv = csv.reader(f)
    headers = [ re.sub('[^a-zA-Z_]', '_', h) for h in next(f_csv) ]
    Row = namedtuple('Row', headers)
    for r in f_csv:
        row = Row(*r)
        # do something
        pass

It's important to note that `csv` does not try to interpret the data or convert it to a type other than a string.  
The following example performs extra type conversions on CSV data:

In [127]:
col_types = [str, float, str, str, float, int]
with open('stocks.csv') as f:
    f_csv = csv.reader(f)
    headers = next(f_csv)
    for row in f_csv:
        # Apply conversions to the row items
        row = tuple(convert(value) for convert, value in zip(col_types, row))
        # And so forth ...
        pass

You can also convert selected fields of dictionaries:

In [128]:
print('Reading as dicts with type conversion')
field_types = [ ('Price', float),
                ('Change', float),
                ('Volume', int) ]

with open('stocks.csv') as f:
    for row in csv.DictReader(f):
        row.update((key, conversion(row[key])) for key, conversion in field_types)
        print(row)

Reading as dicts with type conversion
OrderedDict([('Symbol', 'AA'), ('Price', 39.48), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', -0.18), ('Volume', 181800)])
OrderedDict([('Symbol', 'AIG'), ('Price', 71.38), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', -0.15), ('Volume', 195500)])
OrderedDict([('Symbol', 'AXP'), ('Price', 62.58), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', -0.46), ('Volume', 935000)])


In general, you’ll probably want to be a bit careful with such conversions, though.  
In the real world, it’s common for CSV files to have missing values, corrupted data, and other issues that would break type conversions.  
So, unless your data is guaranteed to be error free, that’s something you’ll need to consider (you might need to add suitable exception handling).  
Finally, if your goal in reading CSV data is to perform data analysis and statistics, you might want to look at the `pandas` package.  
`pandas` includes a convenient `pandas.read_csv()` function that will load CSV data into a `DataFrame` object.  
From there, you can generate various summary statistics, filter the data, and perform other kinds of high-level operations.

## 6.2 Reading and Writing JSON Data

### Problem  
You want to read or write data encoded as JavaScript Object Notation (JSON)

### Solution  
The `json` module provides an easy way to encode and decode data in JSON.  
The two main functions are `json.dumps()` and `json.loads()`, mirroring the interface used in other serialization libraries, such as `pickle`.  
Here is how you turn a Python data structure into JSON:

In [129]:
import json

data = {
    'name' : 'ACME',
    'shares' : 100,
    'price': 542.23
}

json_str = json.dumps(data)
json_str

'{"name": "ACME", "shares": 100, "price": 542.23}'

In [130]:
type(json_str)

str

Now we can turn the JSON-encoded string back into a Python data structure:

In [131]:
data = json.loads(json_str); data

{'name': 'ACME', 'shares': 100, 'price': 542.23}

In [132]:
type(data)

dict

If you are working with files instead of strings, you can also use `json.dump()` and `json.load()` to encode and decode JSON data.

In [133]:
# Write the data
with open ('data.json', 'w') as f:
    json.dump(data, f)
    
# Read data back
with open('data.json', 'r') as f:
    data = json.load(f)
    
data

{'name': 'ACME', 'shares': 100, 'price': 542.23}

### Discussion  
JSON encoding supports the basic types of `None, bool, int, float,` and `str`, as well as lists, tuples, and dictionaries containing those types.  
For dictionaries, keys are assumed to be strings (any non-string keys in a dictionary are converted to strings during encoding).  
To be compliant with the JSON specification, you should only encode Python lists and dictionaries.  
Note that in web applications, it is also conventional for the top-level object to be a dictionary.  
The format of JSON encoding is almost identical to Python syntax except for a few minor changes.  
for instance, `True` is mapped to `true`, `False` is mapped to `false`, and `None` is mapped to `null`.

In [134]:
json.dumps(False)

'false'

In [135]:
d = {
    'a' : True,
    'b' : 'Hello',
    'c': None
}

json.dumps(d)

'{"a": true, "b": "Hello", "c": null}'

If you are trying to examine data you have decoded from JSON, it can often be hard to ascertain its structure simply by printing it out, especially if the data contains a deep level of nested structures or a lot of fields.  
To assist with this, consider using the `pprint()` function in the pprint module.  
This will alphabetize the keys and output a dictionary in a more sane way.  

Normally, JSON decoding will create dicts or lists from the supplied data.  
If you want to create different kinds of objects, supply the `object_pairs_hook` or `object_hook` to `json.loads()`.  
Here is one way you can encode JSON data that preserves its order in an `OrderedDict`:

In [136]:
s = '{"name": "ACME", "shares": 50, "price": 490.1}'

from collections import OrderedDict

data = json.loads(s, object_pairs_hook=OrderedDict); data

OrderedDict([('name', 'ACME'), ('shares', 50), ('price', 490.1)])

You can also turn a JSON dictionary into a Python object:

In [137]:
class JSONObject:
    def __init__(self, d):
        self.__dict__ = d
        
        
data = json.loads(s, object_hook=JSONObject)
data.name, data.shares, data.price

('ACME', 50, 490.1)

In this last example, the dictionary created by decoding the JSON data is passed as a single argument to `__init__()`.  
From there, you can use it directly as the instance dictionary of the object.

There are a few options that can be useful for encoding JSON.  
If you would like the output to be nicely formatted, you can use the indent argument to `json.dumps()`.  
This causes the output to be pretty printed in a format similar to that with the `pprint()` function.  

In [138]:
with open('data.json', 'r') as f:
    data = json.load(f)
    
print(json.dumps(data))
print(json.dumps(data, indent=4))

{"name": "ACME", "shares": 100, "price": 542.23}
{
    "name": "ACME",
    "shares": 100,
    "price": 542.23
}


You can use the `sort_keys` argument to sort the keys alphabetically on output:

In [139]:
print(json.dumps(data, sort_keys=True))

{"name": "ACME", "price": 542.23, "shares": 100}


Instances are not normally serializable as JSON.  
The following code breaks down:

In [140]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
p = Point(2, 3)

If you want to serialize instances, you can supply a function that takes an instance as input and returns a dictionary that can be serialized.

In [141]:
def serialize_instance(obj):
    d = { '__classname__' : type(obj).__name__ }
    d.update(vars(obj))
    return d

If you want to get an instance back, you could do this:

In [142]:
# Dictionary mapping names to known classes
classes = { 'Point' : Point }

def unserialize_object(d):
    clsname = d.pop('__classname__', None)
    if clsname:
        cls = classes[clsname]
        obj = cls.__new__(cls)  # Creates an instance without calling the __init__() method
        for key, value in d.items():
            setattr(obj, key, value)
            return obj
    else:
        return d

In [143]:
p = Point(2,3)
s = json.dumps(p, default=serialize_instance); s

'{"__classname__": "Point", "x": 2, "y": 3}'

In [144]:
a = json.loads(s, object_hook=unserialize_object); a

<__main__.Point at 0x10a557198>

In [145]:
a.x

2

The `json` module has a variety of other options for controlling the low-level interpretation of numbers, special values such as `NaN`, and more.  
[The JavaScript Object Notation (JSON) Data Interchange Format](https://tools.ietf.org/html/rfc8259)  
[`json` — JSON encoder and decoder](https://docs.python.org/3.7/library/json.html)

## 6.3 Parsing Simple XML Data

The `xml.etree.ElementTree` module can be used to extract data from simple XML documents.  
To illustrate, suppose you want to parse and make a summary of the RSS feed on [Planet Python](https://planetpython.org/).  
The following code will do that.

In [146]:
from urllib.request import urlopen
from xml.etree.ElementTree import parse

# Download the RSS feed and parse it:
u = urlopen('https://planet.python.org/rss20.xml')
doc = parse(u); doc

<xml.etree.ElementTree.ElementTree at 0x10a562240>

Now we can extract and output the tags that interest us:

In [147]:
for item in doc.iterfind('channel/item'):
    title = item.findtext('title')
    date = item.findtext('pubDate')
    link = item.findtext('link')
    print(title)
    print(date)
    print(link)
    print()

Vasudev Ram: pprint.isrecursive: Check if object requires recursive representation
Sat, 16 Feb 2019 01:47:54 +0000
http://jugad2.blogspot.com/2019/02/pprintisrecursive-check-if-object.html

PyCharm: PyCharm 2019.1 EAP 4
Fri, 15 Feb 2019 10:54:11 +0000
http://feedproxy.google.com/~r/Pycharm/~3/FEIM-6tduy8/

Continuum Analytics Blog: Intake released on Conda-Forge
Thu, 14 Feb 2019 21:26:28 +0000
https://www.anaconda.com/intake-released-on-conda-forge/

PyCon: Eighth Annual PyLadies Auction at PyCon 2019
Thu, 14 Feb 2019 18:43:02 +0000
https://pycon.blogspot.com/2019/01/eighth-annual-pyladies-auction-at-pycon.html

Made With Mu: A GPIOZero Theramin for Valentine’s Day
Thu, 14 Feb 2019 09:00:00 +0000
https://madewith.mu/mu/users/2019/02/14/gpiozero-theramin.html

Talk Python to Me: #199 Automate all the things with Python at Zapier
Thu, 14 Feb 2019 08:00:00 +0000
https://talkpython.fm/episodes/show/199/automate-all-the-things-with-python-at-zapier

Python Bytes: #117 Is this the end of Pyt

### Discussion

Working with data encoded as XML is commonplace in many applications.  
Not only is XML widely used as a format for exchanging data on the Internet, it is a common format for storing application data (e.g., word processing, music libraries, etc.).  
The discussion that follows already assumes the reader is familiar with XML basics.

In many cases, when XML is simply being used to store data, the document structure is compact and straightforward.  
The `xml.etree.ElementTree.parse()` function parses the entire XML document into a document object.  
From there, you use methods such as `find()`, `iterfind()`, and `findtext()` to search for specific XML elements.  
The arguments to these functions are the names of a specific tag, such as channel/item or title.
When specifying tags, you need to take the overall document structure into account.  
Each find operation takes place relative to a starting element. 
Likewise, the tagname that you supply to each operation is also relative to the start.  
In the example, the call to `doc.iterfind('channel/item')` looks for all "item" elements under a "channel" element. doc represents the top of the document (the top-level "rss" element).  
The later calls to `item.findtext()` take place relative to the found "item" elements.  
Each element represented by the `ElementTree` module has a few essential attributes and methods that are useful when parsing.  
The tag attribute contains the name of the tag, the text attribute contains enclosed text, and the `get()` method can be used to extract attributes (if any).

In [148]:
doc

<xml.etree.ElementTree.ElementTree at 0x10a562240>

In [149]:
e = doc.find('channel/title'); e

<Element 'title' at 0x10a55d4f8>

In [150]:
e.tag

'title'

In [151]:
e.text

'Planet Python'

It should be noted that `xml.etree.ElementTree` is not the only option for XML parsing.  
For more advanced applications, you might consider `lxml`.  
It uses the same program‐ ming interface as ElementTree, so the example shown in this recipe works in the same manner.  
You simply need to change the first import to:  
`from lxml.etree import parse`.  
`lxml` provides the benefit of being fully compliant with XML standards.  
It is also extremely fast, and provides support for features such as validation, XSLT, and XPath.

## 6.4 Parsing Huge XML Files Incrementally 

### Problem

You need to extract data from a huge XML document while using as little memory as possible.

### Solution

Any time you are faced with the problem of incremental data processing, you should think of iterators and generators.  
Here is a simple function that can be used to incrementally process huge XML files using a very small memory footprint:  

In [152]:
from xml.etree.ElementTree import iterparse

def parse_and_remove(filename, path):
    path_parts = path.split('/')
    doc = iterparse(filename, ('start', 'end'))
    # Skip the root element:
    next(doc)
    
    tag_stack = []
    elem_stack = []
    for event, elem in doc:
        if event == 'start':
            tag_stack.append(elem.tag)
            elem_stack.append(elem)
        elif even == 'end':
            if tag_stack == path_parts:
                yield elem
                elem_stack[-2].remove(elem)
            try:
                tag_stack.pop()
                elem_stack.pop()
            except IndexError:
                pass

To test the function, you now need to find a large XML file to work with.  
You can often find such files on government and open data websites.  
For example, you can download [Chicago’s pothole database](https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Pot-Holes-Reported/7as2-ds3y) as XML.  
At the time of this writing, the downloaded file consists of more than 100,000 rows of data, which are encoded like this:

You could write a script that ranks ZIP codes by the number of pothole reports:

The only problem with this script is that it reads and parses the entire XML file into memory.  
On our machine, it takes about 450 MB of memory to run.  
Using this recipe’s code, the program changes only slightly:

This version of the program has a memory footprint of only 7MB.

### Discussion

This recipe relies on two core features of the `ElementTree` module.  
First, the `iterparse()` method allows incremental processing of XML documents.  
To use it, you supply the filename along with an event list consisting of one or more of the following:  
`start, end, start-ns,` and `end-ns`.  
The iterator created by `iterparse()` produces tuples of the form `(event, elem)`, where `event` is one of the listed events and `elem` is the resulting XML element.

`start` events are created when an element is first created but not yet populated with any other data (e.g., child elements).  
`end` events are created when an element is completed.  
Although not shown in this recipe, `start-ns` and `end-ns` events are used to handle XML namespace declarations.  
In this recipe, the start and end events are used to manage stacks of elements and tags.  
The stacks represent the current hierarchical structure of the document as it’s being parsed, and are also used to determine if an element matches the requested path given to the `parse_and_remove()` function.  
If a match is made, `yield` is used to emit it back to the caller.  
The following statement after the yield is the core feature of ElementTree that makes this recipe save memory:  

`elem_stack[-2].remove(elem)`

This statement causes the previously yielded element to be removed from its parent.  
Assuming that no references are left to it anywhere else, the element is destroyed and memory reclaimed.  
The end effect of the iterative parse and the removal of nodes is a highly efficient incremental sweep over the document.  
At no point is a complete document tree ever constructed.  
Yet, it is still possible to write code that processes the XML data in a straightforward manner.  
The primary downside to this recipe is its runtime performance.  
When tested, the version of code that reads the entire document into memory first runs approximately twice as fast as the version that processes it incrementally.  
However, it requires more than 60 times as much memory.  
So, if memory use is a greater concern, the incremental version is a big win.

## 6.5 Turning A Dictionary into XML

### Problem

Take the data in a Python dictionary and convert it to XML.

### Solution

Although the `xml.etree.ElementTree` library is commonly used for parsing, it can also be used to create XML documents.

In [153]:
from xml.etree.ElementTree import Element

def dict_to_xml(tag, d):
    """
    Turn a dict into XML
    """
    elem = Element(tag)
    for key, val in d.items():
        child = Element(key)
        child.text = str(val)
        elem.append(child)
    return elem

s = { 'name': 'GOOG', 'shares': 100, 'price':490.1 }
e = dict_to_xml('stock', s)
e

<Element 'stock' at 0x10a65ba98>

The result of this conversion is an `Element` instance.  
For I/O, it's easy to convert this instance to a byte string using the `tostring()` function in `xml.etree.ElementTree`.

In [154]:
from xml.etree.ElementTree import tostring 

tostring(e)

b'<stock><name>GOOG</name><shares>100</shares><price>490.1</price></stock>'

You can also attach attributes to an element using its `set()` method:

In [155]:
e.set('_id', '1234')
tostring(e)

b'<stock _id="1234"><name>GOOG</name><shares>100</shares><price>490.1</price></stock>'

If the order of the elements matters, you might make an `OrderedDict` instead of a normal dictionary, like in Recipe 1.7.

### Discussion

When creating XML, you might be inclined to just make strings instead:

In [156]:
def dict_to_xml_str(tag, d):
    """
    Turn a simple dict of key/value pairs into XML
    """
    parts = ['<{}>'.format(tag)]
    for key, val in d.items():
        parts.append('<{0}>{1}</{0}>'.format(key, val))
    parts.append('</{}>'.format(tag))
    return ''.join(parts)

However, if you try to do things manually, things can become messy.  
How do you deal with special characters?

In [157]:
d = { 'name' : '<spam>'}
# String creation:
dict_to_xml_str('item', d)

'<item><name><spam></name></item>'

In [158]:
# Proper XML creation:
e = dict_to_xml('item', d)
tostring(e)

b'<item><name>&lt;spam&gt;</name></item>'

Notice how in the latter example, the characters `<` and `>` got replaced with `&lt;` and `&gt;`.  
Just for reference, if you ever need to manually escape or unescape such characters, you can use the `escape()` and `unescape()` functions in `xml.sax.saxutils`.

In [159]:
from xml.sax.saxutils import escape, unescape

escape('<spam>')

'&lt;spam&gt;'

In [160]:
unescape(_)

'<spam>'

Aside from creating correct output, the other reason why it’s a good idea to create `Element` instances instead of strings is that they can be more easily combined together to make a larger document.  
The resulting `Element` instances can also be processed in various ways without ever having to worry about parsing the XML text.  
Essentially, you can do all of the processing of the data in a more high-level form and then output it as a string at the very end.

## 6.6. Parsing, Modifying, and Rewriting XML

### Problem

You want to read an XML document, make changes to it, and then write it back out as XML.

### Solution

The `xml.etree.ElementTree` module makes it easy to perform such tasks.  
Essentially, you start out by parsing the document in the usual way.  
For example, suppose you have a document named `pred.xml` that looks like this:

We can use `ElementTree` to read it and make changes to the structure.

In [161]:
from xml.etree.ElementTree import parse, Element

doc = parse('pred.xml')
root = doc.getroot()
root

<Element 'stop' at 0x10a783908>

Let's make some changes to our XML file and see what happens:

In [162]:
# Remove a few elements
root.remove(root.find('sri'))
root.remove(root.find('cr'))
# Insert a new element after <nm>...</nm>
root.getchildren().index(root.find('nm'))

1

We can create a simple element that will be added to the file.

In [163]:
e = Element('spam')
e.text = 'This is a test'
root.insert(2, e)
# Write it to the file:
doc.write('newpred.xml', xml_declaration=True)

We have created a new XML file that looks like this:

### Discussion

Modifying the structure of an XML document is straightforward, but you must remember that all modifications are generally made to the parent element, treating it as if it were a list.  
For example, if you remove an element, it is removed from its immediate parent using that parent’s `remove()` method.  
If you insert or append new elements, you also use `insert()` and `append()` methods on the parent.  
Elements can also be manipulated using indexing and slicing operations, such as `element[i]` or `element[i:j]`.  
If you need to make new elements, use the `Element` class, as shown in this recipe’s solution.  
A further description is available in Recipe 6.5.

## 6.7. Parsing XML Documents with Namespaces

### Problem

You need to parse an XML document, but it uses XML namespaces.

### Solution

Look at how the following document uses namespaces:

If you parse this document and try to perform the usual queries, you'll find that it doesn't work so easily:

In [164]:
doc = parse('namespaces.xml')

Let's begin with some queries that actually work:

In [165]:
doc.findtext('author')

'David Beazley'

In [166]:
doc.find('content')

<Element 'content' at 0x10a7784f8>

Now let's try some queries that don't go so well:

In [167]:
# A query involving a namespace:
doc.find('content/html')

Only a fully qualified query will work:

In [168]:
doc.find('content/{http://www.w3.org/1999/xhtml}html')

<Element '{http://www.w3.org/1999/xhtml}html' at 0x10a778548>

In [169]:
# This one doesn't work either:
doc.findtext('content/{http://www.w3.org/1999/xhtml}html/head/title')

In [170]:
# Fully qualified:
doc.findtext('content/{http://www.w3.org/1999/xhtml}html/'\
             '{http://www.w3.org/1999/xhtml}head/{http://www.w3.org/1999/xhtml}title')

'Hello World'

One way that you can simplify things is to wrap namespace handling up into a utility class:

In [171]:
class XMLNamespaces:
    def __init__(self, **kwargs):
        self.namespaces = {}
        for name, uri in kwargs.items():
            self.register(name, uri)
    def register(self, name, uri):
        self.namespaces[name] = '{'+uri+'}'
    def __call__(self, path):
        return path.format_map(self.namespaces)

Now let's put our class to work making our lives easier:

In [172]:
ns = XMLNamespaces(html='http://www.w3.org/1999/xhtml')
doc.find(ns('content/{html}html'))

<Element '{http://www.w3.org/1999/xhtml}html' at 0x10a778548>

In [173]:
doc.findtext(ns('content/{html}html/{html}head/{html}title'))

'Hello World'

### Discussion

Parsing XML documents that contain namespaces can be messy.  
The `XMLNamespaces` class is really just meant to clean it up slightly by allowing you to use the shortened namespace names in subsequent operations as opposed to fully qualified URIs.  
Unfortunately, there is no mechanism in the basic `ElementTree` parser to get further information about namespaces.  
However, you can get a bit more information about the scope of namespace processing if you’re willing to use the `iterparse()` function instead.

In [174]:
from xml.etree.ElementTree import iterparse

for evt, elem in iterparse('namespaces.xml'):
    print(evt, elem)

end <Element 'author' at 0x10a6687c8>
end <Element '{http://www.w3.org/1999/xhtml}title' at 0x10a773e08>
end <Element '{http://www.w3.org/1999/xhtml}head' at 0x10a65b908>
end <Element '{http://www.w3.org/1999/xhtml}h1' at 0x10a7733b8>
end <Element '{http://www.w3.org/1999/xhtml}body' at 0x10a773728>
end <Element '{http://www.w3.org/1999/xhtml}html' at 0x10a65b5e8>
end <Element 'content' at 0x10a65b958>
end <Element 'top' at 0x10925cb38>


In [176]:
# The top-most element:
elem

<Element 'top' at 0x10925cb38>

As a final note, if the text you are parsing makes use of namespaces in addition to other advanced XML features, you’re really better off using the `lxml` library instead of `ElementTree`.  
For instance, `lxml` provides better support for validating documents against a DTD, more complete XPath support, and other advanced XML features.  
This recipe is really just a simple fix to make parsing a little easier.

## 6.8. Interacting with a Relational Database