### The Python xml package has many tools for working with XML data and partial support for XPATH. There are other xml packages that are more powerful, but also more complex

In [11]:
import xml.etree.ElementTree as ET

read the simple book example used in the notes and SWP

In [12]:
library_data = ET.parse('library/library.xml')

What does its string representation look like?

In [3]:
root = library_data.getroot()
ET.tostring(root)

b'<library location="Bremen">\n\t<author name="Henry Wise">\n\t   <book title="Artificial Intelligence" />\n\t   <book title="Modern Web Services" />\n\t   <book title="Theory of Computation" />\n\t</author>\n\t<author name="William Smart">\n\t\t<book title="Artificial Intelligence" />\n\t</author>\n\t<author name="Cynthia Singleton">\n\t   <book title="The Semantic Web" />\n\t   <book title="Browser Technology Revised" />\n\t</author>\n</library>'

we can easily access its parts

In [4]:
print('ROOT:', root)
print('TAG:', root.tag)
print('ATTRIBUTES:', root.attrib)
print('ELEMENTS', [element for element in root])

ROOT: <Element 'library' at 0x106d299a8>
TAG: library
ATTRIBUTES: {'location': 'Bremen'}
ELEMENTS [<Element 'author' at 0x106e68ef8>, <Element 'author' at 0x106ec0778>, <Element 'author' at 0x106ec0818>]


find author elements that are children of the root node

In [5]:
root.findall("./author")

[<Element 'author' at 0x106e68ef8>,
 <Element 'author' at 0x106ec0778>,
 <Element 'author' at 0x106ec0818>]

find book elements anywhere in the tree under root

In [6]:
root.findall(".//book")

[<Element 'book' at 0x106ec0688>,
 <Element 'book' at 0x106ec0638>,
 <Element 'book' at 0x106ec0728>,
 <Element 'book' at 0x106ec07c8>,
 <Element 'book' at 0x106ec0868>,
 <Element 'book' at 0x106ec0908>]

find attributes for all books 

In [7]:
[b.attrib for b in root.findall(".//book")]

[{'title': 'Artificial Intelligence'},
 {'title': 'Modern Web Services'},
 {'title': 'Theory of Computation'},
 {'title': 'Artificial Intelligence'},
 {'title': 'The Semantic Web'},
 {'title': 'Browser Technology Revised'}]

Find titles of all of the books in the library

In [8]:
[b.attrib['title'] for b in root.findall(".//book")]

['Artificial Intelligence',
 'Modern Web Services',
 'Theory of Computation',
 'Artificial Intelligence',
 'The Semantic Web',
 'Browser Technology Revised']

Find authors who have written a book with the title "Artificial Intelligence"?

In [9]:
root.findall("./author/book/[@title='Artificial Intelligence']/..")

[<Element 'author' at 0x106e68ef8>, <Element 'author' at 0x106ec0778>]

Find names of authors who have written a book with the title "Artificial Intelligence"?

In [10]:
[auth.attrib['name'] for auth in root.findall("./author/book/[@title='Artificial Intelligence']/..")]

['Henry Wise', 'William Smart']