# Denison CS-181/DA-210 Homework

---

## XPath Exercises

> The intent of this homework is to use **XPath** to obtain results, wherein the xpath string is doing the "heavy lifting" and used in a call to `.xpath()` on the root Element of an XML tree.  
>
> **If you use procedural XML instead of XPath, you will receive at most half credit per question.**
>
> For some questions, you are asked to "wrap" your xpath query into a function, or to perform some post-query step, and the previous admonition does not apply.
>
> It is perfectly fine to use an Online tool (like codebeautify) to help you develop and text your xpath string before incorporating it into its programmatic place here in this notebook.

In [63]:
import os
import sys
import lxml

def add_modules():
    """
    Starting at the current directory and proceeding up the file system
    tree, search for a directory named `modules`.  If found, and if not
    already there, add to the Python module search path.
    
    Params: None
    
    Return: None
    """
    directory = "."
    levels = 0
    while not os.path.isdir(os.path.join(directory, "modules")) and \
          levels < 5:
        directory = os.path.join(directory, "..")
        levels += 1
    module_path = os.path.abspath(os.path.join(directory, "modules"))
    if os.path.isdir(module_path):
        if not module_path in sys.path:
            sys.path.append(module_path)

add_modules()
import util

datadir = util.resolve_dir("hierarchicaldata")

**Q1** Create the tree and obtain the root for `"flights.xml"` in the data directory.  Assign the root node to `froot`.  Perform any necessary imports.  By using Atom, examine this dataset and its hierarchy, noting the set of **different** tags under the root `doc`, and the structure of each.

Understanding the data will help in answering the subsequent questions.

In [64]:
# Solution cell
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)

froot = etree.parse(os.path.join(datadir, "flights.xml"), parser).getroot()


In [65]:
# Testing cell

assert isinstance(froot, lxml.etree._Element)

**Q2** Create a Python list called `passports` containing the actual passport number (i.e., not the Element/node) for each of the passengers in the tree.

In [66]:
# Solution cell

xs_string = """/doc/Passenger/passportnumber/text()"""
passports = froot.xpath(xs_string)

passports

['123456', '123457', '000111', '000112', '000113', '000114']

In [67]:
# Testing cell

assert isinstance(passports, list)

**Q3** Create the collection of the attribute `airId` for those airports with a tax less than 100.  Assign to Python variable `cheapairport`.

In [68]:
# Solution cell

xs_string = """/doc/Airport[tax < "100"]/@airId"""
cheapairport = froot.xpath(xs_string)

cheapairport

['NPL', 'SPL', 'PRG', 'BDP', 'FFT']

In [69]:
# Testing cell

assert True

**Q5** Create a list of destination Elements for the flights where the source is NPL (North Pole).  Assign to Python variable `fromNPL`.

In [70]:
# Solution cell

xs_string = """/doc/Flight[source = "NPL"]/destination"""
fromNPL = froot.xpath(xs_string)

fromNPL

[<Element destination at 0x7ff40f804180>,
 <Element destination at 0x7ff40f802d80>,
 <Element destination at 0x7ff40f8020c0>,
 <Element destination at 0x7ff40f809e00>,
 <Element destination at 0x7ff40f8093c0>,
 <Element destination at 0x7ff40f809f80>,
 <Element destination at 0x7ff40f809e40>,
 <Element destination at 0x7ff40f809d00>,
 <Element destination at 0x7ff40f809b00>,
 <Element destination at 0x7ff40f809fc0>]

In [71]:
# Testing cell

assert True

**Q6** Construct a list of the text contents of **all** of the children of the Flight whose flightId is LX125, assigning to `flightchildren`.

In [72]:
# Solution cell

xs_string = """/doc/Flight[@flightId = "LX125"]//text()"""
flightchildren = froot.xpath(xs_string)

flightchildren

['100', '2005-12-24', '10:00:00', '11:10:00', 'LHR', 'AMS']

In [73]:
# Testing cell

assert True

**Q7** Create the tree and obtain the root for `"bookstore2.xml"` in the data directory.  Assign the root node to `broot`. By using Atom, examine this dataset and its hierarchy, noting the structure of the items in the data set.

Understanding the data will help in answering the subsequent questions.

In [74]:
# Solution cell

broot = etree.parse(os.path.join(datadir, "bookstore2.xml"), parser).getroot()

In [75]:
# Testing cell

assert isinstance(broot, lxml.etree._Element)

**Q8** Create a list of books ids named `less` that cost less than `$6`. Note that `id` is an attribute.

In [76]:
# Solution cell
less = broot.xpath("""/catalog/book[price < "6.00"]/@id""")

less

['bk102', 'bk103', 'bk104', 'bk105', 'bk106', 'bk107', 'bk108']

In [77]:
assert True

**Q9** Create a list of book titles called "eva" where Eva Corets was the author. Your list `eva` should be a list of strings.

In [78]:
# Solution cell
eva = broot.xpath("""/catalog/book[author = "Corets, Eva"]/title/text()""")

eva

['Maeve Ascendant', "Oberon's Legacy", 'The Sundered Grail']

In [79]:
assert True

**Q10** Find the average book price for all books that are not fantasy in this file, assigning to variable `avgprice`. **Hints** First, use XPath to get a list of the price strings (text) based on a single XPath query.  Then use a list comprehension to build a list of `float` values converting the strings to real-valued numbers.  Finally, perform the average based on the values and length of the list.

In [80]:
# Solution cell
avgprice = 0
prices = broot.xpath("""/catalog/book[genre != "Fantasy"]/price/text()""")
prices = [float(prices[i]) for i in range(len(prices))]
avgprice = sum(prices)/len(prices)

#I rounded the number because it was annoying me
avgprice = round(avgprice, 2)
avgprice

23.83

In [81]:
assert True

**Q11** Create a list called `lessFantasy` containing the titles of the books where the price is under `$40` and not in the fantasy genre.

In [82]:
# Solution cell
lessFantasy = broot.xpath("""/catalog/book[genre != "Fantasy" and price < "40.00"]/title/text()""")

lessFantasy

['Lover Birds',
 'Splish Splash',
 'Creepy Crawlies',
 'Paradox Lost',
 'Microsoft .NET: The Programming Bible',
 'MSXML3: A Comprehensive Guide']

In [83]:
assert True