# XML example and exercise
****
+ study examples of accessing nodes in XML tree structure  
+ work on exercise to be completed and submitted
****
+ reference: https://docs.python.org/2.7/library/xml.etree.elementtree.html
+ data source: http://www.dbis.informatik.uni-goettingen.de/Mondial
****

In [2]:
from xml.etree import ElementTree as ET

## XML example

+ for details about tree traversal and iterators, see https://docs.python.org/2.7/library/xml.etree.elementtree.html

In [3]:
document_tree = ET.parse( './data/mondial_database_less.xml' )
document_tree

<xml.etree.ElementTree.ElementTree at 0x104a3b470>

In [4]:
# print names of all countries
for child in document_tree.getroot():
    print (child.find('name').text)

Albania
Greece
Macedonia
Serbia
Montenegro
Kosovo
Andorra


In [5]:
# print names of all countries and their cities
for element in document_tree.iterfind('country'):
    print ('* ' + element.find('name').text + ':'),
    capitals_string = ''
    for subelement in element.getiterator('city'):
        capitals_string += subelement.find('name').text + ', '
    print (capitals_string[:-2])

* Albania:
Tirana, Shkodër, Durrës, Vlorë, Elbasan, Korçë
* Greece:
Komotini, Kavala, Athina, Peiraias, Peristeri, Acharnes, Patra, Kozani, Kerkyra, Ioannina, Thessaloniki, Iraklio, Chania, Ermoupoli, Rhodes, Tripoli, Lamia, Chalkida, Larissa, Volos, Mytilini, Karyes
* Macedonia:
Skopje, Kumanovo
* Serbia:
Beograd, Novi Sad, Niš
* Montenegro:
Podgorica
* Kosovo:
Prishtine
* Andorra:
Andorra la Vella


****
## XML exercise

Using data in 'data/mondial_database.xml', the examples above, and refering to https://docs.python.org/2.7/library/xml.etree.elementtree.html, find

1. 10 countries with the lowest infant mortality rates
2. 10 cities with the largest population
3. 10 ethnic groups with the largest overall populations (sum of best/latest estimates over all countries)
4. name and country of a) longest river, b) largest lake and c) airport at highest elevation

In [6]:
document = ET.parse( './data/mondial_database.xml' )
document

<xml.etree.ElementTree.ElementTree at 0x104a3e278>

In [7]:
root = document.getroot()
root

<Element 'mondial' at 0x1049279a8>

In [8]:
mort_dict = {}
for element in root.iterfind('country'):
    if element.find('infant_mortality') is None:
        #mort_dict[element.find('name').text] = 'N/A'
        pass
    else:
        mort_dict[element.find('name').text] = float(element.find('infant_mortality').text)
        
from heapq import nsmallest
from operator import itemgetter

for country, mortality in nsmallest(10, mort_dict.items(), key=itemgetter(1)):
    print (country, mortality)
    


Monaco 1.81
Japan 2.13
Bermuda 2.48
Norway 2.48
Singapore 2.53
Sweden 2.6
Czech Republic 2.63
Hong Kong 2.73
Macao 3.13
Iceland 3.15


In [11]:
# print names of all countries and their cities
for element in root.iterfind('country'):
    for subelement in element.getiterator('city'):
        capitals_string += subelement.find('name').text + ', '
    print (capitals_string[:-2])

Andorra la Vella, Tirana, Shkodër, Durrës, Vlorë, Elbasan, Korçë, Komotini, Kavala, Athina, Peiraias, Peristeri, Acharnes, Patra, Kozani, Kerkyra, Ioannina, Thessaloniki, Iraklio, Chania, Ermoupoli, Rhodes, Tripoli, Lamia, Chalkida, Larissa, Volos, Mytilini, Karyes, Skopje, Kumanovo, Beograd, Novi Sad, Niš, Podgorica, Prishtine, Andorra la Vella, Tirana, Shkodër, Durrës, Vlorë, Elbasan, Korçë
Andorra la Vella, Tirana, Shkodër, Durrës, Vlorë, Elbasan, Korçë, Komotini, Kavala, Athina, Peiraias, Peristeri, Acharnes, Patra, Kozani, Kerkyra, Ioannina, Thessaloniki, Iraklio, Chania, Ermoupoli, Rhodes, Tripoli, Lamia, Chalkida, Larissa, Volos, Mytilini, Karyes, Skopje, Kumanovo, Beograd, Novi Sad, Niš, Podgorica, Prishtine, Andorra la Vella, Tirana, Shkodër, Durrës, Vlorë, Elbasan, Korçë, Komotini, Kavala, Athina, Peiraias, Peristeri, Acharnes, Patra, Kozani, Kerkyra, Ioannina, Thessaloniki, Iraklio, Chania, Ermoupoli, Rhodes, Tripoli, Lamia, Chalkida, Larissa, Volos, Mytilini, Karyes
Andorra