# XML exercise

Using data from [**mondial database**](https://drive.google.com/file/d/14lFT4nWHgwN36ij4XZh6OUuup-K9qLgR/view?usp=sharing) find the answers to following questions:

1. 10 countries with the lowest infant mortality rates
2. 10 cities with the largest population
3. name and country of a) longest river, b) largest lake and c) airport at highest elevation

In [72]:
import xml.etree.ElementTree as ET
import pandas as pd
import numpy as np

In [73]:
# parsed xml tree and extracted root
tree = ET.parse('data/mondial.xml')
root = tree.getroot()

### 10 countries with the lowest infact mortality rates

In [74]:
print(root.tag)
print(root[0].tag)
print(len(root))

mondial
country
3403


In [75]:
root[0].attrib

{'car_code': 'AL',
 'area': '28750',
 'capital': 'cty-Albania-Tirane',
 'memberships': 'org-BSEC org-CEI org-CD org-SELEC org-CE org-EAPC org-EBRD org-EITI org-FAO org-IPU org-IAEA org-IBRD org-ICC org-ICAO org-ICCt org-Interpol org-IDA org-IFRCS org-IFC org-IFAD org-ILO org-IMO org-IMF org-IOC org-IOM org-ISO org-OIF org-ITU org-ITUC org-IDB org-MIGA org-NATO org-OSCE org-OPCW org-OAS org-OIC org-PCA org-UN org-UNCTAD org-UNESCO org-UNIDO org-UPU org-WCO org-WFTU org-WHO org-WIPO org-WMO org-UNWTO org-WTO'}

In [76]:
country1 = root[0]
list(country1)

[<Element 'name' at 0x281369860>,
 <Element 'population' at 0x2813699a0>,
 <Element 'population' at 0x2810da900>,
 <Element 'population' at 0x2810daa40>,
 <Element 'population' at 0x2810daa90>,
 <Element 'population' at 0x2810da860>,
 <Element 'population' at 0x2813c04a0>,
 <Element 'population' at 0x2813c0ea0>,
 <Element 'population' at 0x2813c0630>,
 <Element 'population' at 0x2813c0270>,
 <Element 'population_growth' at 0x2813c04f0>,
 <Element 'infant_mortality' at 0x2813c05e0>,
 <Element 'gdp_total' at 0x2813a0860>,
 <Element 'gdp_agri' at 0x2813a0770>,
 <Element 'gdp_ind' at 0x28135eb30>,
 <Element 'gdp_serv' at 0x28135e040>,
 <Element 'inflation' at 0x28135ee50>,
 <Element 'unemployment' at 0x28135ebd0>,
 <Element 'indep_date' at 0x28135ef40>,
 <Element 'government' at 0x2813966d0>,
 <Element 'encompassed' at 0x281396040>,
 <Element 'ethnicgroup' at 0x2813969f0>,
 <Element 'ethnicgroup' at 0x281396900>,
 <Element 'religion' at 0x281396950>,
 <Element 'religion' at 0x281396770>,
 

In [77]:
mortality_dict = {'name': [],
           'infant_mortality': []}

for country in root.findall('country'):
    
    if country.find('infant_mortality') != None:
        name = country.find('name').text
        infant_mortality = country.find('infant_mortality').text
        mortality_dict['name'].append(name)
        mortality_dict['infant_mortality'].append(infant_mortality)

In [78]:
pd.DataFrame(mortality_dict).sort_values(by="infant_mortality", ascending=False).head(10)

Unnamed: 0,name,infant_mortality
197,Central African Republic,92.86
214,Guinea-Bissau,90.92
198,Chad,90.3
159,Argentina,9.96
66,Thailand,9.86
58,Bahrain,9.68
123,Greenland,9.42
188,Botswana,9.38
129,Sint Maarten,9.05
99,Sri Lanka,9.02


### 10 cities with the largest population

In [89]:
population_dict = {'name': [],
                   'population': []}

for country in root.findall('country'):
    name = country.find('name').text
    pop = country.find('population').text
    
    population_dict['name'].append(name)
    population_dict['population'].append(int(pop)) # convert population from string to integer before adding to dict

pd.DataFrame(population_dict).sort_values(by='population', ascending=False).head(10)

Unnamed: 0,name,population
55,China,543776080
67,India,238396327
120,United States,157813040
23,Russia,102798657
98,Japan,82199470
88,Indonesia,72592192
11,Germany,68230796
176,Brazil,53974725
53,United Kingdom,50616012
7,France,40502513


### Name and country of:
### 1. Longest river
### 2. Largest lake
### 3. Airport at highest elevation

In [90]:
list(country)

[<Element 'name' at 0x283c2c2c0>,
 <Element 'localname' at 0x283c2c310>,
 <Element 'population' at 0x283c2c360>,
 <Element 'population' at 0x283c2c3b0>,
 <Element 'population' at 0x283c2c400>,
 <Element 'population' at 0x283c2c450>,
 <Element 'population' at 0x283c2c4a0>,
 <Element 'population' at 0x283c2c4f0>,
 <Element 'population' at 0x283c2c540>,
 <Element 'population' at 0x283c2c590>,
 <Element 'population' at 0x283c2c5e0>,
 <Element 'population_growth' at 0x283c2c630>,
 <Element 'infant_mortality' at 0x283c2c680>,
 <Element 'gdp_total' at 0x283c2c6d0>,
 <Element 'gdp_agri' at 0x283c2c720>,
 <Element 'gdp_ind' at 0x283c2c770>,
 <Element 'gdp_serv' at 0x283c2c7c0>,
 <Element 'inflation' at 0x283c2c810>,
 <Element 'unemployment' at 0x283c2c860>,
 <Element 'indep_date' at 0x283c2c8b0>,
 <Element 'government' at 0x283c2c900>,
 <Element 'encompassed' at 0x283c2c950>,
 <Element 'ethnicgroup' at 0x283c2c9a0>,
 <Element 'religion' at 0x283c2c9f0>,
 <Element 'religion' at 0x283c2ca40>,
 <E