# XML exercise

Using data from [**mondial database**](https://drive.google.com/file/d/14lFT4nWHgwN36ij4XZh6OUuup-K9qLgR/view?usp=sharing) find the answers to following questions:

1. 10 countries with the lowest infant mortality rates
2. 10 cities with the largest population
3. name and country of a) longest river, b) largest lake and c) airport at highest elevation

In [1]:
import xml.etree.ElementTree as ET

In [3]:
tree = ET.parse('mondial.xml')

In [5]:
root = tree.getroot()
root

<Element 'mondial' at 0x0000024DE00EA5C0>

In [7]:
print(root.tag)
print(root.attrib)
print(len(root))


mondial
{}
3403


1. 10 countries with the lowest infant mortality rates

In [1]:
import pandas as pd
import xml.etree.ElementTree as ET

# Load the XML data
tree = ET.parse('mondial.xml')
root = tree.getroot()

# Initialize lists to store data
countries = []
infant_mortality_rates = []

# Extract country names and infant mortality rates
for country in root.findall('country'):
    name = country.find('name').text
    infant_mortality = country.find('infant_mortality')
    if infant_mortality is not None:
        infant_mortality = float(infant_mortality.text)
        countries.append(name)
        infant_mortality_rates.append(infant_mortality)

# Create a DataFrame
df = pd.DataFrame({
    'country': countries,
    'infant_mortality_rate': infant_mortality_rates
})

# Find the 10 countries with the lowest infant mortality rates
lowest_infant_mortality = df.sort_values(by='infant_mortality_rate').head(10)

print("10 countries with the lowest infant mortality rates:")
print(lowest_infant_mortality)

10 countries with the lowest infant mortality rates:
            country  infant_mortality_rate
36           Monaco                   1.81
90            Japan                   2.13
109         Bermuda                   2.48
34           Norway                   2.48
98        Singapore                   2.53
35           Sweden                   2.60
8    Czech Republic                   2.63
6             Spain                   2.70
72        Hong Kong                   2.73
73            Macao                   3.13


2. 10 cities with the largest population

In [2]:
# Initialize lists to store city data
cities = []

# Extract city names and populations
for city in root.findall('.//city'):
    name = city.find('name').text
    population = city.find('.//population[last()]')
    if population is not None:
        population = int(population.text)
        cities.append((name, population))

# Create a DataFrame
cities_df = pd.DataFrame(cities, columns=['city', 'population'])

# Find the 10 cities with the largest population
largest_cities = cities_df.sort_values(by='population', ascending=False).head(10)

print("\n10 cities with the largest population:")
print(largest_cities)


10 cities with the largest population:
           city  population
1251   Shanghai    22315474
1334    Karachi    14916456
2866      Lagos    13745000
712    Istanbul    13710512
1422     Mumbai    12442373
448      Moskva    11979529
1250    Beijing    11716620
2811   Kinshasa    11575000
2596  São Paulo    11152344
1313     Lahore    11126285


3. Name and country of:
a) Longest river
b) Largest lake
c) Airport at highest elevation

In [3]:
# Function to get attribute value from element
def get_element_attr(element, attr):
    value = element.find(attr)
    return value.text if value is not None else None

# Find the longest river
rivers = []
for river in root.findall('.//river'):
    name = get_element_attr(river, 'name')
    length = get_element_attr(river, 'length')
    if name and length:
        rivers.append((name, float(length)))

longest_river = max(rivers, key=lambda x: x[1])

# Find the largest lake
lakes = []
for lake in root.findall('.//lake'):
    name = get_element_attr(lake, 'name')
    area = get_element_attr(lake, 'area')
    if name and area:
        lakes.append((name, float(area)))

largest_lake = max(lakes, key=lambda x: x[1])

# Find the airport at highest elevation
airports = []
for airport in root.findall('.//airport'):
    name = get_element_attr(airport, 'name')
    elevation = get_element_attr(airport, 'elevation')
    if name and elevation:
        airports.append((name, float(elevation)))

highest_airport = max(airports, key=lambda x: x[1])

print("\nName and country of:")
print(f"a) Longest river: {longest_river[0]} ({longest_river[1]} km)")
print(f"b) Largest lake: {largest_lake[0]} ({largest_lake[1]} sq. km)")
print(f"c) Airport at highest elevation: {highest_airport[0]} ({highest_airport[1]} meters)")


Name and country of:
a) Longest river: Yangtze (6380.0 km)
b) Largest lake: Caspian Sea (386400.0 sq. km)
c) Airport at highest elevation: El Alto Intl (4063.0 meters)
