# Exercises


## Assignment 01

Write a function ``assignment_06_01`` that reads the random numbers in the files with `csv` extension under ``data/random_numbers``, sums up all values and returns the result. Try to avoid reading the entire file in memory and avoid using a library like pandas or numpy.  


In [2]:
import glob
import os
import itertools

def assignment_06_01():
    # finds all csv files in data/random_numbers
    files = glob.glob(os.path.join("data", "random_numbers", "*.csv"))
    
    sum_of_values = 0

    # Iterate each .csv file from path ~/data/random_numbers/*.csv
    for file_path in files:
        # Open the file
        with open(file_path, 'r') as file:
            # Iterate each row in the file
            for row in file:
                # Split values from row, seperate by ' ' 
                values = row.strip().split(' ')

                # Use itertools.chain to flatten the values and convert each to an integer
                # Flattening transform nested structure ['1', '2', '3', ['4', '5', '6'], '7', '8', '9'] 
                # into flat structure ['1', '2', '3', '4', '5', '6', '7', '8', '9'] 
                sum_of_values += sum(map(int, itertools.chain(*values))) # *values unpacking the list of values into separate arguments for itertools.chain

    return sum_of_values


In [9]:
assignment_06_01() == 203455

36109

## Assignment 02

Write a function ``assignment_06_02`` that reads Wikipedia html pages and extracts the infobox key-value pairs as strings. The infobox is the blue table in the top right of wikipedia pages.


In [5]:
from bs4 import BeautifulSoup
import requests

beuth_url = "https://de.wikipedia.org/wiki/Beuth_Hochschule_f%C3%BCr_Technik_Berlin"


def assignment_06_02(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    infobox = {}

    # Select the table with class "infobox" in the HTML content
    table = soup.select_one("table.infobox")
    
    # Iterate each row in the infobox table
    for row in table.select("tr"):
        # Get the header key (th) within the current row
        key_element = row.select_one("th")

        # Get the data value (td) within the current row 
        value_element = row.select_one("td")

        # Check if both header and data (key-value) exisited
        if key_element and value_element:
            # Extract the content from header (key) and data (value)
            key = key_element.get_text(strip=True)
            value = value_element.get_text(strip=True)

            # Add the key-value pair to infobox (dictionary)
            infobox[key] = value
    
    return infobox

In [6]:
infobox = assignment_06_02(beuth_url)
assert infobox["Ort"] == "Berlin-Wedding"

## Assignment 03

Write a function ``assignment_06_03`` that reads the information about all Christmas markets in Berlin and returns the name of the district that has most registered Christmas markets.

In [7]:
import json
import requests
import pandas as pd


def assignment_06_03():
    christmas_market_url = (
        "https://www.berlin.de/sen/web/service/maerkte-feste/weihnachtsmaerkte/index.php/index/all.json?q="
    )
    data = json.loads(requests.get(christmas_market_url).content)
    
    # Create a DataFrame from extracting the list of district(s) from API
    df = pd.DataFrame(data['index'])

    # Group by bezirk (district) and count occurrences for each bezirk
    district_counts = df.groupby('bezirk').size().to_dict()
        
    # Find the 'district' key which has the most occurrences
    district_with_most_christmas_markets = max(district_counts, key=district_counts.get)

    return district_with_most_christmas_markets

In [8]:
assert assignment_06_03() == "Charlottenburg-Wilmersdorf"