## Exercise 3 - Data sources
- All files used in this exercise can be found under the Exercises/data_files directory

1 Use gamedata.json for this task. This file contains information of games sold through Steam. Parse out the following information from the data (Important: Do not combine these filters, but do them separately!):
- TOP 3 highest metacritic score. Present results using the following format: *Title* has metacritic score of *Score* (for example)
- Games with price discount being 90 % or more. Present results using the following format: *Title* | Discount: *Savings* (for example Metal Gear Solid V: Ground Zeroes | Discount: 90.090090)
- Games having metacritic score higher than steam score. Present results using the following format: *Title* has metacritic score of *MetacriticScore* and steam score of *SteamRatingPercent*

In [26]:
# to avoid my own confusion and making the results more readable I have added empty prints and a "category" for each filter

import json

file_path = 'data_files/gamedata.json'

with open(file_path) as gamedata:
    games = json.load(gamedata)

print("Top 3 metacritic scores are:")

top3 = sorted(games, key=lambda x: x['metacriticScore'], reverse=True)
for game in top3[:3]:
        title = game['title']
        score = game['metacriticScore']
        print(f"{title} has metacritic score of {score}")

print(" ")

print("Discounts:")

discount = sorted(games, key=lambda x: x['savings'], reverse=True)
for game in discount:
    title = game['title']
    discount = float(game['savings'])
    original = float(game['normalPrice'])
    new = float(game['salePrice'])
    savings = original-new
    if discount >= 90:
        print(f"{title} | Savings: {savings}€ Discount:{discount}")
        
print(" ")
print("Reviews")

reviews = sorted(games, key=lambda x: x['title'])
for game in reviews:
    title = game['title']
    metareview = game['metacriticScore']
    steamreview = game['steamRatingPercent']
    if metareview > steamreview:
        print(f"{title} has metacritic score of {metareview} and a steam score of {steamreview}")

Top 3 metacritic scores are:
Star Wars: Knights of the Old Republic has metacritic score of 93
Metal Gear Solid V: The Phantom Pain has metacritic score of 91
Bayonetta has metacritic score of 90
 
Discounts:
Airscape: The Fall of Gravity | Savings: 4.5€ Discount:90.180361
Making History: The Calm and the Storm | Savings: 4.5€ Discount:90.180361
Phantaruk | Savings: 4.5€ Discount:90.180361
Oozi Earth Adventure | Savings: 4.5€ Discount:90.180361
House of Caravan | Savings: 4.5€ Discount:90.180361
Avencast: Rise of the Mage | Savings: 9.0€ Discount:90.09009
Teslagrad | Savings: 9.0€ Discount:90.09009
Lucius | Savings: 9.0€ Discount:90.09009
The Way | Savings: 13.5€ Discount:90.06004
NEON STRUCT | Savings: 16.2€ Discount:90.050028
Metal Gear Solid V: Ground Zeroes | Savings: 18.0€ Discount:90.045023
White Wings  | Savings: 18.0€ Discount:90.045023
The Long Journey Home | Savings: 18.0€ Discount:90.045023
Shadow Tactics: Blades of the Shogun | Savings: 36.0€ Discount:90.022506
 
Reviews
Bi

2 Use earthquakes.csv for this task. This file contains information about earthquakes recorded between 1965 and 2016. Earthquake magnitude value describes how strong the earthquake is. Magnitude information can be categorized like presented in the table below (*Source: http://www.geo.mtu.edu/UPSeis/magnitude.html*).

| Magnitude       | Class | Effects |
|-----------------|-------|---------|
| 2.49 or less    | Minor | Usually not felt, but can be recorded by seismograph. |
| 2.50 to 5.49    | Light | Often felt, but only causes minor damage. |
| 5.50 to 6.09    | Moderate | Slight damage to buildings and other structures. |
| 6.10 to 6.99    | Strong | May cause a lot of damage in very populated areas. |
| 7.00 to 7.99    | Major | Major earthquake. Serious damage. |
| 8.00 or greater | Great | Great earthquake. Can totally destroy communities near the epicenter. |

Count how many earthquakes have occurred in each class.

<b style="color:red;">Notice:</b> The first value has been modified to be 2.4 or less compared to the original source (has been 2.5 or less).

In [20]:
import csv

file_path = 'data_files/earthquakes.csv'

with open(file_path) as quake:
    quakes = csv.reader(quake)
    skip_firstRow = next(quakes, None)
    
    magnitude_classes = [
        {"range": "2.49 or less", "class": "Minor", "effects": "Usually not felt, but can be recorded by seismograph."},
        {"range": "2.50 to 5.49", "class": "Light", "effects": "Often felt, but only causes minor damage."},
        {"range": "5.50 to 6.09", "class": "Moderate", "effects": "Slight damage to buildings and other structures."},
        {"range": "6.10 to 6.99", "class": "Strong", "effects": "May cause a lot of damage in very populated areas."},
        {"range": "7.00 to 7.99", "class": "Major", "effects": "Major earthquake. Serious damage."},
        {"range": "8.00 or greater", "class": "Great", "effects": "Great earthquake. Can totally destroy communities near the epicenter."}
    ]
    
    class_counts = {
        "Minor": 0,
        "Light": 0,
        "Moderate": 0,
        "Strong": 0,
        "Major": 0,
        "Great": 0
    }
    
    for row in quakes:
        magnitude = float(row[8])

        if 0 <= magnitude < 2.49:
            earthquake_class = "Minor"
        elif 2.5 <= magnitude < 5.49:
            earthquake_class = "Light"
        elif 5.5 <= magnitude < 6.09:
            earthquake_class = "Moderate"
        elif 6.1 <= magnitude < 6.99:
            earthquake_class = "Strong"
        elif 7 <= magnitude < 7.99:
            earthquake_class = "Major"
        else:
            earthquake_class = "Great"
        class_counts[earthquake_class] += 1

    for magnitude_class in magnitude_classes:
        earthquake_class = magnitude_class['class']
        effects = magnitude_class['effects']
        count = class_counts[earthquake_class]

#        print(f"{earthquake_class} class {count} earthquakes:")
#        print(f"Effects: {effects}")
#        print(" ")

for magnitude_class in magnitude_classes:
    earthquake = magnitude_class['class']
    magnitude = magnitude_class['range']
    effects = magnitude_class['effects']
    count = class_counts[earthquake]
    print(f"| {magnitude} | {earthquake} | {effects} | {count} |")

| 2.49 or less | Minor | Usually not felt, but can be recorded by seismograph. | 0 |
| 2.50 to 5.49 | Light | Often felt, but only causes minor damage. | 0 |
| 5.50 to 6.09 | Moderate | Slight damage to buildings and other structures. | 17639 |
| 6.10 to 6.99 | Strong | May cause a lot of damage in very populated areas. | 5035 |
| 7.00 to 7.99 | Major | Major earthquake. Serious damage. | 698 |
| 8.00 or greater | Great | Great earthquake. Can totally destroy communities near the epicenter. | 40 |


3 Use netflix_titles.xml for this task. This file contains information about Netflix movies and TV shows. **Important:** Movies have duration presented in minutes while TV shows have duration presented in amount of seasons! Parse out the following information from the data:
- Movies released in 2017
- TV show and movie amount (present both counts in separate lines)
- Movies with a length between 15 and 20 minutes (values 15 and 20 included)

In [163]:
import xml.etree.ElementTree as ET

xml_file_path = 'data_files/netflix_titles.xml'

# ET.parse is a module used to parse xml file, it turns element object to 'tree' representing xml document
tree = ET.parse(xml_file_path)
# getroot calls tree to obtain information from xml document and it gives ability to navigate and manipulate xml structure
root = tree.getroot()

movies_2017_count = 0
for row_element in root.findall('row'):
    show_type = row_element.find('type').text
    release_year = int(row_element.find('release_year').text)
    if release_year == 2017 and show_type == "Movie":
        movies_2017_count += 1
        
moviescount = 0
tvcount = 0

for row_element in root.findall('row'):
    show_type = row_element.find('type').text
    if show_type == "Movie":
        moviescount += 1
    elif show_type == "TV Show":
        tvcount += 1

movies_15_to_20_minutes_count = 0

for row_element in root.findall('row'):
    show_type = row_element.find('type').text
    duration_element = row_element.find('duration')
    if duration_element is not None:
        duration_text = duration_element.text
        duration_numeric = int(duration_text.split(' ')[0])
        if show_type == 'Movie' and 15 <= duration_numeric <= 20:
            movies_15_to_20_minutes_count += 1

            
print(f"Number of movies released in 2017: {movies_2017_count}")
print(f"Number of movies: {moviescount}")
print(f"Number of TV shows: {tvcount}")
print(f"Number of movies with a duration between 15 and 20 minutes: {movies_15_to_20_minutes_count}")

Number of movies released in 2017: 744
Number of movies: 5377
Number of TV shows: 2410
Number of movies with a duration between 15 and 20 minutes: 11


4 Use the following Rest API for this task: https://tie.digitraffic.fi/api/weather/v1/stations/data. Calculate the average for air temperature (ILMA) and humidity (ILMAN_KOSTEUS) values using two decimals.

In [27]:
import requests

url = "https://tie.digitraffic.fi/api/weather/v1/stations/data"
response = requests.get(url)

# status code 200 is a successful request
if response.status_code == 200:
    # convert the json information to be readable by python
    data = response.json()

    # empty lists for variables
    ilma_values = []
    ilman_kosteus_values = []

    # get station information
    for station in data.get("stations", []):
        station_id = station.get("id")

        # get sensor values from stations
        for sensor_value in station.get("sensorValues", []):
            sensor_name = sensor_value.get("name")
            value = sensor_value.get("value")
            unit = sensor_value.get("unit")

            # check sensor name
            if sensor_name == 'ILMA':
                ilma_values.append(value)
            elif sensor_name == 'ILMAN_KOSTEUS':
                ilman_kosteus_values.append(value)

    # ilma avg
    if ilma_values:
        ilma_average = sum(ilma_values) / len(ilma_values)
        print(f"Average ILMA: {ilma_average:.2f}")

    # ilmankosteus avg
    if ilman_kosteus_values:
        ilman_kosteus_average = sum(ilman_kosteus_values) / len(ilman_kosteus_values)
        print(f"Average ILMAN_KOSTEUS: {ilman_kosteus_average:.2f}")


Average ILMA: -12.82
Average ILMAN_KOSTEUS: 77.64
