# <span style="color:Blue">Assignment-2 of COSC5806: Data Analysis with Python</span>

# <span style="color:Blue">Due date: Friday, February 13, 2025, @11:59 PM</span>
# <span style="color:Blue">Cut-off date: Sunday, February 15, 2025, @11:59 PM</span>

## <span style="color:Purple">You are allowed to use core Python's built in modules/packages/libraries and NumPy. Not allowed to use any other libraries including pandas, scikit-learn, matplotlib, and Seaborn. Please read the instruction carefully and do not hesitate to contact me if you have any questions.</span>

### <span style="color:Red">Examples and Resources for this assignment:</span>
<ul>
    <li><span style="color:Red">Chapters 3, 4, 5, 6, 7, 8, and 9 from <a href="https://docs.python.org/3/tutorial/index.html">The Python Tutorial</a></span></li>
    <li><span style="color:Red">Chapter 2 from <a href="https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html">Introduction to NumPy</a></span></li>
</ul>

### <span style="color:Green">Context</span>
This dataset compiles daily snapshots of publicly reported data on 2019 Novel Coronavirus (COVID-19) testing in Ontario. Data includes:

<ul>
    <li><span>date</span></li>
    <li><span>OH region</span></li>
    <li><span>current hospitalizations with COVID-19</span></li>
    <li><span>current patients in Intensive Care Units (ICUs) due to COVID-related critical Illness</span></li>
    <li><span>current patients in Intensive Care Units (ICUs) testing positive for COVID</span></li>
    <li><span>current patients in Intensive Care Units (ICUs) no longer testing positive for COVID</span></li>
    <li><span>current patients in Intensive Care Units (ICUs) on ventilators due to COVID-related critical illness</span></li>
    <li><span>current patients in Intensive Care Units (ICUs) on ventilators testing positive for COVID</span></li>
    <li><span>current patients in Intensive Care Units (ICUs) on ventilators no longer testing positive for COVID</span></li>
</ul>

The following <a href="https://data.ontario.ca/dataset/covid-19-cases-in-hospital-and-icu-by-ontario-health-region">link</a> might be useful for the description of the features.

# <span style="color:Green">P1: Load the dataset.</span>

In [11]:
#Codes of P1 here

# <span style="color:Green">P2: How many unique values exist in the 'oh_region' column? What are the unique values in the 'oh_region' column? How many records exist per 'oh_region'?</span>

In [12]:
#Codes of P2 here

# <span style="color:Green">P3: What is the total number of hospitalizations? What is the average number of hospitalizations per day?</span>

In [13]:
#Codes of P3 here

# <span style="color:Green">P4: What are the top 5 days with the highest number of hospitalizations?</span>

In [14]:
#Codes of P4 here

# <span style="color:Green">P5: Which month(s) had the highest number of hospitalizations?</span>

In [15]:
#Codes of P5 here

# <span style="color:Green">P6: Are there any seasonal (Winter, Spring, Summer, Fall) patterns in COVID-19 hospitalizations?</span>

In [16]:
#Codes of P6 here
seasons = {12: 'Winter', 1: 'Winter', 2: 'Winter',
           3: 'Spring', 4: 'Spring', 5: 'Spring',
           6: 'Summer', 7: 'Summer', 8: 'Summer',
           9: 'Fall', 10: 'Fall', 11: 'Fall'}

# <span style="color:Green">P7: Which region(s) are the busiest region (calculate average and total) based on ICU occupancy ('icu_current_covid')?</span>

In [17]:
#Codes of P7 here

# Import csv so we can read the CSV file row by row.
import csv
# Import numpy so we can calculate sums and averages.
import numpy as np

# Store the file name of the dataset.
DATASET_PATH = "e760480e-1f95-4634-a923-98161cfb02fa.csv"

# Create a helper function to safely convert a value to int.
def safe_int(value, default=None):
    # Use try so the program does not stop on bad data.
    try:
        # Check if the value is missing or empty.
        if value is None or str(value).strip() == "":
            # Return the default value when data is missing.
            return default

        # Convert first to float, then to int.
        converted_value = int(float(value))
        return converted_value
    except:
        # Return the default value when conversion fails.
        return default

# Create a dictionary where:
# key   = region name
# value = list of ICU occupancy values for that region
region_icu_values = {}

# Open the dataset file for reading.
with open(DATASET_PATH, newline="", encoding="utf-8-sig") as dataset:
    # Read each row as a dictionary using column names.
    reader = csv.DictReader(dataset)

    # Loop through all rows in the dataset.
    for row in reader:
        # Get the region name and remove extra spaces.
        region_name = row.get("oh_region", "").strip()
        # Get ICU occupancy and safely convert it to int.
        icu_current_covid = safe_int(row.get("icu_current_covid"), default=None)

        # Check if the region is missing.
        if region_name == "":
            # Skip this row when region is missing.
            continue

        # Check if the ICU value is missing.
        if icu_current_covid is None:
            # Skip this row when ICU occupancy is missing.
            continue

        # If this region is not in the dictionary yet, create an empty list.
        if region_name not in region_icu_values:
            region_icu_values[region_name] = []

        # Append the ICU value to the correct region list.
        region_icu_values[region_name].append(icu_current_covid)

# Create a dictionary for total ICU occupancy per region.
region_icu_total = {}
# Create a dictionary for average ICU occupancy per region.
region_icu_average = {}

# Loop through each region and its list of ICU values.
for region_name, values in region_icu_values.items():
    # Calculate and store the total ICU occupancy for this region.
    region_icu_total[region_name] = int(np.sum(values))

    # Check if the list has values before calculating the average.
    if len(values) > 0:
        # Calculate and store the average ICU occupancy.
        region_icu_average[region_name] = float(np.mean(values))
    else:
        # Use 0.0 if the list is empty.
        region_icu_average[region_name] = 0.0

# Check if there is data in the total dictionary.
if region_icu_total:
    # Find the highest total ICU occupancy among all regions.
    max_total = max(region_icu_total.values())
else:
    # Use 0 when there is no data.
    max_total = 0

# Create a list to store all regions tied for the highest total.
busiest_regions_by_total = []

# Loop through the total dictionary.
for region_name, total_value in region_icu_total.items():
    # Compare the current total with the highest total.
    if total_value == max_total:
        # Add the region to the result list if it matches the maximum.
        busiest_regions_by_total.append(region_name)

# Sort the list alphabetically.
busiest_regions_by_total.sort()

# Check if there is data in the average dictionary.
if region_icu_average:
    # Find the highest average ICU occupancy among all regions.
    max_average = max(region_icu_average.values())
else:
    # Use 0 when there is no data.
    max_average = 0

# Create a list to store all regions tied for the highest average.
busiest_regions_avg = []

# Loop through the average dictionary.
for region_name, average_value in region_icu_average.items():
    # Compare the current average with the highest average.
    if average_value == max_average:
        # Add the region to the result list if it matches the maximum.
        busiest_regions_avg.append(region_name)

# Sort the list alphabetically.
busiest_regions_avg.sort()

# Print final answer for P7.
print("\nBusiest region by ICU Current COVID")
# Print busiest region(s) based on total ICU occupancy.
print("Busiest by TOTAL:", busiest_regions_by_total, "| total =", max_total)
# Print busiest region(s) based on average ICU occupancy.
print("Busiest by AVERAGE:", busiest_regions_avg, "| average =", round(max_average, 2))




Busiest region by ICU Current COVID
Busiest by TOTAL: ['TORONTO'] | total = 65005
Busiest by AVERAGE: ['TORONTO'] | average = 39.21


# <span style="color:Green">P8: What is the average number of ICU patients on ventilators ('icu_current_covid_vented') per region?</span>

In [None]:
#Codes of P8 here

import csv
import numpy as np

# Store the dataset file name.
DATASET_PATH = "e760480e-1f95-4634-a923-98161cfb02fa.csv"

# Create a helper function to safely convert values to int.
def safe_int(value, default=None):
    try:
        # Check if the value is missing or empty.
        if value is None or str(value).strip() == "":
            # Return the default value when data is missing.
            return default

        # Convert first to float, then to int.
        converted_value = int(float(value))
        return converted_value
    except:
        # Return the default value when conversion fails.
        return default

# Create a dictionary where:
# key   = region name
# value = list of ventilated ICU values for that region
region_vented_values = {}

# Open the CSV dataset.
with open(DATASET_PATH, newline="", encoding="utf-8-sig") as dataset:
    # Read each row as a dictionary.
    reader = csv.DictReader(dataset)

    # Loop over each row.
    for row in reader:
        # Read region name.
        region_name = row.get("oh_region", "").strip()
        # Read ventilated ICU value and safely convert it.
        icu_vented = safe_int(row.get("icu_current_covid_vented"), default=None)

        # Check if the region is missing.
        if region_name == "":
            # Skip this row when region is missing.
            continue

        # Check if the ventilated ICU value is missing.
        if icu_vented is None:
            # Skip this row when the value is missing.
            continue

        # Initialize the list for this region if needed.
        if region_name not in region_vented_values:
            region_vented_values[region_name] = []

        # Add this ventilated value to the region list.
        region_vented_values[region_name].append(icu_vented)

# Create a dictionary for average ventilated ICU per region.
region_vented_average = {}

# Calculate the average ventilated ICU value for each region.
for region_name, values in region_vented_values.items():
    # Check if the region has at least one value.
    if len(values) > 0:
        # Calculate the average using numpy.
        region_vented_average[region_name] = float(np.mean(values))
    else:
        # Use 0.0 when the list is empty.
        region_vented_average[region_name] = 0.0

# Sort regions from highest average to lowest average.
sorted_regions = sorted(region_vented_average.items(), key=lambda x: x[1], reverse=True)

# Print final answer for P8.
print("\nAverage ICU Current COVID Vented per Region")

# Print one line per region with 2 decimal places.
for region_name, avg_value in sorted_regions:
    print(f"{region_name}: {avg_value:.2f}")




Average ICU Current COVID Vented per Region
TORONTO: 25.52
WEST: 22.51
CENTRAL: 17.75
EAST: 11.02
NORTH EAST: 1.71
NORTH WEST: 0.86


# <span style="color:Green">P8: Calculate the average number of patients per region who were in the ICU but not on ventilators ('icu_current_covid'-'icu_current_covid_vented'). </span>

In [19]:
#Codes of P9 here

# <span style="color:Green">P10: What are the top 5 days with the highest number of ICU COVID ('icu_current_covid') cases? </span>

In [20]:
#Codes of P10 here

### <span style="color:Red">Please submit only your complete Jupyter notebook (.ipynb) file. Do not submit compressed files, entire projects, or any other types of files. Comment your program carefully so that it can be read and understood. If your program is not properly commented, you may lose marks. See **Marking scheme** for more details.</span>

### <span style="color:Red">Please note that the submitted work will be considered as your own work and you confirm that you have not received any unauthorized assistance including Large Language Models (LLMs) in preparing for or doing this lab/assignment/examination. You confirm knowing that a mark of 0 may be assigned for entire work.</span>

### <span style="color:Red">**Marking scheme:** You will receive full credits for the working code, otherwise zero. No partial credits!</span>