# Sub-Task 1: Load and Inspect All Datasets

In this section, we will load all the given CSV datasets into Pandas DataFrames and perform an initial inspection of their contents.

**Objectives:**
- Understand each dataset's structure by checking the columns, data types, and number of rows and columns.
- Preview a few rows to get a sense of the data format and potential issues.
- This information will guide us in subsequent steps, such as handling missing values, duplicates, and creating a data cleaning plan.

In [2]:
# %%
# =============================================================================
# CodeGenesis Team
# Sub-Task 1: Load and Inspect All Datasets
#
# In this code block, we will:
# - Import necessary libraries.
# - Load the CSV files into Pandas DataFrames.
# - Inspect each dataset, displaying columns, data types, shape, and sample rows.
#
# Documentation:
# - The 'load_dataset' function helps us keep the code organized.
# - The 'inspect_dataset' function prints out essential information, providing
#   a quick overview of each dataset.
#
# This process sets the stage for the subsequent cleaning and analysis steps.
# =============================================================================

import pandas as pd

def load_dataset(path: str) -> pd.DataFrame:
    """
    Load a CSV file into a Pandas DataFrame.

    Parameters
    ----------
    path : str
        The file path of the CSV dataset.

    Returns
    -------
    pd.DataFrame
        DataFrame containing the loaded dataset.
    """
    return pd.read_csv(path)

# File paths (assuming files are located in the 'data/raw' directory)
vacc_death_rate_path = "data/raw/covid-vaccinations-vs-covid-death-rate.csv"
vaccine_doses_manufacturer_path = 'data/raw/covid-vaccine-doses-by-manufacturer.csv'
oecd_health_path = 'data/raw/OECD_health_expenditure.csv'
us_death_by_vacc_status_path = 'data/raw/united-states-rates-of-covid-19-deaths-by-vaccination-status.csv'

# Load the datasets into DataFrames
df_vacc_death_rate = load_dataset(vacc_death_rate_path)
df_vaccine_doses_manufacturer = load_dataset(vaccine_doses_manufacturer_path)
df_oecd_health = load_dataset(oecd_health_path)
df_us_death_vacc_status = load_dataset(us_death_by_vacc_status_path)

def inspect_dataset(df: pd.DataFrame, name: str) -> None:
    """
    Print basic information about the given DataFrame, including:
    - Shape (rows, columns)
    - First 5 rows
    - Data types and memory usage

    Parameters
    ----------
    df : pd.DataFrame
        The dataset to inspect.
    name : str
        A descriptive name for the dataset, used in printed output.
    """
    print(f"\n=== {name} ===")
    print("Shape:", df.shape)
    print("First 5 rows:")
    display(df.head())
    print("\nInfo:")
    df.info()

# Inspect each dataset
inspect_dataset(df_vacc_death_rate, "COVID Vaccinations vs COVID Death Rate")
inspect_dataset(df_vaccine_doses_manufacturer, "COVID Vaccine Doses by Manufacturer")
inspect_dataset(df_oecd_health, "OECD Health Expenditure")
inspect_dataset(df_us_death_vacc_status, "US Rates of COVID-19 Deaths by Vaccination Status")


FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/covid-vaccinations-vs-covid-death-rate.csv'