# Data loading and indicator selection

This notebook performs the initial data ingestion step for the GIGA school connectivity analysis.

The objectives of this notebook are to:
- load raw ITU ICT indicator datasets from CSV files;
- inspect their structure and metadata;
- select a subset of indicators relevant to school connectivity;
- apply basic filtering by country and year.

No data cleaning or transformation is performed at this stage.

In [2]:
# -----------------------------
# 02_data_loading_and_selection.ipynb
# -----------------------------

# Import libraries
import pandas as pd
from pathlib import Path

# Path to folder with CSV files
DATA_DIR = Path(r"C:\Users\Den\PycharmProjects\giga-school-connectivity-analysis\data\raw")

# -----------------------------
# Access
# -----------------------------
hh_internet = pd.read_csv(DATA_DIR / "households_with_internet_access_at_home.csv")
fixed_broadband = pd.read_csv(DATA_DIR / "fixed_broadband_subscriptions.csv")
active_mobile_bb = pd.read_csv(DATA_DIR / "active_mobile_broadband_subscriptions.csv")
pop_coverage_mobile = pd.read_csv(DATA_DIR / "population_coverage_mobile_network.csv")

# -----------------------------
# Quality of Service (QoS)
# -----------------------------
avg_download_fixed = pd.read_csv(DATA_DIR / "average_download_throughput_fixed_broadband.csv")
packet_latency_fixed = pd.read_csv(DATA_DIR / "packet_latency_fixed_broadband.csv")
service_activation_fixed = pd.read_csv(DATA_DIR / "service_activation_time_fixed_broadband.csv")

# -----------------------------
# Backbone / Core infrastructure
# -----------------------------
intl_bandwidth_usage = pd.read_csv(DATA_DIR / "international_bandwidth_usage.csv")
lit_equipped_bandwidth = pd.read_csv(DATA_DIR / "lit_equipped_international_bandwidth_capacity.csv")

# -----------------------------
# Affordability / ICT prices
# -----------------------------
fixed_bb_5GB = pd.read_csv(DATA_DIR / "fixed_broadband_internet_5GB.csv")
data_only_bb_5GB = pd.read_csv(DATA_DIR / "data_only_mobile_broadband_5GB.csv")

# -----------------------------
# Governance / Universal access
# -----------------------------
uas_policy = pd.read_csv(DATA_DIR / "uas_policy.csv")
universal_service_financing = pd.read_csv(DATA_DIR / "universal_service_financing.csv")

# -----------------------------
# Quick check
# -----------------------------
print("All indicators successfully loaded:")
print(f"Households with Internet access: {hh_internet.shape}")
print(f"Fixed broadband subscriptions: {fixed_broadband.shape}")
print(f"Active mobile broadband: {active_mobile_bb.shape}")
print(f"Population coverage mobile: {pop_coverage_mobile.shape}")

All indicators have been successfully loaded:
Households with Internet access: (3964, 13)
Fixed broadband subscriptions: (8844, 13)
Active mobile broadband: (6271, 13)
Population coverage mobile: (9609, 13)


This notebook completed the initial data ingestion and selection step.

All datasets were:
- loaded from raw ITU CSV files;
- filtered to a common set of countries and years;
- saved as intermediate processed files for further cleaning and harmonization.

The next notebook focuses on data cleaning, unit harmonization and consistency checks.
