Global Education Performance Tracker 🎓

📝 Problem Statement

Policymakers and global development organizations often struggle to answer a fundamental question: "Are financial investments in education actually translating into higher literacy outcomes?" Publicly available datasets from the World Bank contain the answers but are stored in "long-format" structures that are difficult for standard BI tools to visualize efficiently. This project builds a cloud-native ELT pipeline to centralize, transform, and visualize these indicators, allowing for a direct comparison between government expenditure and literacy rates across decades.

🏗️ Architecture

The project follows a modern Data Engineering lifecycle:

Infrastructure: Terraform (IaC) to manage Google Cloud Storage and BigQuery.
Ingestion: Python (Batch) extracting World Bank data to Parquet files and uploading to GCS.
Data Warehouse: Google BigQuery, utilizing Partitioning (by Year) and Clustering (by Country Code).
Transformation: dbt to pivot raw indicators into an analytical-ready "wide" format.
Orchestration: Scripted workflow utilizing the uv package manager.
Visualization: Streamlit dashboard with interactive Plotly components.

🚀 Getting Started

1. Prerequisites

Python 3.12+ (using uv for package management)
Google Cloud Platform account and a project ID.
Terraform installed.

2. Infrastructure Setup

cd terraform
terraform init
terraform apply

3. Data Ingestion

# Install dependencies
uv sync

# Run ingestion script
uv run ingest_data.py

4. Transformations (dbt)

cd dbt_edu
# Copy the example profile to your local dbt config
mkdir -p ~/.dbt
cp profiles.yml.example ~/.dbt/profiles.yml 
# (Note: Edit ~/.dbt/profiles.yml to add your Project ID)

uv run dbt deps
uv run dbt run

Note: Ensure your ~/.dbt/profiles.yml is configured for BigQuery with the us-central1 location.

5. Dashboard

uv run streamlit run app.py

📊 Data Warehouse Optimization

To maximize query performance and minimize costs:

Partitioning: The fact_education table is partitioned by year. This allows the Streamlit dashboard to fetch specific time-ranges without scanning the entire dataset.
Clustering: Data is clustered by country_code. This ensures that when a user selects a specific country in the dashboard, BigQuery only reads the relevant blocks of data.

🛠️ Project Evolution & Uniqueness

While this project follows the core architecture of the Data Engineering Zoomcamp, it features a unique implementation focused on Global Education Economics. The custom dbt logic to pivot disparate World Bank indicators into a synchronized analytical dataset was developed specifically for this use case to bridge the gap between raw data and actionable policy insights.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dbt_edu		dbt_edu
terraform		terraform
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
ingest_data.py		ingest_data.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global Education Performance Tracker 🎓

📝 Problem Statement

🏗️ Architecture

🚀 Getting Started

1. Prerequisites

2. Infrastructure Setup

3. Data Ingestion

4. Transformations (dbt)

5. Dashboard

📊 Data Warehouse Optimization

🛠️ Project Evolution & Uniqueness

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Global Education Performance Tracker 🎓

📝 Problem Statement

🏗️ Architecture

🚀 Getting Started

1. Prerequisites

2. Infrastructure Setup

3. Data Ingestion

4. Transformations (dbt)

5. Dashboard

📊 Data Warehouse Optimization

🛠️ Project Evolution & Uniqueness

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages