Clash Royale Data Engineering Project
- This project demonstrates an end-to-end data pipeline built with Python, PostgreSQL, and Prefect.
- The goal is to extract, clean, and load game data from a Clash Royale cards dataset into a PostgreSQL database.
1. Extract – Read data from a CSV file using pandas.
2. Transform – Clean column names, remove duplicates, and create new calculated columns such as efficiency and meta_score.
3. Load – Save the cleaned data into a PostgreSQL table named cards_etl.
4. Orchestrate – Automate the ETL pipeline using Prefect to ensure repeatable and reliable runs.
- Python 3
- Pandas
- SQLAlchemy and psycopg2
- PostgreSQL
- Prefect (for scheduling and monitoring)
Data-Engineering-Projects/
├─ data/
│ ├─ raw/
│ └─ clean/
├─ notebooks/
│ ├─ 01_clashroyale_cleaning.ipynb
│ ├─ 01b_sql_playground.ipynb
│ ├─ 02_clashroyale_to_postgres.ipynb
│ └─ 03_clashroyale_etl_pipeline.ipynb
├─ projects/
│ └─ clashroyale_etl/
│ ├─ extract.py
│ ├─ transform.py
│ ├─ load.py
│ └─ prefect_flow.py
└─ README.md
- Database: clashroyale
- Table: cards
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("postgresql+psycopg2://postgres:YOUR%21PASSWORD@localhost:5432/clashroyale")
pd.read_sql("SELECT COUNT(*) FROM cards_etl;", engine)
Run directly:
python projects\clashroyale_etl\etl_pipeline.py
Run with Prefect (local schedule):
python projects\clashroyale_etl\prefect_flow.py
This will start a Prefect flow that extracts, transforms, and loads the data every 30 minutes while the terminal remains open.
- The CSV file comes from the public Clash Royale cards dataset on Kaggle.
- Special characters in passwords (such as !) did not need to be URL-encoded (%21).
- The project demonstrates key data-engineering concepts: ETL design, automation, database integration, and basic validation.
Below is a screenshot showing a successful Prefect ETL run completing all stages.