Skip to content

ShivdevRajendran/Data-Engineering-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clash Royale Data Engineering Project

- This project demonstrates an end-to-end data pipeline built with Python, PostgreSQL, and Prefect.
- The goal is to extract, clean, and load game data from a Clash Royale cards dataset into a PostgreSQL database.

Overview

1. Extract – Read data from a CSV file using pandas.
2. Transform – Clean column names, remove duplicates, and create new calculated columns such as efficiency and meta_score.
3. Load – Save the cleaned data into a PostgreSQL table named cards_etl.
4. Orchestrate – Automate the ETL pipeline using Prefect to ensure repeatable and reliable runs.

Tools Used

- Python 3
- Pandas
- SQLAlchemy and psycopg2
- PostgreSQL
- Prefect (for scheduling and monitoring)

Folder Structure

Data-Engineering-Projects/
├─ data/
│  ├─ raw/
│  └─ clean/
├─ notebooks/
│  ├─ 01_clashroyale_cleaning.ipynb
│  ├─ 01b_sql_playground.ipynb
│  ├─ 02_clashroyale_to_postgres.ipynb
│  └─ 03_clashroyale_etl_pipeline.ipynb
├─ projects/
│  └─ clashroyale_etl/
│     ├─ extract.py
│     ├─ transform.py
│     ├─ load.py
│     └─ prefect_flow.py
└─ README.md

Database

- Database: clashroyale
- Table: cards

Example connection:


from sqlalchemy import create_engine
import pandas as pd

engine = create_engine("postgresql+psycopg2://postgres:YOUR%21PASSWORD@localhost:5432/clashroyale")
pd.read_sql("SELECT COUNT(*) FROM cards_etl;", engine)

Running the ETL

Run directly:
    python projects\clashroyale_etl\etl_pipeline.py

Run with Prefect (local schedule):
    python projects\clashroyale_etl\prefect_flow.py

This will start a Prefect flow that extracts, transforms, and loads the data every 30 minutes while the terminal remains open.

Notes

- The CSV file comes from the public Clash Royale cards dataset on Kaggle.
- Special characters in passwords (such as !) did not need to be URL-encoded (%21).
- The project demonstrates key data-engineering concepts: ETL design, automation, database integration, and basic validation.

Example Output

Below is a screenshot showing a successful Prefect ETL run completing all stages.

Prefect Flow Run Screenshot

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published