This repository contains a custom Extract, Load, Transform (ELT) project that utilizes Docker and PostgreSQL to demonstrate a simple ELT process for IMDB Movie dataset from Kaggle.
Be aware that this repository has 4 branches:
-
main:
The Simplest One Performs manual ELT process using dummy data and containerization with docker
-
dbt:
The Stepping Stone Leverage the ELT process using DBT as a data modelling tools for better management and organized
-
cron:
The Next Step Implemented CRON job to automate the ELT process by scheduled the ELT script running at spefified interval. In this case, run every day at 3am.
-
The Final One The complete orchestrated IMDB Movie ELT using Airflow and replace the dummy data with the real dataset up to 9000+ rows and 3+ tables. The dataset can be found in
datasets
folder
Download Docker, Python, and Pip on your machine
Clone this repository
git clone https://github.com/raflyritonga/imdb-movie-elt.git
Navigate to repository directory
cd imdb-movie-elt
Install dependencies
pip install -r requirements.txt
Run project
docker compose up
Once ELT process done, you can access the PostgreSQL databases on ports 5433 and 5434