Skip to content

raflyritonga/imdb-movie-elt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDB MOVIE ELT

Note: You're now in main branch

This repository contains a custom Extract, Load, Transform (ELT) project that utilizes Docker and PostgreSQL to demonstrate a simple ELT process for IMDB Movie dataset from Kaggle.

Be aware that this repository has 4 branches:

  • main:

    The Simplest One Performs manual ELT process using dummy data and containerization with docker

  • dbt:

    The Stepping Stone Leverage the ELT process using DBT as a data modelling tools for better management and organized

  • cron:

    The Next Step Implemented CRON job to automate the ELT process by scheduled the ELT script running at spefified interval. In this case, run every day at 3am.

  • airflow:

    The Final One The complete orchestrated IMDB Movie ELT using Airflow and replace the dummy data with the real dataset up to 9000+ rows and 3+ tables. The dataset can be found in datasets folder

Getting Started

Download Docker, Python, and Pip on your machine

Clone this repository

  git clone https://github.com/raflyritonga/imdb-movie-elt.git

Navigate to repository directory

  cd imdb-movie-elt

Install dependencies

  pip install -r requirements.txt

Run project

  docker compose up

Once ELT process done, you can access the PostgreSQL databases on ports 5433 and 5434

Acknowledgements

About

The containerized orchestrated ELT pipeline for IMDB movie

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published