Skip to content

alexanderjac/IMDB-Data-Engineering-Project

Repository files navigation

Objective:

The aim of the project is to analyse the movies data from multiple sources and perform ETL processes using Talend.

• Designed a Data Warehouse for the IMDB movies, people, and TV series database

• Developed and optimized 100+ ETL jobs for ensuring one-click load to the entire data warehouse using a master job by parallelization

• Developed various analytical dashboard using Power BI and tableau server that depicts critical KPIs

• Accomplished data profiling, cleansing, and transformation using Talend and Alteryx

• Developed an Efficient Data integration workflow to load data into Target with a run time of 80 min (15mins staging schema + 45 min Facts Integration schema+ 10 min BI Schema) loading a total of 11GB of data into Datawarehouse using bulk load techniques

• Pipelined large-scale data of a business from multiple database sources and integrated into SCD using Talend

ER Diagram

Technologies Used:

ER/ Studio SQL server Developer Edition Microsoft SQL server Management Studio Talend Real-Time Data Platform 7.1 Alteryx Tableau Desktop Microsoft PowerBI

Dataset Links:

https://datasets.imdbws.com/

https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab

https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab

https://grouplens.org/datasets/movielens/25m/

https://www.the-numbers.com/movies/franchises

https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary

https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office

References:

https://elearning.tableau.com/

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/8RlpZdAdKhP0IaMHXRV7yw

https://www.talend.com/

https://grouplens.org/datasets/movielens/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published