Objective:
The aim of the project is to analyse the movies data from multiple sources and perform ETL processes using Talend.
• Designed a Data Warehouse for the IMDB movies, people, and TV series database
• Developed and optimized 100+ ETL jobs for ensuring one-click load to the entire data warehouse using a master job by parallelization
• Developed various analytical dashboard using Power BI and tableau server that depicts critical KPIs
• Accomplished data profiling, cleansing, and transformation using Talend and Alteryx
• Developed an Efficient Data integration workflow to load data into Target with a run time of 80 min (15mins staging schema + 45 min Facts Integration schema+ 10 min BI Schema) loading a total of 11GB of data into Datawarehouse using bulk load techniques
• Pipelined large-scale data of a business from multiple database sources and integrated into SCD using Talend
Technologies Used:
ER/ Studio SQL server Developer Edition Microsoft SQL server Management Studio Talend Real-Time Data Platform 7.1 Alteryx Tableau Desktop Microsoft PowerBI
Dataset Links:
https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab
https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab
https://grouplens.org/datasets/movielens/25m/
https://www.the-numbers.com/movies/franchises
https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary
https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office
References:
https://elearning.tableau.com/
https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/8RlpZdAdKhP0IaMHXRV7yw
