Movies-ETL

Created a data pipeline from movie datasets using Python, Pandas, Jupyter Notebook and PostgreSQL. Implemented (ETL) - Extract, Transform, Load - to complete Within the scope of the Amazing Prime Hackathon, this project will create an automated pipeline that takes in new data, from Wikipedia data, Kaggle metadata and the MovieLens rating data. It then performs the appropriate transformations and loads the data into an existing PostgreSQL database.

For this analysis, we used the following breakdown:

write an ETL function to read three data files,
extract and transform the Wikipedia data,
extract and transform the Kaggle and rating data,
load the data to a PostgreSQL Movie Database.

Resources

Software: Python 3.7.10, Conda 4.10.1, Jupyter Notebook 6.3, PostgreSQL 13, pgAdmin 4

Results

Write an ETL function to read three data files

The function takes the Wikipedia JSON, the Kaggle metadata and MovieLens csv files and creates three separate DataFrames.

Extract and Transform the Wikipedia data

We filtered out the TV shows, consolidated the redundant data, removed the duplicates and formatted the Wikipedia data.

Extract and Transform the Kaggle and rating data

Again, we consolidated the redundant data, removed the duplicates, formatted and grouped the data.
The Kaggle and rating data were then merged with the Wikipedia movies DataFrame.

Load the data to a PostgreSQL Movie Database

Summary

The ETL function created collects and cleans movie data from different sources (Wikipedia JSON and Kaggle and ratings csv files). It transforms and merges the data and loads it into two updatable PostgreSQL dataset tables ready to be used by the hackathon participants for their analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Resources		Resources
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
Movies-ETL.ipynb		Movies-ETL.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movies-ETL

Resources

Results

Write an ETL function to read three data files

Extract and Transform the Wikipedia data

Extract and Transform the Kaggle and rating data

Load the data to a PostgreSQL Movie Database

Summary

About

Releases

Packages

Languages

dw251414/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies-ETL

Resources

Results

Write an ETL function to read three data files

Extract and Transform the Wikipedia data

Extract and Transform the Kaggle and rating data

Load the data to a PostgreSQL Movie Database

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages