Movies-ETL

Perform the Extract, Transform and Load (ETL) process to create a data pipeline on movie datasets using Python, Pandas, Jupyter Notebook and PostgreSQL.

In this project, I created an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. Refactored the code from the module challenge and created one function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data—and performs the ETL process by adding the data to a PostgreSQL database.

Deliverable 1

Using knowledge of Python, Pandas, the ETL process, and code refactoring, write a function that reads in the three data files and creates three separate DataFrames.

Wiki_movies_df DataFrame

kaggle_metadata DataFrame

Deliverable 2

Using knowledge of Python, Pandas, the ETL process, and code refactoring, extract and transform the Wikipedia data so you can merge it with the Kaggle metadata. While extracting the IMDb IDs using a regular expression string and dropping duplicates, use a try-except block to catch errors.

wiki_movies_df

Deliverable 3

Using your knowledge of Python, Pandas, the ETL process, and code refactoring, extract and transform the Kaggle metadata and MovieLens rating data, then convert the transformed data into separate DataFrames. Then, you’ll merge the Kaggle metadata DataFrame with the Wikipedia movies DataFrame to create the movies_df DataFrame. Finally, you’ll merge the MovieLens rating data DataFrame with the movies_df DataFrame to create the movies_with_ratings_df.

movies_with_ratings_df

movies_df

Deliverable 4

Use knowledge of Python, Pandas, the ETL process, code refactoring, and PostgreSQL to add the movies_df DataFrame and MovieLens rating CSV data to a SQL database.

Summary

The (Extract, Transform, Load) function created collects and cleans movie data from different sources (Wikipedia JSON and Kaggle and ratings CSV files). It transforms and merges the data and loads it into two updatable PostgreSQL dataset tables ready to be used by the hackathon participants for their analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
__pycache__		__pycache__
.gitattributes		.gitattributes
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
Movie_ETL.ipynb		Movie_ETL.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movies-ETL

Deliverable 1

Wiki_movies_df DataFrame

kaggle_metadata DataFrame

Deliverable 2

wiki_movies_df

wiki_movies_df

Deliverable 3

movies_with_ratings_df

movies_df

Deliverable 4

Summary

About

Releases

Packages

Languages

Tobi1018/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies-ETL

Deliverable 1

Wiki_movies_df DataFrame

kaggle_metadata DataFrame

Deliverable 2

wiki_movies_df

wiki_movies_df

Deliverable 3

movies_with_ratings_df

movies_df

Deliverable 4

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages