Movies_ETL

Movies ETL Project Overview:

This project will create an automated pipeline that takes in new data, from Wikipedia data, Kaggle metadata and the MovieLens rating data.

Project

allenge: Using ETL to clean data files, parse the data that we extracted to make it look how we want. Merge parsed data sets and load the merged sets into pgadmin for further use.

Create an automated pipeline that takes in new data from Wikipedia, Kaggle metadata, and MovieLens ratings. The data is transformed into appropriate format and then loaded into a PostgreSQL database for further analysis.

Deliverable 1: Write an ETL function to read three data files
Deliverable 2: Extract and Transform the Wikipedia Data
Deliverable 3: Extract and Transform the Kaggle Data
Deliverable 4: Create the Movie Database

Resources

Data Sources:
- movies_metadata.csv
- ratings.csv
- wikipedia-movies.json
Software:
- Anaconda Version 3.7.3
- MacOS Catalina Version 10.15.7
- PgAdmin4
- Jupyter Notebook
- PostgreSQL 11.9
- Python 3.7.7

Project Results:

Results: Write an ETL function to read three data files

A function was created to take in Kaggle metadata, Wikipedia JSON, and MovieLens rating.csv. Each one was then set into their own seperate dataframe.

Results: Extract and Transform the Wikipedia JSON data

A merge was created with Wikipedia JSON data and Kaggle metadata. The merged dataframe was filtered out for tv shows while multiple columns were cleaned up do to lack of undesirable data.

Extract and Transform the Kaggle Data

Transformed Kaggle metadata and MovieLens rating data, and then converted into seperate dataframes. Kaggle, Wikipedia, and MoviesLens rating dataframes were then merged to create one dataframe with ratings for analysis.

Create the Movie Database

Added merged dataframes containig Kaggle and Wikipedia data with MoviesLens rating data to a SQL database where custom queries can be performed for analysis.

Project Summary:

Amazon Prime Video Development Team can now perform the "Predict the Popular Pictures" Hackathon as requested. With the new "Movie Data" database, the team is now able to uncover low budget releases that they can use to predict are at a bargain price.

David Supple

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Resources		Resources
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
README.md		README.md
not_sure_what_to_call_this_yet.ipynb		not_sure_what_to_call_this_yet.ipynb
wikipedia-movies.json		wikipedia-movies.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movies_ETL

Movies ETL Project Overview:

Project

Resources

Project Results:

Results: Write an ETL function to read three data files

Results: Extract and Transform the Wikipedia JSON data

Extract and Transform the Kaggle Data

Create the Movie Database

Project Summary:

About

Releases

Packages

Languages

DSupps/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies_ETL

Movies ETL Project Overview:

Project

Resources

Project Results:

Results: Write an ETL function to read three data files

Results: Extract and Transform the Wikipedia JSON data

Extract and Transform the Kaggle Data

Create the Movie Database

Project Summary:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages