Skip to content

alvin-98/etl_pipeline_sparkifydb_postgres

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Pipeline for a Music App Startup

The project uses python and sql to create an ETL pipeline for a music app startup. Sparkify collects data on their songs and user activity in their app into two local databases. Data on the song tracks and their artists are stored in song dataset and user logs are stored in log dataset. We create a postgres database with fact and dimension tables in a star schema. This is because we want a database that is optimized particularly for song play analytics. As a result, we can do fast aggregations and write simple queries to get insights on song plays.

How to Use?

  1. Download the repository to your local system.
  2. Install libraries mentioned in the requirements.txt file using command prompt. >pip install library_name
  3. Run the 'create_tables.py' file on command prompt using the below command to create the sparkify database and relevant tables. >python create_tables.py
  4. Run the 'etl.py' file on command prompt using the below command to run the ETL pipeline and load the data into respective tables in the database. >python etl.py

Files in Repository

  1. create_tables.py Creates the Sparkify database, drops any existing tables with the same names and then creates all relevant tables, according to defined schema, afresh.
  2. sql_queries.py Contains SQL queries for CREATE TABLE, INSERT, SELECT, and DROP as strings to be used by the create_tables.py file.
  3. etl.py Implements the ETL pipeline
  4. test.ipynb An interactive python notebook which can be used to test the functioning of the ETL pipeline.
  5. etl.ipynb An interactive python notebook which can be used to experiment with the ETL pipeline code before finalizing code for production.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages