GitHub - Guli-Y/SparkifyRedshift: a ETL pipeline for extracting data from s3, staging themon Redshift and transforming them into fact and dimensional tables for song play analysis

Sparkify is a startup company who provides music streaming services. They have JSON metadata on the songs in their app and user activity data in Amazon S3. The analytics team in Sparkify is interested in understanding what songs users are listening to.

Purpose

To build a ETL pipeline for extracting data from s3, stage them on Redshift and transform them into dimensional tables for song play analysis.

Star schema design

Fact Table

songplays

songplay_id (primary key), start_time (foreign key), user_id (foreign key), level, song_id (foreign key), artist_id (foreign key), session_id, location, user_agent

Dimension Tables

users

user_id (primary key), first_name, last_name, gender, level

songs

song_id (primary key), title, artist_id (foreign key), year, duration

artists

artist_id (primary key), name, location, latitude, longitude

time

start_time (primary key), hour, day, week, month, year, weekday

Usage

Create a Redshift cluster and update config.ini with your own credentials
Create the tables in your Redshift database by running following command in your terminal

python creat_tables.py

Stage, tranform and load the data to your Redshift database by running following command

python etl.py

Check whether the tables are loaded correctly by running following command

python test_tables.py

Credits

The codes are written by me and the sparkify data belongs to Udacity.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
config.ini		config.ini
create_tables.py		create_tables.py
etl.py		etl.py
queries.py		queries.py
requirements.txt		requirements.txt
test_tables.py		test_tables.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Purpose

Star schema design

Usage

Credits

About

Releases

Packages

Languages

Guli-Y/SparkifyRedshift

Folders and files

Latest commit

History

Repository files navigation

Purpose

Star schema design

Usage

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages