📌Project Description

`Project source` - udacity

This end-to-end data modeling project entails designing and implementing a data warehouse solution for a music streaming industry. The core responsibility is to model an analytical database for the data analytics team interested in understanding users' activity to improve business decisions. The current data available is operational data, JSON metadata files on the songs in the app.

✅ Project Objective

The project objective includes

The design of an optimized data ingestion pipeline architecture
The design of the data warehouse ERD (Entity Relationship Diagram) for the data analytics team.
The implementation of Data Extraction, Transformation and Loading (ETL).

🛠Project Development

Data Pipeline Architecture

for architecture design - https://app.diagrams.net/

The architecture is purposely designed to achieve the project objective:

Data Source: User log and song data are archived in file directories.
Processing: The data are loaded into pandas for data extraction and transformation
Ingestion: Since the DWH is in Postgres, SQL insert commands together with python are used to load the data in DWH because of SQL readability and simplicity across the data team.
Storage: Data is stored in PostgreSQL for analytical purposes
Visualization: Analytics dashboard for business decisions.

ERD (Entity Relationship Diagram) Design

The data model below shows a star schema optimized for queries on the song play analysis utilizing 3 levels of normalization (1NF, 2NF, and 3NF), where appropriate, using the song and log datasets.

Fact Table

songplays — records in log data associated with song plays i.e. records with page NextSong songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables

users — users in the app user_id, first_name, last_name, gender, level

songs — songs in the music database song_id, title, artist_id, year, duration

artists — artists in the music database artist_id, name, location, latitude, longitude

time — timestamps of records in songplays broken down into the specific unit start_time, hour, day, week, month, year, weekday

Project Template

In addition to the data files, the project workspace includes five files:

test.ipynb displays the first few rows of each table to check the database.
create_tables.py drops and creates the postgres table. This drops (if exists) and create tables. Run at the start of the project
etl.ipynb reads and processes a single file from song_data and log_data and loads the data into the tables. It contains a detailed explanation of the ETL process to the tables.
etl.py reads and processes files from song_data and log_data and loads them into the tables.
sql_queries.py contains all the SQL queries used to perform all SQL command to ingest data to DB

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
project_readme.md		project_readme.md
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

📌Project Description

`Project source` - udacity

✅ Project Objective

🛠Project Development

Data Pipeline Architecture

for architecture design - https://app.diagrams.net/

ERD (Entity Relationship Diagram) Design

Fact Table

Dimension Tables

Project Template

About

Uh oh!

Releases

Packages

Languages

dit-x/data_modeling_with_postgres

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

📌Project Description

Project source - udacity

✅ Project Objective

🛠Project Development

Data Pipeline Architecture

for architecture design - https://app.diagrams.net/

ERD (Entity Relationship Diagram) Design

Fact Table

Dimension Tables

Project Template

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Project source` - udacity

Packages