Skip to content

Yuexi-Li/Data-Engineering

Repository files navigation

Data-Engineering

This repository contains 6 passed projects of the Data Engineering Program.

Lesson Project Overview
Data Modeling Model user activity data for a music streaming app called Sparkify and optimize queries for understanding what songs users are listening to.
Project 1: Relational Model with Postgres

- Desgin the schema and define Fact and Dimension tables;
- Insert Data into the tables.

Project 2: NoSQL data model with Apache Cassandra

- Model the data to help the data team answer queries about the app usage;
- Set up Apache Cassandra database tables in ways to optimize writes of transactional data on user sessions.

Cloud Data Warehouse Project 3: Data Warehouse(AWS)

- Build an ELT pipeline that extracts Sparkify’s data from S3, Amazon’s popular storage system;
- Stage the data in Amazon Redshift and transform it into a set of fact and dimensional tables for the Sparkify analytics team to continue finding insights in what songs their users are listening to.

Data Lakes with Spark Project 4: Data Lake with Apache Spark

- Build an ETL pipeline for a data lake (The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in the app);
- Load data from S3, process the data into analytics tables using Spark, and load them back into S3;
- Deploy this Spark process on a cluster using AWS.

Data Pipelines with Airflow Use up-and-coming tool Apache Airflow, developed and open-sourced by Airbnb and the Apache Foundation to continue to work on Sparkify’s data infrastructure.
Project 5: Data Pipeline with Airflow

- Creatand automate a set of data pipelines;
- Configure and schedule data pipelines with Airflow, setting dependencies, triggers, and quality checks as would in a production setting.

Capstone Capstone Project

- Define the scope of the project and the data will be working with;
- Gather data from four different sourcces then transform, combine, and summarize it;
- Create a clean database for others to analyze.

About

Udacity Data Engineer Nano Program

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published