Udacity Nanodegree
Explore the repository»
Postgres, Cassandra, AWS, RedShift, S3, EMR, Spark, Airflow, ETL, ELT, Data Modelling, Database Schema, Data Warehousing, Data Lakes, Data Engineering, Udacity
The data engineering field is expected to continue growing rapidly over the next several years, and there’s huge demand for data engineers across industries. This Data Engineer Nanodegree program is comprised of content and curriculum to support six (6) projects. It is estimated to complete the program in five (5) months working 10 hours per week.
Each project will be reviewed by the Udacity reviewer network and a feedback is provided and if the student does not pass the project, he will be asked to resubmit the project until it passes.
The objective here consists in learning to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets.
At the end of the program, the student will combine the acquired new skills by completing a capstone project.
Educational Objectives:
- Create user-friendly relational and NoSQL data models
- Create scalable and efficient data warehouses
- Work efficiently with massive datasets
- Build and interact with a cloud-based data lake
- Automate and monitor data pipelines
- Develop proficiency in Spark, Airflow, and AWS tools
TO BE ATTACHED!
During this program, the student will complete four courses and five projects. Throughout the projects, he will play part of a data engineer at a music streaming company. He will work with the same type of data in each project, but with increasing data volume, velocity, and complexity. below you can find a course-by-course breakdown.
Associated notebooks for this course can be found here.
In this course, the student will learn to fit the diverse needs of data consumers, understanding the differences between different data models, and how to choose the appropriate data model for a given situation. He will also build fluency in PostgreSQL and Apache Cassandra.
Project 01 - Data Modeling with Postgres
In this project, the student will model user activity data for a music streaming app called Sparkify. He will create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL he will also define Fact and Dimension tables and insert data into the new tables created.
- Link for Project 01 - Link
Project 02 - Data Modeling with Apache Cassandra
In these projects, the student will model user activity data for a music streaming app called Sparkify. He will create a database and ETL pipeline, in Apache Cassandra, he will model the data so he can run specific queries provided by the analytics team at Sparkify.
- Link for Project 02 - Link
Distributed under the MIT License. See LICENSE
for more information.
Djan Magno - djan.magno@gmail.com
Project Link - https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree
Leave a star in GitHub, give a clap in Medium and share this guide if you found this helpful.