Skip to content

Repository containing the notebooks used on classes and projects done from the Udacity Data Engineer Nanodegree.

License

Notifications You must be signed in to change notification settings

djanmagno/Udacity-Data-Engineer-Nanodegree

Repository files navigation

Banner

Project Title


Udacity Data Engineering Nanodegree

Udacity Nanodegree
Explore the repository»

Language GitHub release (latest by date including pre-releases) GitHub last commit GitHub issues GitHub pull requests GitHub Linkedin

Postgres, Cassandra, AWS, RedShift, S3, EMR, Spark, Airflow, ETL, ELT, Data Modelling, Database Schema, Data Warehousing, Data Lakes, Data Engineering, Udacity

About The Nanodegree

The data engineering field is expected to continue growing rapidly over the next several years, and there’s huge demand for data engineers across industries. This Data Engineer Nanodegree program is comprised of content and curriculum to support six (6) projects. It is estimated to complete the program in five (5) months working 10 hours per week.

Each project will be reviewed by the Udacity reviewer network and a feedback is provided and if the student does not pass the project, he will be asked to resubmit the project until it passes.

The objective here consists in learning to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets.

At the end of the program, the student will combine the acquired new skills by completing a capstone project.

Educational Objectives:

  • Create user-friendly relational and NoSQL data models
  • Create scalable and efficient data warehouses
  • Work efficiently with massive datasets
  • Build and interact with a cloud-based data lake
  • Automate and monitor data pipelines
  • Develop proficiency in Spark, Airflow, and AWS tools

Certificate

TO BE ATTACHED!

Program Details

During this program, the student will complete four courses and five projects. Throughout the projects, he will play part of a data engineer at a music streaming company. He will work with the same type of data in each project, but with increasing data volume, velocity, and complexity. below you can find a course-by-course breakdown.

Associated notebooks for this course can be found here.

Course 1 – Data Modeling

In this course, the student will learn to fit the diverse needs of data consumers, understanding the differences between different data models, and how to choose the appropriate data model for a given situation. He will also build fluency in PostgreSQL and Apache Cassandra.

Project 01 - Data Modeling with Postgres

In this project, the student will model user activity data for a music streaming app called Sparkify. He will create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL he will also define Fact and Dimension tables and insert data into the new tables created.

  • Link for Project 01 - Link

Project 02 - Data Modeling with Apache Cassandra

In these projects, the student will model user activity data for a music streaming app called Sparkify. He will create a database and ETL pipeline, in Apache Cassandra, he will model the data so he can run specific queries provided by the analytics team at Sparkify.

  • Link for Project 02 - Link

License

(Back to top)

Distributed under the MIT License. See LICENSE for more information.

MIT License

Contact

Djan Magno - djan.magno@gmail.com

Project Link - https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree

Footer

(Back to top)

Leave a star in GitHub, give a clap in Medium and share this guide if you found this helpful.

Footer

About

Repository containing the notebooks used on classes and projects done from the Udacity Data Engineer Nanodegree.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published