This repo has all the resources you need to become an amazing data engineer!
Make sure to check out the projects section for more hands-on examples!
Make sure to check out the interviews section for more advice on how to pass data engineering interviews!
Great books:
- Fundamentals of Data Engineering
- Designing Data-Intensive Applications
- Designing Machine Learning Systems
- The Hundred Page Machine Learning Book
- Kimball - The Data Warehouse Toolkit
- Data Mesh
- Machine Learning System Design Interview
- Streaming Systems
- High Performance Spark
- Building Evolutionary Architectures, 2nd Edition
- Data Management at Scale, 2nd Edition
- Deciphering Data Architectures
- 97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts
- Data Governance: The Definitive Guide
- Trino: The Definitive Guide
- Delta Lake: The Definitive Guide
- Hadoop: The Definitive Guide
- Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications
- Data Engineering with dbt: A practical guide to building a dependable data platform with SQL
- Data Engineering with AWS
- Practical DataOps: Delivering Agile Date Science at Scale
- Data Engineering Design Patterns
- Snowflake Data Engineering
- Unlocking dbt
- Learning Spark, Second Edition
Communities:
- Seattle Data Guy Discord
- EcZachly Data Engineering Discord
- AdalFlow Discrod (LLM Library)
- Chip Huyen MLOps Discord
- Data Engineer Things Community
- DBT Community
- r/dataengineering
- Microsoft Fabric Community
- r/MicrosoftFabric
- Data Talks Club Slack
- Data Engineering Wiki
Companies:
- Orchestration
- Data Lake / Cloud
- Data Warehouse
- Data Quality
- Education Companies
- Analytics / Visualization
- Data Integration
- Modern OLAP
- LLM application library
Data Engineering blogs of companies:
- Netflix
- Uber
- Databricks
- Airbnb
- Amazon AWS Blog
- Microsoft Data Architecture Blogs
- Microsoft Fabric Blog
- Oracle
- Meta
- Onehouse
Data Engineering Whitepapers:
- A Five-Layered Business Intelligence Architecture
- Lakehouse:A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
- Big Data Quality: A Data Quality Profiling Model
- The Data Lakehouse: Data Warehousing and More
- Spark: Cluster Computing with Working Sets
- The Google File System
- Building a Universal Data Lakehouse
- XTable in Action: Seamless Interoperability in Data Lakes
- MapReduce: Simplified Data Processing on Large Clusters
Great YouTube Channels:
- 100k+ subscribers
- 10k+ subscribers
- 1k+ subscribers
Great Podcasts
- The Data Engineering Show
- Data Engineering Podcast
- DataTopics
- The Data Engineering Side Of Data
- DataWare
- The Data Coffee Break Podcast
- Thd datastack show
- Intricity101 Data Sharks Podcast
- Drill to Detail with Mark Rittman
- Analytics Power Hour
- Catalog & cocktails
- Datatalks
- Data Brew by Databricks
- The Data Cloud Podcast by Snowflake
- What's New in data
- Open||Source||Data by Datastax
- Streaming Audio by confluent
- The Data Scientist Show
- MLOps.community
- Monday Morning Data Chat
- The Data Chief
Newsletters:
- DataEngineer.io Newsletter
- Seattle Data Guy
- Joe Reis
- Data Engineering Weekly
- Data Engineering Central
- Dutch Engineer
- ByteByteGo
- Start Data Engineering
- Developing Dev
- High Growth Engineer
- Learn Analytics Engineering
- Marvelous MLOps
- medium Data Engineering Newsletter
- Benn Stancil
- Metadata Weekly
- Technically
- Blef.fr Data News
- All Hands on Data
- Modern Data 101
- SELECT Insights
- Interesting Data Gigs
- Ju Data Engineering Weekly
- From An Engineer Sight
Glossaries:
- Data Engineering Vault
- Airbyte Data Glossary
- Data Engineering Wiki by Reddit
- Seconda Glossary
- Glossary Databricks
- Airtable Glossary
- Data Engineering Glossary by Dagster
- 100k+ Followers
- 50k+ Followers
- 10k+ Followers
- 5k+ Followers
- 1k+ Followers
- Shruti Mantri
- Volker Janz
- [Benoit Pimpaud)(https://www.linkedin.com/in/pimpaudben/)
Twitter / X
- Zach Wilson
- Seattle Data Guy
- Sumit Mittal
- Joseph Machado
- Alex Xu
- Eric Roby
- Andreas Kretz
- Marc Lamberti
- Dipankar Mazumdar
- Start Data Engineering
- Data Cyborg
- Simon Späti
- Marcos Ortiz
TikTok
Design Patterns
- Cumulative Table Design
- Microbatch Deduplication
- The Little Book of Pipelines
- Data Developer Platform
Courses / Academies
- DataExpert.io course use code HANDBOOK10 for a discount!
- LearnDataEngineering.com
- Technical Freelancer Academy Use code zwtech for a discount!
- IBM Data Engineering for Everyone
- Qwiklabs
- DataCamp
- Udemy Courses from Shruti Mantri
- Rock the JVM teaches Spark (in Scala), Flink and others
- Data Engineering Zoomcamp by DataTalksClub
- Efficient Data Processing in Spark
- Scaler
Certifications Courses
- Google Cloud Certified - Professional Data Engineer
- Databricks - Data Engineer Professional
- Azure Data Engineer Associate
- Microsoft Fabric Analytics Engineer Associate
- Exam DP-203: Data Engineering on Microsoft Azure
- AWS Certified Data Engineer - Associate
Conferences