Welcome to my personal data engineering portfolio! This repo contains my notes, homework, projects, and homework as I work through the DataTalksClub Data Engineering Zoomcamp.
- Introduction to Google Cloud Platform (GCP)
- Working with Docker and Docker Compose
- Running PostgreSQL in Docker containers
- Managing infrastructure using Terraform
- Understanding Data Lakes and orchestration concepts
- Building pipelines with Kestra
- Exploring task scheduling and dependency management
- Reading and ingesting data from APIs
- Building scalable pipelines
- Implementing data normalization and incremental loading
- Overview of Google BigQuery
- Implementing table partitioning and clustering
- Learning optimization best practices
- Intro to ML features in BigQuery
- Building models using dbt (data build tool)
- Testing, documenting, and deploying transformations
- Creating dashboards with Metabase
- Introduction to Apache Spark
- Working with DataFrames and Spark SQL
- Exploring how groupBy and join operations work under the hood
- Getting started with Kafka
- Using Kafka Streams and KSQL
- Managing data schemas with Avro
A final end-to-end project applying concepts learned throughout the course. Coming soon!