# Welcome to Databricks Data Engineer Associate Training

**Organized by:** Altcom  
**Duration:** 3 days (hands-on)  
**Target cert:** Databricks Certified Data Engineer Associate  

---

> *This training covers 100% of the exam domains with live demos, hands-on labs, and real-world scenarios.*

## 3-Day Agenda

| Day | Topics | Key Skills |
|-----|--------|------------|
| **Day 1** | Platform, ELT & Ingestion, Delta Fundamentals | Unity Catalog, read_files, CTAS, Views, Delta basics |
| **Day 2** | Delta Optimization, Streaming, Advanced Transforms | OPTIMIZE, VACUUM, Auto Loader, MERGE, CDF, Window |
| **Day 3** | Medallion & Lakeflow, Orchestration, Governance, Exam Prep | DLT pipelines, Jobs, GRANT/REVOKE, System Tables |

### Daily Structure
```
09:00 - 10:30  |  Module + Live Demo
10:30 - 10:45  |  Break
10:45 - 12:00  |  Module + Live Demo
12:00 - 13:00  |  Lunch
13:00 - 14:30  |  Module + Live Demo
14:30 - 14:45  |  Break
14:45 - 16:00  |  Hands-on Lab + Quiz
16:00 - 16:30  |  Q&A / Wrap-up
```

## Exam Domains & Coverage

| Domain | Weight | Training Day |
|--------|--------|-------------|
| Databricks Lakehouse Platform | 24% | Day 1 |
| ELT with Spark SQL and Python | 29% | Day 1-2 |
| Incremental Data Processing | 22% | Day 2 |
| Production Pipelines (DLT) | 11% | Day 3 |
| Data Governance | 14% | Day 3 |

> **Exam format:** 45 questions, 90 minutes, 70% to pass, multiple choice

## Environment Setup

Before we start, run the setup notebook to configure your personal catalog and schema.

**Your environment:**
- **Catalog:** `retailhub_{your_username}`
- **Schemas:** `bronze`, `silver`, `gold`
- **Volume:** for raw data files

Run the cell below to verify:

In [None]:
# Quick environment check
username = spark.sql("SELECT current_user()").first()[0].split("@")[0].replace(".", "_")
catalog = f"retailhub_{username}"

print(f"User:    {username}")
print(f"Catalog: {catalog}")
print(f"Schemas: bronze, silver, gold")

# Verify catalog exists
try:
    spark.sql(f"USE CATALOG {catalog}")
    print(f"\nCatalog '{catalog}' is ready!")
except Exception as e:
    print(f"\nRun 00_setup first: {e}")

## Training Materials

| Resource | Location | When |
|----------|----------|------|
| Demo notebooks | `dayX/demo/` | Follow along with trainer |
| Lab notebooks | `dayX/lab/` | Hands-on exercises |
| Lab guides | `dayX/lab/lab_guide/` | Step-by-step instructions |
| Cheatsheets | `dayX/materials/` | Quick reference (keep open!) |
| Interactive quizzes | `dayX/materials/` | End-of-day knowledge check |
| Exam prep | `day3/demo/EXAM_PREP` | Day 3 wrap-up |

> **Tip:** Open the cheatsheet for the current day in a side tab -- it's a lifesaver during labs.

## Ground Rules

1. **Ask questions anytime** -- there are no dumb questions
2. **Labs are self-paced** -- finish early? Try the bonus challenges
3. **Mistakes are welcome** -- that's how we learn Delta recovery!
4. **Collaborate** -- help your neighbor, discuss approaches
5. **Have fun** -- data engineering is awesome

---

### Let's get started!

Open `day1/demo/01_platform_intro` when the trainer says go.