# M10: Exam Prep — Databricks Data Engineer Associate

| Exam Domain | Weight |
|---|---|
| Databricks Lakehouse Platform | 24% |
| ELT with Spark SQL and Python | 29% |
| Incremental Data Processing | 20% |
| Production Pipelines | 13% |
| Data Governance | 17% |

Final session — exam strategy, must-know syntax, common traps, and study resources.

---

## Exam Overview

| Detail | Value |
|--------|-------|
| **Format** | Multiple choice (single and multi-select) |
| **Questions** | 45 |
| **Time** | 90 minutes |
| **Passing score** | 70% (32/45 correct) |
| **Cost** | $200 USD |
| **Validity** | 2 years |
| **Retake policy** | 14-day wait after 1st attempt, 30 days after 2nd |
| **Registration** | [academy.databricks.com](https://academy.databricks.com) |

## Exam Domains & Weight

| Domain | Weight | Modules |
|--------|--------|---------|
| **Databricks Lakehouse Platform** | ~24% | M01 |
| **ELT with Spark SQL and Python** | ~29% | M02, M06 |
| **Incremental Data Processing** | ~17% | M05, M04 (CDF) |
| **Production Pipelines** | ~13% | M07, M08 |
| **Data Governance** | ~17% | M09 |

> **Exam Tip:** Heaviest domain = **ELT** (29%) — focus on SELECT, JOIN, GROUP BY, MERGE, window functions, array/JSON operations.

## What You Learned (3-Day Map)

| Day | Module | Key Topics |
|-----|--------|------------|
| 1 | M01: Lakehouse Platform | Architecture, Unity Catalog, Compute, Volumes |
| 1 | M02: ELT Ingestion | CSV/JSON/Parquet, DataFrames, Transforms, Views |
| 1 | M03: Delta Fundamentals | CRUD, MERGE INTO, Time Travel, Schema Evolution |
| 2 | M04: Delta Optimization | OPTIMIZE, Z-ORDER, VACUUM, Liquid Clustering, CDF |
| 2 | M05: Incremental Processing | COPY INTO, Auto Loader, Structured Streaming |
| 2 | M06: Advanced Transforms | Window functions, CTEs, explode(), UDFs |
| 3 | M07: Medallion & Lakeflow | Bronze/Silver/Gold, STREAMING TABLE, Expectations |
| 3 | M08: Orchestration | Jobs, Triggers, CRON, taskValues, Repair Run |
| 3 | M09: Governance | GRANT/REVOKE, Row Filters, Column Masks, System Tables |

## Must-Know Syntax

### MERGE INTO
```sql
MERGE INTO target USING source ON target.id = source.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
```

### Auto Loader
```python
spark.readStream.format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("cloudFiles.schemaLocation", checkpoint)
    .load(path)
```

### Lakeflow Expectations
```sql
CONSTRAINT valid_id EXPECT (id IS NOT NULL) ON VIOLATION DROP ROW
```

### Change Data Feed
```sql
ALTER TABLE t SET TBLPROPERTIES ('delta.enableChangeDataFeed' = true);
SELECT * FROM table_changes('t', 2);
```

### GRANT
```sql
GRANT USE CATALOG ON CATALOG my_catalog TO `analysts`;
GRANT SELECT ON SCHEMA my_catalog.silver TO `analysts`;
```

## Common Exam Traps

| Trap | Correct Answer |
|------|---------------|
| VACUUM default retention | **7 days** (168 hours) |
| `LIVE.table` vs `STREAM(LIVE.table)` | `LIVE.` = batch, `STREAM(LIVE.)` = streaming |
| Expectation without `ON VIOLATION` | **Warn only** — keeps all rows |
| `DROP ROW` vs `FAIL UPDATE` | DROP ROW silently removes; FAIL UPDATE aborts |
| Temp View scope | Current SparkSession only |
| Global Temp View | Query as `global_temp.view_name` |
| Auto Loader format | `cloudFiles` (not `autoLoader`!) |
| Unity Catalog default | **Deny by default** — must GRANT explicitly |
| OPTIMIZE vs VACUUM | OPTIMIZE compacts files; VACUUM removes old files |
| Schema evolution | `.option("mergeSchema", "true")` on **write**, not read |
| Change Data Feed | `table_changes()`, NOT `table_change_feed()` |
| Streaming trigger | `availableNow=True` = all then stop; `processingTime` = micro-batch |
| Row Filter function | Must return **BOOLEAN** |
| Column Mask function | Must return **same type** as masked column |

## Study Strategy

**Before the exam:**
1. Review cheatsheets (in `materials/` folder) 2-3 times
2. Re-do all quizzes — aim for 100%
3. Practice in Databricks — MERGE, Time Travel, VACUUM, Auto Loader
4. Take the [Databricks practice exam](https://academy.databricks.com)

**During the exam:**
- 2 min/question — flag and skip difficult ones
- Eliminate obviously wrong answers first (usually 2 of 4)
- Read carefully: "which is TRUE" vs "which is FALSE"
- If two answers seem correct, pick the **more specific** one

## Post-Training Resources

| Resource | Link |
|----------|------|
| Databricks Academy | [academy.databricks.com](https://academy.databricks.com) |
| Practice Exam | [Databricks Certification](https://www.databricks.com/learn/certification/data-engineer-associate) |
| Official Documentation | [docs.databricks.com](https://docs.databricks.com) |
| Delta Lake Docs | [docs.delta.io](https://docs.delta.io) |
| Unity Catalog Guide | [UC Documentation](https://docs.databricks.com/en/data-governance/unity-catalog/index.html) |

**Your training materials:** Cheatsheets (Day 1-3), Quizzes (60+ questions), Lab & Demo notebooks (M01-M09).

## Final Notes

You've covered **100% of the exam topics** during this 3-day training. Schedule your exam within **2–4 weeks** while the material is fresh.

| Day | Modules | Key Topics |
|---|---|---|
| 1 | M01–M03 | Lakehouse, ELT Ingestion, Delta Fundamentals |
| 2 | M04–M06 | Delta Optimization, Incremental Processing, Advanced Transforms |
| 3 | M07–M10 | Medallion/Lakeflow, Orchestration, Governance, Exam Prep |

Good luck on your certification!

---

*Training delivered by Altcom*

---

> **← M09: Governance | Day 3 | End of Training**