# LAB 03: Delta DML & Time Travel

**Duration:** ~40 min  
**Day:** 1  
**After module:** M03: Delta Lake Fundamentals  
**Difficulty:** Intermediate

---

## Scenario

> *"A batch of updated customer records has arrived. You need to merge them into the existing table, then simulate a disaster (accidental DELETE) and recover data using Delta Lake's Time Travel. Finally, understand how VACUUM affects your ability to travel back in time."*

---

## Objectives

After completing this lab you will be able to:
- Use `MERGE INTO` for upsert operations (insert + update)
- Perform `UPDATE` and `DELETE` on Delta tables
- Inspect table history with `DESCRIBE HISTORY`
- Query previous versions with Time Travel (`VERSION AS OF`)
- Restore a table to a previous version with `RESTORE TABLE`
- Understand VACUUM and its impact on Time Travel

---

## Prerequisites

- LAB 02 completed (Bronze tables exist)
- The setup cell will recreate `bronze.customers` if needed

---

## Tasks Overview

Open **`LAB_03_code.ipynb`** and complete the `# TODO` cells.

| Task | What to do | Key concept |
|------|-----------|-------------|
| **Task 1** | Examine the Update File | Load `customers_new.csv`, compare counts |
| **Task 2** | MERGE INTO (Upsert) | Match on `customer_id`, UPDATE matched, INSERT new |
| **Task 3** | UPDATE Records | `UPDATE ... SET state = 'TX' WHERE city = 'Austin'` |
| **Task 4** | Accidental DELETE | Simulate disaster — delete most rows |
| **Task 5** | DESCRIBE HISTORY | View all operations in table history |
| **Task 6** | Time Travel — Query Old Version | `SELECT * FROM table VERSION AS OF n` |
| **Task 7** | RESTORE the Table | `RESTORE TABLE ... TO VERSION AS OF n` |
| **Task 8** | VACUUM and Its Impact | Run VACUUM, observe Time Travel failure |

---

## Detailed Hints

### Task 1: Update File
- Read `customers_new.csv` using `spark.read` with CSV format
- Compare `df_updates.count()` with base table count

### Task 2: MERGE INTO
- Match condition: `target.customer_id = source.customer_id`
- `WHEN MATCHED THEN UPDATE SET *`
- `WHEN NOT MATCHED THEN INSERT *`

### Task 3: UPDATE
- `SET state = 'TX'` and `WHERE city = 'Austin'`

### Task 5: DESCRIBE HISTORY
- Command: `DESCRIBE HISTORY catalog.schema.table`
- Look for operations: WRITE, MERGE, UPDATE, DELETE

### Task 6: Time Travel
- Find the version number BEFORE the DELETE from the history
- Use `VERSION AS OF {version_number}` in SELECT

### Task 7: RESTORE
- `RESTORE TABLE ... TO VERSION AS OF {version_number}`
- Use the same version as Task 6

### Task 8: VACUUM
- `VACUUM table_name RETAIN 0 HOURS` (demo only, never in production!)
- After VACUUM, querying old versions will fail — files are deleted

---

## Summary

In this lab you:
- Used MERGE INTO for upsert (insert + update)
- Performed UPDATE and DELETE on Delta tables
- Inspected history with DESCRIBE HISTORY
- Queried previous versions with Time Travel
- Restored a table with RESTORE TABLE
- Ran VACUUM and observed its impact on Time Travel

> **Exam Tip:** Time Travel uses the Delta transaction log. Data files for old versions are only removed by `VACUUM`. Default retention is **7 days**. After VACUUM, `DESCRIBE HISTORY` still shows metadata, but querying old versions fails because the underlying Parquet files are gone.

> **What's next:** In LAB 04 you will optimize Delta tables with OPTIMIZE, Z-ORDER, VACUUM, and Liquid Clustering.