# LAB 07B Guide: Orchestration — Multi-task Jobs with Triggers

**Duration:** ~45 min | **Day:** 3 | **Difficulty:** Advanced  
**After module:** M08: Orchestration & Lakeflow Jobs

---

## Scenario

> *"RetailHub needs two automated pipelines: one that processes orders when new files arrive,  
> and another that builds customer reports whenever the orders pipeline updates its silver table.  
> Set up cross-job orchestration using File Arrival and Table Update triggers."*

## Objectives

After completing this lab you will be able to:
- Create multi-task Lakeflow Jobs with task dependencies (DAG)
- Configure **File Arrival** triggers on Unity Catalog Volumes
- Configure **Table Update** triggers for cross-job orchestration
- Use a validation task with event logging
- Pass parameters between Job tasks using widgets and `{{run.id}}`

## Task 1: Prepare Workspace

Import medallion notebooks from `lab/materials/medallion/` into Databricks workspace.

**Files to import:**
- `bronze_orders.ipynb`, `silver_orders_cleaned.ipynb`, `gold_daily_orders.ipynb`
- `bronze_customers.ipynb`, `silver_customers.ipynb`, `gold_customer_orders_summary.ipynb`
- `task_validate_pipeline.py` (from `lab/materials/orchestration/`)

**Target:** `/Workspace/Users/<email>/medallion/` and `/Workspace/Users/<email>/orchestration/`

## Task 2: Create Job A — Orders Pipeline

**Job name:** `LAB_Orders_Pipeline`

**Tasks:**
1. `bronze_orders` → `medallion/bronze_orders` (no dependencies)
2. `silver_orders` → `medallion/silver_orders_cleaned` (depends on: `bronze_orders`)
3. `gold_daily` → `medallion/gold_daily_orders` (depends on: `silver_orders`)

**Parameters:** Each task needs `catalog` parameter. Silver/Gold tasks need schema params.

**Trigger:** File Arrival
- URL: `/Volumes/<catalog>/default/landing_zone/trigger`
- Min time between triggers: 60s
- Wait after last change: 15s

**Key points:**
- Use Serverless compute or Job cluster (not All-Purpose)
- DAG should show linear dependency: bronze → silver → gold
- Don't run yet — trigger in Task 3

## Task 3: Trigger Job A

Run the code cell to create a signal file in the monitored Volume path.

**Expected answer:**
```python
volume_path = f"/Volumes/{CATALOG}/default/landing_zone/trigger"
```

After file creation:
- Job A should trigger within ~60 seconds
- Check Workflows UI → `LAB_Orders_Pipeline` for run status
- Wait for all 3 tasks to complete

## Task 4: Create Job B — Customer Pipeline

**Job name:** `LAB_Customer_Pipeline`

**Tasks:**
1. `bronze_customers` → `medallion/bronze_customers`
2. `silver_customers` → `medallion/silver_customers` (depends on: `bronze_customers`)
3. `gold_summary` → `medallion/gold_customer_orders_summary` (depends on: `silver_customers`)
4. `validate` → `orchestration/task_validate_pipeline` (depends on: `gold_summary`)

**Trigger:** Table updated
- Table: `<catalog>.silver.silver_orders_cleaned`
- Condition: Any new rows (`inserted_count > 0`)
- Min time between triggers: 60s

**Cross-job flow:**
```
File arrives → Job A: bronze→silver→gold_daily
                         ↓
              silver_orders_cleaned updated
                         ↓
              Job B triggers: bronze_cust→silver_cust→gold_summary→validate
```

**Note:** Job B validates ALL tables (from both jobs) via `task_validate_pipeline`.

## Task 5-6: Verify Results & Event Log

Uncomment the verification queries after both jobs complete.

**Expected results:**
- All 6 tables have rows (bronze_orders, bronze_customers, silver_orders_cleaned, silver_customers, gold_daily_orders, gold_customer_orders_summary)
- `pipeline_event_log` has a SUCCESS record with `event_type = 'PIPELINE_VALIDATION'`

## Task 7: Cross-Job Pattern

**Answers:**
```
Signal file → File Arrival trigger → Job A runs → writes silver_orders_cleaned
                                                           ↓
                                                Table Update trigger → Job B runs → validates all tables
```

1. If Job A fails at `silver_orders` → `silver_orders_cleaned` is NOT updated → Job B does NOT trigger
2. Repair Runs re-execute only failed + downstream tasks, skipping already-successful ones → saves time/compute
3. Email alerts: Job settings → Notifications → Add email → On failure

## Summary

| Topic | Key Concept |
|-------|-------------|
| Multi-task Job | DAG with notebook tasks + dependencies |
| File Arrival | Trigger on new file in UC Volume |
| Table Update | Trigger on DML in Delta table |
| Cross-job | Job A output → triggers Job B |
| Validation | Check all tables + log to event_log |
| Repair | Re-run only failed + downstream tasks |

---

> **Next:** LAB 08 - Unity Catalog Governance