## Declarative Pipelines in Databricks (Delta Live Tables)

In Databricks, declarative pipelines are implemented using **Delta Live Tables (DLT)**.

DLT lets you define pipeline steps as **tables** (using SQL or Python). Instead of manually orchestrating tasks like “run job A, then job B, then job C”, you **declare the desired tables and transformations**, and DLT manages execution details such as ordering, retries, and monitoring.

> You declare **what** you want the tables to look like — DLT manages **how** they run as a pipeline.

### Core idea: Tables define the pipeline graph (DAG)

In DLT, each table is a node in the pipeline.  
If table B reads table A, then **B depends on A**.

DLT automatically builds the dependency graph and runs steps in the correct order.

### Benefits of DLT (Declarative Pipelines)

#### 1) Built-in data quality checks (Expectations)
You can define rules like “this column should never be NULL.”  
DLT can:
- **record** the rule outcome (monitoring),
- **drop bad records**, or
- **fail the pipeline** depending on how you configure the expectation.

#### 2) Automatic dependency management
You don’t manually create a DAG of tasks.  
DLT derives the DAG from table dependencies and schedules execution accordingly.

#### 3) Incremental processing support (including CDC patterns)
DLT is designed to efficiently process new/changed data.  
For CDC-style updates, you typically define:
- primary keys (e.g., `order_id` or `item_id`)
- a sequencing column (e.g., `updated_at`)
and DLT can apply changes and maintain history (e.g., SCD Type 2) using supported CDC utilities.

#### 4) Unified batch and streaming pipelines
DLT supports both batch and streaming-style pipelines in one framework, while managing operational details like checkpoints, retries, and observability.

### Expectations (Data Quality Rules)

Expectations are simple rules applied to your DLT tables.

Common behaviors:
- **expect**: track metrics, but keep records
- **expect_or_drop**: drop records that fail the rule
- **expect_or_fail**: fail the pipeline if rule fails

Example rules:
- `order_id IS NOT NULL`
- `amount >= 0`

### Understanding the Databricks UI (high-level setup)

#### Step 1: Create a Catalog
- Go to **Catalog Explorer** → **Create catalog**
- Name it (example): `dlt_plaidt`
- Configure access (e.g., grant privileges to required users/groups)

#### Step 2: Create a Schema for source data
- In the catalog, create a schema named: `source`

#### Step 3: Create and populate a source table (SQL Editor)
- Open **SQL Editor**
- Set catalog = `dlt_plaidt`, schema = `source`
- Create a table (example: `orders`) and insert sample rows
- Optional: save the query as `source_orders`

#### Step 4: Organize workspace assets
- Create a workspace folder (example: `Declarative Pipelines`)
- Move your SQL queries/notebooks into that folder for clean organization

### Creating a Delta Live Tables Pipeline (DLT)

1. Go to **Jobs & Pipelines** → **Create** → **ETL pipeline**
2. Choose a starting point:
   - Sample code (SQL/Python) for learning
   - Empty pipeline for building from scratch
   - Add existing assets if you already have DLT source files ready

#### Pipeline assets (important concept)
DLT pipelines run code from a configured **source code directory**.
- Only files inside the declared **source code** folder are executed as part of the pipeline
- Keep notebooks/scripts for exploration or testing **outside** the source folder

Common pattern:
- `transformations/` → pipeline source code
- `explorations/` → EDA notebooks (not part of pipeline)
- `utilities/` → helper modules/functions