# ðŸ“˜ Delta Live Tables (DLT)

## 1. What is Delta Live Tables?
- DLT is a **declarative ETL framework** in Databricks for building **reliable data pipelines**.
- You define tables/views with simple SQL or Python, Databricks **automatically manages execution, dependencies, retries, and monitoring**.
- Supports both **batch** and **streaming** pipelines.

---

## 2. Key Features
- âœ… **Declarative**: You define *what* you want (tables/views), not *how* to run.  
- âœ… **Auto Dependency Management**: DLT figures out the order of execution.  
- âœ… **Data Quality**: Built-in **expectations** (data validation).  
- âœ… **Auto Scaling**: Manages cluster resources.  
- âœ… **Monitoring**: In-built UI with lineage & metrics.  
- âœ… **SCD Support**: Built-in support for Slowly Changing Dimensions (Type 2).  

---

## 3. Core Concepts
### ðŸ”¹ Live Table
- A managed Delta table maintained by DLT.
- Defined with:
```sql
CREATE OR REFRESH LIVE TABLE customers_bronze AS
SELECT * FROM cloud_files("/mnt/raw/customers", "csv", map("header","true"))


In [0]:
ðŸ”¹ Streaming Live Table

A streaming Delta table maintained by DLT.
New data appended continuously.

CREATE OR REFRESH STREAMING LIVE TABLE customers_bronze AS
SELECT * FROM cloud_files("/mnt/raw/customers", "csv")


In [0]:
ðŸ”¹ Live View

Logical view (not materialized).
Used to transform before writing into tables.

CREATE OR REFRESH LIVE VIEW customers_silver AS
SELECT DISTINCT * FROM LIVE.customers_bronze


In [0]:
ðŸ”¹ Expectations (Data Quality Rules)

Validate data with expectations.
Can log, drop, or fail pipeline if rule is broken.

CREATE OR REFRESH LIVE TABLE customers_silver
TBLPROPERTIES ("quality" = "silver")
AS
SELECT *
FROM LIVE.customers_bronze
EXPECT (amount > 0) ON VIOLATION DROP ROW


In [0]:
4. Data Quality Pipeline Layers

Bronze â†’ Raw ingestion (from Auto Loader, Kafka, etc.)
Silver â†’ Cleaned, filtered, deduplicated data
Gold â†’ Aggregations, business KPIs  

In [0]:
5. SCD Type 2 (Slowly Changing Dimensions)
DLT provides APPLY CHANGES INTO syntax for SCD2:
    
APPLY CHANGES INTO LIVE.customers_scd2
FROM LIVE.customers_silver
KEYS (customer_id)
SEQUENCE BY update_timestamp
STORED AS SCD TYPE 2


In [0]:
6. Python API
Using decorators:
import dlt
from pyspark.sql.functions import col

@dlt.table
def customers_bronze():
    return spark.readStream.format("cloudFiles") \
        .option("cloudFiles.format", "csv") \
        .load("/mnt/raw/customers")

@dlt.view
def customers_silver():
    return dlt.read("customers_bronze").dropDuplicates(["customer_id"])

@dlt.table
def customers_gold():
    return dlt.read("customers_silver").groupBy("city").count()


In [0]:
7. Deployment

DLT pipelines are deployed via the Pipelines UI or Databricks REST API.

Options:
Continuous mode â†’ Keeps running (streaming).
Triggered mode â†’ Runs once (batch).

In [0]:
8. Why DLT?

Simplifies pipeline development â†’ less boilerplate.
Ensures data quality & lineage automatically.
Ideal for medallion architecture (bronze â†’ silver â†’ gold).
Integrates seamlessly with Auto Loader and Delta Lake.