In [0]:
SET datasets.path=dbfs:/mnt/demo-datasets/bookstore;


## 🔶 Bronze Layer Tables

The **Bronze layer** ingests raw CDC feeds, typically in formats like JSON, which include operational metadata used to track the type and time of each change.

### Key Operational Columns:
- **row_status**: Specifies the operation type:
  - *Insert*: New data
  - *Update*: Modified records
  - *Delete*: Removed entries
- **row_time**: Timestamp of the change, used to sequence operations accurately.

Auto Loader is commonly used for incremental ingestion of CDC events from cloud storage.


In [0]:
CREATE OR REFRESH STREAMING LIVE TABLE books_bronze
COMMENT "The raw books data, ingested from CDC feed"
AS SELECT * FROM cloud_files("${datasets.path}/books-cdc", "json");


## 🔷 Silver Layer Tables

The **Silver layer** applies changes from the Bronze table using the `APPLY CHANGES INTO` mechanism. This layer performs logic to upsert and delete records based on CDC metadata.

### Highlights:
- The primary key is used to detect whether a record should be updated or inserted.
- Deletion handling is explicitly defined to remove records from the target when applicable.
- The **row_time** column ensures events are processed in the correct order to maintain consistency.

This layer transforms the raw event stream into a clean, accurate view of the current state of the data.


In [0]:
CREATE OR REFRESH STREAMING LIVE TABLE books_silver;

APPLY CHANGES INTO LIVE.books_silver
  FROM STREAM(LIVE.books_bronze)
  KEYS (book_id)
  APPLY AS DELETE WHEN row_status = "DELETE"
  SEQUENCE BY row_time
  COLUMNS * EXCEPT (row_status, row_time)

## 🟡 Gold Layer Tables

The **Gold layer** aggregates and curates data for analytics and reporting. Aggregations are applied over the Silver layer to derive metrics such as totals or trends by category or time.

> Note: Due to ongoing updates and deletions, the output tables in this layer are **not suitable as streaming sources**.

In [0]:
CREATE LIVE TABLE author_counts_state
  COMMENT "Number of books per author"
AS SELECT author, count(*) as books_count, current_timestamp() updated_time
  FROM LIVE.books_silver
  GROUP BY author

## 🧩 DLT Views

Delta Live Table views are **temporary constructs** within the pipeline, designed to:
- Facilitate logic separation.
- Enforce data quality checks.
- Collect validation metrics or intermediate transformations.

DLT views cannot be accessed outside of the DLT pipeline but provide flexibility within multi-step data flows.


In [0]:
CREATE LIVE VIEW books_sales
  AS SELECT b.title, o.quantity
    FROM (
      SELECT *, explode(books) AS book 
      FROM LIVE.orders_cleaned) o
    INNER JOIN LIVE.books_silver b
    ON o.book.book_id = b.book_id;

## 🛠️ Pipeline Integration

The session concludes by integrating new logic into an existing DLT pipeline, demonstrating how:
- Multiple notebooks can be used collaboratively.
- CDC processing fits naturally within a **multi-hop** architecture.
- Pipelines remain modular, maintainable, and scalable across evolving datasets.

Delta Live Tables simplify complex CDC workflows by combining declarative syntax, built-in quality enforcement, and seamless orchestration.