# Lecture 32. Processing CDC Feed with DLT (Hands On) - Part 2

## New Notebook to Be Added to the Pipeline

### Bronze Layer Tables

We start by creating a **bronze table** to ingest the **book CDC feed**.  
  We are using **auto loader** to load the JSON files incrementally.


In [None]:
%sql
CREATE OR REFRESH STREAMING LIVE TABLE books_bronze
COMMENT "The raw books data, ingested from CDC feed"
AS SELECT * FROM cloud_files("${dataset.path}/books-cdc", "json")

### Silver Layer Tables

Next, we create the **silver table**.  
This is our **target table**, into which the changes from the CDC feed will be applied.  
We start by declaring the table since `Apply Changes Into` requires the target table to be declared in a separate statement.

Now, with the target table created, we can write the `Apply Changes Into` command.
  - In this command, we specify the table `book_silver` as the **target table**,  and the table `books_bronze` as the **streaming source** of our CDC feed.

  - Then, we identify the `book_id` as the **primary key**.  
    - If the key exists in the target table, the record will be **updated**.  
    - If not, it will be **inserted**.

  - Next, we specify that records where the `row_status` is **"delete"** should be **deleted** from the target table.

  - And we specify the `row_time` field for **ordering the operations**.

  - Lastly, we indicate that all **books fields** should be added to the target table, except the operational columns:  
`row_status` and `row_time`.


In [None]:
%sql
CREATE OR REFRESH STREAMING LIVE TABLE books_silver;

APPLY CHANGES INTO LIVE.books_silver
  FROM STREAM(LIVE.books_bronze)
  KEYS (book_id)
  APPLY AS DELETE WHEN row_status = "DELETE"
  SEQUENCE BY row_time
  COLUMNS * EXCEPT (row_status, row_time)

### Gold Layer Table

At the **gold layer**, we define a simple **aggregate query** to create a live table from the data in our `book_silver` table.


In [None]:
%sql
CREATE LIVE TABLE author_counts_state
  COMMENT "Number of books per author"
AS SELECT author, count(*) as books_count, current_timestamp() updated_time
  FROM LIVE.books_silver
  GROUP BY author

Notice here that this is **not a streaming table**.  
Since data is being updated and deleted from our `book_silver` table, it is no longer valid to be a **streaming source** for this new table.  
Remember, streaming sources must be **append-only** tables.


### Defining a DLT View

Lastly, we see here how to define a **DLT view**.

In DLT pipelines, we can also define **views**. To define a view, simply replace `TABLE` with the `VIEW` keyword.

**DLT views** are **temporary views** scoped to the DLT pipeline they are a part of, so they are not persisted to the **metastore**.  
Views can still be used to enforce **data quality**.  
And **metrics for views** will be collected and reported as they would be for tables.

In [None]:
%sql
CREATE LIVE VIEW books_sales
  AS SELECT b.title, o.quantity
    FROM (
      SELECT *, explode(books) AS book 
      FROM LIVE.orders_cleaned) o
    INNER JOIN LIVE.books_silver b
    ON o.book.book_id = b.book_id;

### Joining and Referencing Tables Across Notebooks

Here, we see how we can **join** and reference tables across notebooks.

We are joining our `book_silver` table to the `orders_cleaned` table, which we created in another notebook in the last lecture.

Since **DLT** supports scheduling multiple notebooks as part of a single **DLT pipeline configuration**, code in any notebook can reference tables and views created in any other notebook.

Essentially, you can think of the scope of the schema referenced by the **LIVE** keyword to be at the Delta pipeline level, rather than the individual notebook.



## [Adding a New Notebook to the Existing DLT Pipeline](./Lecture-32__Processing-CDC-Feed-with-DLT-(Hands-On)-3.ipynb)
