# Lecture 31. Change Data Capture

In this lecture, we will talk about **change data capture (CDC)**.

- You will understand what **change data capture** is, and 
- you will learn how this can be processed in **Delta Live Tables**.


## What is Change Data Capture (CDC)

**Change Data Capture** or **CDC** refers to the process of identifying and capturing changes made to data in the data source, and then delivering those changes to the target.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Presentation-Images/Change Data Capture (CDC).jpg" alt="Workflows - Delta Live Tables - Pipeline details.jpg" style="width: 320px">
</div>

Those changes could obviously be 
  - new records to be inserted from the source to the target, 
  - updated records in the source that need to be reflected in the target, and lastly, 
  - deleted records in the source that must be deleted in the target.


### CDC Feed 

Changes are logged at the source as events that contain both the data of the records along with metadata information. This metadata indicates whether the specified record was inserted, updated, or deleted, in addition to a version number or timestamp indicating the order in which changes happened.

Here's an example of CDC events that need to be applied to our target table.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Presentation-Images/CDC Feed 1.jpg" alt="Workflows - Delta Live Tables - Pipeline details.jpg" style="width: 800px">
</div>

Notice here:

  - **France**, for example, has two records, so we need to apply the most recent change.
  - **Canada** needs to be deleted, so we don't need to send all the data of the record.
  - Lastly, **USA** and **India** are new records that need to be inserted.

Here we see the changes applied to our target table. And of course, we don't see the record of **Canada** as it has been deleted.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Presentation-Images/CDC Feed 2.jpg" alt="Workflows - Delta Live Tables - Pipeline details.jpg" style="width: 800px">
</div>

Such a CDC feed could be received from the source as a data stream or simply in **JSON** files, for example.

**Delta Live Tables** supports CDC feed processing using the `Apply Changes Into` command.



## The `Apply Changes Into` Command

The command is pretty simple:

```sql
APPLY CHANGES INTO LIVE.target_table
FROM STREAM(LIVE.cdc_feed_table)
KEYS (key_field)
APPLY AS DELETE WHEN operation_field = "DELETE"
SEQUENCE BY sequence_field
COLUMNS *
```

- **Apply Changes Into**: The target table into which the changes need to be applied.
- **From**: A CDC feed table specified as a streaming source.
- **Keys**: Where you identify the primary key fields. If the key exists in the target table, the record will be updated. If not, it will be inserted.
- **Apply As Delete When**: Here, you specify that records where the operation field is "Delete" should be deleted.
- **Sequence By**: Specifies the sequence field for ordering how operations should be applied.
- Lastly, you indicate the list of fields that should be added to the target table.

Note here: The target **Delta Live Table** needs to be already created before executing the `Apply Changes Into` command.



### Features of the `Apply Changes Into` Command

- It automatically orders late-arriving records using the user-provided sequencing key. 

  - This pattern ensures that if any records arrive out of order, downstream results can be properly re-computed to reflect the updates.

  - It also ensures that when records are deleted from a source table, these values are no longer reflected in tables later in the pipeline.

- The default behavior for insert and update operations is to upsert the CDC events into the target table. 
  
  That means it updates any rows in the target table that match the specified key, or inserts new records when a matching record does not exist in the target table.

- Optional handling for delete events can be specified with the `APPLY AS DELETE WHEN` condition.

- You can specify one or many fields as the primary key for a table.

- The `Except` keyword can be added to specify columns to ignore.

- Lastly, you can choose whether to store records as **Slowly Changing Dimension (SCD)**, **Type 1** or **Type 2**. 

  `Apply Changes Into` defaults to creating a **Type 1** slowly changing dimension table, meaning that each unique key will have at most one record, and that the updates would overwrite the original information.



### Disadvantage of the `Apply Changes Into` Command

On the other hand, `Apply Changes Into` has one disadvantage. 

Since data is being updated and deleted in the target table, this breaks the append-only requirements for streaming table sources. That means we will no longer be able to use this updated table as a streaming source later in the next layer.

---

**Great! That's all for this lecture.**

Let us now switch to the **Databricks** platform in order to see `Apply Changes Into` in action.
