
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# 6 - Change Data Capture with APPLY CHANGES INTO

In this demonstration, we will continue to build our pipeline by ingesting **customer** data into our pipeline. The customer data includes new customers, customers who have deleted their accounts, and customers who have updated their information (such as address, email, etc.). We will need to build our customer pipeline by implementing change data capture (CDC) for customer data.

The customer pipeline flow will:

- The bronze table uses **Auto Loader** to ingest JSON data from cloud object storage with SQL (`FROM STREAM`).
- A table is defined to enforce constraints before passing records to the silver layer.
- `APPLY CHANGES INTO` is used to automatically process CDC data into the silver layer as a Type 2 [slowly changing dimension (SCD) table](https://en.wikipedia.org/wiki/Slowly_changing_dimension).
- A gold table is defined to create a materialized view of the current customers with updated information (dropped customers, new customers and updated information).



### Learning Objectives

By the end of this lesson, students should feel comfortable:
- Apply the `APPLY CHANGES INTO` operation in Lakeflow Declarative Pipelines to process change data capture (CDC) by integrating and updating incoming data from a source stream into an existing Delta table, ensuring data accuracy and consistency.
- Analyze Slowly Changing Dimensions (SCD Type 2) tables within Lakeflow Declarative Pipelines to effectively track historical changes in dimensional data, managing the state of records over time using appropriate keys, versioning, and timestamps.

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

    - In the drop-down, select **More**.

    - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## A. Classroom Setup

Run the following cell to configure your working environment for this course. This setup will reset your volume to one JSON file in each directory.

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically create and reference the information needed to run the course.

In [0]:
%run ./Includes/Classroom-Setup-6

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


Schema labuser11058730_1754017152.1_bronze_db already exists. No action taken.
Schema labuser11058730_1754017152.2_silver_db already exists. No action taken.
Schema labuser11058730_1754017152.3_gold_db already exists. No action taken.
----------------------------------------------------------------------------------------
Directory /Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers already exists. No action taken.
Directory /Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/orders already exists. No action taken.
Directory /Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/status already exists. No action taken.
----------------------------------------------------------------------------------------


Searching for files in /Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers/ volume to delete prior to creating files...
Deleting file: /Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers/00.json

Searc

Schemas are available, lab check passed: ['1_bronze_db', '2_silver_db', '3_gold_db'].


0,1
Your catalog name variable reference: DA.catalog_name:,
"Variable reference to your source files (Python - DA.paths.working_dir, SQL - DA.paths_working_dir):",


## B. Explore the Customer Data Source Files

1. Run the cell below to programmatically view the files in your `/Volumes/dbacademy/ops/lab-user-name/customers` volume. Confirm you only see one **00.json** file for customers.

In [0]:
%python
spark.sql(f'LIST "{DA.paths.working_dir}/customers"').display()

path,name,size,modification_time
/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers/00.json,00.json,193413,1754030168000


2. Run the query below to explore the customers **00.json** file located at `/Volumes/dbacademy/ops/lab-user-name/customers`. Note the following:

   a. The file contains **939 customers** (remember this number).

   b. It includes general customer information such as **email**, **name**, and **address**.

   c. The **timestamp** column specifies the logical order of customer events in the source data.

   d. The **operation** column indicates whether the entry is for a new customer, a deletion, or an update.
      - **NOTE:** Since this is the first JSON file, all rows will be considered new customers.


In [0]:
SELECT *
FROM read_files(
  DA.paths_working_dir || '/customers/00.json',
  format => "JSON"
)
ORDER BY operation;

address,city,customer_id,email,name,operation,state,timestamp,zip_code,_rescued_data
568 David Brook Apt. 524,Norwalk,23056,nelsonjoy@example.com,Brent Chavez,NEW,CA,1632417981,45049,
741 Wendy Plains Apt. 143,San Francisco,23057,perkinsdeborah@example.net,James Cruz,NEW,CA,1632421305,42872,
732 Trujillo Rue,Santa Monica,23058,jmccullough@example.net,Jennifer Christensen,NEW,CA,1632356384,89020,
954 Oconnell Union Apt. 988,New York,23060,ibuck@example.net,Shannon Cochran,NEW,NY,1632371835,35054,
07449 Michael Manor,Pawtucket,23061,anthony99@example.com,Michael Norris,NEW,RI,1632409638,93764,
73556 Rogers Glens,St. Charles,23062,vcoleman@example.net,Rhonda Thompson,NEW,MO,1632389458,62504,
672 Erica Lakes,Perry,23063,sarahgallagher@example.net,Gene Yang DDS,NEW,MI,1632413696,60175,
99513 Shari Views Apt. 667,Roswell,23064,kgonzalez@example.com,Valerie Clark,NEW,GA,1632356121,15232,
61040 Hernandez Lane,Gaithersburg,23065,dylantaylor@example.net,Tonya Cameron,NEW,MD,1632400610,58640,
4208 Antonio Mountains Apt. 386,Danville,23066,clin@example.org,Kelly Davis,NEW,KY,1632508637,59680,


### Question: 
How can we ingest new raw data source files (JSON) with customer updates into our pipeline to update the **customers_silver** table when inserts, updates, or deletes occur, while also maintaining historical records?

## C. Change Data Capture with APPLY CHANGES INTO in Lakeflow Declarative Pipelines for SCD Type 2
**SCD - Slowly Changing Dimensions**

1. Run the cell below to create your starter Lakeflow Declarative Pipeline for this demonstration. The pipeline will set the following for you:
    - Your default catalog: `labuser`
    - Your configuration parameter: `source` = `/Volumes/dbacademy/ops/your-labuser-name`

    **NOTE:** If the pipeline already exists, an error will be returned. In that case, you'll need to delete the existing pipeline and rerun this cell.

    To delete the pipeline:

    a. Select **Jobs and Pipelines** from the far-left navigation bar.  

    b. Find the pipeline you want to delete.  

    c. Click the three-dot menu ![ellipsis icon](./Includes/images/ellipsis_icon.png).  

    d. Select **Delete**.

**NOTE:**  The `create_declarative_pipeline` function is a custom function built for this course to create the sample pipeline using the Databricks REST API. This avoids manually creating the pipeline and referencing the pipeline assets.

In [0]:
%python
create_declarative_pipeline(pipeline_name=f'6 - Change Data Capture with APPLY CHANGES INTO - {DA.catalog_name}', 
                            root_path_folder_name='6 - Change Data Capture with APPLY CHANGES INTO Project',
                            catalog_name = DA.catalog_name,
                            schema_name = 'default',
                            source_folder_names=['orders', 'status', 'customers'],
                            configuration = {'source':DA.paths.working_dir})

Creating the Lakeflow Declarative Pipeline '6 - Change Data Capture with APPLY CHANGES INTO - labuser11058730_1754017152'...
Root folder path: /Workspace/Users/labuser11058730_1754017152@vocareum.com/build-data-pipelines-with-lakeflow-declarative-pipelines-3.0.2/Build Data Pipelines with Lakeflow Declarative Pipelines/6 - Change Data Capture with APPLY CHANGES INTO Project
Source folder path(s): [{'glob': {'include': '/Workspace/Users/labuser11058730_1754017152@vocareum.com/build-data-pipelines-with-lakeflow-declarative-pipelines-3.0.2/Build Data Pipelines with Lakeflow Declarative Pipelines/6 - Change Data Capture with APPLY CHANGES INTO Project/orders/**'}}, {'glob': {'include': '/Workspace/Users/labuser11058730_1754017152@vocareum.com/build-data-pipelines-with-lakeflow-declarative-pipelines-3.0.2/Build Data Pipelines with Lakeflow Declarative Pipelines/6 - Change Data Capture with APPLY CHANGES INTO Project/status/**'}}, {'glob': {'include': '/Workspace/Users/labuser11058730_1754017


**NOTE**: If the pop-up Run in strict sandbox appears on the Pipeline page, click on **Run in strict sandbox** to proceed.


2. Complete the following steps to open the starter Lakeflow Declarative Pipeline project for this demonstration:

   a. Click the folder icon ![Folder](./Includes/images/folder_icon.png) in the left navigation panel.

   b. In the **Build Data Pipelines with Lakeflow Declarative Pipelines** folder, find the **6 - Change Data Capture with APPLY CHANGES INTO Project** folder. 
   
   c. Right-click and select **Open in a new tab**.

   d. In the new tab you should see four folders: **explorations**, **orders**, **status** and **customers**.(Plus the extra python_excluded folder that contains the Python version). 

      - **NOTE:** The **status** and **orders** pipelines are the same as we saw in the previous demo.

   e. Open the **customers** folder and select the **customers_pipeline** notebook.

#### IMPORTANT
   **NOTE:** If you open the **customers_pipeline** file and it does not open up the pipeline editor, that is because that folder is not associated with a pipeline. Please make sure to run the previous cell to associate the folder with the pipeline and try again.

   **WARNING:** If you get the following warning when opening the **customers_pipeline** file: 

   ```pipeline you are trying to access does not exist or is inaccessible. Please verify the pipeline ID, request access or detach this file from the pipeline.``` 

   Simply refresh the page and/or reselect the notebook.

3. Explore the **customers_pipeline** notebook code cells step by step and then follow the instructions.

    **NOTE:** The **status** and **orders** code is the same as the previous demonstration, you do not need to review those.

## D. Land New Data to Your Data Source Volume

1. Before landing more data into your cloud storage location, run the query below to view the **customers_silver_demo6** streaming table (the table with SCD Type 2). Notice the following:

   - The streaming table contains all **939 rows** from the **00.json** file, since they are all new customers being added to the table.

   - Scroll to the right of the table and note that `APPLY CHANGES INTO` added the following columns:

     - **__START_AT**:  
       - A timestamp representing when the current version of a record became active.  

     - **__END_AT**:  
       - A timestamp representing when the current version of a record became inactive (either a **DELETE** or **UPDATE**).
       - This means if you filter on **__END_AT** for `null` you retrieve all active customers. In the first run all customer rows are active.

     - **NOTE:** These columns support Slowly Changing Dimension (SCD) Type 2. The special fields **__START_AT** and **__END_AT** are automatically managed by the `APPLY CHANGES` statement in Databricks to track the validity period of each version of a record.

In this initial ingestion of the **00.json** file, all records are active, so the **__END_AT** column contains only `null` values.

In [0]:
SELECT *
FROM 2_silver_db.customers_silver_demo6;

address,city,customer_id,email,name,state,zip_code,processing_time,source_file,timestamp_datetime,__START_AT,__END_AT
241 Dennis Springs,Springfield,22122,marie21@example.net,Cynthia Price,MA,69026,2025-08-01T06:39:11.387Z,00.json,2021-12-31T20:25:14Z,2021-12-31T20:25:14Z,
80972 Johnson Island,Denver,22141,michelle30@example.net,Stephen Parker,CO,76478,2025-08-01T06:39:11.387Z,00.json,2021-12-29T10:48:27Z,2021-12-29T10:48:27Z,
3772 Miller Junctions Apt. 383,McAllen,22144,phillipsolivia@example.com,Angela Frost,TX,57649,2025-08-01T06:39:11.387Z,00.json,2021-12-29T10:17:00Z,2021-12-29T10:17:00Z,
8554 Summer Plain Suite 213,Delano,22161,johnstonkatherine@example.net,Brittany Schneider,CA,74021,2025-08-01T06:39:11.387Z,00.json,2021-12-27T07:44:04Z,2021-12-27T07:44:04Z,
399 Jackson Villages,Santa Fe Springs,22163,ronaldfrazier@example.org,Jason Anderson,CA,38777,2025-08-01T06:39:11.387Z,00.json,2021-12-27T22:56:52Z,2021-12-27T22:56:52Z,
610 Thompson Valleys,Las Vegas,22169,fisherdebra@example.net,Steven Mullen,NV,74024,2025-08-01T06:39:11.387Z,00.json,2021-12-27T14:56:48Z,2021-12-27T14:56:48Z,
471 Mcdonald Corner Apt. 354,Winchester,22195,jason50@example.com,David West,VA,80381,2025-08-01T06:39:11.387Z,00.json,2021-12-24T08:49:56Z,2021-12-24T08:49:56Z,
845 Ralph Garden,Falfurrias,22196,sydney25@example.org,Daniel Berger,TX,15928,2025-08-01T06:39:11.387Z,00.json,2021-12-24T20:30:57Z,2021-12-24T20:30:57Z,
400 Bryant Mountain,New York,22198,christina16@example.net,David Lee,NY,41010,2025-08-01T06:39:11.387Z,00.json,2021-12-23T20:10:42Z,2021-12-23T20:10:42Z,
137 Nicholas Vista Apt. 083,Port Huron,22203,shepardseth@example.net,Shane Hensley,MI,41717,2025-08-01T06:39:11.387Z,00.json,2021-12-23T11:45:43Z,2021-12-23T11:45:43Z,


2. Query the **customers_silver_demo6** streaming table for the customer with **customer_id** *23225*. Notice that this customer has:

   - **Address**: `76814 Jacqueline Mountains Suite 815`  

   - **State**: `TX`  

   - **__END_AT**: Contains a `null` value, indicating this is the current information for that customer.

In [0]:
SELECT *
FROM 2_silver_db.customers_silver_demo6
WHERE customer_id = 23225;

address,city,customer_id,email,name,state,zip_code,processing_time,source_file,timestamp_datetime,__START_AT,__END_AT
76814 Jacqueline Mountains Suite 815,El Paso,23225,andrewcarter@example.org,Sandy Adams,TX,46521,2025-08-01T06:39:11.387Z,00.json,2021-10-12T10:52:25Z,2021-10-12T10:52:25Z,


3. Run the cell below to land a new JSON file to each volume (**customers**, **status** and **orders**) to simulate new files being added to your cloud storage locations.

In [0]:
%python
copy_file_for_multiple_sources(copy_n_files = 2, 
                               sleep_set = 1,
                               copy_from_source='/Volumes/dbacademy_retail/v01/retail-pipeline',
                               copy_to_target = DA.paths.working_dir)


----------------Loading files to user's volume: '/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/orders'----------------
File number 1 - 00.json is already in the source volume "/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/orders". Skipping file.

----------------Loading files to user's volume: '/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers'----------------
File number 1 - 00.json is already in the source volume "/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers". Skipping file.

----------------Loading files to user's volume: '/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/status'----------------
File number 1 - 00.json is already in the source volume "/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/status". Skipping file.

----------------Loading files to user's volume: '/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/orders'----------------
File number 1 

4. Run the cell below to programmatically view the files in your `/Volumes/dbacademy/ops/labuser-name/customers` volume. Confirm your volume now contains **00.json** and **01.json** file.

In [0]:
%python
spark.sql(f'LIST "{DA.paths.working_dir}/customers"').display()

path,name,size,modification_time
/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers/00.json,00.json,193413,1754030168000
/Volumes/dbacademy/ops/labuser11058730_1754017152@vocareum_com/customers/01.json,01.json,4709,1754030661000


5. Run the cell to explore the raw data in the **01.json** file prior to ingesting it in your pipeline. Notice the following:

   - This file contains **23** rows.

   - The **operation** column specifies **UPDATE**, **DELETE**, and **NEW** operations for customers.
      - **NOTE:** There are:
         - 12 customers with **UPDATE** values
         - 1 customer with a **DELETE** value
         - 10 new customers with a **NEW** value

   - In the results below, find the row with **customer_id** *23225* and note the following:

      - The original address for **Sandy Adams** (from the streaming table, file **00.json**) was: `76814 Jacqueline Mountains Suite 815`, `TX`

      - The updated address for **Sandy Adams** (from the file below) is: `512 John Stravenue Suite 239`, `TN`

   - In the results below, find the row with **customer_id** *23617* and note the following:
      - The **operation** for this customer is **DELETE**.
      - When the **operation** column is delete, all other column values are `null`.

In [0]:
SELECT *
FROM read_files(
  DA.paths_working_dir || '/customers/01.json',
  format => "JSON"
)
ORDER BY customer_id;

address,city,customer_id,email,name,operation,state,timestamp,zip_code,_rescued_data
618 Villarreal Stravenue Suite 601,Pittsburgh,22668,medinaryan@example.org,Steven Conway,UPDATE,PA,1641011454,28488.0,
451 Hunt Station,Johnson City,22760,ashley44@example.org,Michael Lewis,UPDATE,TN,1641057420,90476.0,
80805 Mcmillan Street,Maryville,22931,flowersjose@example.org,Teresa Mooney,UPDATE,TN,1641043886,77378.0,
512 John Stravenue Suite 239,Kingsport,23225,andrewcarter@example.org,Sandy Adams,UPDATE,TN,1641067642,94660.0,
510 Martin Gardens Apt. 723,Fullerton,23345,wnunez@example.net,Andrew Perez,UPDATE,CA,1641025081,30769.0,
077 Linda Corners,Detroit,23439,mstone@example.net,Lori Jordan,UPDATE,MI,1641066875,20864.0,
,,23617,,,DELETE,,1641054281,,
50434 Turner Land Suite 696,New York,23666,faulknershannon@example.net,John Rodgers,UPDATE,NY,1641045935,66088.0,
976 Lester Heights Suite 317,Austin,23768,conleycarly@example.org,Steven Campbell,UPDATE,TX,1641036961,8533.0,
80101 Adam Spur Apt. 971,Chicago,23789,marcus91@example.org,Robert Smith,UPDATE,IL,1641039413,81100.0,


6. Go back to your pipeline and run the pipeline to process a new JSON file for each data source.

## E. View the CDC SCD Type 2 on the Customers Table

1. View the data in the **customers_silver_demo6** streaming table with SCD Type 2 and observe the following:

   a. The table contains **961 rows** (**initial 939 customers** + **12 updates** to existing customers + **10 new customers**).

   b. Scroll to the right and locate the **__END_AT** column. Then scroll down to rows **82** and **83**. Notice there are two rows for customer **22668**, the original record and the updated record.

**NOTE:** For demonstration purposes, many of the metadata columns were retained in the silver streaming table.

In [0]:
SELECT customer_id, address, name, __START_AT, __END_AT
FROM 2_silver_db.customers_silver_demo6
ORDER BY customer_id, __END_AT;

customer_id,address,name,__START_AT,__END_AT
22122,241 Dennis Springs,Cynthia Price,2021-12-31T20:25:14Z,
22141,80972 Johnson Island,Stephen Parker,2021-12-29T10:48:27Z,
22144,3772 Miller Junctions Apt. 383,Angela Frost,2021-12-29T10:17:00Z,
22161,8554 Summer Plain Suite 213,Brittany Schneider,2021-12-27T07:44:04Z,
22163,399 Jackson Villages,Jason Anderson,2021-12-27T22:56:52Z,
22169,610 Thompson Valleys,Steven Mullen,2021-12-27T14:56:48Z,
22195,471 Mcdonald Corner Apt. 354,David West,2021-12-24T08:49:56Z,
22196,845 Ralph Garden,Daniel Berger,2021-12-24T20:30:57Z,
22198,400 Bryant Mountain,David Lee,2021-12-23T20:10:42Z,
22203,137 Nicholas Vista Apt. 083,Shane Hensley,2021-12-23T11:45:43Z,


2. Run the query on the **2_silver_db.customers_silver_demo6** streaming table for all rows where **__END_AT** `IS NOT NULL` to view all rows where those customers rows are now inactive.

Notice the following:
  - **13 rows** are returned (**12 UPDATES** + **1 DELETE**)
  - The **__END_AT** column indicates the date and time that the row was either updated or deleted.

In [0]:
SELECT customer_id, address, name, __START_AT, __END_AT
FROM 2_silver_db.customers_silver_demo6
WHERE __END_AT IS NOT NULL;

customer_id,address,name,__START_AT,__END_AT


3. Query the **2_silver_db.customers_silver** table for the **customer_id** *23225*. Notice the following:


    - There are **two records** for that customer in the table.
    - The original record from the **00.json** file now has a value in the **__END_AT** column, indicating that it is now inactive.
    - The new record from the **01.json** file is now the active row and contains a `null` value in the **__END_AT** column.

In [0]:
SELECT customer_id, address, name, state, source_file, __START_AT, __END_AT
FROM 2_silver_db.customers_silver_demo6
WHERE customer_id = 23225;

customer_id,address,name,state,source_file,__START_AT,__END_AT
23225,76814 Jacqueline Mountains Suite 815,Sandy Adams,TX,00.json,2021-10-12T10:52:25Z,


4. In the **01.json** file, **customer_id** *23617* was marked as deleted. Let's query the **customers_silver_demo6** table for that customer and view the results. Notice that when a customer is marked as deleted, the **__END_AT** column contains the value of when that customer was deleted and became inactive.


In [0]:
SELECT customer_id, address, name, __START_AT, __END_AT
FROM 2_silver_db.customers_silver_demo6
WHERE customer_id = 23617

customer_id,address,name,__START_AT,__END_AT
23617,0727 Michael Locks,Stephen Green,2021-11-22T17:06:38Z,


**BONUS:** The query below is a dynamic query to find **ALL** deleted records from the silver table (in run 2 that is one customer, the customer above) using a window function. The query:
  - Finds the latest record of each customer.
  - If the latest record of each customer has a date value in the **__END_AT** column it means that the customer has requested to be deleted.

In [0]:
-- Create a temporary view of the latest customer records for each customer
CREATE OR REPLACE TEMPORARY VIEW latest_customer_records AS
SELECT
  customer_id, 
  address, 
  name, 
  __START_AT, 
  __END_AT,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY __START_AT DESC) AS latest_record -- Get the latest record of each customer
FROM 2_silver_db.customers_silver_demo6
QUALIFY latest_record = 1;


-- Show all customers who were deleted (only 1 customer)
SELECT
  customer_id, 
  address, 
  name, 
  __START_AT, 
  __END_AT
FROM latest_customer_records
WHERE __END_AT IS NOT NULL;   -- Query for the latest value of each customer where it is not null. It will display all 'deleted' customers.

customer_id,address,name,__START_AT,__END_AT


5. To view your organization's most up-to-date customer data, you can query the materialized view **3_gold_db.current_customers_gold_demo6**. Remember, the query to create the materialized view filters for all **__END_AT** values that are `null` (active rows).

    Run the cell and view the results. Notice the following:
   - The current updated list of customers contains **948 rows**:
     - **939** from the initial file (**00.json**)
     - **+10** new customers from the update file (**01.json**)
     - **-1** deleted customer from the update file (**01.json**)
     - The table also contains the updated records from the **01.json** file.

**Gold Customers Materialized View**
```
CREATE MATERIALIZED VIEW 3_gold_db.current_customers_gold_demo6
COMMENT "Current updated list of active customers"
AS 
SELECT 
  * EXCEPT (processing_time),
  current_timestamp() updated_at
FROM 2_silver_db.customers_silver
WHERE `__END_AT` IS NULL;  
```

**NOTE:** For demonstration purposes, many of the metadata columns were kept in the materialized view.

In [0]:
SELECT customer_id, address, name, __START_AT, __END_AT
FROM 3_gold_db.current_customers_gold_demo6;

customer_id,address,name,__START_AT,__END_AT
22122,241 Dennis Springs,Cynthia Price,2021-12-31T20:25:14Z,
22141,80972 Johnson Island,Stephen Parker,2021-12-29T10:48:27Z,
22144,3772 Miller Junctions Apt. 383,Angela Frost,2021-12-29T10:17:00Z,
22161,8554 Summer Plain Suite 213,Brittany Schneider,2021-12-27T07:44:04Z,
22163,399 Jackson Villages,Jason Anderson,2021-12-27T22:56:52Z,
22169,610 Thompson Valleys,Steven Mullen,2021-12-27T14:56:48Z,
22195,471 Mcdonald Corner Apt. 354,David West,2021-12-24T08:49:56Z,
22196,845 Ralph Garden,Daniel Berger,2021-12-24T20:30:57Z,
22198,400 Bryant Mountain,David Lee,2021-12-23T20:10:42Z,
22203,137 Nicholas Vista Apt. 083,Shane Hensley,2021-12-23T11:45:43Z,


## Additional Resources

- [What is change data capture (CDC)?](https://docs.databricks.com/aws/en/dlt/what-is-change-data-capture)

- [APPLY CHANGES INTO](https://docs.databricks.com/gcp/en/dlt-ref/dlt-sql-ref-apply-changes-into) documentation

- [The AUTO CDC APIs: Simplify change data capture with Lakeflow Declarative Pipelines](https://docs.databricks.com/aws/en/dlt/cdc) documentation

- [How to implement Slowly Changing Dimensions when you have duplicates - Part 1: What to look out for?](https://community.databricks.com/t5/technical-blog/how-to-implement-slowly-changing-dimensions-when-you-have/ba-p/40568)


&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="blank">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy" target="blank">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use" target="blank">Terms of Use</a> | 
<a href="https://help.databricks.com/" target="blank">Support</a>