
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Using the Delta Live Tables UI - PART 1 - Orders

This demo will explore the DLT UI. By the end of this lesson you will be able to: 

* Deploy a DLT pipeline
* Explore the resultant DAG
* Execute an update of the pipeline

This demonstration will focus on using SQL code with DLT. Python notebooks are available that replicate the SQL code.

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## A. Classroom Setup

Run the following cell to configure your working environment for this course. It will also set your default catalog to **dbacademy** and the schema to your specific schema name shown below using the `USE` statements.
<br></br>


```
USE CATALOG dbacademy;
USE SCHEMA dbacademy.<your unique schema name>;
```

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ./Includes/Classroom-Setup-1

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


Loading batch 1 of 31...2 seconds


True

## B. Explore Available Raw Files

Complete the following steps to explore the available raw data files that will be used for the DLT pipeline:

1. Navigate to the available catalogs by selecting the catalog icon directly to the left of the notebook (do not select the **Catalog** text in the far left navigation bar).

2. Expand the **dbacademy** catalog.

3. Expand the **ops** schema.

4. Expand the **Volumes** within the **ops** schema.

5. Expand the volume that contains your **unique username**.

6. Expand the **stream-source** directory. Notice that the directory contains three subdirectories: **customers**, **orders**, and **status**.

7. Expand each subdirectory. Notice that each contains a JSON file (00.json) with raw data. We will create a DLT pipeline that will ingest the files within this volume to create tables and materialized views for our consumers.

## C. Generate Pipeline Configuration
**Delta Live Tables (DLT) pipelines can be written in either SQL or python**. In this course, we have written examples in both languages. In the code cell below, note that we are first going to look at the SQL example. 

We are going to manually configure a pipeline using the DLT UI. Configuring this pipeline will require parameters unique to a given user. Run the cell to print out values you'll use to configure your pipeline in subsequent steps.

In [0]:
pipeline_language = "SQL"
# pipeline_language = "Python"

DA.print_pipeline_config(pipeline_language)

labuser10369179_1748065139: Example Pipeline


0,1
Pipeline Name:,
Notebook #1 Path:,
Notebook #2 Path:,
Notebook #3 Path:,
Default Catalog:,
Default Schema:,
Source:,


## PART 1 - Add a Single Notebook to a DLT Pipeline

### PART 1.1 - Create and Configure a Pipeline

Complete the following steps to configure the pipeline.

1. Open the **Pipelines** UI:
    - Find **Pipelines** under the **Data Engineering** section in the left navigation bar.
    - Right click on **Pipelines** to select *Open Link in a New Tab*. Make sure to open it in a new tab so you can continue to follow along this notebook.

2. Click **Create pipeline** in the upper-right corner and Select **ETL Pipeline** to create a DLT pipeline.

3. Configure the pipeline as specified below. You'll need the values provided in the cell output above for this step.

| Setting | Instructions |
|--|--|
| Pipeline name | Enter the **Pipeline Name** provided above |
| Serverless | Choose **Serverless** |
| Product edition (not needed with Serverless) | Choose **Advanced**  |
| Pipeline mode | Choose **Triggered** |
| Paths| Use the navigator to select or enter the path for ONLY **Notebook #1** from the cell provided above |
| Storage options | Choose **Unity Catalog** (should already be selected by default)  |
| Default Catalog | Choose your **Default Catalog** provided above (**dbacademy**) |
| Default schema | Choose your **Default schema** provided above (your unique schema name) |
| Configuration | Click **Add Configuration** and input the **Key** and **Value** using the table below |
| Channel | Choose **Current** |

#### Configuration Details
**NOTE:** The **source** key references the path to your raw files which reside in your volume. The **source** variable will be used in your notebooks to dynamically reference the volume location: 
- Source Volume Path Example: */Volumes/dbacademy/ops/\<your-unique-user-name-from-cell-above>/stream-source*

| Key                 | Value                                      |
| ------------------- | ------------------------------------------ |
| **`source`** | Enter the **Source** provided in the previous code cell |


</br>

4. Click the **Create** button to create the DLT pipeline. Leave the Pipelines UI open.

### PART 1.2 - Check Your Pipeline Configuration

![new_pipeline_editor](files/images/build-data-pipelines-with-delta-live-tables-2.1.3/new_pipeline_editor.png)

1. If necessary, in the Databricks workspace open the Pipelines (DLT) UI (**Workflows** -> **Pipelines**) and open your DLT pipeline.

2. Select **Settings** to access your pipeline configuration. 

3. Review the pipeline configuration settings to ensure they are correctly configured according to the provided instructions.

4. Once you've confirmed that the pipeline configuration is set up correctly and the maintenance cluster has been removed, proceed to the next steps for validating and running the pipeline.

5. Click **Save** in the bottom right corner.

**NOTE:** If you accidently enabled the new editor you can disable it by:
  - Going to your user icon at the top right.
  - Select **Settings**
  - Select **Developer**
  - Scroll to the bottom and turn off the following:

  ![turn_off_editor](files/images/build-data-pipelines-with-delta-live-tables-2.1.3/turn_off_editor.png)

Run the following cell to check if the pipeline has been set up correctly for the demonstration. This is a custom method specifically built for this course. Fix any specified issues if required.

In [0]:
DA.validate_pipeline_config(pipeline_language, num_notebooks=1)

labuser10369179_1748065139: Example Pipeline
Pipeline validation complete. No errors found.


#### NOTE - Additional Notes on Pipeline Configuration
Here are a few notes regarding the pipeline settings above:

- **Pipeline mode** - This specifies how the pipeline will be run. Choose the mode based on latency and cost requirements.
  - **`Triggered`** pipelines run once and then shut down until the next manual or scheduled update.
  - **`Continuous`** pipelines run continuously, ingesting new data as it arrives.
- **Notebook libraries** - Even though these documents are standard Databricks Notebooks, the SQL syntax is specialized to DLT table declarations. We will be exploring the syntax in the exercise that follows.
- **Storage location** - This optional field allows the user to specify a location to store logs, tables, and other information related to pipeline execution. If not specified, DLT will automatically generate a directory.
- **Catalog and Default schema** - These parameters are necessary to make data available outside the pipeline.
- **Configuration variables** - Key-value pairs that we add here will be passed to the notebooks used in the pipeline. We will look at the one variable we are using, **`source`**, in the next lesson. Please note that keys are case-sensitive.

### PART 1.3 -  Full Refresh, Validate, Start
1. Click the dropdown immediately to the right of the **`Start`** button. There are two additional options (other than "Start").

  - **Full refresh all** - All live tables are updated to reflect the current state of their input data sources. For all streaming tables, Delta Live Tables attempts to clear all data from each table and then load all data from the streaming source.

      --**IMPORTANT NOTE**--  
      Because a full refresh clears all data from your current tables and uses the current state of data sources, it is possible for you to lose data if your data sources no longer contain the data you need. Be very careful when running full refreshes.

 - **Validate** - Builds a directed acyclic graph (DAG) and runs a syntax check but does not actually perform any data updates.

### PART 1.4 - Validating Pipelines

#### 4a. Validate the Pipeline
1. Click the dropdown next to the **`Start`** button and click **`Validate`**. DLT builds a graph in the graph window and generates log entries at the bottom of the window. The pipeline should pass all checks and look similar to the image below.


![ValidateOneNotebookDLTPipeline](files/images/build-data-pipelines-with-delta-live-tables-2.1.3/ValidateOneNotebookDLTPipeline.png)

#### 4b. Introduce an Error
Let's introduce an error:

1. In the **`Pipeline details`** section (to the right of the DAG), click the **`Source code`** link. Our first source code notebook is opened in a new window. We will be talking about DLT source code in the next lesson. For now, continue through the next steps.
 
    - You may get a note that this notebook is associated with a pipeline. If you do, click the **"`x`"** to dismiss the dialog box.

2. Scroll to the first code cell in the notebook and remove the word **`CREATE`** from the SQL command. This will create a syntax error in this notebook.

    - Note that we do not need to "Save" the notebook.

3. Return to the pipeline definition and run **`Validate`** again by clicking the dropdown next to **`Start`** and clicking **`Validate`**.

4. The validation fails. Click the log entry marked in red to get more details about the error. We see that there was a syntax error. We can also view the stack trace by clicking the "+" button. 

5. Fix the error we introduced, and re-run **`Validate`**. Confirm there are no errors.

### PART 1.5 - Run a Pipeline

Now that we have the pipeline validated, let's run it.

1. We are running the pipeline in development mode. Development mode provides for more expeditious iterative development by reusing the cluster (as opposed to creating a new cluster for each run) and disabling retries so that you can readily identify and fix errors. Refer to the <a href="https://docs.databricks.com/aws/en/dlt/develop#overview-of-dlt-development-features" target="_blank">documentation</a> for more information on this feature.

2. Click **Start** to begin the pipeline run.

3. The pipeline will create the data in the **dbacademy** catalog within your unique schema.

4. While the DLT pipeline is running, let's examine Notebook **1 - Orders Pipeline** for the specified language and review the code.
  - [SQL Notebook 1 - Orders Pipeline]($./2A - SQL Pipelines/1 - Orders Pipeline)
  - [Python Notebook 1 - Orders Pipeline]($./2B - Python Pipelines/1 - Orders Pipeline)


**NOTE:** We are using Serverless clusters in this course for DLT pipelines. However, if you use a classic compute cluster with the DLT policy, the initial run will take several minutes while a cluster is provisioned. Subsequent runs will be appreciably quicker.

### PART 1.6 - Explore the DAG

As the pipeline completes, the execution flow is graphed. Selecting the streaming tables or materialized views reviews the details.

Complete the following:

1. Select **orders_silver**. Notice the results reported in the **Data Quality** section. 

**NOTE:** With each triggered update, all newly arriving data will be processed through your pipeline. Metrics will always be reported for the current run.

</br>

![RunOneNotebookDLTPipeline](files/images/build-data-pipelines-with-delta-live-tables-2.1.3/RunOneNotebookDLTPipeline.png)

Leave your Databricks environment open. We will discuss how to implement Change Data Capture (CDC) in DLT and add another notebook to the pipeline.


&copy; 2025 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>