-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Using the Delta Live Tables UI

This demo will explore the DLT UI. By the end of this lesson you will be able to: 

* Deploy a DLT pipeline
* Explore the resultant DAG
* Execute an update of the pipeline

## Classroom Setup

Run the following cell to configure your working environment for this course.

In [0]:
%run ../Includes/Classroom-Setup-04.1

Python interpreter will be restarted.
Python interpreter will be restarted.


Resetting the learning environment...
...dropping the schema "dnchankov_bezt_dbacademy_delp_pipeline_demo"...(1 seconds)
...removing the working directory "dbfs:/mnt/dbacademy-users/dnchankov@abv.bg/data-engineer-learning-path/pipeline_demo"...(0 seconds)

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/data-engineer-learning-path/v01"

Validating the locally installed datasets...(2 seconds)
Creating & using the schema "dnchankov_bezt_dbacademy_delp_pipeline_demo"...(0 seconds)
Loading batch 1 of 31...1 seconds
Predefined tables in "dnchankov_bezt_dbacademy_delp_pipeline_demo":
  -none-

Predefined paths variables:
  DA.paths.working_dir:      dbfs:/mnt/dbacademy-users/dnchankov@abv.bg/data-engineer-learning-path/pipeline_demo
  DA.paths.user_db:          dbfs:/mnt/dbacademy-users/dnchankov@abv.bg/data-engineer-learning-path/pipeline_demo/database.db
  DA.paths.datasets:         dbfs:/mnt/dbacademy-datasets/data-engineer-learning-path/v01
  DA.paths.storage_locat

## Generate Pipeline Configuration
The configuration of your pipeline includes parameters unique to a given user.

You will need to specify which language to use by uncommenting the appropriate line.

Run the following cell to print out the values used to configure your pipeline in subsequent steps.

In [0]:
pipeline_language = "SQL"
# pipeline_language = "Python"

DA.print_pipeline_config(pipeline_language)

0,1
Pipeline Name:,
Source:,
Storage Location:,
Notebook #1 Path:,
Notebook #2 Path:,
Notebook #3 Path:,


In [0]:
# To run the cell when we're so far
# pipeline_language = "Python" 
DA.validate_pipeline_config(pipeline_language)

<img src="https://files.training.databricks.com/images/icon_hint_24.png"> **HINT:** You will want to refer back to the paths above for Notebook #2 and Notebook #3 in later lessons.

## Create and configure a pipeline

In this section you will create a pipeline using a single notebook provided with the courseware.

We'll explore the contents of this notebook in the following lesson.

We will later add Notebooks #2 & #3 to the pipeline but for now, let's focus on just Notebook #1:

1. Click the **Workflows** button in the sidebar
1. Select the **Delta Live Tables** tab.
1. Click the **Create Pipeline** button.
1. In the field **Product edition**, select the value "**Advanced**".
1. In the field **Pipeline Name**, enter the value specified in the cell above
1. In the field **Notebook Libraries**, copy the path for Notebook #1 specified in the cell above and paste it here.
    * Though this document is a standard Databricks Notebook, the syntax is specialized to DLT table declarations.
    * We will be exploring the syntax in the exercise that follows.
    * Notebooks #2 and #3 will be added in later lessons
1. Configure the **Source**
    1. Click the **Add configuration** button
    1. In the field **Key**, enter the word "**source**"
    1. In the field **Value**, enter the **Source** value specified in the cell above
1. In the field **Storage location**, enter the value specified in the cell above.
    * This optional field allows the user to specify a location to store logs, tables, and other information related to pipeline execution. If not specified, DLT will automatically generate a directory.
1. Set **Pipeline Mode** to **Triggered**.
    * This field specifies how the pipeline will be run.
    * **Triggered** pipelines run once and then shut down until the next manual or scheduled update.
    * **Continuous** pipelines run continuously, ingesting new data as it arrives.
    * Choose the mode based on latency and cost requirements.
1. Disable autoscaling by unchecking **Enable autoscaling**.
    * **Enable autoscaling**, **Min Workers** and **Max Workers** control the worker configuration for the underlying cluster processing the pipeline.
    * Notice the DBU estimate provided, similar to that provided when configuring interactive clusters.

Running in Local-Mode    
* We need to configure the pipeline to run on a local-mode cluster.
* In most deployments, we would adjust the number of workers to accomodate the scale of the pipeline.
* Our datasets are really small, we are just prototyping, so we can use a Local-Mode cluster which reduces cloud costs by employing only a single VM.

Local-Mode Setup:    
1. In the fields **Workers**, set the value to "**0**" (zero) - worker and driver will use the same VM.
1. Click **Add configuration** and then set the key to **spark.master** and the corresponding value to **local[*]**

<img src="https://files.training.databricks.com/images/icon_note_24.png"> **WARNING:** Setting works to zero and failing to configure **spark.master** will result in a failure to create your cluster as it will be waiting forever for a worker that will never be created.

Final Steps
1. Click the **Create** button
1. Verify that the pipeline mode is set to "**Development**"

## Run a pipeline

With a pipeline created, you will now run the pipeline.

1. Select **Development** to run the pipeline in development mode. Development mode provides for more expeditious iterative development by reusing the cluster (as opposed to creating a new cluster for each run) and disabling retries so that you can readily identify and fix errors. Refer to the <a href="https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#optimize-execution" target="_blank">documentation</a> for more information on this feature.
2. Click **Start**.

The initial run will take several minutes while a cluster is provisioned. Subsequent runs will be appreciably quicker.

## Exploring the DAG

As the pipeline completes, the execution flow is graphed. 

Selecting the tables reviews the details.

Select **orders_silver**. Notice the results reported in the **Data Quality** section. 

With each triggered update, all newly arriving data will be processed through your pipeline. Metrics will always be reported for current run.

## Land another batch of data

Run the cell below to land more data in the source directory, then manually trigger a pipeline update.

In [0]:
DA.dlt_data_factory.load()

Loading batch 2 of 31...1 seconds
Out[14]: True

As we continue through the course, you can return to this notebook and use the method provided above to land new data.

Running this entire notebook again will delete the underlying data files for both the source data and your DLT Pipeline. 

If you get disconnected from your cluster or have some other event where you wish to land(получаваш) more data without deleting things, refer to the <a href="$./DE 4.99 - Land New Data" target="_blank">DE 4.99 - Land New Data</a> notebook.

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>