
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>


# 1 - Creating a Job Using the Lakeflow Jobs UI

In this lesson, we will start by creating a job using a single notebook and SQL Query and exploring the Lakeflow Jobs UI.

In this demonstration, we will walk through the process of creating and running a Lakeflow Job in Databricks. 

The demo will include:

- Creating a new job with two tasks: one using a notebook and the other using a SQL query.
- Modifying task configurations.
- Exploring the Lakeflow Jobs UI to understand how to modify, monitor, and manage job runs.


## Learning Objectives
By the end of this lesson, you should be able to:
- Schedule a notebook task and Sql task in a Databricks Workflow Job
- Running a Job which have multiple task

## Data Overview 
We are going to use a retail dataset for this course across all demos. We have three different dimensions/data available: **customers data, sales data, and orders data** for our retail dataset.


## REQUIRED - SELECT CLASSIC COMPUTE (The cluster named 'labuser')

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:


1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.

   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## A. Classroom Setup

Run the following cell to configure your working environment for this course. It will also set your default catalog to **dbacademy** and the schema to your specific schema name shown below using the `USE` statements.
<br></br>


```
USE CATALOG dbacademy;
USE SCHEMA dbacademy.<your unique schema name>;
```

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

**NOTE:** If you want to use **Serverless** compute, make sure you are on the **latest version (version > 1)**. Otherwise, the setup will not work correctly.

- [Select an environment version](https://docs.databricks.com/aws/en/compute/serverless/dependencies#-select-an-environment-version).

In [0]:
%run ./Includes/Classroom-Setup-1

## B. Explore Your Environment

### B1. Explore your Class Schema

Complete the following to explore your **dbacademy.labuser** schema:

1. In the left navigation bar, select the catalog icon:  ![Catalog Icon](./Includes/images/catalog_icon.png)

2. Locate the catalog called **dbacademy** and expand the catalog.

3. Expand your **labuser** schema. 

4. Notice that within your schema no tables exist.

### B2. Explore your Source Catalogs

#### dbacademy_bank Catalog

Complete the following to explore your **dbacademy_bank** and **dbacademy_retail** catalogs. We will be ingesting tables and files from these locations during the demos and labs:

1. In the left navigation bar, select the catalog icon:  ![Catalog Icon](./Includes/images/catalog_icon.png)

2. Locate the catalog called **dbacademy_bank** and expand the catalog.

3. Expand your **v01** schema. 

4. Notice that within your schema a single volume named **banking** exists with a CSV file.

#### dbacademy_retail Catalog

1. In the left navigation bar, select the catalog icon:  ![Catalog Icon](./Includes/images/catalog_icon.png)

2. Locate the catalog called **dbacademy_retail** and expand the catalog.

3. Expand your **v01** schema. 

4. Notice that within your schema:
  - Multiple tables exist
  - In **Volumes** two volumes exist: **retail-pipeline** and **source_files**.

## C. Viewing Your Files 
Complete the following steps to review the notebook and SQL file you will use in this job. All files are located in the **Task Files** folder within the directory for the corresponding lesson number.

### C1. Viewing Notebook File
1. Navigate to (or click the link for) the notebook: [Task Files/Lesson 1 Files/1.1 - Creating orders table]($./Task Files/Lesson 1 Files/1.1 - Creating orders table).  
  - Review the notebook and note that it reads data from **dbacademy_retail.v01.sales_orders** and creates a simple table named **orders_bronze** in your designated **dbacademy.labuser** schema.


## D. Create the Job

Complete the steps below to create a Lakeflow Job with two tasks:

- A notebook task  
- A SQL file task


### D1. Generate your Job Configuration

1. Run the cell below to print out values you'll use to configure your job in subsequent steps. Make sure to specify the correct job name and Files.

    **NOTE:** The `DA.print_job_config` object is specific to the Databricks Academy course. It will output the necessary information to help you create the job.

In [0]:
DA.print_job_config(job_name_extension='Demo_01_Retail_Job', 
                    file_paths='/Task Files/Lesson 1 Files',
                    Files=[
                        '1.1 - Creating orders table'
                    ])

### D2. Create and Name the Job

Complete the following steps to create and name your job.

1. Right-click the **Jobs and Pipelines** button in the sidebar and select *Open Link in New Tab*.

2. In the new tab, confirm that you are in the **Jobs & Pipelines** tab.

3. Click the **Create** button and select **Job** from the dropdown.

4. In the top-left corner of the screen, you’ll see a default job name based on the current date and time (for example, *New Job Jul 29, 2025, 11:46 AM*).

5. Ensure the **Lakeflow Jobs UI** button is **ON**.

6. Change the **Job Name** to the one provided in the previous cell (for example: **Demo_01_Retail_Job_labuser123**).

7. Leave the job open and proceed to the next steps.

**NOTE:** If you click on a recommended task (like **Notebook**), you will be redirected to a different page than shown in the screenshot below.

![Lesson01_Jobs_UI.png](./Includes/images/Lesson01_Jobs_UI.png)

### D3. Create the Notebook Task

Complete the following steps to add a notebook task.

1. In the Lakeflow Jobs UI, You may see some task suggestion. For Eg., **Notebook** or **SQL File**

2. Select the **Notebook** task type.

3. Configure the task using the settings below:

| Setting         | Instructions |
|-----------------|--------------|
| **Task name**   | Enter **ingesting_orders** |
| **Type**        | Select **Notebook** |
| **Source**      | Choose **Workspace** |
| **Path**        | Use the file navigator to locate and select **Notebook #1**:<br>**./Task Files/Lesson 1 Files/1.1 - Creating orders table** |
| **Compute**     | Select a **Serverless** cluster from the dropdown menu.<br>(We will use Serverless clusters for all jobs in this course. You may specify a different cluster outside of this course, if needed.) <br></br>**NOTE**: If you selected your all-purpose cluster, you may get a warning about how this will be billed as all-purpose compute. Production jobs should always be scheduled against new job clusters appropriately sized for the workload, as this is billed at a much lower rate.
 |
| **Create task** | Click **Create task** |

4. Keep the Lakeflow Jobs UI open, you’ll be adding another task in the next step.
##### For better performance, please enable Performance Optimized Mode in Job Details. Otherwise, it might take 6 to 8 minutes to initiate execution.

<br></br>

#### Notebook Task Setup

![Lesson01_Notebook_task.png](./Includes/images/Lesson01_Notebook_task.png)





### D4. Create the SQL Query Task

Follow these steps to add a SQL file as a task:

1. In the Lakeflow Jobs UI, click **Add task**.

2. Select the **SQL query** task type.

3. Configure the task using the settings below:

| Setting           | Instructions |
|-------------------|--------------|
| **Task name**     | Enter **ingesting_sales** |
| **Type**          | Select **SQL** |
| **SQL task**      | Select **Query** |
| **SQL query**     | From the dropdown, choose the SQL file:<br>**1.2 - Creating sales table - SQL Query** |
| **SQL warehouse** | From the dropdown, select your SQL warehouse from drop-down menu |
| **Depends on**    | No task should be selected here.<br>(Unselect **ingesting_orders** if it is selected.) |
| **Create task**   | Click **Create task** |

<br></br>

#### SQL Task Setup

![Lesson01_task1_sql.png](./Includes/images/Lesson01_task1_sql.png)

### D5. Explore and Modify the Job Details

1. Navigate to the Job Details page. In the right pane, you will find the following job-level details:

- **Job Details:** Information such as Job ID, creator, and more.
- **Schedulers and Triggers:** View and configure various scheduling options and triggers for the job.
- **Job Parameters:** Options to declare parameters that apply to the entire job.


#### For better performance, please turn on Performance Optimized Mode in Job Details.

##### Performance Optimized Mode
- Enables fast compute startup and improved execution speed.

##### Standard Mode
- Disabling performance optimization results in startup times similar to Classic infrastructure and may reduce your costs.

## E. Run the Job

1. In the upper-right corner, find the kebab menu (three dots) next to the **Run now** button. You will see options such as **Edit as YAML**, **Clone job**, **View as code**, and **Delete job**.

2. Click **View as code** to see your job represented in three formats: YAML, Python (SDK and DABS), and JSON.

3. Return to the main job page and click the **Run now** button in the top right to start the job.

    **NOTE:** After starting the job, you can click the link to view the run in progress. In the next section, you will learn another way to view past and current job runs.

## F. Review the Job Run

1. On the Job Details page, click the **Runs** tab in the top-left corner of the screen (you should currently be on the **Tasks** tab).

2. In the Runs tab of your job, you can see detailed information about each run.
   At the top, there is a time-based bar chart where:

   - The X-axis represents each run.
   - The Y-axis shows the time taken by each task within that run.
3. Color Coding
   -    key: green = success
   -    red = failed
   -    yellow = waiting/retry, 
   -    pink = skipped,
   -    grey = pending/canceled/timeout.


Below the chart, you will find a tabular matrix view that provides the same information in detail. This table starts with the timestamp and includes fields such as run_id, run status, duration, and other relevant details for each run.

![Lesson01_view_runs.png](./Includes/images/Lesson01_view_runs.png)

4. Open the output details by clicking the timestamp under the **Start time** column:

   - If **the job is still running**, you will see the active state with a **Status** of **Pending** or **Running** in the right-side panel.

   - If **the job has completed**, you will see the full execution results with a **Status** of **Succeeded** or **Failed** in the right-side panel.

## G. View Your New Tables
1. From left-hand pane, select **Catalog**. Then drill down from **dbacademy** catalog.

2. Expand your unique schema name.

3. Notice that within your schema a table named **sales_bronze** and **orders_bronze**

##H. Query Your New Tables

In [0]:
%sql
-- Querying sales_bronze table
SELECT * 
FROM sales_bronze
LIMIT 50;

In [0]:
%sql
-- Querying orders_bronze table
SELECT * 
FROM orders_bronze
LIMIT 50;

## Additional Resources

- [Lakeflow Jobs Documentation](https://docs.databricks.com/aws/en/jobs/)

&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>