# Lecture 33. Jobs (Hands On)

In this demo, we are going to see how to orchestrate jobs with **Databricks**. Databricks allows you to schedule one or multiple tasks as part of a job.

We are going to create a multi-task job consisting of three tasks.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - All tasks.jpg" style="width: 1280px">
</div>

First, executing a notebook that lands a new batch of data in our source directory.

Then, running our **Delta Live Tables** pipeline created last session to process this data through a series of tables.

And lastly, executing the notebook we created in the last session to show the pipeline results.


## Creating a Multi-Task Job

To create such a multi-task job, navigate to the **workflow tabs** on the sidebar.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs none.jpg" style="width: 1280px">
</div>

In the **jobs** tab, click the **Create Job** button.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - New Job.jpg" style="width: 1280px">
</div>

At the top, let us set a name for our job. For example, `Bookstore Demo Job`.


### Task 1: `Land_New_Data`

Then, we can start configuring our first **Task** in this job.
  - Fill in a task name. Say **Land_New_Data**.
  - For **Type**, select **Notebook**.
  - The notebook could be located in your Databricks workspace or in a repository. In our case, the notebook is in the **Workspace**.
  - For **Path**, we are going to select the [**Land_New_Data**](./Lecture-33__Jobs-1_Land-New-Data.ipynb) notebook in workspace.
    The notebook has nothing but just a call to the `land_new_data` function.
  - From the **Cluster** dropdown, let us select our **Demo Cluster**.
    However, for production jobs, we must use **job clusters** instead for cost saving.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - Task 1.jpg" style="width: 1280px">
</div>

Let us now click on **Create task**.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - Task 1 created.jpg" style="width: 1280px">
</div>

Now, we have a job with a single task.



### Task 2: DLT

Let us add another task for our **DLT pipeline** to be executed after the success of this first task.

To do so, click the blue circle button **+ Add task** to add a new task.

  - Enter *DLT* for the task name.
  - For **Type**, select **Delta Live Tables Pipeline**.
  - For **Pipeline**, select our demo pipeline `demo_bookstore` we created during our last session.
  - The **Depends On** field defaults to your previously defined task, **Land_New_Data** in our case, so leave this value as it is.

Click now on the **Create Task** button.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - Task 2 created.jpg" style="width: 1280px">
</div>

Now we see the two tasks and the dependencies between them.



### Task 3: Pipeline_Result

Lastly, let us add a third task for executing a notebook to show the pipeline results.

  - Enter **Pipeline_Results** for the task name.
  - For **Type**, select `Notebook`.
  - For **Path**, select the notebook [**Pipeline_Result**](./Lecture-33__Jobs-3_Pipeline-Result.ipynb) created in the last session.

    This notebook just shows the content of the pipeline storage location and queries our **gold table**.

  - From the **cluster** dropdown, select the *Demo Cluster*. 
  The **Depends On** field defaults again to your previously defined task, which is the **DLT** task in our case.

So leave this value as is.

Click now on the **Create Task** button.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - Task 3 created.jpg" style="width: 1280px">
</div>


We configured now our three tasks for this job.



## Job Configuration

### Scheduling the Job

On the right, we see the **Schedules & Triggers** section that allows us to schedule our job.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - All tasks.jpg" style="width: 1280px">
</div>

Click on the **Add trigger** button to explore the scheduling option.
We can change here the trigger type to: **Scheduled**.

And configure the schedule for this job.

You can edit the cron syntax as well.

For this demo, we will not set any schedule, so let us cancel this window.


### Job notifications

You can also set **email notifications**.

So you can be alerted on the job's start, success, and failure.


### Managing Permissions

In the **permission** section, you can control who can run, manage, or review the jobs, either a user or a group of users.

This also allows changing the owner of the job to another user (but of course not to a group of users).

That's all in terms of configuration.


## Running the Job

Let us click the **Run Now** button to start our job.

Now you can see your runs of your job on the **Job runs** tab.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Job runs.jpg" style="width: 1280px">
</div>

Let us click on the **Start Time** link to open the current run of this job.

The visualization for tasks will update in real-time to reflect which tasks are actively running.

<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - run succeeded - Graph.jpg" style="width: 1280px">
</div>

Our job has been successfully completed.

  - Let us click on the first task to show its results.

  <div  style="text-align: center; line-height: 0; padding-top: 9px;">
    <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - run succeeded - Land_New_Data run succeeded.jpg" style="width: 1280px">
  </div>

    Here we can see that we landed the parquet file number 7.

  - If we come back to the run output and you click on the **DLT** task.

    <div  style="text-align: center; line-height: 0; padding-top: 9px;">
      <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - run succeeded - DLT run succeeded.jpg" style="width: 1280px">
    </div>

    **DLT pipelines** scheduled as tasks do not directly render the results in the runs UI.

    Instead, you can click on the link to be directed back to the DLT pipeline UI to see the results.

    Here's the result of running our **DLT pipeline** on the new data.
    
    <div  style="text-align: center; line-height: 0; padding-top: 9px;">
      <img src="../../assets/images/Screen-Captures/Workflows - Delta Live Tables - demo_bookstore (new notebook added) completed.jpg" style="width: 1280px">
    </div>

  - And finally, if we click on the **Pipeline_Results** task, we can see the results of all the cells in our notebook.

    <div  style="text-align: center; line-height: 0; padding-top: 9px;">
      <img src="../../assets/images/Screen-Captures/Workflows - Jobs - Bookstore Demo Job - run succeeded - Pipeline_Results run succeeded.jpg" style="width: 1280px">
    </div>

### Error Handling Scenario

Let us see another scenario where we have some bad code in this notebook that causes our job to fail.

Let us query a table that does not exist.
For example, the `daily_customer_books` in the USA.
This table does not exist in our database.

Let us run again our job.

Indeed, the job has failed.

If we click on the **Pipeline_Results** task, we see the **Table Not Found** error.

### Repairing the Job

Let us correct this error and see how we can fix our job.

Let us correct the table name.
For example, to query the **France daily customer books** table that we are sure exists.

Now, if we come back to our job and click on the failed run.
We can see that we have the **Repair Run** button.
This is a great option that will allow us to rerun only the failed tasks.
Let us click this button.

Here, it shows the task to be rerun.
In our case, it's only the **Pipeline Results** task.
Let us click now **Repair Run**.

Great.
Our job has been repaired.



## Conclusion

That's all for orchestrating jobs in Databricks.

Don't forget to terminate the pipeline cluster and see you in the next video.