
## Lab: implementing a DLT pipeline

> This notebook is **not intended** to be executed interactively, but rather to be deployed as a DLT pipeline from the **workflows** tab


* Help: <a href="https://docs.databricks.com/en/delta-live-tables/tutorial-sql.html" target="_blank">DLT syntax documentation</a>.


<div  style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://raw.githubusercontent.com/derar-alhussein/Databricks-Certified-Data-Engineer-Associate/main/Labs/Includes/images/school_schema.png" alt="School Schema" style="width: 600">
</div>

In [0]:
SET datasets.path=dbfs:/mnt/DE-Associate/datasets/school;

#### Q1- Declaring Bronze Tables

Declare a streaming live table, **`enrollments_bronze`**, that ingests JSON data incrementally using Auto Loader from the directory **"${datasets.path}/enrollments-json-raw"**

In [0]:
CREATE ____________________
AS SELECT * FROM cloud_files("${datasets.path}/enrollments-json-raw", "json",
                             map("cloudFiles.inferColumnTypes", "true"))

Declare a live table, **`students_bronze`**, that load data directly from JSON files in the directory **"${datasets.path}/students-json"**

In [0]:
CREATE ____________________
AS SELECT * FROM json.`${datasets.path}/students-json`

#### Q2 - Declaring Silver Table

Declare a streaming live table, **`enrollments_cleaned`**, that:

1. Enrich the **enrollments_bronze** data through an inner join with the **`students_bronze`** table on the common **`student_id`** field to obtain the student's country
1. Implement quality control by applying a constraint to drop records with a null **`email`**
1. The table will have the following schema:

| Field | Type |
| --- | --- |
| **`enroll_id`** | **`STRING`** |
| **`total`** | **`DOUBLE`** |
| **`email`** | **`STRING`** |
| **`country`** | **`STRING`** |


In [0]:
CREATE OR REFRESH STREAMING LIVE TABLE enrollments_cleaned
  (CONSTRAINT ____________________ ON VIOLATION ____________________ )
AS SELECT enroll_id, total, email, profile:address:country as country
  FROM ____________________ n
  INNER ____________________ s
    ON n.student_id = s.student_id

### Q3- Declaring Gold Table

Declare a live table, **`course_sales_per_country`** against **`enrollments_cleaned`** that calculate per **`country`** the following:
* **`enrollments_count`**: the number of enrollments
* **`enrollments_amount`**: the sum of the total amount of enrollments

Add a comment to the table: "Course Sales Per Country"

In [0]:
CREATE OR REFRESH LIVE TABLE course_sales_per_country
  COMMENT ____________________
AS SELECT ____________________


### Q4- Deploying DLT pipeline

From the **Workflows** button on the sidebar, under the **Delta Live Tables** tab, click **Create Pipeline**

Configure the pipeline settings specified below:

| Setting | Instructions |
|--|--|
| Pipeline name | School DLT |
| Product edition | Choose **Advanced** |
| Pipeline mode | Choose **Triggered** |
| Source code | Use the navigator to select this current notebook (4.1L - Delta Live Tables) |
| Storage location | dbfs:/mnt/DE-Associate/dlt/school |
| Target schema | DE_Associate_School_DLT |
| Cluster policy | Leave it **None**|
| Cluster mode | Choose **Fixed size**|
| Workers | Enter **0**|
| Photon Acceleration | Leave it unchecked |
| Advanced Configuration | Click **Add Configuration** and enter:<br> - Key: **datasets.path** <br> - Value: **dbfs:/mnt/DE-Associate/datasets/school** |
| Channel | Choose **Current**|

Finally, click **Create**.

### Q5 - Run your Pipeline

Select **Development** mode and Click **Start** to begin the update to your pipeline's tables