# Data Warehousing Comprehensive Lab

This lab will guide you through creating a complete pipeline in Databricks, leveraging Delta Lake, data ingestion techniques, transformations, dashboards, and Databricks Genie. The goal is to give you hands-on experience with the Databricks platform.

**Learning Objectives**

By the end of this lab, you will:
- Create Delta tables and explore Delta Lake features like Time Travel and Version History.
- Perform data ingestion using techniques - Upload UI.
- Clean and transform datasets into Bronze, Silver, and Gold layers.
- Visualize insights using Databricks Dashboards.
- Leverage Databricks Genie for data exploration and analysis.


## Task 1 - Creating Delta Tables and Exploring Delta Lake Features
In this task, you will learn how to create Delta tables and explore the advanced features of Delta Lake.


### 1.1 Create the `sales_table` Delta Table
Follow these steps to create a Delta table from a CSV file and explore its features:
1. Create the Delta table by reading data from the CSV file.
2. Verify the table creation by selecting a sample of the data.

In [0]:
---- Drop the table if it already exists for demonstration purposes
DROP TABLE IF EXISTS sales_table;

---- Create a Delta table using the CSV file
CREATE TABLE sales_table USING DELTA
AS
SELECT *
FROM read_files(
  '${DA.paths.datasets.retail}/source_files/sales.csv',
  format => 'csv',
  header => true,
  inferSchema => true
);

---- Select from the newly created table
SELECT * FROM sales_table;

### 1.2 Enable Column Mapping and Modify the Table
In this step, you will enhance the functionality and structure of the sales_table Delta table by enabling column mapping and modifying the schema. Column mapping is essential for managing schema evolution and ensuring data consistency in Delta Lake. Follow these steps:

- **Step 1: Enable Column Mapping:**

  Set the table properties to enable column mapping. This feature allows you to rename columns, manage schema changes, and maintain backward compatibility for readers.

- **Step 2: Drop Unnecessary Columns:**

  Remove the `_rescued_data` column, which is often added to capture extra data during schema inference but may not be required for further analysis.

In [0]:
---- Enable column mapping on the Delta table
ALTER TABLE sales_table SET TBLPROPERTIES (
   'delta.minReaderVersion' = '2',
   'delta.minWriterVersion' = '5',
   'delta.columnMapping.mode' = 'name'
);

---- Drop the column after enabling column mapping
ALTER TABLE sales_table DROP COLUMNS (_rescued_data);

- **Step 3: Add and Update a New Column:**

  Add a new column named `discount_code` to the table schema and populate it with values based on conditions. In this step:

    - Assign `Discount_20%` to rows where the `product_category` is `'Ramsung'`.
    - Assign `N/A` to all other rows.

In [0]:
---- Alter the table by adding a new column
ALTER TABLE sales_table ADD COLUMNS (discount_code STRING);

---- Update the newly added column with data
UPDATE sales_table
SET discount_code = CASE
  WHEN product_category = 'Ramsung' THEN 'Discount_20%'
  ELSE 'N/A'
END;

- **Step 4: View Table History:**
  
  Use the `DESCRIBE HISTORY` command to view the version history of the table.

In [0]:
---- Display the history of changes made to the sales_table
DESCRIBE HISTORY sales_table;

### 1.3 Restore the Table Using Time Travel

Delta Lake's time travel feature allows you to access and restore previous versions of a Delta table. This is useful for scenarios such as data recovery, debugging, or auditing changes.

In this sub task, you will restore the `sales_table` Delta table to a specific version using the `RESTORE TABLE` command.

In [0]:
---- Restore the sales_table to previous version
RESTORE TABLE sales_table TO VERSION AS OF 3;

## Task 2 -  Data Ingestion Techniques
In this task, you will learn how to ingest data into Databricks using the UI. This includes downloading a dataset, uploading it to your schema, and creating a Delta table.

### 2.1 - Uploading Data and Creating a Delta Table using UI

1. Download the `customers.csv` data file by following [this link](/ajax-api/2.0/fs/files/Volumes/dbacademy_retail/v01/source_files/customers.csv). This will download the CSV file to your browser's download folder.
1. Using the the [Catalog Explorer](/explore/data/dbacademy) user interface, create a table named *customers_ui* in your schema, using the file you just downloaded.

- **Step 1: Verify the Table Creation**

  After successfully creating the Delta table, you can verify its creation and view a sample of the data by following these steps:

Use the `SHOW TABLES` command to display all tables in the current schema and confirm that `customers_ui` exists.

In [0]:
---- Show all tables in the current Schema
SHOW TABLES;

Use the `SELECT` statement to retrieve and display the first 10 records from the `customers_ui` table to ensure the data has been ingested correctly.


In [0]:
---- Display the first 10 records from the customers_ui table
SELECT * FROM customers_ui LIMIT 10;

### 2.2 Create Table as Select (CTAS)

In this step, we create the `customers_ui_bronze` Delta table by selecting data from `customers_ui` and applying transformations.

In [0]:
---- Drop the customers_ui_bronze table if it already exists
DROP TABLE IF EXISTS customers_ui_bronze;
---- Create a new Delta table
CREATE TABLE customers_ui_bronze USING DELTA AS
SELECT *, 
  CAST(CAST(valid_from / 1e6 AS TIMESTAMP) AS DATE) AS first_touch_date, 
  CURRENT_TIMESTAMP() AS updated,
  _metadata.file_name AS source_file
FROM customers_ui;

---- Verify the data in the newly created table
SELECT * FROM customers_ui_bronze LIMIT 10;

## Task 3 - Data Transformation
In this task, you will transform the data in your Delta tables to create the Silver and Gold tables. These transformations will clean, enrich, and join the data to provide valuable insights for analytics and reporting.

###3.1 Create the Silver Table

The Silver table represents a refined layer with cleaned and enriched data derived from the Bronze table. 

Follow these steps:
- Transform the `customers_ui_bronze` table to clean and enrich the data.
- Create a new column, `loyalty_level`, that categorizes customers based on their loyalty segment.
- Save the results as the `customers_ui_silver` table.

In [0]:
---- Create or replace the Silver table
CREATE OR REPLACE TABLE customers_ui_silver AS
SELECT 
  customer_id,
  customer_name,
  state,
  city,
  units_purchased,
  loyalty_segment, ---- Selecting relevant columns from the Bronze table.
  CASE 
    WHEN loyalty_segment = 1 THEN 'High'
    WHEN loyalty_segment = 2 THEN 'Medium'
    ELSE 'Low'
  END AS loyalty_level  ---- Adding a new column, loyalty_level, based on the loyalty_segment values.
FROM customers_ui_bronze;

---- Verify the Silver table
SELECT * FROM customers_ui_silver LIMIT 10;

###3.2 Create the Gold Table

The Gold table represents a business insights layer, created by joining the Silver table with the `sales_table`.

Follow these steps:

- Join the `customers_ui_silver` table with the `sales_table` on the `customer_id` column.
- Select key metrics and dimensions required for analytics and save the result as the `customers_ui_gold` table.

In [0]:
---- Create or replace the Gold table
CREATE OR REPLACE TABLE customers_ui_gold AS
SELECT 
  c.customer_id,
  c.customer_name,
  c.loyalty_level,
  s.product_category,
  s.product_name,
  s.total_price,
  s.order_date
FROM customers_ui_silver c
JOIN sales_table s ---- Joining the customers_ui_silver table with the sales_table on the customer_id column.
ON c.customer_id = s.customer_id; ---- Selecting key attributes from both tables to create a comprehensive insights layer.

-- Verify the Gold table
SELECT * FROM customers_ui_gold LIMIT 10;

## Task 4 - Visualization with Dashboards
In this task, you will create a dashboard in Databricks to visualize insights derived from the Gold table. The task involves adding datasets, creating visualizations, and exploring the dashboard using Databricks Genie.

###4.1: Create a New Dashboard
Follow these steps to create a new dashboard:
* Navigate to **Dashboards** in the side navigation panel.
* Select **Create dashboard**. 
* At the top of the resulting screen, click on the Dashboard name and change it to **Customer_Sales Dashboard**.

### 4.2: Adding Data to the Dashboard

To create visualizations, you need to associate datasets with the dashboard. Complete the following steps:

1. Navigate to the **Data** tab in the dashboard.
2. Use the **+ Select a table** button to add datasets. 
3. Search for and select the **`customers_ui_gold`** table from {DA.schema_name}.{DA.schema_name}and click **Confirm**. The table will appear in your dataset list.

You can modify the SQL query associated with each dataset in the query editing panel to customize the data.

### 4.3: Visualization - Combo Chart
Visualize the insights by creating a Combo Chart that displays total sales value and sales order counts over a three-month span.

**Steps to Create the Combo Chart:**

1. In the **Data** tab, select the **+ Create from SQL** option.
2. Enter and execute the following SQL query (replace `{DA.catalog_name}.{DA.schema_name}` with your actual **catalog name** and **schema name**):

    ```sql
    SELECT customer_name, 
           total_price AS Total_Sales, 
           date_format(order_date, "MM") AS Month, 
           product_category 
    FROM {DA.catalog_name}.{DA.schema_name}.customers_ui_gold 
    WHERE order_date >= to_date('2019-08-01')
    AND order_date <= to_date('2019-10-31');
    ```

3. Rename the query to **Three Month Sales** and save it.
4. Switch to the **Canvas** tab and click **Add a visualization** at the bottom.
5. Select the **Three Month Sales** dataset and choose the **Combo** chart as the visualization type.
6. Configure the chart settings:
    - **X axis:** Month
    - **Bar:** Total_Sales (Rename to **Total Sales Value**)
    - **Line:** COUNT(`*`) (Rename to **Count of Sales Orders**)

7. Enable **dual axis** from the Y-axis configuration menu.
8. Change the left Y-axis format to **Currency ($)**.

This visualization will show the correlation between sales volume and total sales value for each month.

### 4.4: Creating a Genie Space from a Dashboard

Databricks Genie allows you to explore data directly from the dashboard in a conversational interface.

**Steps to Create a Genie Space:**

1. Open the **Retail Dashboard** you created.
2. Switch to the **Draft** view.
3. Click the kebab menu (three vertical dots) in the upper-right corner and select **Open Draft Genie space**.
4. In the chatbox, ask:

    `
    What tables are there and how are they connected? Give me a short summary.
    `

5. Review the response provided by Genie to understand the data relationships and structure.

---

By completing this task, you have successfully created a visual dashboard to analyze business insights and leveraged Genie for exploratory analysis.

## Conclusion
Congratulations on completing the **Data Warehousing Comprehensive Lab**! Throughout this lab, you gained hands-on experience with Databricks to build and analyze a complete data pipeline, leveraging the robust features of Delta Lake, Databricks Dashboards, and Databricks Genie.


&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="blank">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy" target="blank">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use" target="blank">Terms of Use</a> | 
<a href="https://help.databricks.com/" target="blank">Support</a>