# Lab Manipulate Delta Tables

Follow the following blog to create 

https://www.databricks.com/blog/2022/11/07/load-edw-dimensional-model-real-time-databricks-lakehouse.html

# Databricks Advanced Lab: 


![Databricks Image](https://www.databricks.com/sites/default/files/inline-images/db-388-blog-img-2.png?v=1667403642)


## Delta Lakehouse for EDW – Inserts, Updates, Time Travel, Liquid Clustering, Optimization, and Dimensional Modeling

## 🎯 Objective

In this lab, you will:

* Create and modify Delta tables using a **star schema** design
* Insert and update **fact and dimension** records
* Explore table **version history and time travel**
* Apply **Liquid Clustering** using surrogate keys
* Optimize Delta tables for **real-time querying**
* Simulate **accidental deletions** and learn recovery via **restore**

You will **implement each step yourself** by following the blog instructions and applying them in this lab.

---

## 📋 Instructions

---

## Part 1: Set Up Your Database and Schema

* Use exisiting database (e.g., `tabular.dataexpert`).
* Create **dimension tables** such as `dim_product_firstname_lastname`, `dim_category_firstname_lastname`, and `dim_date_firstname_lastname`.
* Create a **fact table** named `fact_sales_firstname_lastname` with surrogate keys referencing the dimensions.


✅ Verify all tables are empty initially.

---

## Part 2: Insert Dimension Data

* Insert sample records into your dimension tables (`dim_product_firstname_lastname`, `dim_category_firstname_lastname`, `dim_date_firstname_lastname`).
* Ensure each dimension has a **surrogate key**.
* Populate meaningful attributes like product names, categories, and calendar dates.

✅ Query dimensions and verify that surrogate keys are unique and descriptive attributes exist.

---

## Part 3: Insert Fact Data

* Insert **multiple rows** into `fact_sales_firstname_lastname`, joining surrogate keys from dimension tables.
* Include transactional metrics like quantity sold, sales amount, and timestamps.
* Ensure referential integrity with dimensions.

✅ Confirm that fact data references valid dimension keys.

---

## Part 4: Apply Liquid Clustering

* Alter your fact and dimension tables to enable **Liquid Clustering** using `CLUSTER BY`.
* Cluster dimensions (e.g., `dim_product_firstname_lastname`) by their surrogate key.
* Cluster `fact_sales_firstname_lastname` by the **foreign keys** (e.g., `product_id`, `date_id`) and **timestamp**.

✅ After clustering, insert additional records and observe how Delta handles file layout dynamically.

---

## Part 5: Table Optimization

* Run `OPTIMIZE` on both fact and dimension tables.
* Focus on the **fact table**, especially after multiple inserts or updates.
* Write a short explanation of how optimization reduces file fragmentation and improves query speed.

✅ Compare performance (e.g., number of files) before and after `OPTIMIZE`.

---

## Part 6: Table History and Time Travel

* Use `DESCRIBE HISTORY` to explore the version timeline of your `fact_sales_firstname_lastname` table.
* Use `VERSION AS OF` or `TIMESTAMP AS OF` to query earlier versions of the table.

✅ Record at least one query result from a previous version.

---

## Part 7: Simulate Deletion and Restore

* Simulate an accidental `DELETE` on your fact table.
* Use Delta **table restore** from version or timestamp to recover the data.

✅ Confirm that data is successfully restored and time travel still works.

---

## 🧠 Bonus Challenge (Optional)

* Insert additional transactions post-restore.
* Then revert back using `RESTORE TABLE TO VERSION AS OF` to undo the new changes.

✅ Ensure that only the original records remain after the rollback.

---

## 📦 What to Submit

* A zipped SQL files:
  - SQL statements for all table creation and inserts
  - `DESCRIBE HISTORY` outputs
  - Time travel query outputs
  - `OPTIMIZE` result comments
  - Evidence of successful restore and rollback
* A paragraph summarizing the benefits of:
  - Using **Liquid Clustering** on surrogate keys
  - Running regular **OPTIMIZE** for performance

---

# 🏆 Skills You Will Build

| Skill Area               | Description                                                  |
| :----------------------- | :----------------------------------------------------------- |
| Delta Table Management   | Create, Insert, Update Delta tables in a star schema         |
| Dimensional Modeling     | Apply EDW principles using Lakehouse architecture            |
| Time Travel & Versioning | Query Delta tables by version or timestamp                   |
| Liquid Clustering        | Dynamically organize tables by keys for scalable performance |
| Delta Optimization       | Compact files and improve metadata for efficient queries     |
| Table Recovery           | Use `RESTORE` to recover from data loss or corruption        |

---

✅ **Important Reminder:**
Use **SQL syntax only** throughout this lab unless otherwise instructed. Refer to [this blog](https://www.databricks.com/blog/2022/11/07/load-edw-dimensional-model-real-time-databricks-lakehouse.html) for real-time EDW star schema guidance.
