# Databricks SQL
<br>

<div style="float: right; width: 100%;">
  <img 
    src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/dbsql/sql-etl-hls-patient/Databricks%20SQL%20Intro.png?raw=true" 
    width="100%"
  >
</div>

<br>

# Migrate a Healthcare Data Warehouse and Build a Star Schema with Databricks

## **🎯 Scenario**

A hospital is migrating its legacy data warehouse to the **Databricks Lakehouse Platform** to modernize analytics and reduce operational complexity.

Two personas lead the effort:

- 🏗️ **Data Architect**:
    - Design data models (star or snowflake schema), considering performance and reporting.
    - Map source data, defining types and transformations.
    - Collaborate with stakeholders to translate business needs (KPIs, reporting) into logical and physical models.
    - Establish data governance and quality rules.
    - Ensure scalability.
- 🔧 **Data Engineer**:
    - Build and maintain data pipelines to ingest, transform, and load data into the data warehouse.
    - Design and develop ETL/ELT processes for efficient data flow.
    - Monitor and troubleshoot data pipelines for performance and reliability.
    - Implement data quality checks and validation processes.
    - Manage and optimize data warehouse infrastructure.
    - Automate data-related tasks and workflows.
    - Collaborate with data architects and analysts to understand data requirements.
    - Deploy and manage data pipelines in production environments.

This demo covers **Step 1**: creating and populating the patient\_dim dimension table. **Step 2** will involve building the full star schema and powering BI reports.

# End-to-End Data Warehousing Solution
<br>

<div style="float: right; width: 100%;">

<img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/dbsql/sql-etl-hls-patient/Databricks%20SQL%20Marketecture.png?raw=true" style="float: right" width="100%">

</div>

# 🛠 What We’ll Build

- Model the patient\_dim table
- Ingest raw patient data
- Clean and standardize the data
- Populate the SCD 2 Patient dimension
- Build idempotent (i.e. recoverable) pipelines


## 🔄 Workflow Overview

High level flow:

`Raw → Clean → patient_dim → Unity Catalog → Ready for fact joins`

**Note:** We will be relying on several SQL Centric Engine Capabilities. Check out the examples in [SQL Centric Capabilities Examples]($./sql-centric-capabilities-examples).

## ✅ Outcome

- patient\_dim is clean, queryable, and governed
- Analysts and BI users can join it with future fact tables (e.g. Patient Visit Fact Table)
- Foundation for the full star schema is in place

# Ok you're now ready, let's get started with the demo 

This example will create and populate a SCD Type 2 dimension using Databricks SQL.

Start with: [Patient Dimension ETL Introduction]($./01-patient-dimension-ETL-introduction)