## Data Architecture in Training

This training uses **Unity Catalog Volumes for data storage**:

- **Source**: Unity Catalog Volumes (`/Volumes/ecommerce_platform_<user>/default/kion_datasets`)
- **Variable**: `DATASET_BASE_PATH`
- **Purpose**: Demonstration of advanced UC features (Lakeflow, Governance)
- **Example**: `spark.read.csv("/Volumes/ecommerce_platform_trainer/default/kion_datasets/customers/customers.csv")`

> **Note (2025)**: We use Unity Catalog Volumes instead of DBFS for better governance, security, and lineage tracking.

---

In [None]:
# === Catalog and Schema Configuration ===
# Get current user (for production environment)
# raw_user = spark.sql("SELECT current_user()").first()[0]
raw_user = "trainer"  # For training environment

import re

user_slug = re.split(r'[_@]', raw_user, 1)[0]

# Option 1: Catalog Isolation (Clean code: FROM bronze.orders)
# Requires CREATE CATALOG permissions
CATALOG = f"ecommerce_platform_{user_slug}"
BRONZE_SCHEMA = "bronze"
SILVER_SCHEMA = "silver"
GOLD_SCHEMA = "gold"
ISOLATION_MODE = "Catalog"

# For Databricks environment - Unity Catalog Volume:
DATASET_BASE_PATH = f"/Volumes/ecommerce_platform_trainer/default/kion_datasets"

print(f"Dataset base path: {DATASET_BASE_PATH}")

spark.sql(f'USE CATALOG {CATALOG}')

for s in [BRONZE_SCHEMA, SILVER_SCHEMA, GOLD_SCHEMA]:
    spark.sql(f'CREATE SCHEMA IF NOT EXISTS {CATALOG}.{s}')

# Optionally: Create volume for data if not exists (for training purposes)
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG}.default")

display(
    spark.sql(f"DESCRIBE CATALOG EXTENDED {CATALOG}")
)

print(f"✓ User slug: {user_slug}")
print(f"✓ Isolation mode: {ISOLATION_MODE}")
print(f"✓ Working catalog: {CATALOG}")
print(f"✓ Schemas: {BRONZE_SCHEMA}, {SILVER_SCHEMA}, {GOLD_SCHEMA}")