# Training Environment Setup

## Purpose
This notebook **validates the training environment** and exports variables used across all modules and labs.

**What it does:**
1. Retrieves the username from the Databricks session
2. Finds the pre-created catalog (`retailhub_{username}`)
3. Validates schemas (bronze, silver, gold) and Volume
4. Exports variables: `CATALOG`, `BRONZE_SCHEMA`, `SILVER_SCHEMA`, `GOLD_SCHEMA`, `DATASET_PATH`

**Requirements:**
- The trainer must run `00_pre_config.ipynb` before the training
- The participant must be a member of the training group

> **Note:** Do not modify this notebook -- it is called via `%run` from every module and lab.

---

In [None]:
# =============================================================================
# CONFIGURATION -- training environment constants
# =============================================================================
import re

CATALOG_PREFIX = "retailhub"
BRONZE_SCHEMA = "bronze"
SILVER_SCHEMA = "silver"
GOLD_SCHEMA = "gold"
DEFAULT_SCHEMA = "default"
VOLUME_NAME = "datasets"

## Step 1: User Identification

In [None]:
# =============================================================================
# STEP 1: Get the current user from the Databricks session
# =============================================================================
raw_user = spark.sql("SELECT current_user()").first()[0]
print(f"User: {raw_user}")

# Create a safe catalog suffix (email -> slug)
if any(sub in raw_user.lower() for sub in ["trainer","trener02", "krzysztof.burejza"]):
    user_slug = "trener"
else:
    user_slug = re.sub(r'[^a-zA-Z0-9]', '_', raw_user.split('@')[0]).lower()
    user_slug = re.sub(r'^[0-9_]+', '', re.sub(r'_+', '_', user_slug)).strip('_')

print(f"Slug: {user_slug}")

## Step 2: Catalog Validation

In [None]:
# =============================================================================
# STEP 2: Find and validate the Unity Catalog
# =============================================================================
CATALOG = f"{CATALOG_PREFIX}_{user_slug}"

catalogs = [row[0] for row in spark.sql("SHOW CATALOGS").collect()]

if CATALOG in catalogs:
    print(f"[OK] Catalog found: {CATALOG}")
    spark.sql(f"USE CATALOG {CATALOG}")
else:
    print(f"[ERROR] Catalog '{CATALOG}' does not exist!")
    print(f"\nAvailable catalogs with prefix '{CATALOG_PREFIX}':")
    for c in catalogs:
        if c.startswith(CATALOG_PREFIX):
            print(f"  - {c}")
    print("\nContact the trainer -- they need to run 00_pre_config.ipynb")
    raise Exception(f"Catalog '{CATALOG}' not found")

## Step 3: Schema Validation (Medallion)

In [None]:
# =============================================================================
# STEP 3: Verify Bronze/Silver/Gold schemas
# =============================================================================
schema_names = [row[0] for row in spark.sql(f"SHOW SCHEMAS IN {CATALOG}").collect()]
required_schemas = [BRONZE_SCHEMA, SILVER_SCHEMA, GOLD_SCHEMA, DEFAULT_SCHEMA]
missing = [s for s in required_schemas if s not in schema_names]

if missing:
    print(f"[ERROR] Missing schemas: {missing}")
    print("Contact the trainer.")
    raise Exception(f"Missing schemas: {missing}")
else:
    print(f"[OK] All schemas present: {', '.join(required_schemas)}")

## Step 4: Volume Validation

In [None]:
# =============================================================================
# STEP 4: Verify Volume and file access
# =============================================================================
DATASET_PATH = f"/Volumes/{CATALOG}/{DEFAULT_SCHEMA}/{VOLUME_NAME}"

try:
    files = dbutils.fs.ls(DATASET_PATH)
    print(f"[OK] Volume accessible: {DATASET_PATH}")
    print(f"Contents:")
    for f in files:
        print(f"   {f.name}")
except Exception as e:
    print(f"[ERROR] Cannot access Volume: {DATASET_PATH}")
    print(f"   Error: {e}")
    raise

## Step 5: Environment Summary

In [None]:
# =============================================================================
# STEP 5: Export variables -- available in all notebooks via %run
# =============================================================================

# Aliases for convenience
catalog = CATALOG
volume_path = DATASET_PATH

print("=" * 60)
print("TRAINING ENVIRONMENT READY")
print("=" * 60)
print()
print(f"  User:           {raw_user}")
print(f"  CATALOG:        {CATALOG}")
print(f"  BRONZE_SCHEMA:  {BRONZE_SCHEMA}")
print(f"  SILVER_SCHEMA:  {SILVER_SCHEMA}")
print(f"  GOLD_SCHEMA:    {GOLD_SCHEMA}")
print(f"  DATASET_PATH:   {DATASET_PATH}")
print()
print("=" * 60)