# Idempotent Lakehouse Deployment

**Notebook Purpose**

This notebook orchestrates idempotent deployment and validation of core lakehouse layers (Ops, Bronze, Silver, Gold) inside the current Microsoft Fabric workspace. It retrieves workspace context at runtime, constructs a parallelized execution DAG, and calls a reusable utility notebook to get-or-create each lakehouse artifact with standardized configuration (schemas enabled, descriptions applied). The result is a consistent, automated foundation for downstream ingestion, transformation, and reporting pipelines.

In [None]:
# --- FETCH CONTEXT VARIABLES --- #
ctx = notebookutils.runtime.context # -> get context

# Extract the current workspace identifiers from the runtime context.
# These are used to ensure lakehouse artifacts are created/looked up in the correct workspace.
WORKSPACE_ID, WORKSPACE_NAME = (
    ctx.get("currentWorkspaceId"),
    ctx.get("currentWorkspaceName")
)

# Defensive check: context lookups can fail if the notebook is running in an unexpected
# execution environment (e.g., missing permissions, detached context, or non-standard runtime).
# Fail fast so downstream lakehouse deployment doesn't accidentally target an unknown workspace.
if not WORKSPACE_ID or not WORKSPACE_NAME:
    raise ValueError("Could not determine workspace from context")


# --- LAKE ARTIFACTS --- #
# Define the lakehouses you want to ensure exist (idempotent deploy / get-or-create pattern),
# along with human-readable descriptions to apply at creation time.
lakes = {
    "Ops": "Operations / Control Layer",
    "Bronze": "Data Landing Zone / Raw Data",
    "Silver": "Curated Base Data / Pre Business Logic",
    "Gold": "Reporting Layer"
}

# Build a list of "activities" (one activity per lakehouse).
# Each activity calls a child utility notebook ("util_get_lake") that is responsible for:
#   1) trying to retrieve the lakehouse artifact by name in the given workspace, and
#   2) creating it if it does not exist.
activities = [
    {
        "name": lh_name,
        "path": "util_get_lake",
        "timeoutPerCellInSeconds": 120,
        "args": {
            "name": lh_name,                  # Lakehouse name to get-or-create
            "ws_id": WORKSPACE_ID,            # Ensure the operation targets the current workspace
            "desc": desc,                     # Description used on creation
            "schemas": True,                  # Enable schemas on the lakehouse definition
        },
        "retry": 0,
        "retryIntervalInSeconds": 10
    }
    for lh_name, desc in lakes.items()
]

# Define the Directed Acyclic Graph (DAG) configuration for runMultiple.
DAG = {
    "activities": activities,
    "concurrency": min(4, len(activities)),  # Adjust if your fabric sku is a limiting factor; avoid over-parallelizing;
    "timeoutInSeconds": 240,                 # Total time allowed for the whole DAG
}

# --- EXECUTE LAKE DEPLOYMENT --- #
# Execute all activities according to the DAG definition.
results = notebookutils.notebook.runMultiple(
    DAG,
    {"displayDAGViaGraphviz": False}         # displayDAGViaGraphviz=False disables graph visualization output.
)

In [None]:
# --- PARSE DAG RESULTS --- #
import json

def parse_runmultiple(results: dict) -> dict:
    # 1) collect failures
    failures = {
        name: info["exception"]
        for name, info in results.items()
        if info.get("exception") is not None
    }
    if failures:
        msg = "\n".join(f"- {name}: {err}" for name, err in failures.items())
        raise RuntimeError(f"One or more lakehouse deployments failed:\n{msg}")

    # 2) decode exitVal JSON
    return {
        name: json.loads(info["exitVal"])
        for name, info in results.items()
    }


artifacts = parse_runmultiple(results)