#**CHAPTER 2.SUITABILITY GATES AND GATE REFUSALS**
---

##REFERENCE

https://chatgpt.com/share/6995de77-c208-8012-afb8-94e130ca535d

##0.CONTEXT

This notebook is a deliberately constrained exercise in professional boundary-setting for finance workflows. It is not “an AI that gives financial advice.” It is a state-driven routing system that decides, under explicit governance rules, whether a request is suitable for a general educational response, requires more information before continuing, must be refused with a redirect, or should be escalated for human review. The intent is pedagogical and architectural: to teach how to build agentic systems that behave like audited decision processes rather than improvisational chatbots. In practice, most failures in finance-facing AI do not come from “bad math.” They come from blurred scope, uncontrolled escalation, and outputs that drift into personalized recommendations without adequate suitability context. Notebook 2 operationalizes a hard truth that financial professionals already know: the first job of any decision system is to know when it is not allowed to decide.

The example is intentionally simple in surface form (a user asks for a “specific stock to buy today to double money quickly”), but the underlying objective is complex and professionally relevant. In real organizations, requests arrive as messy, underspecified, and sometimes explicitly inappropriate. A client, a colleague, or an internal stakeholder asks for something that sounds like a decision (“what should we buy?”, “should I refinance?”, “can we structure this to avoid tax?”, “how do we hide this exposure?”). The temptation is to answer quickly. The professional requirement is to route correctly. Suitability is not optional; it is a control constraint. This notebook turns that constraint into a first-class routing mechanism, implemented as a graph with explicit states and auditable transitions. The “answer” is downstream; the “gate” is the lesson.

For financial professionals, this exercise maps directly to daily work. Wealth management and brokerage contexts require suitability and best-interest obligations; research and trading contexts require controls against market manipulation, insider trading, and misinterpretation of commentary as a recommendation; corporate finance contexts require controls around MNPI, confidentiality, and disclosure risk; treasury and risk contexts require escalation and documentation when decisions cross policy lines. Even when a professional is not “regulated” in a formal licensing sense, the organization is regulated through policies, supervisory procedures, and audit expectations. The core relevance is that modern finance is increasingly mediated by systems, and systems are accountable. If an LLM is used anywhere near a decision boundary, you need an architecture that can prove what it did, why it did it, and what it refused to do. This notebook is the smallest possible architecture that forces that proof.

The model in this notebook does four things, and it does them in a specific order. First, it performs **intake**: it converts the user’s free-form request into a structured profile (topic, request type, time horizon, risk tolerance, constraints, missing critical items). Second, it performs a **suitability gate**: it classifies the request into one of four decisions: **ALLOW_GENERAL**, **NEED_MORE_INFO**, **REFUSE**, or **HUMAN_REVIEW**. Third, it generates a response strictly consistent with that decision: either general educational guidance, targeted missing-info questions, a refusal with safe alternatives, or an ambiguity clarification message. Fourth, it terminates explicitly and exports artifacts that allow an auditor or instructor to reconstruct the run. This is not a “chat.” It is a state machine with a language model as an internal component.

Equally important are the system’s non-capabilities, because they are part of the safety boundary. This notebook cannot and will not produce personalized investment recommendations or specific trade instructions. It will not tell the user what to buy or sell, where to enter, where to place a stop, or how to “double money quickly.” It will not help with illegal or harmful requests (insider trading, manipulation, laundering, tax evasion). It does not claim to be a regulated advisor, and it does not pretend to know facts it has not been given. It also does not verify external market data, compute a backtest, or infer a user’s financial profile from vibe. In other words, it cannot do the seductive part. It can only do the professional part: classify, gate, and respond within scope. That is not a limitation of ambition; it is the point of the architecture.

A key lesson for practitioners is that “what the model can do” and “what the system is allowed to do” are different categories. A general-purpose LLM can produce a plausible-sounding recommendation in a single pass. A professional system must decide whether it should produce anything at all, and it must do so under rules that can be inspected. Notebook 2 therefore elevates suitability to the top of the graph, not as a disclaimer bolted onto the final message. The decision is made early, based on structured state, and it drives the route taken through the graph. This aligns with real operational controls: the suitability checklist precedes the recommendation, not after it.

The “how” is architectural, not rhetorical. The notebook uses **LangGraph** to build an explicit directed graph with an entry node, conditional edges, and an explicit END. Each node is wrapped in an **AgentNode** abstraction so the graph executes small, composable functions rather than a single monolithic prompt. The system state is a strict **TypedDict** with audit fields, inputs, extracted profile, suitability assessments, decision, response text, and a trace log. This design is central: a professional system is not “prompt-in, answer-out.” It is “state-in, transition, state-out.” The language model is used to produce structured JSON objects that populate the state. Routing decisions are made using the state, and the final output is a function of the decision, not a free-form continuation of the conversation.

The notebook highlights two governance mechanisms that matter in practice. The first is **hard branching**. Once the suitability gate decides **REFUSE**, the graph routes to a refusal node and terminates. Once it decides **NEED_MORE_INFO**, the graph routes to a targeted-question node and terminates. There is no open-ended continuation where the model can “accidentally” provide advice while asking questions. This is the architectural equivalent of a compliance hold: the system simply does not have a path that leads to a recommendation under missing suitability data. The second mechanism is **early termination** via an explicit END node. The system does not “keep chatting” after it asks for missing information. It stops. In professional workflows, stopping is a feature: it forces a new input, a new review step, or a supervised handoff.

A second architectural layer is the **bounded consistency pass** inside the suitability node. In Notebook 2, this is intentionally small (two passes, strictly bounded) and deterministic in configuration. The purpose is not to chase perfect classification; it is to demonstrate a pattern used in high-stakes routing: if two independent suitability assessments disagree, the system escalates to **HUMAN_REVIEW** rather than guessing. This is the appropriate posture for professional systems. You do not “average” compliance. You escalate. The notebook makes the escalation explicit and traceable, and it keeps the loop bounded so the system remains fast and predictable in a classroom setting.

An additional pragmatic constraint is provider overload and operational continuity. In production, external model APIs can return transient errors under load. Notebook 2 treats this as part of the engineering reality and enforces bounded retries. More importantly for teaching and for audit, the notebook can run in a deterministic simulator mode when the provider is overloaded. The simulator does not pretend to be the model; it produces structured outputs using explicit rules and marks them as simulated. This matters because professional reliability is not the same as model availability. The graph topology, state transitions, artifacts, and routing logic must remain testable even when the external service is degraded. The simulator is therefore not a “hack.” It is a controlled contingency that preserves the learning objective: the architecture still runs, the outputs remain inspectable, and the artifacts still capture what happened.

The learning artifact is the graph visualization. The diagram is not decoration; it is a compact representation of governance intent. A professional can look at the graph and immediately answer: Where is suitability evaluated? What decisions are possible? Which branches terminate? Where does escalation happen? If an auditor asks, “How do you prevent the model from giving trade recommendations?”, the answer is not a paragraph of policy text. It is the topology: there is no path from suitability failure to a recommendation node. This is why the notebook enforces a hardened Mermaid renderer and requires the diagram to match topology exactly. The diagram is the interface between engineering and supervision.

From a finance lens, the notebook teaches a transferable pattern: **policy-first orchestration**. You can replace the “personal finance triage” content with a research compliance gate, an MNPI gate, a suitability gate for product complexity, a leverage/collateral constraint gate, or a disclosure-risk gate for investor communications. The pattern remains the same: intake → gate → route → terminate → export artifacts. That is why this exercise belongs early in the series. Before you add tools, backtests, retrieval, committees, or event-driven workflows, you must master the simplest form of control: deciding whether the system is allowed to proceed.

Finally, the notebook’s value is that it makes the system accountable by construction. At the end of every run, it exports three artifacts. **run_manifest.json** captures configuration, environment fingerprint, and run identity. **graph_spec.json** captures topology, node list, conditional edges, and the exact Mermaid diagram used for visualization. **final_state.json** captures the full terminal state: extracted profile, suitability decisions, routing outcome, response text, and trace events. This is what “auditable AI” means in finance: not a promise of correctness, but a guarantee of reconstructability. A reviewer can read the artifacts and understand what happened without replaying the conversation or trusting the model’s memory. The architecture does not eliminate risk; it makes risk reviewable.

In summary, Notebook 2 is an agentic boundary system: it demonstrates how to build a state-driven compliance gate that routes requests into safe, professional actions and terminates explicitly. The model is a component, not the system. The system is the graph: typed state, deterministic transitions, bounded loops, conditional routing, explicit END, and exported artifacts. For financial professionals, this is the difference between an impressive demo and a deployable mechanism: a system that knows when to stop.


##1.LIBRARIES AND ENVIRONMENT

**Cell 1 — Install + core imports (what it is doing and why it exists)**

This first cell does two quiet but professionally critical things: it establishes a controlled execution environment, and it makes explicit which libraries define the notebook’s “system boundary.” In finance, most operational failures in analytical notebooks come from environment drift: a package update changes a method signature, a dependency pulls a different transitive version, or a tool behaves differently across machines. By installing LangGraph and Anthropic explicitly at the start, you are signaling that the notebook is not a “loose script,” but a reproducible mechanism. The `pip` line is the minimal scaffolding required to make the rest of the notebook executable in Colab without hidden setup steps. This matters in classrooms and in professional teams because it reduces “works on my machine” failure modes and makes the run path auditable.

After installation, the imports are intentionally conservative and explicit. You are not importing a pile of convenience packages; you are importing only what you need to implement a state-driven graph system and to export artifacts. The standard library modules (`json`, `os`, `sys`, `platform`, `hashlib`, `uuid`, `re`, and `datetime`) support audit logging, deterministic run identity, and controlled parsing. The typing imports (`TypedDict`, `Literal`, etc.) force you to define the state schema as a contractual interface rather than letting it emerge implicitly. In regulated or supervised environments, “implicit state” is a governance risk because it becomes impossible to prove what data existed at which stage of a run.

Finally, the key conceptual import is `StateGraph` and `END` from LangGraph. This is where the notebook declares its architectural premise: routing will be done by a graph, not by ad-hoc if/else logic spread through a script, and not by “let the model decide” prompt improvisation. `END` is also imported early to emphasize that termination is part of the design, not an afterthought. In Notebook 2, early termination is the mechanism: once the suitability decision is made, the system must stop after producing the appropriate response. Cell 1 therefore sets the stage: we are building an auditable, state-driven, graph-routed system, not a chatbot.


In [None]:
# CELL 1/10 — Install + core imports (Colab-ready)
!pip -q install "langgraph>=0.2.0" "anthropic>=0.34.0"

import json, os, sys, platform, hashlib, uuid, re
import datetime as _dt
from typing import TypedDict, NotRequired, Literal, Dict, Any, List, Optional, Callable, Tuple

from langgraph.graph import StateGraph, END


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/455.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m455.6/455.6 kB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[?25h

##2.CONFIGURATION

###2.1.OVERVIEW

**Cell 2 — Configuration + deterministic run identity + environment fingerprint (what it is doing and why it exists)**

Cell 2 is where you formalize governance as data. Rather than embedding behavior in scattered constants or hidden assumptions, you define a single configuration object that will be hashed, logged, and exported. This makes the notebook behave like a controlled experiment: every run has an identity, and that identity is linked to the parameters that shaped the system’s decisions. In finance, this is the difference between a demonstration and a reviewable process. Without a manifest, you cannot answer basic questions like: “Which model did we use?”, “What temperature?”, “How many consistency passes?”, “Was simulation allowed?”, or “Did we change anything since last week?”

The cell defines small utility functions that enforce a specific discipline. `utc_now_iso()` uses timezone-aware UTC timestamps, which avoids ambiguous local timestamps and aligns with professional logging norms. `sha256_hex()` is used to create content-addressable identifiers. This is an important governance pattern: rather than trusting a human-readable label, you compute a hash of the configuration and treat it as a stable fingerprint. The `env_fingerprint()` function captures minimal environment metadata so that you can tell, later, what runtime context produced the outputs. It is intentionally lightweight: you want classroom speed and stability, not a heavy environment audit.

The locked model name is then declared as a strict constant. This notebook series requires that the model be fixed for consistency across students and runs, so you treat it as a “technical lock,” not a suggestion. The config dictionary then encodes the system’s operational constraints: deterministic settings (`temperature=0.0`), a bounded consistency loop (`suitability_consistency_passes`), and policy-level flags that reflect what the system is allowed to do. The key teaching point is that policies belong in configuration, not in narrative text. That makes them testable and exportable.

Finally, the cell creates `RUN_ID` and `CONFIG_HASH`. `RUN_ID` is a unique identifier for the run; `CONFIG_HASH` is a deterministic identifier for the run’s configuration. Together, they support traceability: you can correlate the final state and artifacts back to the exact settings used. Printing these values is not cosmetic. In a classroom, it helps students internalize that every run is an auditable object; in professional settings, it is the minimum viable discipline for reproducibility.


###2.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 2/10 — Configuration + deterministic run identity + env fingerprint (UPDATED: simulator mode)

def utc_now_iso() -> str:
    return _dt.datetime.now(_dt.timezone.utc).isoformat()

def sha256_hex(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8")).hexdigest()

def env_fingerprint() -> Dict[str, Any]:
    return {
        "python_version": sys.version.split()[0],
        "platform": platform.platform(),
        "executable": sys.executable,
        "utc_now": utc_now_iso(),
    }

MODEL_NAME = "claude-haiku-4-5-20251001"  # STRICT: no substitution

CONFIG: Dict[str, Any] = {
    "model": MODEL_NAME,
    "max_tokens": 700,
    "temperature": 0.0,
    "suitability_consistency_passes": 2,   # bounded
    "provider": {
        "mode": "AUTO",                   # AUTO | LIVE | SIMULATE
        "overload_max_attempts": 5,       # bounded
        "overload_backoff_seconds": [0.6, 1.2, 2.4, 4.8, 4.8],  # deterministic
        "simulate_on_overload": True,     # <-- key: keep notebook runnable in class
    },
    "policy": {
        "allow_only_general_guidance": True,
        "never_claim_professional_relationship": True,
        "always_offer_redirect_on_refusal": True,
    },
}

RUN_ID = str(uuid.uuid4())
CONFIG_HASH = sha256_hex(json.dumps(CONFIG, sort_keys=True))

print("RUN_ID:", RUN_ID)
print("CONFIG_HASH:", CONFIG_HASH)
print("MODEL:", CONFIG["model"])
print("PROVIDER_MODE:", CONFIG["provider"]["mode"])


RUN_ID: c1a9677b-4e0c-4632-9daf-fe9f4196fd5d
CONFIG_HASH: 5e510bb56f053ba34481985c4e0a758a387e84aa8efe17bb57ca7a87061d3a0b
MODEL: claude-haiku-4-5-20251001
PROVIDER_MODE: AUTO


##3.VISUALIZATION

###3.1.OVERVIEW

**Cell 3 — Visualization Standard v1: hardened Mermaid renderer (what it is doing and why it exists)**

Cell 3 exists because the graph is not just implementation detail; it is the learning artifact. In agentic systems, topology is policy. If you cannot render the graph reliably in Colab, you lose the most important pedagogical object: the visible structure of routing, termination, and governance boundaries. This cell therefore provides a hardened way to render Mermaid diagrams using an ESM import pinned to a specific Mermaid version. Pinning matters: visualization libraries change behavior, and even small rendering differences can confuse students or break notebooks mid-semester.

The function `display_langgraph_mermaid` is designed with operational reliability in mind. It validates input (you must pass Mermaid graph code), it generates a deterministic container ID based on the hash of the code (so repeated renders do not collide), and it sanitizes dangerous script termination sequences. These details reflect the mindset you want in financial engineering: treat UI rendering as part of system reliability, not as a casual extra. In institutional settings, even visualization is subject to review because it influences how stakeholders interpret the workflow.

The JavaScript portion imports Mermaid from a pinned CDN URL. It initializes Mermaid in strict security mode and runs rendering inside the dedicated container. The try/catch is not there to hide errors; it is there to fail visibly and informatively. If rendering fails, the user sees a clear message in the notebook output rather than a silent blank area. This aligns with the “no silent failures” coding standard.

Pedagogically, this cell teaches that transparency is engineered, not assumed. You do not “hope” the graph is understandable; you render it deterministically and require that the diagram match topology exactly. The diagram becomes a contract: any change to routing must appear in the visualization. This is essential for teaching modular architectures because students learn to reason about systems by inspecting their graphs. In professional environments, the same mechanism supports supervision and audit: reviewers can confirm that a refusal path truly terminates, or that missing-information paths cannot accidentally fall through to advice generation.


###3.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 3/10 — Visualization Standard v1: Hardened Colab Mermaid ESM renderer (pinned)
from IPython.display import HTML, display

MERMAID_VERSION = "10.6.1"

def display_langgraph_mermaid(mermaid_code: str, *, height_px: int = 420) -> None:
    """
    Hardened, self-contained Mermaid ESM renderer for Google Colab.
    - Pinned Mermaid version (default 10.6.1)
    - No external side effects, deterministic container ID
    """
    if not isinstance(mermaid_code, str) or "graph" not in mermaid_code:
        raise ValueError("Expected Mermaid graph code as a string (e.g., 'graph TD ...').")

    container_id = f"mermaid_{sha256_hex(mermaid_code)[:12]}"
    safe_code = mermaid_code.replace("</script>", "<\\/script>")

    html = f"""
<div id="{container_id}" style="border:1px solid rgba(0,0,0,0.15); border-radius:10px; padding:12px; overflow:auto; height:{height_px}px;">
  <div class="mermaid">
{safe_code}
  </div>
</div>

<script type="module">
  import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@{MERMAID_VERSION}/dist/mermaid.esm.min.mjs";
  try {{
    mermaid.initialize({{
      startOnLoad: true,
      securityLevel: "strict",
      theme: "default",
      flowchart: {{ useMaxWidth: true, htmlLabels: true }}
    }});
    mermaid.run({{ querySelector: "#{container_id} .mermaid" }});
  }} catch (e) {{
    const el = document.querySelector("#{container_id}");
    if (el) {{
      el.innerHTML = "<pre style='color:#b00020; white-space:pre-wrap;'>Mermaid render failed: " + String(e) + "</pre>";
    }}
  }}
</script>
"""
    display(HTML(html))

print("Mermaid pinned:", MERMAID_VERSION)


Mermaid pinned: 10.6.1


##4.STATE SCHEMA AND VALIDATORS

###4.1.OVERVIEW

**Cell 4 — Typed state schema + validators (what it is doing and why it exists)**

Cell 4 is where the notebook stops being “a script that calls a model” and becomes an agentic system with explicit state. The central object is the `FinanceSuitabilityState` TypedDict. This is not a typing exercise for its own sake; it is a governance boundary. In finance workflows, untyped, ad-hoc dictionaries are a source of hidden coupling and subtle bugs: a node expects a key that another node sometimes forgets to set, a value changes type, or a later refactor silently changes meaning. By defining state as a TypedDict, you force every node to agree on the shared interface.

The state is designed around auditable execution. It includes run metadata (`run_id`, `config_hash`, `model`, `started_utc`) so that any output can be tied to a specific run and configuration. It includes inputs (`user_message`, `jurisdiction`) so the run is reconstructable. It includes `profile`, which is the structured extraction from intake, and two suitability assessments (`suitability_primary`, `suitability_check`) to support the bounded consistency mechanism. It includes `final_decision`, `response_text`, and a `trace` list that logs key events. Finally, it includes a `terminated` flag, which teaches an important principle: termination is state, not a side effect.

The `SuitabilityAssessment` TypedDict defines the precise structure expected from the suitability gate. It includes not only the decision but also the rationale, missing information, refusal reason, and risk flags. This is intentional: professional systems do not only decide; they explain in a structured way that is reviewable. Free-form rationales are not sufficient because they cannot be reliably parsed, compared, or audited.

The validator functions (`_require_keys` and `validate_suitability`) enforce “fail loudly” behavior. If the model output is malformed or missing keys, the system throws an error rather than continuing with corrupted state. Pedagogically, this reinforces a critical engineering norm: do not let an LLM’s output shape your system unless it matches a strict schema. In finance, schema enforcement is a basic control because downstream decisions can have real consequences. This cell teaches students to treat structure as a safety device, not as optional documentation.


###4.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 4/10 — Explicit TypedDict state schema + minimal validators

Decision = Literal["ALLOW_GENERAL", "NEED_MORE_INFO", "REFUSE", "HUMAN_REVIEW"]

class SuitabilityAssessment(TypedDict):
    decision: Decision
    rationale: str
    missing_info: List[str]
    refusal_reason: str
    risk_flags: List[str]

class FinanceSuitabilityState(TypedDict):
    # Required audit fields
    run_id: str
    config_hash: str
    model: str
    started_utc: str

    # Inputs
    user_message: str
    jurisdiction: str

    # Extracted profile (structured, not "chat memory")
    profile: Dict[str, Any]

    # Suitability
    suitability_primary: SuitabilityAssessment
    suitability_check: SuitabilityAssessment
    final_decision: Decision

    # Outputs
    response_text: str

    # Control + logs
    terminated: bool
    trace: List[Dict[str, Any]]

def _require_keys(d: Dict[str, Any], keys: List[str], ctx: str) -> None:
    for k in keys:
        if k not in d:
            raise ValueError(f"{ctx}: missing key '{k}'")

def validate_suitability(obj: Dict[str, Any], ctx: str) -> SuitabilityAssessment:
    _require_keys(obj, ["decision", "rationale", "missing_info", "refusal_reason", "risk_flags"], ctx)
    if obj["decision"] not in ("ALLOW_GENERAL", "NEED_MORE_INFO", "REFUSE", "HUMAN_REVIEW"):
        raise ValueError(f"{ctx}: invalid decision '{obj['decision']}'")
    if not isinstance(obj["missing_info"], list) or not isinstance(obj["risk_flags"], list):
        raise ValueError(f"{ctx}: 'missing_info' and 'risk_flags' must be lists")
    return obj  # type: ignore

print("State schema ready.")


State schema ready.


##5.LLM CLIENT INITIALIZATION

###5.1.OVERVIEW

**Cell 5 — Anthropic client initialization + strict JSON helper (what it is doing and why it exists)**

Cell 5 is the controlled gateway between your deterministic system and a probabilistic external model service. The first design choice is the use of Colab secrets via `userdata.get("ANTHROPIC_API_KEY")` in all caps. This is a practical governance pattern: keys do not belong in notebooks, screenshots, or version control. Keeping secrets in the Colab secret store reduces accidental leakage and aligns with professional security hygiene. The cell also fails immediately if the key is missing. This matters because “half-running” notebooks can mislead students and produce partial artifacts that look valid but are not.

The Anthropic client is then instantiated with the locked API key. The notebook treats the model as a dependency, not a collaborator. The helper `_extract_text` converts the SDK’s structured response into plain text blocks. This is deliberately narrow: you want to be robust to SDK response formats without doing complex parsing.

The function `call_claude_json` is the most important part of this cell. It enforces a strict contract: every model call must return a single JSON object. The system scans for a JSON object, parses it, and fails loudly if parsing fails. This protects the rest of the graph from non-deterministic formatting behaviors. In agentic architectures, the model is allowed to generate content, but it is not allowed to shape the system’s control flow unless it produces valid structured output.

Operationally, this helper also implements bounded retry behavior for overload conditions (HTTP 529). In real deployments, external services can temporarily fail. A professional system needs resilience, but it also needs bounded execution time. The retry loop is therefore fixed, deterministic, and capped. There is no infinite waiting and no silent fallback. If the provider remains overloaded after retries, the system either raises or, if configured, switches to a deterministic simulator path that is explicitly marked. The key pedagogical message is that reliability is engineered: you plan for overload, you bound your retries, and you log what happened.

Finally, this cell centralizes the model name, token limit, and temperature through `CONFIG`. This is governance by design: when the model behavior changes, the config changes, the config hash changes, and the artifacts capture that change. In finance, this is the minimum standard for explaining why two runs produced different outcomes.


###5.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 5/10 — Anthropic client init + strict JSON call helper (UPDATED: AUTO w/ simulator fallback)

from anthropic import Anthropic
from google.colab import userdata

API_KEY = userdata.get("ANTHROPIC_API_KEY")  # STRICT: ALL CAPS
if not API_KEY or not isinstance(API_KEY, str):
    raise RuntimeError("Missing Colab secret: userdata.get('ANTHROPIC_API_KEY') must be set.")

client = Anthropic(api_key=API_KEY)

_JSON_RE = re.compile(r"\{.*\}", re.DOTALL)

def _extract_text(msg) -> str:
    text = ""
    for block in msg.content:
        if getattr(block, "type", None) == "text":
            text += block.text
    return text

def _is_overloaded(exc: Exception) -> bool:
    s = str(exc)
    return ("Error code: 529" in s) or ("overloaded" in s.lower())

# ---------- Deterministic simulator (teaching-only; keeps topology executable) ----------

def _simulate_intake(user_message: str, jurisdiction: str) -> Dict[str, Any]:
    msg = user_message.lower()

    # Topic classification (simple, deterministic)
    if any(w in msg for w in ["stock", "ticker", "buy", "sell", "option", "crypto", "trade", "trading"]):
        topic = "trading"
    elif any(w in msg for w in ["budget", "spend", "expense", "saving"]):
        topic = "budgeting"
    elif any(w in msg for w in ["debt", "loan", "credit card", "interest"]):
        topic = "debt"
    elif any(w in msg for w in ["tax", "irs", "sat", "deduction", "evasion"]):
        topic = "tax"
    else:
        topic = "other"

    # Request type flags
    illegal = any(w in msg for w in ["insider", "manipulate", "pump", "dump", "launder", "tax evasion", "evade"])
    wants_specific_trade = any(w in msg for w in ["what should i buy", "what to buy", "which stock", "exact", "today"]) and topic in ["trading"]
    promises_fast_money = any(w in msg for w in ["double", "quickly", "guarantee", "sure thing", "risk-free"])

    if illegal:
        request_type = "illegal_or_harmful"
    elif wants_specific_trade:
        request_type = "trade_recommendation"
    elif "should i" in msg or "what should" in msg:
        request_type = "personalized_advice"
    else:
        request_type = "general_info"

    missing_critical = []
    # For any personalized/trading context, we demand suitability fields
    if request_type in ["personalized_advice", "trade_recommendation"]:
        missing_critical = ["goal", "time_horizon", "risk_tolerance", "constraints", "financial situation basics"]

    return {
        "jurisdiction_guess": jurisdiction or "",
        "topic": topic,
        "request_type": request_type,
        "time_horizon": "unknown",
        "risk_tolerance": "unknown",
        "constraints": [],
        "provided_numbers": {},
        "missing_critical": missing_critical,
        "simulated": True,
        "signals": {
            "illegal": illegal,
            "wants_specific_trade": wants_specific_trade,
            "promises_fast_money": promises_fast_money,
        }
    }

def _simulate_suitability(profile: Dict[str, Any]) -> Dict[str, Any]:
    signals = profile.get("signals", {})
    if signals.get("illegal", False):
        return {
            "decision": "REFUSE",
            "rationale": "Request appears to involve illegal or harmful activity.",
            "missing_info": [],
            "refusal_reason": "I can’t assist with illegal or harmful financial activity.",
            "risk_flags": ["illegal_or_harmful"],
            "simulated": True,
        }

    if profile.get("request_type") == "trade_recommendation":
        # Hard boundary: no specific trades; ask for suitability + redirect to general education
        return {
            "decision": "NEED_MORE_INFO",
            "rationale": "User requests a specific trade; we only provide general education and must gather suitability context before discussing options.",
            "missing_info": ["goal", "time horizon", "risk tolerance", "loss capacity", "liquidity needs", "experience level"],
            "refusal_reason": "",
            "risk_flags": ["specific_trade_request", "suitability_missing"],
            "simulated": True,
        }

    if profile.get("request_type") == "personalized_advice":
        return {
            "decision": "NEED_MORE_INFO",
            "rationale": "Personalized guidance requires key suitability details to avoid unsafe recommendations.",
            "missing_info": ["goal", "time horizon", "risk tolerance", "constraints"],
            "refusal_reason": "",
            "risk_flags": ["suitability_missing"],
            "simulated": True,
        }

    return {
        "decision": "ALLOW_GENERAL",
        "rationale": "General educational guidance is appropriate.",
        "missing_info": [],
        "refusal_reason": "",
        "risk_flags": [],
        "simulated": True,
    }

def _simulate_response(kind: str, state: Dict[str, Any]) -> Dict[str, Any]:
    # kind: "refusal" | "request_more_info" | "general_guidance" | "human_review"
    msg = state["user_message"]
    if kind == "refusal":
        return {"response_text": "I can’t help with that request because it may involve illegal or harmful activity. I can, however, explain legal, general concepts like diversification, risk, and long-term investing principles."}
    if kind == "request_more_info":
        missing = state["suitability_primary"].get("missing_info", [])[:6]
        q = "\n".join([f"{i+1}. {m}?" for i, m in enumerate(missing)]) if missing else "1. What is your goal and time horizon?\n2. What level of risk can you tolerate?"
        return {"response_text": f"To stay in general, safe territory, I need a bit more context:\n{q}\n\nOnce you answer, I can explain options and trade-offs (without telling you what to buy or sell)."}
    if kind == "general_guidance":
        return {"response_text": "General guidance: focus on your goal, time horizon, diversification, and how much loss you can tolerate. Avoid decisions based on “double quickly” claims; evaluate expected return vs risk, costs, and downside scenarios. I can explain how to compare broad approaches (cash buffer, diversified index exposure, risk limits) without giving specific buy/sell instructions."}
    # human_review
    return {"response_text": "I can’t proceed because the request is ambiguous. Are you looking for general education, or a specific trade recommendation? If general education, tell me your horizon and risk tolerance and I’ll explain options and trade-offs."}

# ---------- Unified call function with AUTO switching ----------

def call_claude_json(system: str, user: str) -> Dict[str, Any]:
    """
    AUTO mode:
      - Try LIVE with bounded retry on 529.
      - If still overloaded and simulate_on_overload=True, return deterministic simulated outputs.
    LIVE mode:
      - Only Anthropic; fail loudly if unavailable.
    SIMULATE mode:
      - Always return deterministic simulated outputs.
    """
    mode = CONFIG["provider"]["mode"]
    simulate_on_overload = bool(CONFIG["provider"]["simulate_on_overload"])
    max_attempts = int(CONFIG["provider"]["overload_max_attempts"])
    backoffs = list(CONFIG["provider"]["overload_backoff_seconds"])

    def live_call() -> Dict[str, Any]:
        last_err: Optional[Exception] = None
        for attempt in range(max_attempts):
            try:
                msg = client.messages.create(
                    model=CONFIG["model"],
                    max_tokens=CONFIG["max_tokens"],
                    temperature=CONFIG["temperature"],
                    system=system,
                    messages=[{"role": "user", "content": user}],
                )
                text = _extract_text(msg).strip()
                m = _JSON_RE.search(text)
                if not m:
                    raise ValueError(f"Model did not return JSON object.\nRAW:\n{text}")
                return json.loads(m.group(0))
            except Exception as e:
                last_err = e
                if _is_overloaded(e) and attempt < max_attempts - 1:
                    import time
                    time.sleep(backoffs[min(attempt, len(backoffs)-1)])
                    continue
                raise
        raise RuntimeError(f"Claude call failed after {max_attempts} attempts") from last_err

    def simulate_call() -> Dict[str, Any]:
        # Detect which "contract" is being requested by looking for required keys in prompt text.
        u = user
        if '"jurisdiction_guess"' in u and '"topic"' in u and '"request_type"' in u:
            # Intake contract
            # We don't have state in this function; user prompt includes state['user_message'] content.
            # Extract message between "Message:" and "Return JSON" deterministically.
            m = re.search(r"Message:\s*(.*?)\n\s*Return JSON", u, re.DOTALL)
            msg = m.group(1).strip() if m else ""
            # Extract jurisdiction from system caller context is not available; it’s also in state in intake prompt
            j = ""  # intake prompt already carries jurisdiction separately in the state; OK to leave empty here
            return _simulate_intake(msg, j)

        if '"decision"' in u and "ALLOW_GENERAL|NEED_MORE_INFO|REFUSE|HUMAN_REVIEW" in u:
            # Suitability contract
            m = re.search(r"profile_json:\s*(\{.*\})\s*\n\s*Return JSON", u, re.DOTALL)
            prof = json.loads(m.group(1)) if m else {}
            return _simulate_suitability(prof)

        if 'Return JSON: { "response_text"' in u:
            # Response contract (infer kind from node intent cues)
            if "Write a refusal" in u:
                kind = "refusal"
            elif "Ask for missing suitability information" in u:
                kind = "request_more_info"
            elif "Provide general educational guidance only" in u:
                kind = "general_guidance"
            else:
                kind = "human_review"
            # We need state to do richer responses; we approximate by extracting message + missing_info if present
            msg_m = re.search(r"User message:\s*(.*?)\n\s*(Refusal reason:|Ask for missing|Provide general|We have an internal mismatch)", u, re.DOTALL)
            msg = msg_m.group(1).strip() if msg_m else ""
            # Extract missing_info if present
            miss_m = re.search(r"Missing_info list:\s*(\[[\s\S]*?\])\s*\n\s*Return JSON", u, re.DOTALL)
            missing = json.loads(miss_m.group(1)) if miss_m else []
            # Build a minimal pseudo-state
            pseudo = {
                "user_message": msg,
                "suitability_primary": {"missing_info": missing},
            }
            return _simulate_response(kind, pseudo)

        # Unknown contract => deterministic failure
        raise ValueError("Simulator could not match prompt contract. Keep prompts stable.")

    if mode == "SIMULATE":
        return simulate_call()
    if mode == "LIVE":
        return live_call()

    # AUTO
    try:
        return live_call()
    except Exception as e:
        if _is_overloaded(e) and simulate_on_overload:
            return simulate_call()
        raise

print("Anthropic client initialized. MODEL locked:", CONFIG["model"])
print("Provider mode:", CONFIG["provider"]["mode"], "| simulate_on_overload:", CONFIG["provider"]["simulate_on_overload"])


Anthropic client initialized. MODEL locked: claude-haiku-4-5-20251001
Provider mode: AUTO | simulate_on_overload: True


##6.AGENT NODE ABSTRACTION

###6.1.OVERVIEW

**Cell 6 — AgentNode abstraction + intake and suitability nodes (what it is doing and why it exists)**

Cell 6 is where the notebook introduces the core agentic abstraction: nodes as small, composable state transformers. The `AgentNode` wrapper is intentionally simple: it binds a name to a pure function that accepts state and returns updated state. This is critical for teaching scalable architectures. If you let your system become “one giant prompt,” you cannot reason about it, test it, or extend it. By contrast, when each node is small and has a clear contract, the graph becomes modular and maintainable.

The intake node (`intake_node_fn`) demonstrates a foundational pattern: convert messy natural language into structured state. It calls the model with a system prompt that forbids advice and requires JSON. The extracted profile includes topic classification, request type classification, and missing critical fields. Notice the intent: the intake stage is not judging suitability; it is preparing the evidence the suitability gate will use. This separation is important in professional systems because it reduces the risk that one prompt implicitly “does everything” and produces unreviewable outputs.

The node also appends a trace event. That trace is not logging for debugging; it is logging for audit. In financial workflows, you want to reconstruct the decision path after the fact. A trace entry with a timestamp, node name, and event description is the minimum viable audit trail. By logging a profile summary (topic, request type, horizon, risk tolerance), the notebook provides reviewers with a compact view of what the system thought it saw.

The suitability node (`suitability_check_node_fn`) then applies explicit policy constraints. The prompt defines the allowable decisions and the meaning of each decision. Crucially, it encodes the system boundary: no personalized trades, refuse illegal requests, require missing suitability information when needed. The node runs a primary assessment and then a bounded consistency pass. The consistency pass is not a “loop until correct” hack; it is a controlled risk reduction step that is strictly bounded by configuration. The system then applies a hard rule: if the two assessments disagree, escalate to `HUMAN_REVIEW`. This teaches a professional posture: when uncertain at a policy boundary, escalate rather than guess.

By the end of Cell 6, you have two reusable node patterns: an intake parser and a policy gate. These are the building blocks of real finance systems: structured intake, then deterministic routing based on the resulting state.


###6.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 6/10 — AgentNode abstraction + core nodes (intake, suitability_check) with bounded consistency pass

class AgentNode:
    """
    Small, testable node wrapper: name + pure(state)->state transform.
    No hidden globals; uses CONFIG via closure where needed.
    """
    def __init__(self, name: str, fn: Callable[[FinanceSuitabilityState], FinanceSuitabilityState]):
        self.name = name
        self.fn = fn

    def __call__(self, state: FinanceSuitabilityState) -> FinanceSuitabilityState:
        return self.fn(state)

INTAKE_SYSTEM = (
    "You are a finance intake parser for an educational system. "
    "Return ONLY a JSON object. Do not give advice. Extract facts from the user's message."
)

def intake_node_fn(state: FinanceSuitabilityState) -> FinanceSuitabilityState:
    user = f"""
Extract a structured profile from the message.

Message:
{state['user_message']}

Return JSON with:
{{
  "jurisdiction_guess": "<string or empty>",
  "topic": "<one of: budgeting|debt|investing|trading|tax|retirement|insurance|other>",
  "request_type": "<general_info|personalized_advice|trade_recommendation|illegal_or_harmful|unknown>",
  "time_horizon": "<short|medium|long|unknown>",
  "risk_tolerance": "<low|medium|high|unknown>",
  "constraints": ["..."],
  "provided_numbers": {{"...": "..."}},
  "missing_critical": ["..."]
}}
"""
    obj = call_claude_json(INTAKE_SYSTEM, user)
    state["profile"] = obj

    state["trace"].append({
        "ts_utc": utc_now_iso(),
        "node": "intake",
        "event": "parsed_profile",
        "profile_summary": {
            "topic": obj.get("topic"),
            "request_type": obj.get("request_type"),
            "time_horizon": obj.get("time_horizon"),
            "risk_tolerance": obj.get("risk_tolerance"),
        }
    })
    return state

SUITABILITY_SYSTEM = (
    "You are a suitability gate for finance education. "
    "Your job: decide if the assistant should provide ONLY general educational guidance, "
    "ask for missing info, or refuse and redirect. Return ONLY JSON."
)

def _suitability_prompt(state: FinanceSuitabilityState) -> str:
    return f"""
You must choose one decision:
- ALLOW_GENERAL: safe to provide general educational guidance (no individualized advice).
- NEED_MORE_INFO: user asks for personalized guidance but suitability info is missing; ask targeted questions; do not give recommendations.
- REFUSE: request is illegal, unsafe, or requires regulated professional handling beyond scope; refuse + redirect.
- HUMAN_REVIEW: ambiguous policy risk; escalate.

Constraints:
- Never output personalized investment instructions or specific trades.
- If the user requests illegal activity (insider trading, market manipulation, tax evasion, money laundering), REFUSE.
- If user requests specific buy/sell/price targets, treat as personalized; likely NEED_MORE_INFO or REFUSE depending on risk.
- If the user provides insufficient suitability data (goals, horizon, risk, constraints), choose NEED_MORE_INFO.
- Always keep rationale concise and operational.

Inputs:
jurisdiction: {state['jurisdiction']}
profile_json: {json.dumps(state['profile'], ensure_ascii=False)}

Return JSON:
{{
  "decision": "ALLOW_GENERAL|NEED_MORE_INFO|REFUSE|HUMAN_REVIEW",
  "rationale": "<1-3 sentences>",
  "missing_info": ["..."],
  "refusal_reason": "<empty if not REFUSE>",
  "risk_flags": ["..."]
}}
"""

def suitability_check_node_fn(state: FinanceSuitabilityState) -> FinanceSuitabilityState:
    # Primary assessment
    primary_raw = call_claude_json(SUITABILITY_SYSTEM, _suitability_prompt(state))
    primary = validate_suitability(primary_raw, "suitability_primary")
    state["suitability_primary"] = primary

    # Bounded consistency pass (max 2) — reduces single-shot misroutes
    check = primary
    for i in range(max(1, int(CONFIG["suitability_consistency_passes"])) - 1):
        check_raw = call_claude_json(
            SUITABILITY_SYSTEM,
            _suitability_prompt(state) + f"\n\nConsistency check pass: {i+1}. Re-evaluate independently."
        )
        check = validate_suitability(check_raw, f"suitability_check_pass_{i+1}")

    state["suitability_check"] = check

    # Hard decision rule: if mismatch, escalate to HUMAN_REVIEW
    if primary["decision"] != check["decision"]:
        final_decision: Decision = "HUMAN_REVIEW"
        rationale = f"Primary='{primary['decision']}' vs Check='{check['decision']}'"
        state["trace"].append({
            "ts_utc": utc_now_iso(),
            "node": "suitability_check",
            "event": "decision_mismatch_escalate",
            "details": rationale,
        })
    else:
        final_decision = primary["decision"]

    state["final_decision"] = final_decision

    state["trace"].append({
        "ts_utc": utc_now_iso(),
        "node": "suitability_check",
        "event": "final_decision",
        "final_decision": final_decision,
        "risk_flags": sorted(list(set(primary.get("risk_flags", []) + check.get("risk_flags", [])))),
    })
    return state

intake_node = AgentNode("intake", intake_node_fn)
suitability_node = AgentNode("suitability_check", suitability_check_node_fn)

print("Core nodes ready:", intake_node.name, suitability_node.name)


Core nodes ready: intake suitability_check


##7.RESPONSE NODES

###7.1.OVERVIEW

**Cell 7 — Response nodes + routing function (what it is doing and why it exists)**

Cell 7 translates decisions into controlled outputs and makes routing explicit. In many “LLM assistant” examples, the model both decides and speaks in one step, which is risky because it can drift across boundaries. Here, the decision and the response are separated. The suitability node produces a decision. The response nodes produce text consistent with that decision. This separation is a governance mechanism: it limits what each part of the system is allowed to do.

The response system prompt is intentionally narrow: return JSON containing only `response_text`. This keeps the output contract stable and reduces accidental leakage of extra fields that might confuse downstream consumers. Each response node then has a specific job. The refusal node writes a refusal that is clear, brief, and includes safer alternatives. The missing-info node asks targeted questions and explicitly avoids recommendations. The general-guidance node provides educational content with a boundary disclaimer. The human-review node handles ambiguity and requests clarification. In all cases, the node sets `terminated = True` and appends a trace event. This reinforces the notebook’s architectural theme: early termination is part of the design.

The routing function `route_after_suitability` is the critical bridge between state and topology. It reads `final_decision` and returns the name of the next node. This is an important pedagogical point: routing is driven by state, not by prompt-text heuristics scattered across code. The graph is the routing engine; the router function is the policy mapping from state to edges.

Cell 7 also highlights the “hard branching” requirement of Notebook 2. Each decision leads to a different branch that terminates. There is no “maybe continue the conversation and see what happens.” This matches professional practice: if the request is unsuitable, you stop; if information is missing, you stop and ask; if the request is illegal, you refuse and stop. The architecture enforces professional discipline even when the model might be tempted to be helpful.

By the end of this cell, your system has a clear contract: decisions are produced upstream; responses are produced downstream; the graph routes deterministically based on state.


###7.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 7/10 — Response nodes (refusal / ask / general) + router (LangGraph conditional routing only)

RESPONSE_SYSTEM = (
    "You are a finance education assistant. Output must be a single JSON object with key 'response_text' only. "
    "No extra keys. No markdown. Be concise, professional, and classroom-appropriate."
)

def refusal_node_fn(state: FinanceSuitabilityState) -> FinanceSuitabilityState:
    s = state["suitability_primary"]
    user = f"""
User message:
{state['user_message']}

Refusal reason:
{s.get('refusal_reason','')}

Write a refusal that:
- clearly says you can't help with that request
- briefly says why (1 sentence)
- offers safer alternatives (education/general concepts) and suggests a qualified professional if appropriate
Return JSON: {{ "response_text": "<text>" }}
"""
    obj = call_claude_json(RESPONSE_SYSTEM, user)
    state["response_text"] = obj["response_text"]
    state["terminated"] = True
    state["trace"].append({"ts_utc": utc_now_iso(), "node": "refusal", "event": "terminated"})
    return state

def request_more_info_node_fn(state: FinanceSuitabilityState) -> FinanceSuitabilityState:
    s = state["suitability_primary"]
    missing = s.get("missing_info", [])
    user = f"""
User message:
{state['user_message']}

Ask for missing suitability information only (no recommendations). Use a short numbered list (max 6).
Missing_info list:
{json.dumps(missing, ensure_ascii=False)}

Return JSON: {{ "response_text": "<text>" }}
"""
    obj = call_claude_json(RESPONSE_SYSTEM, user)
    state["response_text"] = obj["response_text"]
    state["terminated"] = True  # N2: early termination after asking
    state["trace"].append({"ts_utc": utc_now_iso(), "node": "request_more_info", "event": "terminated"})
    return state

def general_guidance_node_fn(state: FinanceSuitabilityState) -> FinanceSuitabilityState:
    user = f"""
User message:
{state['user_message']}

Provide general educational guidance only:
- explain concepts, options, and trade-offs
- avoid individualized instructions, tickers, or "do X now"
- include a brief boundary disclaimer (1 sentence)
Return JSON: {{ "response_text": "<text>" }}
"""
    obj = call_claude_json(RESPONSE_SYSTEM, user)
    state["response_text"] = obj["response_text"]
    state["terminated"] = True
    state["trace"].append({"ts_utc": utc_now_iso(), "node": "general_guidance", "event": "terminated"})
    return state

def human_review_node_fn(state: FinanceSuitabilityState) -> FinanceSuitabilityState:
    user = f"""
User message:
{state['user_message']}

We have an internal mismatch in suitability decisions.
Write a short message that:
- states we cannot proceed due to ambiguity
- asks the user to clarify intent in 1-2 questions
- offers general topics we can cover safely
Return JSON: {{ "response_text": "<text>" }}
"""
    obj = call_claude_json(RESPONSE_SYSTEM, user)
    state["response_text"] = obj["response_text"]
    state["terminated"] = True
    state["trace"].append({"ts_utc": utc_now_iso(), "node": "human_review", "event": "terminated"})
    return state

refusal_node = AgentNode("refusal", refusal_node_fn)
request_more_info_node = AgentNode("request_more_info", request_more_info_node_fn)
general_guidance_node = AgentNode("general_guidance", general_guidance_node_fn)
human_review_node = AgentNode("human_review", human_review_node_fn)

def route_after_suitability(state: FinanceSuitabilityState) -> str:
    # LangGraph conditional routing target names
    d = state["final_decision"]
    if d == "REFUSE":
        return "refusal"
    if d == "NEED_MORE_INFO":
        return "request_more_info"
    if d == "ALLOW_GENERAL":
        return "general_guidance"
    return "human_review"

print("Routing ready.")


Routing ready.


##8.GRAPH CONSTRUCTION

###8.1.OVERVIEW

**Cell 8 — Build and compile the LangGraph + visualize topology (what it is doing and why it exists)**

Cell 8 is where the notebook becomes a graph-driven system in the literal sense. You construct a `StateGraph` parameterized by the TypedDict state. This ensures that every node in the graph is conceptually operating on the same shared state schema. You then add each node with a clear name. In agentic engineering, node naming is not cosmetic: names become part of your artifact trail, your debugging vocabulary, and your supervisory documentation. A reviewer should be able to read the graph and understand what each node does.

After setting the entry point to `intake`, you add a direct edge to `suitability_check`. This enforces the workflow order: parse first, decide second. The central action is the conditional edge added from `suitability_check`. Here you are using LangGraph’s native conditional routing rather than manual branching logic. This is a strict requirement of the project because it teaches a scalable pattern: routing logic is part of the graph specification, not an informal convention.

The mapping dictionary passed to `add_conditional_edges` is the explicit policy-level connection between decision labels and node names. This is where topology becomes governance. If you want to prove that “REFUSE always terminates,” you can point directly to the graph spec: the REFUSE branch routes to `refusal`, and `refusal` routes to `END`.

The explicit `END` edges are then added for each terminal node. This matters for two reasons. First, it makes termination explicit and inspectable. Second, it prevents accidental continuation of the workflow, which is a common failure mode in conversational systems. In Notebook 2, early termination is the architectural dimension being taught, so you treat `END` as a first-class concept.

Finally, the graph is compiled and the Mermaid representation is produced and rendered. The key pedagogical message is that the diagram is not optional. A visible graph is how students learn to reason about agentic workflows, and it is how professionals review them. If your topology changes, your diagram changes, and the change becomes visible. That is a governance artifact, not a screenshot.


###8.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 8/10 — Build LangGraph topology + compile + visualize (diagram must match exactly)

graph = StateGraph(FinanceSuitabilityState)

graph.add_node("intake", intake_node)
graph.add_node("suitability_check", suitability_node)
graph.add_node("refusal", refusal_node)
graph.add_node("request_more_info", request_more_info_node)
graph.add_node("general_guidance", general_guidance_node)
graph.add_node("human_review", human_review_node)

graph.set_entry_point("intake")
graph.add_edge("intake", "suitability_check")

graph.add_conditional_edges(
    "suitability_check",
    route_after_suitability,
    {
        "refusal": "refusal",
        "request_more_info": "request_more_info",
        "general_guidance": "general_guidance",
        "human_review": "human_review",
    },
)

# Explicit END node edges (early termination paths)
graph.add_edge("refusal", END)
graph.add_edge("request_more_info", END)
graph.add_edge("general_guidance", END)
graph.add_edge("human_review", END)

app = graph.compile()

# Mermaid must match topology exactly
try:
    mermaid = app.get_graph().draw_mermaid()
except Exception:
    # Fallback if draw_mermaid not available in this LangGraph version
    mermaid = app.get_graph().to_mermaid()

print(mermaid)
display_langgraph_mermaid(mermaid, height_px=460)


---
config:
  flowchart:
    curve: linear
---
graph TD;
	__start__([<p>__start__</p>]):::first
	intake(intake)
	suitability_check(suitability_check)
	refusal(refusal)
	request_more_info(request_more_info)
	general_guidance(general_guidance)
	human_review(human_review)
	__end__([<p>__end__</p>]):::last
	__start__ --> intake;
	intake --> suitability_check;
	suitability_check -.-> general_guidance;
	suitability_check -.-> human_review;
	suitability_check -.-> refusal;
	suitability_check -.-> request_more_info;
	general_guidance --> __end__;
	human_review --> __end__;
	refusal --> __end__;
	request_more_info --> __end__;
	classDef default fill:#f2f0ff,line-height:1.2
	classDef first fill-opacity:0
	classDef last fill:#bfb6fc



##9.EXECUTION

###9.1.0VERVIEW

**Cell 9 — Execute one run + inspect outputs and trace (what it is doing and why it exists)**

Cell 9 is the demonstration harness: it instantiates an initial state, invokes the graph, and prints the results. The critical lesson is that the system run is not “a chat completion.” It is a state transition process. You start with a fully specified state object that includes run identifiers, the model name, timestamps, and placeholders for outputs. This is important: the system never assumes hidden memory. Everything that matters exists in state, and if it is not in state, it does not exist for the workflow.

The `USER_MESSAGE` and `JURISDICTION` variables are intentionally simple levers for classroom experimentation. Changing the user message should produce different routing decisions. For example, asking for a specific trade should trigger the missing-information path, while an illegal request should trigger refusal. This is how you test the correctness of routing: not by reading the prompt, but by observing which node path executes.

The state contains both `suitability_primary` and `suitability_check` initialized to a safe default. This avoids missing keys during execution and enforces a stable schema even before the model fills in values. The `trace` starts with an explicit `__init__` event. This teaches a professional logging pattern: trace logs should record the start of the run so that incomplete runs can be distinguished from missing artifacts.

If the provider is overloaded, the notebook can terminate cleanly or switch to deterministic simulation (depending on configuration). The important point is that the graph invocation is bounded and auditable even under failure. This is realistic engineering: finance systems must handle dependency outages without becoming unreviewable.

After invocation, the cell prints `FINAL_DECISION`, the response text, and the last few trace entries. This output format is pedagogical: it teaches students to inspect the decision, then the response, then the trace. In professional settings, the same habit is essential. You do not only read the answer; you confirm the routing decision that produced it and the evidence recorded along the way. Cell 9 therefore trains the correct inspection workflow: decision → output → provenance.


###9.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 9/10 — Execute one run (classroom demo) + capture final state deterministically
#            (with a clean early-termination fallback if the provider is overloaded)

USER_MESSAGE = "I want a specific stock to buy today to double my money quickly. What should I buy?"
JURISDICTION = "Mexico"

initial_state: FinanceSuitabilityState = {
    "run_id": RUN_ID,
    "config_hash": CONFIG_HASH,
    "model": CONFIG["model"],
    "started_utc": utc_now_iso(),

    "user_message": USER_MESSAGE,
    "jurisdiction": JURISDICTION,

    "profile": {},

    "suitability_primary": {
        "decision": "HUMAN_REVIEW",
        "rationale": "",
        "missing_info": [],
        "refusal_reason": "",
        "risk_flags": [],
    },
    "suitability_check": {
        "decision": "HUMAN_REVIEW",
        "rationale": "",
        "missing_info": [],
        "refusal_reason": "",
        "risk_flags": [],
    },
    "final_decision": "HUMAN_REVIEW",

    "response_text": "",

    "terminated": False,
    "trace": [{"ts_utc": utc_now_iso(), "node": "__init__", "event": "start"}],
}

try:
    final_state = app.invoke(initial_state)

except Exception as e:
    # Deterministic, auditable fallback: terminate cleanly if provider overload persists.
    # This is NOT used for logic errors; those should still raise.
    msg_txt = str(e)
    is_overloaded = ("Error code: 529" in msg_txt) or ("overloaded" in msg_txt.lower())

    if not is_overloaded:
        raise

    final_state = dict(initial_state)
    final_state["final_decision"] = "HUMAN_REVIEW"
    final_state["response_text"] = (
        "I can’t run the suitability gate right now due to temporary model capacity limits. "
        "If you paste your question again, I can proceed. Meanwhile, I can cover general concepts: "
        "risk/return, diversification, time horizon, and how to evaluate a claim like “double quickly” safely."
    )
    final_state["terminated"] = True
    final_state["trace"] = list(initial_state["trace"]) + [{
        "ts_utc": utc_now_iso(),
        "node": "__fallback__",
        "event": "provider_overload_terminated",
        "error": msg_txt[:3000],
    }]

print("FINAL_DECISION:", final_state["final_decision"])
print("\n--- RESPONSE ---\n")
print(final_state["response_text"])
print("\n--- TRACE (last 3) ---")
for t in final_state["trace"][-3:]:
    print(t)


FINAL_DECISION: NEED_MORE_INFO

--- RESPONSE ---

I can't recommend specific stocks, but I can help you think through whether this strategy fits your situation. Please share:

1. How much capital are you planning to invest?
2. What's your investment experience level?
3. Do you have an emergency fund covering 3-6 months of expenses?
4. What are your current debts or financial obligations?
5. What's your actual risk tolerance—could you handle losing 50% of this investment?
6. What's your investment timeline and broader financial goals?

Note: Strategies seeking quick doubles typically involve high risk. Understanding your full financial picture is essential before pursuing any investment approach.

--- TRACE (last 3) ---
{'ts_utc': '2026-02-18T15:28:24.563974+00:00', 'node': 'intake', 'event': 'parsed_profile', 'profile_summary': {'topic': 'trading', 'request_type': 'trade_recommendation', 'time_horizon': 'short', 'risk_tolerance': 'high'}}
{'ts_utc': '2026-02-18T15:28:31.962681+00:00', 

##10.AUDIT BUNDLE

###10.1.OVERVIEW

**Cell 10 — Export artifacts (run_manifest, graph_spec, final_state) (what it is doing and why it exists)**

Cell 10 turns the run into an audit object. Without artifact export, the notebook produces an output that is visible only in the notebook UI and disappears when the session ends. That is unacceptable for professional workflows and insufficient for a governance-first curriculum. This cell exports three JSON artifacts that collectively capture configuration, topology, and outcome.

First, `build_graph_spec` produces a minimal but truthful representation of the graph. It includes the entry point, the node list, all edges (direct and conditional), and the Mermaid diagram that was rendered. This is crucial because the Mermaid diagram alone is not a machine-readable spec, and code alone is not a compact topology artifact. The graph spec is what a reviewer can store, version, compare, and sign off on. It makes routing explicit in a way that policy and risk functions can understand.

Second, `run_manifest` captures run metadata: run ID, timestamps, configuration, config hash, and environment fingerprint. The manifest answers the operational questions that auditors and instructors ask first: “What ran?”, “With what settings?”, “In what environment?”, and “What artifacts were produced?” The presence of `config_hash` is particularly important because it allows you to detect changes across runs without manually diffing code.

Third, `final_state.json` exports the terminal state, including the extracted profile, suitability decisions, final routing choice, response text, termination flag, and trace events. This is the core evidence bundle. It allows a reviewer to reconstruct what the system believed, what it decided, and what it said, without rerunning the notebook or trusting ephemeral outputs.

The cell writes these files deterministically and prints their locations, then performs a simple sanity check that termination occurred. Pedagogically, this reinforces the principle that an agentic workflow must be inspectable after the fact. In finance, the ability to review decisions is not optional; it is part of control. Cell 10 is therefore not “saving files.” It is implementing the minimum viable audit trail for an LLM-driven routing system.


###10.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 10/10 — Export required artifacts: run_manifest.json, graph_spec.json, final_state.json

def build_graph_spec(mermaid_code: str) -> Dict[str, Any]:
    # Minimal, topology-truthful spec (no inference, no hidden edges)
    return {
        "name": "N2_suitability_refusal_boundary",
        "entry_point": "intake",
        "end_node": "END",
        "nodes": ["intake", "suitability_check", "refusal", "request_more_info", "general_guidance", "human_review"],
        "edges": [
            {"from": "intake", "to": "suitability_check", "type": "direct"},
            {"from": "suitability_check", "to": "refusal", "type": "conditional", "when": "REFUSE"},
            {"from": "suitability_check", "to": "request_more_info", "type": "conditional", "when": "NEED_MORE_INFO"},
            {"from": "suitability_check", "to": "general_guidance", "type": "conditional", "when": "ALLOW_GENERAL"},
            {"from": "suitability_check", "to": "human_review", "type": "conditional", "when": "HUMAN_REVIEW"},
            {"from": "refusal", "to": "END", "type": "direct"},
            {"from": "request_more_info", "to": "END", "type": "direct"},
            {"from": "general_guidance", "to": "END", "type": "direct"},
            {"from": "human_review", "to": "END", "type": "direct"},
        ],
        "routing": {
            "function": "route_after_suitability",
            "decisions": ["ALLOW_GENERAL", "NEED_MORE_INFO", "REFUSE", "HUMAN_REVIEW"],
            "bounded_consistency_passes": int(CONFIG["suitability_consistency_passes"]),
        },
        "mermaid": mermaid_code,
        "mermaid_version_pinned": MERMAID_VERSION,
    }

run_manifest = {
    "run_id": RUN_ID,
    "started_utc": initial_state["started_utc"],
    "finished_utc": utc_now_iso(),
    "config": CONFIG,
    "config_hash": CONFIG_HASH,
    "env_fingerprint": env_fingerprint(),
    "artifacts": ["run_manifest.json", "graph_spec.json", "final_state.json"],
}

graph_spec = build_graph_spec(mermaid)
final_state_export = final_state  # already JSON-serializable primitives

with open("run_manifest.json", "w", encoding="utf-8") as f:
    json.dump(run_manifest, f, ensure_ascii=False, indent=2)

with open("graph_spec.json", "w", encoding="utf-8") as f:
    json.dump(graph_spec, f, ensure_ascii=False, indent=2)

with open("final_state.json", "w", encoding="utf-8") as f:
    json.dump(final_state_export, f, ensure_ascii=False, indent=2)

print("Wrote artifacts:")
for p in ["run_manifest.json", "graph_spec.json", "final_state.json"]:
    print(" -", os.path.abspath(p))

print("\nSanity check: END termination =", final_state_export["terminated"])


Wrote artifacts:
 - /content/run_manifest.json
 - /content/graph_spec.json
 - /content/final_state.json

Sanity check: END termination = True


##11.CONCLUSION

This notebook closes with a simple but non-negotiable professional lesson: in finance, the most important output is often not an answer, but a **decision to stop**. The exercise is framed as “suitability and refusal,” yet what it really teaches is the discipline of building systems that treat boundaries as first-class mechanisms. In uncontrolled LLM usage, the model’s helpfulness becomes a liability: it can blur scope, infer missing facts, and slide into personalized recommendations even when the user has provided no suitability context. Notebook 2 reverses that dynamic. It does not try to be clever. It tries to be reviewable. The system is engineered so that the safest behavior is also the default behavior: extract structured context, apply a gate, route deterministically, terminate explicitly, and export proof.

For financial professionals, this is not theoretical. The failure mode you are defending against is common and expensive: a message that looks like “general education” to an engineer can look like a recommendation to a client, a regulator, or a court. The boundary between explanation and advice is not a stylistic nuance; it is a compliance surface. The notebook demonstrates how to operationalize that surface in a way that is legible to supervision. The graph makes visible what policy text often hides: which paths exist, which do not, and which outcomes are possible. If the system refuses, it cannot accidentally wander into a recommendation because the topology does not permit it. If suitability information is missing, the system asks targeted questions and stops. If decisions disagree, the system escalates rather than guessing. This is the posture of real finance organizations: ambiguity is not “handled,” it is routed to review.

Architecturally, the notebook’s most important contribution is the shift from conversational improvisation to **state-driven control**. The TypedDict schema is not decoration; it is the system’s constitution. It forces the workflow to declare what it knows, what it does not know, and what it is allowed to do next. The AgentNode abstraction then turns that constitution into modular machinery: each node is a small, testable state transformer rather than a sprawling prompt. Conditional routing belongs to LangGraph, not to scattered if/else statements or prompt heuristics. The END node is explicit, because termination is part of governance. Together, these design choices teach a transferable pattern: you can replace “personal finance suitability” with MNPI gating, trade surveillance triage, disclosure risk screening, product complexity checks, or internal policy approvals, and the architecture still holds. The policy changes, but the topology remains intelligible.

The bounded loops are equally instructive. They show how to incorporate robustness without creating runaway systems. The consistency pass is not an attempt to “outsmart” the model; it is a controlled method for reducing single-pass misroutes at a decision boundary. The retry mechanism is not “resilience theater”; it is a practical acknowledgment that external services can fail under load, and that a professional system must handle failure in a deterministic, auditable way. The optional simulator mode extends the same principle: the notebook remains runnable and teachable even when the provider is overloaded, and the artifacts explicitly label simulated outputs. This is exactly the kind of engineering maturity finance teams need: continuity without deception, and reliability without hiding uncertainty.

Most importantly, Notebook 2 clarifies what “agentic” means in this series. It does not mean “multiple clever prompts.” It means a system whose behavior emerges from **topology + state + routing**, not from the model’s conversational momentum. The language model is used as a component that produces structured outputs under strict schema constraints. The system around it is what enforces professional discipline: clear separation of intake, gatekeeping, response generation, and termination; explicit artifact export; and traceable transitions. This is the difference between a demo assistant and an auditable workflow: the latter can be inspected, reasoned about, and governed.

If you take one operational takeaway forward to the rest of the series, it is this: before adding tools, retrieval, committees, or event loops, you must first learn to build a system that can say **no** correctly. A finance-grade agentic architecture is not judged by how persuasive its answers are. It is judged by whether it routes safely under missing information, whether it refuses when required, whether it escalates on ambiguity, and whether it can prove what it did after the fact. Notebook 2 establishes that foundation. Everything that follows—credit memo critique loops, backtest wrappers, liquidity regime machines, committee aggregation, and supervised multi-agent orchestration—will be stronger because the boundary discipline is already encoded in the graph.
