#**CHAPTER 3.CREDIT MEMO AGENT WITH CRITIQUE LOOP**
---

##REFERENCE


https://chatgpt.com/share/6995e11a-8e64-8012-981d-0f151b9ca19e

##0.CONTEXT

This notebook is a deliberately narrow, professional teaching case: it demonstrates how to structure a credit memo workflow as a **state-driven agentic system** rather than a one-shot “prompt → answer” interaction. The finance objective is straightforward and familiar to anyone who has sat near a credit committee: produce a memo that is **decision-support**, not storytelling. The engineering objective is equally strict: show how a memo becomes materially better when the system is forced to confront **evidence gaps** explicitly and repeatedly, using **deterministic routing** and a **bounded critique loop**. In other words, the notebook is not trying to be clever. It is trying to be reviewable, auditable, and reproducible — which is precisely what real credit processes require.

The example case is synthetic by design. That choice is not a gimmick; it is a governance decision. In a classroom, we want the mechanism — the architecture — to be the lesson, not the incidental complexity of confidential financial statements, personal data, or proprietary underwriting templates. Synthetic facts let us control what is “known,” what is “missing,” and what must remain “open,” so we can observe how the graph routes when information is incomplete. The case includes just enough detail to resemble a realistic mid-market request (borrower context, requested amount, tenor, purpose, limited financial snapshot, collateral hints, concentration hints), and then it intentionally withholds the items that credit professionals constantly chase: debt schedules, covenant headroom, AR aging, utilization KPIs, insurance, appraisals, contract terms. The point is not to simulate a full underwriting file. The point is to force a disciplined separation between **facts provided**, **assumptions**, and **open items** — and to do it in a system that can be inspected after the run.

In real life, the difference between a memo that “sounds plausible” and a memo that is **committee-ready** is almost always the handling of missing information. A junior analyst can draft a clean narrative, compute a few ratios, and paste in a risk list. What credit committees actually pay for is the ability to see what is not known, what is inferred, and what is still disputed — and to see that separation in a way that survives review. Most credit blowups are not caused by ignorance of finance fundamentals; they are caused by ungoverned leaps: assumptions masquerading as facts, missing diligence that no one escalated, and structural weaknesses that were never translated into covenants. The memo is the interface between analysis and decision. If the interface lies, the decision will be wrong even if the underlying modeling is sophisticated.

This is why evidence discipline matters as a first-class architectural concept. In credit, you are always underwriting a **distribution**: the borrower’s cash flow under operational variation, pricing variation, macro variation, customer behavior, and refinancing conditions. But a memo is not a Monte Carlo output. It is an argument with constraints. The constraints are practical: limited time, incomplete data, and multiple stakeholders with different incentives. A committee needs to know whether the memo is an honest representation of the underwriting state. That honesty is not a moral preference; it is an operational requirement. A bank’s ability to explain why it made a decision is part of its ability to defend that decision later — to auditors, regulators, internal review, litigation, or simply the next committee when the borrower comes back for a waiver.

The notebook’s workflow models the most common failure mode in generative drafting: the model’s tendency to fill gaps with fluent guesswork. In normal consumer contexts, that tendency is annoying. In credit, it is unacceptable. A credit memo cannot “average out” uncertainty by sounding confident. It must do the opposite: surface uncertainty explicitly, prioritize open items, and anchor recommendations to what is actually known. This is why the critique node exists as a structural component rather than an optional human habit. The critique node acts like a committee reviewer who asks: “Which claims are unsupported? Which diligence items are missing? What would block approval? What covenants follow from the real risk profile rather than generic templates?” That critique is not just commentary — it is a state update that changes the routing and the next draft.

The system therefore treats drafting as a controlled process, not a creative act. Drafting produces an initial memo; critique produces structured gaps and questions; a dedicated evidence-gaps node marks the state explicitly; revision integrates the critique without inventing facts; and finalization terminates either when quality is acceptable or when the bounded iteration limit is reached. That bounded loop is not a teaching convenience; it is a governance control. In real workflows, endless revision is not possible. You have timeboxes, committee deadlines, and escalation triggers. A bounded loop forces the system to decide: do we have enough to proceed, or do we stop and escalate? This notebook encodes that discipline into the graph itself.

Why an agentic architecture instead of a single prompt? Because a credit memo is not a single deliverable; it is a sequence of decisions under constraint. The “deliverable” is the memo, but the real product is the decision pathway: how the team moved from incomplete facts to a recommendation that can be defended. A one-shot prompt collapses that pathway into an opaque blob. It may produce a memo-like output, but it provides no structural guarantees about evidence handling, and it provides no durable artifacts that auditors or reviewers can inspect. A graph-based architecture, by contrast, makes the workflow explicit: nodes are responsibilities, edges are routing logic, and state is the shared contract. If something goes wrong, you can locate where it happened: drafting, critique, revision, or finalization.

This is the deeper pedagogical point: in professional finance, the problem is rarely “can we generate text.” The problem is “can we generate text in a way that is reviewable, reproducible, and controlled.” In other words, the problem is not language; it is governance. A LangGraph topology is a way to teach governance as code. Instead of relying on “be careful” instructions, you enforce carefulness through structure: explicit state fields for evidence gaps and assumptions, strict JSON returns for critique objects, deterministic routing based on state values, and explicit termination conditions. This mirrors how real credit organizations operate: policies, checklists, escalations, and stage gates exist because humans under time pressure will otherwise drift toward convenience.

The example also illustrates why “state drives routing, not text heuristics alone” is a meaningful principle. It is tempting to route based on keywords in the memo or the critique narrative. That is fragile and unreviewable. Instead, the critique node returns a typed structure: a list of evidence gaps, a list of questions to verify, a list of assumptions, and a binary quality decision. Routing then depends on these fields and on the bounded iteration counter. This produces a system whose behavior can be tested and reasoned about. You can create unit tests around routing decisions. You can audit runs by reading `final_state.json`. You can compare topologies across notebooks as the course evolves.

In real underwriting, a similar separation appears as a set of structured artifacts: a diligence tracker, a covenant grid, an exceptions list, and a set of open items tied to owners and timelines. Those artifacts exist precisely because committees do not trust narrative alone. Narrative is necessary, but it is not sufficient. The memo is a narrative wrapper around a structured process. This notebook encodes that idea directly: it uses narrative outputs (the memo) but it also produces structured outputs (the critique JSON) and structured run artifacts (graph spec, manifest, final state). In practice, this is what enables accountability. When an approval goes wrong, the question is not “who wrote the memo.” The question is “what did we know, what did we assume, what did we miss, and why did the process allow that miss.”

The workflow also demonstrates an important real-world constraint: models will occasionally violate formatting requirements. In regulated or institutional settings, you cannot treat that as “close enough.” You must have **bounded repair mechanisms**. That is why the critique stage includes a strict JSON contract and a bounded retry that re-asks for JSON if the model returns prose. The retry is bounded because unlimited retries are operationally dangerous (cost, latency, unpredictability) and because governance requires explicit failure modes. If the model cannot comply after the bounded retry, the workflow should fail loudly and stop. That is not a weakness. That is a feature: it forces human intervention rather than silently degrading into unreviewable behavior.

Finally, this notebook is part of an arc. Earlier notebooks introduced conditional retry loops and suitability boundaries. Here, the architectural dimension added is the **structured critique loop**: a controlled self-correction mechanism that does not optimize for eloquence, but for evidence discipline. Later notebooks will add tools, regime machines, committees, routers, and supervisors. But the lesson here is foundational: before you build complex multi-agent systems, you must first learn how to make a simple two-role workflow behave like a professional process. Credit memo drafting is an ideal vehicle for that lesson because it is familiar, high-stakes, and unforgiving of hidden assumptions.

If you take only one principle from this introduction, it should be this: **a credit memo is a governed artifact**. The agentic architecture exists to preserve the governance properties that human credit processes evolved to enforce: separation of facts and assumptions, explicit evidence gaps, bounded revision cycles, and auditable traces. The model is not the decision-maker. The model is a drafting component inside a controlled system. The system — state, routing, bounded loops, and artifacts — is what turns generative output into something a real committee can use.


##1.LIBRARIES AND ENVIRONMENT

**CELL 1/10 — Install + core imports (Colab-ready, Chapter-2 style)**  
This cell establishes the notebook’s execution substrate and is intentionally treated as a **governance control**, not a convenience step. In agentic systems, “it runs on my machine” is not an acceptable standard; a workflow that depends on ambient Colab packages will eventually fail in front of students, reviewers, or production stakeholders. The explicit installs and uninstalls are there to create a predictable environment for three interacting subsystems: **LangGraph** (workflow orchestration), **Anthropic SDK** (model transport), and **httpx/httpcore** (the HTTP transport layer underlying the SDK). The reason you saw earlier “proxies” issues is that mismatched versions can break client initialization even before the graph is built. This is why we pin `httpx==0.27.2`: it aligns with SDK expectations in the Colab image.  

From a pedagogy perspective, this cell teaches a critical professional lesson: **agent reliability begins with dependency hygiene**. The warning messages from pip about other libraries are not the main concern; what matters is that the libraries we use in this notebook are coherent and that conflicting packages (like `langgraph-prebuilt`) are removed if they introduce incompatible constraints. You are not trying to “win” dependency resolution globally; you are building a controlled lab environment for a specific architecture.  

This cell also sets determinism controls (`PYTHONHASHSEED`, `random.seed`) because later nodes depend on stable routing and stable artifact generation. The notebook is designed to be rerun in a classroom and yield the same topology, the same routing conditions, and the same artifact structure. The version printouts are not cosmetic; they are a minimal audit hook. In real finance workflows, a run manifest without package versions is incomplete because model behavior and transport behavior can change with library versions. Here we practice the habit: print versions early, fail early if a required pin is missing, and move forward only if the substrate is correct. That is the difference between “a demo” and “a governed laboratory.”


In [1]:
# CELL 1/10 — Install + clean environment + core imports (MAX STABILITY)


# --- PINNED INSTALLS ---
!pip -q install "httpx==0.27.2" "httpcore==1.0.5"
!pip -q install "langgraph==0.2.39" "langchain==0.3.14" "langchain-core==0.3.40" "anthropic>=0.34.0"

# --- CORE IMPORTS ---
import json, os, sys, platform, hashlib, uuid, re, time, random
import datetime as _dt
from typing import TypedDict, Literal, Dict, Any, List, Optional, Callable

from langgraph.graph import StateGraph, END
from google.colab import userdata
from IPython.display import HTML, display

# --- DETERMINISM ---
random.seed(7)
os.environ["PYTHONHASHSEED"] = "7"

# --- VERSION CHECK ---
import importlib.metadata as md

def _ver(pkg: str) -> str:
    try:
        return md.version(pkg)
    except Exception:
        return "missing"

VERSIONS = {
    "httpx": _ver("httpx"),
    "httpcore": _ver("httpcore"),
    "langgraph": _ver("langgraph"),
    "langchain": _ver("langchain"),
    "langchain-core": _ver("langchain-core"),
    "anthropic": _ver("anthropic"),
    "langgraph-prebuilt": _ver("langgraph-prebuilt"),
}

print("VERSIONS:", VERSIONS)

# --- HARD VALIDATION ---
if not VERSIONS["httpx"].startswith("0.27."):
    raise RuntimeError(f"httpx pin failed. Expected 0.27.x, got {VERSIONS['httpx']}.")

if VERSIONS["langgraph-prebuilt"] != "missing":
    raise RuntimeError("langgraph-prebuilt is still installed; it conflicts with langchain-core pin.")

if VERSIONS["anthropic"] == "missing":
    raise RuntimeError("anthropic package missing after install.")

print("OK: Clean environment ready.")


VERSIONS: {'httpx': '0.27.2', 'httpcore': '1.0.5', 'langgraph': '0.2.39', 'langchain': '0.3.14', 'langchain-core': '0.3.40', 'anthropic': '0.81.0', 'langgraph-prebuilt': 'missing'}
OK: Clean environment ready.


##2.CONFIGURATION

###2.1.OVERVIEW

**CELL 2/10 — Configuration + deterministic run identity + fingerprints**  
This cell defines the **contract of the run**: what model is allowed, what limits apply, and how the run is identified and audited. The first hard rule is the model lock: `claude-haiku-4-5-20251001`. The purpose is not branding; it is reproducibility. When a credit memo changes because the model changed, you need to know that it changed. That is why the model name is carried into state and exported in `run_manifest.json`.  

Next, we define bounds and policies that will shape system behavior. `critique_max_iters` is the architectural feature introduced in this notebook: a bounded critique loop. The loop bound belongs in configuration because it is a governance decision. A committee process cannot iterate forever; it must either converge or escalate. The provider retry settings are also bounded and deterministic. Overload retries (529) happen in real life; the correct posture is not “try forever,” but “try a small number of times with known backoff, then fail loudly.”  

The run identity (`RUN_ID`) and config hash (`CONFIG_HASH`) are the keys to auditability. The config hash is especially important: it prevents accidental drift. If you rerun with different limits or different loop bounds, the hash changes, and you can see that the run is not comparable. The environment fingerprint is intentionally minimal here, but it establishes the pattern: record the runtime context at the moment the run begins.  

Pedagogically, this cell teaches that professional systems are driven by **explicit configuration** rather than hidden defaults. Defaults are invisible dependencies. In finance, invisible dependencies become unreviewable risk. By making configuration a first-class object, you enable clear discussion: “Why two critique iterations?” “Why temperature 0?” “Why 900 tokens?” Those are not arbitrary choices; they are design parameters tied to governance goals.


###2.2.CODE AND IMPLEMENTATION

In [2]:
# CELL 2/10 — Config + deterministic run identity + fingerprints (executable)
MODEL_NAME = "claude-haiku-4-5-20251001"  # STRICT lock (no substitution)

def utc_now_iso() -> str:
    return _dt.datetime.now(_dt.timezone.utc).isoformat()

def sha256_hex(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8")).hexdigest()

RUN_ID = str(uuid.uuid4())

CONFIG: Dict[str, Any] = {
    "project": "AA-FIN-LG-2026",
    "chapter": 3,
    "notebook": 3,
    "notebook_id": "AA-FIN-LG-2026_N3",
    "objective": "Credit memo drafting with evidence gaps — structured critique loop",
    "model": MODEL_NAME,
    "temperature": 0.0,
    "max_tokens": 900,
    "critique_max_iters": 2,  # bounded loop: at most 2 revisions (3 drafts total)
    "provider": {
        "mode": "LIVE",  # no simulation
        "overload_max_attempts": 5,  # bounded
        "overload_backoff_seconds": [0.6, 1.2, 2.4, 4.8, 4.8],
    },
    "viz": {"mermaid_version": "10.6.1"},
    "artifacts_dir": "artifacts_ch3_nb3",
}

CONFIG_HASH = sha256_hex(json.dumps(CONFIG, sort_keys=True))

ENV_FINGERPRINT = {
    "ts_utc": utc_now_iso(),
    "python": sys.version.split()[0],
    "platform": platform.platform(),
}

print("RUN_ID:", RUN_ID)
print("CONFIG_HASH:", CONFIG_HASH)
print("MODEL_LOCK:", CONFIG["model"])
print("ENV:", json.dumps(ENV_FINGERPRINT, indent=2))


RUN_ID: 0d511995-3460-4df7-b4c8-b9a64d3f48fb
CONFIG_HASH: eec7233d296e9d82938c5dc47079c17ac63c7621c6ae2261c47fddddd1ca3cc3
MODEL_LOCK: claude-haiku-4-5-20251001
ENV: {
  "ts_utc": "2026-02-18T18:00:04.788569+00:00",
  "python": "3.12.12",
  "platform": "Linux-6.6.105+-x86_64-with-glibc2.35"
}


##3.VISUALIZATION

###3.1.OVERVIEW

**CELL 3/10 — Visualization Standard v1: Mermaid renderer + display_langgraph_mermaid(app)**  
This cell exists because a graph-based workflow is only teachable if the topology is visible. The diagram is not decoration; it is the learning artifact that makes architecture concrete. Students and practitioners need to see the difference between a linear pipeline and a conditional loop. They need to visually inspect where critique feeds back into revision, where termination occurs, and how the system enforces bounded iteration.  

We use a hardened Mermaid ESM renderer because Colab is a browser runtime with evolving security constraints. The renderer is pinned to a known Mermaid version and uses strict mode. This matters because visualization failures are a common source of classroom friction: if the graph doesn’t render reliably, students cannot connect code to topology. The wrapper function `display_langgraph_mermaid` is also a governance constraint: it standardizes how we visualize across notebooks. In a series of ten notebooks, consistent visualization is part of the pedagogy.  

The switch to a white background is purely a presentation adjustment, but it illustrates a broader point: even “small” UI details can affect comprehension. A diagram that is hard to read is a diagram that fails as a teaching artifact. The renderer therefore sets background, border, overflow behavior, and theme in a controlled way.  

Architecturally, this cell reinforces the principle: **graphs are explicit process representations**. A system that cannot explain its own structure is harder to audit. In real organizations, a workflow that is “only in people’s heads” becomes fragile. Here we model the opposite: the workflow structure is inspectable, reproducible, and exportable. That is why we also export the Mermaid string in `graph_spec.json` later. The diagram is not just for humans today; it is for review tomorrow.


###3.2.CODE AND IMPLEMENTATION

In [3]:
# CELL 3/10 — Visualization Standard v1 (WHITE background) (executable)
MERMAID_VERSION = CONFIG["viz"]["mermaid_version"]

def display_langgraph_mermaid(compiled_app_or_graph) -> None:
    g = compiled_app_or_graph.get_graph() if hasattr(compiled_app_or_graph, "get_graph") else compiled_app_or_graph
    if not hasattr(g, "draw_mermaid"):
        raise ValueError("Pass a compiled LangGraph app or graph with draw_mermaid().")

    mermaid_code = g.draw_mermaid()
    box_id = f"m_{sha256_hex(mermaid_code)[:10]}_{int(time.time()*1000)}"

    html = f"""
<div id="{box_id}" style="
  background:#ffffff;
  border:1px solid rgba(0,0,0,0.12);
  border-radius:12px;
  padding:12px;
  overflow:auto;">
  <div class="mermaid" style="white-space:pre;">{mermaid_code}</div>
</div>

<script type="module">
  import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@{MERMAID_VERSION}/dist/mermaid.esm.min.mjs";
  try {{
    mermaid.initialize({{
      startOnLoad: true,
      securityLevel: "strict",
      theme: "default",                 // <-- LIGHT theme
      flowchart: {{ curve: "basis", useMaxWidth: true, htmlLabels: true }},
      themeVariables: {{
        background: "#ffffff"           // <-- ensure white canvas
      }}
    }});
    mermaid.run({{ querySelector: "#{box_id} .mermaid" }});
  }} catch (e) {{
    const el = document.querySelector("#{box_id}");
    if (el) el.innerHTML =
      "<pre style='color:#b00020;white-space:pre-wrap;'>Mermaid render failed: " + String(e) + "</pre>";
    console.error(e);
  }}
</script>
"""
    display(HTML(html))

print("OK: Mermaid renderer ready (WHITE background). Pinned:", MERMAID_VERSION)


OK: Mermaid renderer ready (WHITE background). Pinned: 10.6.1


##4.STATE SCHEMA

###4.1.OVERVIEW

**CELL 4/10 — TypedDict state schema + init + trace logger**  
This cell is the core “mechanism-first” discipline: it defines the system state explicitly, with a typed schema that names what matters. The state is not a chat transcript. It is not a blob of text. It is a structured object that drives routing and holds artifacts. In professional finance workflows, the most important step is often deciding what the state is: what inputs are allowed, what outputs are required, and what control variables are used to govern iteration.  

`CreditMemoState` separates: inputs (`case_facts`), working outputs (`memo_draft`, `critique`, `evidence_gaps`, `questions_to_verify`, `assumptions`), and control/audit fields (`iter_count`, `max_iters`, `status`, `run_id`, `config_hash`, `model`, `trace`). This separation is deliberate. It prevents the model from “smuggling” new facts into the memo without traceability. It also ensures that routing decisions can be based on state variables rather than narrative.  

The trace logger is an audit hook. Every node entry and exit is recorded with a timestamp and a payload. This is not overengineering; it is the minimal infrastructure you need to answer questions like: “How many revisions occurred?” “What was the status after critique?” “Did the router send us to gaps or revise?” In regulated or high-stakes environments, you are rarely judged only on the final output. You are judged on the pathway that produced it.  

Pedagogically, this cell teaches a powerful mindset: **state is the system**. If you cannot write the state schema clearly, you do not yet understand the workflow. Students who learn to design state schemas learn to design systems. This is why the course emphasizes TypedDict: it forces explicitness and supports disciplined reasoning about what the workflow can and cannot do.


###4.2.CODE AND IMPLEMENTATION

In [4]:
# CELL 4/10 — TypedDict state schema + init + trace logger (executable)
CreditStatus = Literal["INIT", "DRAFTED", "CRITIQUED", "GAPS_IDENTIFIED", "REVISED", "FINALIZED", "STOPPED"]

class CreditMemoState(TypedDict):
    # Inputs
    case_facts: Dict[str, Any]

    # Working outputs
    memo_draft: str
    critique: Dict[str, Any]
    evidence_gaps: List[str]
    questions_to_verify: List[str]
    assumptions: List[str]

    # Control + audit
    iter_count: int
    max_iters: int
    status: CreditStatus
    run_id: str
    config_hash: str
    model: str
    started_utc: str
    trace: List[Dict[str, Any]]

def trace(state: CreditMemoState, node: str, event: str, payload: Dict[str, Any]) -> None:
    state["trace"].append({
        "ts_utc": utc_now_iso(),
        "node": node,
        "event": event,
        "payload": payload,
    })

def init_state(case_facts: Dict[str, Any]) -> CreditMemoState:
    return {
        "case_facts": case_facts,
        "memo_draft": "",
        "critique": {},
        "evidence_gaps": [],
        "questions_to_verify": [],
        "assumptions": [],
        "iter_count": 0,
        "max_iters": int(CONFIG["critique_max_iters"]),
        "status": "INIT",
        "run_id": RUN_ID,
        "config_hash": CONFIG_HASH,
        "model": CONFIG["model"],
        "started_utc": utc_now_iso(),
        "trace": [],
    }

print("OK: state schema ready.")


OK: state schema ready.


##5.LLM CLIENT WRAPPER

###5.1.OVERVIEW

**CELL 5/10 — Anthropic client init + strict JSON/text helpers (LIVE-only)**  
This cell establishes the controlled interface to the model. The primary lesson is that model calls are not “magic”; they are a dependency that must be governed. We enforce the API key source (`userdata.get("ANTHROPIC_API_KEY")`) to prevent accidental leakage and to standardize deployment across student runtimes. We create a single client and wrap it in helper functions that enforce bounded retries on overload and strict parsing behavior.  

There are two call modes: text and JSON. Drafting and revision use `call_claude_text` because the output is naturally narrative. Critique uses `call_claude_json` because the output must become state. This distinction is crucial: structured objects should come from structured outputs. When you let a model produce free-form critique and then you parse it heuristically, you create an unreviewable system. Here we do the opposite: we demand JSON and fail loudly if the contract is violated.  

The bounded retry logic is part of professional reliability. Provider overload happens. The correct response is not to silently degrade or to loop forever. We retry a fixed number of times with known backoffs and then stop. This keeps classroom execution fast and predictable and keeps production-like behavior aligned with governance expectations.  

Pedagogically, this cell highlights the separation between **architecture** and **prompting**. The model is a component. The system must treat it as such. By wrapping calls, you centralize policy: token limits, temperature, retries, and parsing rules. This is how systems become maintainable. When later notebooks add tools or multiple agents, the call boundary remains stable, and the system remains auditable. In finance, maintainability is not optional; it is part of model risk management.


###5.2.CODE AND IMPLEMENTATION

In [5]:
# CELL 5/10 — Anthropic client init + MAX-STABILITY strict JSON pipeline (bounded repair + fallback) (executable)
import json, time
from anthropic import Anthropic

API_KEY = userdata.get("ANTHROPIC_API_KEY")  # STRICT: ALL CAPS
if not API_KEY or not isinstance(API_KEY, str) or not API_KEY.strip():
    raise RuntimeError("Missing Colab secret ANTHROPIC_API_KEY (ALL CAPS).")

client = Anthropic(api_key=API_KEY.strip())

def _extract_text(msg) -> str:
    out = ""
    for block in msg.content:
        if getattr(block, "type", None) == "text":
            out += block.text
    return out.strip()

def _is_overloaded(exc: Exception) -> bool:
    s = str(exc)
    return ("Error code: 529" in s) or ("overloaded" in s.lower())

def call_claude_text(system: str, user: str, *, max_tokens: Optional[int] = None) -> str:
    max_attempts = int(CONFIG["provider"]["overload_max_attempts"])
    backoffs = list(CONFIG["provider"]["overload_backoff_seconds"])
    last_err: Optional[Exception] = None

    for attempt in range(max_attempts):
        try:
            msg = client.messages.create(
                model=CONFIG["model"],
                max_tokens=int(max_tokens if max_tokens is not None else CONFIG["max_tokens"]),
                temperature=float(CONFIG["temperature"]),
                system=system,
                messages=[{"role": "user", "content": user}],
            )
            return _extract_text(msg)
        except Exception as e:
            last_err = e
            if _is_overloaded(e) and attempt < max_attempts - 1:
                time.sleep(backoffs[min(attempt, len(backoffs) - 1)])
                continue
            raise
    raise RuntimeError(f"Claude call failed after {max_attempts} attempts.") from last_err

def _strip_code_fences(s: str) -> str:
    s = s.strip()
    if s.startswith("```"):
        lines = s.splitlines()
        if len(lines) >= 1 and lines[0].startswith("```"):
            lines = lines[1:]
        if len(lines) >= 1 and lines[-1].strip() == "```":
            lines = lines[:-1]
        s = "\n".join(lines).strip()
    return s

def _first_complete_json_object(s: str) -> Optional[str]:
    """
    Finds the first COMPLETE JSON object by brace-balancing.
    Returns None if no complete object is present (e.g., truncation).
    """
    start = s.find("{")
    if start == -1:
        return None
    depth = 0
    in_str = False
    esc = False
    for i in range(start, len(s)):
        c = s[i]
        if in_str:
            if esc:
                esc = False
            elif c == "\\":
                esc = True
            elif c == '"':
                in_str = False
            continue
        else:
            if c == '"':
                in_str = True
                continue
            if c == "{":
                depth += 1
            elif c == "}":
                depth -= 1
                if depth == 0:
                    return s[start:i+1]
    return None

def call_claude_json_max_stability(
    system: str,
    user: str,
    *,
    max_tokens: int,
    attempts: int = 2
) -> Dict[str, Any]:
    """
    Maximum stability JSON:
    - attempt 1: parse direct (strip fences + brace-balance)
    - attempt 2: strict repair re-ask (no prose, smaller lists)
    Hard bound: attempts=2 (exactly one repair try)
    """
    last_raw = None
    prompt = user

    for attempt in range(attempts):
        raw = call_claude_text(system, prompt, max_tokens=max_tokens)
        last_raw = raw
        raw2 = _strip_code_fences(raw)
        js = _first_complete_json_object(raw2)
        if js is not None:
            return json.loads(js)

        # prepare bounded repair prompt
        prompt = (
            user
            + "\n\nFORMAT REPAIR (MANDATORY): Return ONLY one complete JSON object."
              " No prose. No markdown. No code fences."
              " Keep arrays within the MAX limits and keep strings concise."
        )

    raise ValueError(f"Model did not return a COMPLETE JSON object.\nRAW:\n{(last_raw or '')[:4000]}")

print("OK: Anthropic client initialized + MAX-STABILITY JSON parser ready. MODEL locked:", CONFIG["model"])


OK: Anthropic client initialized + MAX-STABILITY JSON parser ready. MODEL locked: claude-haiku-4-5-20251001


##6.AGENT NODE

###6.1.OVERVIEW

**CELL 6/10 — AgentNode abstraction + credit memo nodes + bounded JSON enforcement in critique**  
This cell introduces the architectural heart of Notebook 3: a structured critique loop implemented via explicit nodes. The `AgentNode` abstraction is required because it makes each node a testable unit: a named responsibility with a clear input state and output state. The wrapper also standardizes audit logging by tracing node entry and exit. This is not a stylistic choice; it is how you make a multi-node workflow inspectable.  

The nodes correspond to real underwriting roles. `DRAFT_MEMO` behaves like an analyst drafting a memo. `CRITIQUE_MEMO` behaves like a senior reviewer forcing evidence discipline. `EVIDENCE_GAPS` makes “missingness” explicit as a state milestone, which is important because evidence gaps are not merely a list; they are a gating condition. `REVISE_MEMO` integrates critique without inventing facts, and `FINALIZE` produces a controlled termination status.  

The key technical point is the critique node’s strict JSON contract and bounded retry. Models sometimes violate formatting. In professional systems you must respond with bounded repair, not permissive parsing. The helper `_critique_json_with_bounded_retries` performs at most one re-ask, explicitly reinforcing the format requirement. If the model still fails, the workflow stops loudly. This is exactly what “governance-first” means: correctness and reviewability outrank convenience.  

Pedagogically, this cell teaches how to design nodes that are “small but complete.” Each node does one job. Prompts are kept stable and minimal; structure comes from the graph, not from sprawling instructions. Students learn that agentic design is not about adding more prompts; it is about creating **reliable transformations of state**. In real credit work, this is the difference between a memo that reads well and a memo that can survive committee scrutiny.


###6.2.CODE AND IMPLEMENTATION

In [6]:
# CELL 6/10 — AgentNode abstraction + node implementations (MAX-STABILITY critique JSON: small payload) (executable)

class AgentNode:
    def __init__(self, name: str, fn: Callable[[CreditMemoState], CreditMemoState]):
        self.name = name
        self.fn = fn

    def __call__(self, state: CreditMemoState) -> CreditMemoState:
        trace(state, self.name, "enter", {"status": state["status"], "iter_count": state["iter_count"]})
        out = self.fn(state)
        trace(out, self.name, "exit", {"status": out["status"], "iter_count": out["iter_count"]})
        return out

SYS_DRAFT = "You draft committee-grade credit memos. You never fabricate facts."
SYS_CRIT  = "You are a strict credit committee critic. Return strict JSON only. You never allow invented facts."
SYS_REV   = "You revise a credit memo under evidence discipline. You never fabricate facts."

def _prompt_draft(case_facts: Dict[str, Any]) -> str:
    return f"""
Draft an institutional-grade CREDIT MEMO for internal credit committee review.

Hard requirements:
- Sections: Executive Summary; Business Overview; Transaction Request; Use of Proceeds; Financial Snapshot (facts only);
  Underwriting Logic; Key Risks; Mitigants; Proposed Structure & Covenants (recommendations allowed, label as such);
  Evidence Gaps / Open Items; Assumptions; Preliminary Recommendation.
- Do NOT fabricate facts. If missing, put in Evidence Gaps/Open Items and proceed with placeholders.

CASE FACTS (JSON):
{json.dumps(case_facts, indent=2, sort_keys=True)}
""".strip()

def _prompt_critique(memo: str) -> str:
    # MAX STABILITY: keep JSON small to avoid truncation
    return f"""
Return STRICT JSON ONLY (one object) with exactly these keys:
  evidence_gaps: array of strings (MAX 6 items, each <= 140 chars)
  questions_to_verify: array of strings (MAX 6 items, each <= 140 chars)
  assumptions: array of strings (MAX 6 items, each <= 140 chars)
  memo_quality: "PASS" or "REVISE"
  rationale: string (<= 200 chars)

Rules:
- Output must be a single COMPLETE JSON object. No prose, no markdown, no code fences.
- Do not exceed MAX item counts.
- Prioritize the most approval-blocking items first.
- Do not invent facts; if uncertain, phrase as an open item.

MEMO:
{memo}
""".strip()

def _prompt_revise(case_facts: Dict[str, Any], memo: str, critique: Dict[str, Any]) -> str:
    return f"""
Revise the credit memo to address the critique.

Rules:
- Do NOT invent facts.
- Convert unsupported claims into Assumptions/Open Items.
- Incorporate the critique's evidence gaps/questions explicitly (may be condensed).
- Keep sections intact and committee tone professional.

CASE FACTS (JSON):
{json.dumps(case_facts, indent=2, sort_keys=True)}

CRITIQUE (JSON):
{json.dumps(critique, indent=2, sort_keys=True)}

PRIOR MEMO:
{memo}
""".strip()

def n_draft(state: CreditMemoState) -> CreditMemoState:
    state["memo_draft"] = call_claude_text(SYS_DRAFT, _prompt_draft(state["case_facts"]), max_tokens=1100)
    state["status"] = "DRAFTED"
    return state

def n_critique(state: CreditMemoState) -> CreditMemoState:
    critique = call_claude_json_max_stability(
        SYS_CRIT,
        _prompt_critique(state["memo_draft"]),
        max_tokens=900,   # enough for small JSON, reduces truncation risk
        attempts=2        # bounded repair (one retry)
    )

    required = ["evidence_gaps", "questions_to_verify", "assumptions", "memo_quality", "rationale"]
    for k in required:
        if k not in critique:
            raise ValueError(f"Critique JSON missing key: {k}. Got: {list(critique.keys())}")

    if not isinstance(critique["evidence_gaps"], list) or not isinstance(critique["questions_to_verify"], list) or not isinstance(critique["assumptions"], list):
        raise ValueError("Critique JSON must have list types for evidence_gaps/questions_to_verify/assumptions.")

    if len(critique["evidence_gaps"]) > 6:
        raise ValueError("evidence_gaps exceeds MAX 6 items.")
    if len(critique["questions_to_verify"]) > 6:
        raise ValueError("questions_to_verify exceeds MAX 6 items.")
    if len(critique["assumptions"]) > 6:
        raise ValueError("assumptions exceeds MAX 6 items.")

    mq = str(critique["memo_quality"]).upper()
    if mq not in ("PASS", "REVISE"):
        raise ValueError(f"memo_quality must be PASS or REVISE. Got: {critique['memo_quality']}")
    critique["memo_quality"] = mq

    state["critique"] = critique
    state["evidence_gaps"] = list(critique["evidence_gaps"])
    state["questions_to_verify"] = list(critique["questions_to_verify"])
    state["assumptions"] = list(critique["assumptions"])
    state["status"] = "CRITIQUED"
    return state

def n_gaps(state: CreditMemoState) -> CreditMemoState:
    state["status"] = "GAPS_IDENTIFIED"
    return state

def n_revise(state: CreditMemoState) -> CreditMemoState:
    state["memo_draft"] = call_claude_text(SYS_REV, _prompt_revise(state["case_facts"], state["memo_draft"], state["critique"]), max_tokens=1300)
    state["iter_count"] += 1
    state["status"] = "REVISED"
    return state

def n_finalize(state: CreditMemoState) -> CreditMemoState:
    quality = str(state["critique"].get("memo_quality", "REVISE")).upper()
    if quality == "PASS":
        state["status"] = "FINALIZED"
    else:
        state["status"] = "STOPPED" if state["iter_count"] >= state["max_iters"] else "FINALIZED"
    return state

NODE_DRAFT = AgentNode("DRAFT_MEMO", n_draft)
NODE_CRIT  = AgentNode("CRITIQUE_MEMO", n_critique)
NODE_GAPS  = AgentNode("EVIDENCE_GAPS", n_gaps)
NODE_REV   = AgentNode("REVISE_MEMO", n_revise)
NODE_FIN   = AgentNode("FINALIZE", n_finalize)

print("OK: Agent nodes ready (MAX-STABILITY JSON critique).")


OK: Agent nodes ready (MAX-STABILITY JSON critique).


##7.GRAPH BUILD

###7.1.OVERVIEW

**CELL 7/10 — LangGraph topology: conditional routing + bounded loop + explicit END**  
This cell turns the conceptual workflow into an executable graph. The main learning objective is understanding how **routing rules encode professional process**. We set an entry point (`DRAFT_MEMO`), connect drafting to critique, and then add conditional edges from `CRITIQUE_MEMO` based on state. This is where “state drives routing” becomes real: the router reads `memo_quality`, checks whether evidence gaps exist, and checks whether the iteration bound has been reached. No fragile keyword heuristics are needed.  

The loop is explicit: `REVISE_MEMO → CRITIQUE_MEMO`. It is bounded because `iter_count < max_iters` is part of the routing decision. This is the notebook’s new architectural dimension: a structured critique loop that can converge or stop. When the loop cannot continue, routing forces `FINALIZE`, and the graph terminates at `END`. The explicit `END` is important: professional workflows must have known terminal states. “It just stops when it stops” is not acceptable for governed systems.  

Pedagogically, this cell clarifies what a graph buys you. In a normal script, loop logic is easy to hide inside code. In a graph, the loop is part of the topology and therefore part of the learning artifact. Students can point to the exact edge that creates iteration. They can inspect the routing function and see why the system chose a path. This creates the right mental model: agentic systems are not mystical; they are **control flows over state**.  

In real-life credit processes, the analog is a stage gate: draft → review → revise → re-review → decision. Encoding that as a graph is not “automation theater.” It is a way to standardize discipline, reduce process variance, and generate audit trails.


###7.2.CODE AND IMPLEMENTATION

In [7]:
# CELL 7/10 — LangGraph build: structured critique loop + conditional routing + explicit END (executable)

def route_after_critique(state: CreditMemoState) -> str:
    quality = str(state["critique"].get("memo_quality", "REVISE")).upper()
    has_gaps = len(state["evidence_gaps"]) > 0
    can_iterate = state["iter_count"] < state["max_iters"]

    if quality == "PASS":
        return "finalize"
    if can_iterate and has_gaps:
        return "gaps"
    if can_iterate:
        return "revise"
    return "finalize"

builder = StateGraph(CreditMemoState)
builder.add_node("DRAFT_MEMO", NODE_DRAFT)
builder.add_node("CRITIQUE_MEMO", NODE_CRIT)
builder.add_node("EVIDENCE_GAPS", NODE_GAPS)
builder.add_node("REVISE_MEMO", NODE_REV)
builder.add_node("FINALIZE", NODE_FIN)

builder.set_entry_point("DRAFT_MEMO")
builder.add_edge("DRAFT_MEMO", "CRITIQUE_MEMO")

builder.add_conditional_edges(
    "CRITIQUE_MEMO",
    route_after_critique,
    {"gaps": "EVIDENCE_GAPS", "revise": "REVISE_MEMO", "finalize": "FINALIZE"},
)

builder.add_edge("EVIDENCE_GAPS", "REVISE_MEMO")
builder.add_edge("REVISE_MEMO", "CRITIQUE_MEMO")
builder.add_edge("FINALIZE", END)

app = builder.compile()
print("OK: Graph compiled. Nodes:", list(app.get_graph().nodes.keys()))


OK: Graph compiled. Nodes: ['__start__', 'DRAFT_MEMO', 'CRITIQUE_MEMO', 'EVIDENCE_GAPS', 'REVISE_MEMO', 'FINALIZE', '__end__']


##8.GRAPH VISUALIZATION

###8.1.OVERVIEW

**CELL 8/10 — Graph visualization (mandatory topology artifact)**  
This cell renders the compiled graph and is mandatory because the diagram is part of the notebook’s deliverables. The visualization is the fastest way to validate that the topology matches the intended architecture: a critique loop with a conditional branch that can route through an evidence-gaps milestone before revision. In professional workflows, visual inspection of topology is a legitimate validation step, just like reviewing a system diagram in architecture reviews.  

This cell also reinforces a critical point about agent systems: the topology is a first-class object. You do not only care about outputs; you care about the process that generated them. If a student modifies the graph incorrectly, the diagram will reveal it immediately. That is why the diagram must “match topology exactly.” It is not enough to say “we have a loop.” The loop must be visible and correct.  

Pedagogically, the diagram helps students develop the habit of mapping from narrative to structure. The narrative is: “Draft, critique, identify gaps, revise, repeat, finalize.” The structure is: nodes and edges with specific routing logic. In teaching, the diagram becomes a shared reference for discussion: “Where does iteration happen?” “What determines termination?” “Why route through gaps?” This is also why we export the Mermaid code later: it allows offline review and comparison across runs.  

In short, Cell 8 is the “architecture proof.” In real settings, reviewers often ask for the workflow diagram before they trust outputs. This cell makes that practice normal.


###8.2.CODE AND IMPLEMENTATION

In [8]:
# CELL 8/10 — Mandatory graph visualization (must match topology exactly) (executable)
display_langgraph_mermaid(app)
print("OK: Mermaid rendered.")


OK: Mermaid rendered.


##9.SYNTHETIC CREDIT CASE

###9.1.0VERVIEW

**CELL 9/10 — Execute the workflow on a synthetic credit case**  
This cell is where the architecture is exercised. The synthetic case is constructed to resemble a real underwriting situation but with deliberate missingness. That missingness is essential: if the case were fully specified, the critique loop would have less to do, and the notebook would not demonstrate its core value. By providing partial financial snapshot and leaving key diligence items absent, we force the system to surface evidence gaps and questions to verify.  

The run proceeds from `INIT` state to drafting, then critique, and then conditional routing. Depending on what the critique returns, the system may route through the evidence-gaps node and into revision, then back into critique. The bounded loop ensures the run completes quickly and predictably in a classroom. This is important: learning agentic systems should not require long runtimes. The point is to see the mechanism in action, not to wait.  

Cell 9 prints a compact summary: final status, iteration count, top evidence gaps, top verification questions, and a preview of the memo. This is intentionally not a full report. The full report lives in `final_state.json`. The printed view is for rapid feedback during instruction. In real life, this is like a deal team standup: you want a quick view of what’s blocking approval and what needs to be requested from the borrower.  

Pedagogically, the key observation students should make is that the system does not pretend to know what it cannot know. Instead, it produces a memo plus an explicit list of what remains missing. That is the professional win: the memo becomes decision-support rather than narrative. The architecture, not the model, is doing the heavy lifting by forcing uncertainty into explicit state.


###9.2.CODE AND IMPLEMENTATION

In [9]:
# CELL 9/10 — Run: synthetic credit case (executable)
case_facts = {
    "borrower": {"name": "Azul Logistics S.A.", "industry": "3PL / regional trucking", "ownership": "Founder-owned"},
    "request": {"type": "Term Loan", "amount_usd": 12000000, "tenor_years": 4, "purpose": "fleet expansion + working capital"},
    "financials": {"revenue_ttm_usd": 38000000, "ebitda_ttm_usd": 5200000, "leverage_notes": "Debt schedule incomplete."},
    "collateral": {"proposed": ["New equipment", "AR pledge (unconfirmed)"], "valuation_notes": "No appraisal provided."},
    "customers": {"top_customer_concentration": "High (exact % not provided)", "contracts": "Mix of spot and 1-year MSAs"},
    "risk_flags": ["Fuel price sensitivity", "Customer concentration", "Covenant headroom unknown"],
    "known_missing_items": ["Detailed debt schedule", "AR aging", "capex schedule", "insurance summary", "fleet utilization KPIs"],
}

state0 = init_state(case_facts)
final_state = app.invoke(state0)

print("FINAL STATUS:", final_state["status"])
print("ITER_COUNT:", final_state["iter_count"], "/", final_state["max_iters"])

print("\nEVIDENCE GAPS (<=6):")
for i, g in enumerate(final_state["evidence_gaps"], 1):
    print(f"{i}. {g}")

print("\nQUESTIONS TO VERIFY (<=6):")
for i, q in enumerate(final_state["questions_to_verify"], 1):
    print(f"{i}. {q}")

print("\nASSUMPTIONS (<=6):")
for i, a in enumerate(final_state["assumptions"], 1):
    print(f"{i}. {a}")

print("\nMEMO PREVIEW (first 1200 chars):\n")
print(final_state["memo_draft"][:1200])


FINAL STATUS: STOPPED
ITER_COUNT: 2 / 2

EVIDENCE GAPS (<=6):
1. Complete existing debt schedule, maturity profile, and covenants; pro forma leverage unverified at 2.3x
2. Detailed use of proceeds allocation between capex and working capital; capex plan with ROI assumptions missing
3. Full balance sheet (assets, liabilities, equity); PP&E valuation methodology and fleet appraisal absent
4. Operating cash flow, historical capex, DSO/DPO, and DSCR calculation; free cash flow cannot be assessed
5. Customer concentration analysis; revenue mix between contract MSAs and spot freight; concentration risk unquantified
6. Key person risk assessment; founder succession plan and management depth documentation not provided

QUESTIONS TO VERIFY (<=6):
1. What is the exact allocation of $12M between fleet capex and working capital, with detailed capex plan and ROI?
2. What is total existing debt, maturity schedule, and covenant package? Does pro forma 2.3x leverage hold?
3. What are top 5 customers b

##10.AUDIT BUNDLE

###10.1.OVERVIEW

**CELL 10/10 — Export artifacts: run_manifest.json, graph_spec.json, final_state.json**  
This cell operationalizes the governance requirement: every run must produce a minimal audit bundle. The artifacts are not optional because professional finance workflows are evaluated by their ability to be reviewed after the fact. A memo alone is not enough. You need to know which topology produced it, what configuration was used, which model was locked, and what the final state contained.  

`run_manifest.json` captures run identity, timestamps, model lock, config hash, environment fingerprint, and loop bounds. This is the record that makes the run comparable. If results differ, you can check whether the configuration differed. If a review happens later, you can reconstruct what happened. `graph_spec.json` captures the topology: nodes, edges, entry point, end, routing rules, and Mermaid. This is critical because the “system” is not just code; it is the graph structure. Without it, you cannot verify that the workflow executed the intended process. `final_state.json` captures the full terminal state, including memo, critique, gaps, assumptions, and trace. That trace is the audit trail of node execution.  

Pedagogically, this cell teaches that professional AI systems are artifact-producing systems. A governed workflow must leave behind evidence of what it did. In regulated settings, this is the difference between acceptable tooling and a liability. Students should treat these artifacts as part of the deliverable: when you submit work, you submit outputs plus provenance. This course makes that the default habit.  

The final sanity checks in this cell are also a teaching point: fail loudly if required files are missing or if the system terminated in an unexpected status. Silent success is not success in governed work; success must be verifiable.


###10.2.CODE AND IMPLEMENTATION

In [10]:
# CELL 10/10 — Export required artifacts: run_manifest.json, graph_spec.json, final_state.json (executable)
import os
import importlib.metadata as md

ART_DIR = CONFIG["artifacts_dir"]
os.makedirs(ART_DIR, exist_ok=True)

def _pkgver(name: str) -> str:
    try:
        return md.version(name)
    except Exception:
        return "missing"

run_manifest = {
    "run_id": RUN_ID,
    "ts_utc": utc_now_iso(),
    "project": CONFIG["project"],
    "chapter": CONFIG["chapter"],
    "notebook": CONFIG["notebook"],
    "notebook_id": CONFIG["notebook_id"],
    "objective": CONFIG["objective"],
    "model_lock": CONFIG["model"],
    "config_hash_sha256": CONFIG_HASH,
    "env_fingerprint": ENV_FINGERPRINT,
    "packages": {
        "anthropic": _pkgver("anthropic"),
        "httpx": _pkgver("httpx"),
        "httpcore": _pkgver("httpcore"),
        "langgraph": _pkgver("langgraph"),
        "langchain": _pkgver("langchain"),
        "langchain-core": _pkgver("langchain-core"),
    },
    "loop_bounds": {
        "critique_max_iters": CONFIG["critique_max_iters"],
        "overload_max_attempts": CONFIG["provider"]["overload_max_attempts"],
    },
    "notes": {
        "strict_json": True,
        "json_parser": "brace_balance_first_object",
        "bounded_json_repair": True,
        "critique_payload_caps": {"gaps": 6, "questions": 6, "assumptions": 6},
        "explicit_END": True,
    },
}

g = app.get_graph()
graph_spec = {
    "run_id": RUN_ID,
    "ts_utc": utc_now_iso(),
    "entry_point": "DRAFT_MEMO",
    "end_node": "END",
    "nodes": list(g.nodes.keys()),
    "edges": [{"from": e.source, "to": e.target} for e in g.edges],
    "conditional_routing": {
        "from_node": "CRITIQUE_MEMO",
        "router_fn": "route_after_critique",
        "routes": {"gaps": "EVIDENCE_GAPS", "revise": "REVISE_MEMO", "finalize": "FINALIZE"},
    },
    "mermaid_version_pinned": CONFIG["viz"]["mermaid_version"],
    "mermaid": g.draw_mermaid(),
}

final_state_export = {"run_id": RUN_ID, "ts_utc": utc_now_iso(), "state": final_state}

paths = {
    "run_manifest.json": os.path.join(ART_DIR, "run_manifest.json"),
    "graph_spec.json": os.path.join(ART_DIR, "graph_spec.json"),
    "final_state.json": os.path.join(ART_DIR, "final_state.json"),
}

with open(paths["run_manifest.json"], "w", encoding="utf-8") as f:
    json.dump(run_manifest, f, indent=2, sort_keys=True)

with open(paths["graph_spec.json"], "w", encoding="utf-8") as f:
    json.dump(graph_spec, f, indent=2, sort_keys=True)

with open(paths["final_state.json"], "w", encoding="utf-8") as f:
    json.dump(final_state_export, f, indent=2, sort_keys=True)

print("WROTE:")
for k, p in paths.items():
    print("-", k, "->", p)

assert final_state["status"] in ("FINALIZED", "STOPPED")
print("OK: required artifacts exported.")


WROTE:
- run_manifest.json -> artifacts_ch3_nb3/run_manifest.json
- graph_spec.json -> artifacts_ch3_nb3/graph_spec.json
- final_state.json -> artifacts_ch3_nb3/final_state.json
OK: required artifacts exported.


##11.CONCLUSION

This notebook closes with a simple but non-negotiable result: a credit memo workflow becomes materially more professional when it is treated as a **stateful control system** rather than a one-shot drafting prompt. The memo is not “better” because the language model is more persuasive. It is better because the architecture forces the workflow to respect the core constraint of real underwriting: **missing information is not a nuisance; it is the main object**. When evidence gaps are made explicit in state, routed deterministically through the graph, and preserved as artifacts, the system stops pretending that fluency equals truth. That is the central lesson.

The synthetic case was intentionally incomplete, and that incompleteness was not merely pedagogical. It mirrors the real operating condition of credit teams: you rarely have perfect data at first pass. You have fragments from the borrower, partial financials, unclear collateral, ambiguous customer concentration, and time pressure. In those conditions, the professional difference between acceptable and unacceptable output is the ability to separate **facts provided**, **assumptions**, and **open items** without drifting into invention. This notebook demonstrates that you can encode that separation as an architectural requirement, not a stylistic preference. The critique node does not “improve writing.” It identifies which claims must be downgraded, which diligence items block approval, and which questions must be asked before the memo can be treated as decision-support. That is exactly how a committee thinks.

The structured critique loop is therefore not a cosmetic feature; it is the mechanism that enforces underwriting integrity. The workflow’s bounded revision cycle models a real constraint: organizations do not have infinite time, and they cannot allow infinite self-editing. Bounded loops are governance tools. They define when the system is allowed to iterate and when it must stop and escalate. In practice, that escalation is a human decision: proceed with conditions, delay approval pending diligence, restructure the deal, or refuse. The notebook encodes the precondition for that decision: a transparent state that shows what is unknown and why it matters. When the system terminates as **FINALIZED** or **STOPPED**, that status is not decorative; it is a control signal that communicates readiness or the need for intervention.

A second practical conclusion is that strict output contracts are essential in professional agent systems. The critique node returning **strict JSON** is not an engineering flourish; it is the only way to make routing testable and auditable. Free-form critique text is unstructured and brittle. It cannot reliably drive conditional edges without hidden heuristics. By forcing the model into a typed structure (gaps, questions, assumptions, pass/revise), we make the system’s behavior inspectable. More importantly, we make failure modes explicit. When the model violates the JSON contract, the workflow does not “best-effort parse” and move on; it performs a bounded repair attempt and otherwise fails loudly. That is the correct posture for institutional work: silent degradation is worse than a visible stop.

The exported artifacts formalize what professionals often forget they need until something goes wrong. `run_manifest.json` ties the run to a model lock, configuration hash, environment fingerprint, and loop bounds. `graph_spec.json` makes the topology itself an audit object: nodes, edges, routing logic, and the exact Mermaid representation. `final_state.json` captures the full state at termination, including the memo, the critique structure, the open items, and the trace. Together, these artifacts convert an LLM workflow from an ephemeral chat into something closer to a reproducible process. That is the main reason to use LangGraph here: not because graphs are fashionable, but because they enable **explicit process representation**.

In real credit environments, this pattern scales naturally. The critique loop becomes a proxy for committee review. Evidence gaps become a diligence tracker. Questions to verify become task assignments and third-party requests. Assumptions become exception logs and sensitivity anchors. Proposed covenants become a structured term sheet draft rather than narrative suggestions. Once the architecture exists, you can extend it with additional nodes — document ingestion, ratio computation, covenant stress tests, pricing checks, portfolio limits — without losing the governance properties. The architecture is the scaffold that keeps complexity from turning into opacity.

Finally, this notebook clarifies the appropriate role of the model in finance workflows. The model is not an oracle. It is not an underwriter. It is a drafting component that must operate under constraint. The system is what provides discipline: typed state, deterministic routing, bounded loops, explicit end states, and auditable artifacts. If you build from that premise, you can use LLMs responsibly in credit contexts without pretending that they “know” what they cannot know. You can generate a memo quickly, but you can also generate the most important thing: a transparent map of what remains uncertain.

The conclusion, then, is architectural: **governance is not an add-on**. Governance is the workflow. A structured critique loop is one of the simplest ways to make that visible. It turns drafting into a controlled process, makes missing information a first-class object, and produces artifacts that survive scrutiny. This is the professional standard this course is teaching: mechanism first, state first, and accountability always.
