#**CHAPTER 10. MULTI DESK RESEARCH SYNTHESIS**
---

##REFERENCE

https://chatgpt.com/share/699721db-b414-8012-aef0-194a46d98396

##0.CONTEXT

**Introduction: Why This Notebook Exists and What It Proves**

This notebook is a demonstration of a simple but powerful idea: an AI system can be organized like a real investment firm. Not “AI as a chatbot,” and not “AI as a magic answer machine,” but AI as a **structured workflow** that behaves like a multi-desk research process—with explicit roles, explicit handoffs, explicit controls, and explicit stopping rules.

When we say “multi-desk research,” we are describing a familiar institutional routine. A portfolio manager or CIO asks a cross-asset question that cannot be answered responsibly from a single angle. The macro desk frames the regime and policy reaction. The rates/FX desk interprets the transmission mechanism through yields, curves, and currency channels. The equity desk translates the regime into valuation and earnings consequences. The credit desk tests funding conditions, refinancing risk, and spread dynamics. Then a senior person synthesizes these views into one coherent memo, and a risk or review function challenges it: “What are we assuming? What do we not know? What would falsify this thesis? What must be verified before we act?”

This notebook operationalizes that workflow. It doesn’t just produce a report; it demonstrates an **architecture**. The committee should evaluate the notebook in the same way they evaluate an operating model: Is the process legible? Are the controls real? Are the failure modes anticipated? Can we reproduce and audit what happened? Does the system behave like a disciplined team rather than a single opaque black box?

That is the purpose of this notebook: to show that AI can support institutional research by providing **structured coverage, faster iteration, and consistent governance**—without pretending to be a substitute for human judgment.

---

**What This Notebook Does in Plain Terms**

This notebook takes one cross-asset research question—“How should a multi-asset portfolio manager think about a sudden, persistent inflation surprise combined with tightening financial conditions?”—and runs a process that mirrors a real multi-desk workflow.

The process has five stages:

**1) Intake**  
The system receives a query and constraints. Constraints matter because a memo written for an institutional PM differs from a memo written for retail investors. In this notebook, constraints include the audience (“Institutional PM / CIO”), a conservative risk posture (“avoid unsupported claims”), a time horizon (“3–6 months”), and a deliverable style (“mechanism-first; explicit assumptions and open items”).

The point is not to “prompt cleverly.” The point is to define the operational frame so the system produces outputs that match how we work.

**2) Desk selection and routing**  
A real team does not always involve every desk. Some questions require macro + rates/FX only. Others require equity + credit. So the system performs a structured selection step and then routes the question through the relevant desks.

In our case, it chooses four desks:
- **MACRO**
- **RATES_FX**
- **EQUITY**
- **CREDIT**

Routing is not ad hoc. It is state-driven. The notebook maintains an explicit state variable called something like **pending_desks**. That list is the queue. Each desk runs exactly when it is at the front of the queue. When it finishes, it is removed from the queue. When the queue is empty, the process moves forward to synthesis. This is a real operational control: it prevents confusion about which steps have completed and why the system moved forward.

**3) Desk memo production**  
Each desk writes a memo in a structured format:
- a thesis,
- key points (mechanisms),
- risks,
- assumptions,
- open items,
- recommended next actions.

This is the heart of “multi-desk AI.” We are not asking one model to write one long memo. We are asking the system to produce separate, role-specific memos that reflect different institutional lenses. That reduces single-thread narrative drift and increases coverage.

It also improves auditability. If someone questions why the synthesis recommended a certain hedge, we can point to the rates/FX memo. If someone questions the rationale for reducing leverage, we can point to credit and liquidity risks. Each output is attributable to a named desk node.

**4) Synthesis**  
After all desk memos exist, the system produces a unified cross-asset synthesis. This is analogous to a senior research analyst combining desk notes into a coherent narrative:
- What is the shared view across desks?
- Where do desks diverge?
- What is unknown or unverified?
- What actions are plausible given the constraints?

Critically, the synthesis step is not meant to create “confidence theater.” It is meant to produce a memo that is explicit about uncertainty.

**5) Red team and supervisor gate**  
This is where the notebook becomes institutional rather than “AI demo.”

In a real firm, the most dangerous failure mode is not that we miss a detail. The most dangerous failure mode is that we convince ourselves of a story, with no explicit verification plan, and we act before we have measured what matters.

So the notebook adds two controls:

- **Red team:** A deliberate adversarial critique step. The red team attacks the synthesis: missing evidence, hidden assumptions, scenario gaps, ambiguous triggers, non-implementable recommendations. This step exists to make sure the system does not drift into unchallenged narrative.

- **Supervisor gate:** A final decision node that can set one of three outcomes: **APPROVE**, **REVISE**, or **REJECT**. This is a practical institutional control. It forces the workflow to end with an explicit gate, the way we end real research processes with a decision about readiness.

If the supervisor chooses **REVISE**, the workflow loops—but in a bounded way. It will rerun desks (or parts of the analysis) for a limited number of revision rounds, and then it will stop. This is not optional. It prevents infinite loops and ensures the system behaves predictably.

---

**Why We Built It This Way: Architecture Over Chat**

Most people’s mental model of AI in finance is “a chatbot that answers questions.” That is not how serious work gets done. Serious work gets done by workflows: people, roles, checks, routing, and artifacts.

This notebook uses LangGraph to explicitly model that workflow. In practical terms, LangGraph gives us something we do not get from a normal prompt:

**A visible topology.**  
We can draw the system’s process as a graph: intake → desk router → desks → synthesis → red team → supervisor → revision plan → end. That diagram is not decoration. It is a governance artifact. It tells us what the system is allowed to do, in what order, and under what conditions it stops.

**State-driven behavior.**  
The system moves because state changes, not because text “sounds right.” A desk runs because it is in the pending queue. Synthesis runs because the queue is empty. The process ends because termination_reason is set to a terminal value. This matters because it makes behavior predictable and testable.

**Bounded loops.**  
Revision loops are allowed, but bounded. This is exactly how we manage real research cycles: we allow iteration, but we do not allow indefinite churn. The notebook enforces that.

**Artifact export.**  
Every run writes artifacts: a run manifest (metadata), a graph spec (topology), and a final state (outputs + trace). This is the difference between “AI wrote something” and “we can audit what the AI did.”

---

**What the Committee Should Notice in the Output**

The report produced by this notebook looks like a multi-desk packet because it is constructed like one.

There are four independent desk memos. Each memo speaks in the language that desk would use. The synthesis consolidates those views and identifies shared mechanisms and disagreements. The red team explicitly challenges what is missing or weak. The supervisor gate tells us whether the memo is ready, and if not, what must be improved.

That structure is not cosmetic. It changes the risk profile of AI-assisted research. Instead of a single narrative, we get:
- separate perspectives,
- explicit gaps,
- explicit follow-ups,
- explicit readiness decisions.

If you think about committee review processes, this mirrors how we build confidence: not by saying “we are confident,” but by showing what we know, what we don’t know, and what we would check before acting.

---

**What This Notebook Does Not Do (And Why That Is Good)**

To present this responsibly, we must say what it does not do.

**It does not fetch real-time market data.**  
This notebook is about architecture, not about live data. In a production setting, we would add data connectors and validated datasets. Here we focus on the workflow and governance.

**It does not create trading instructions.**  
The deliverable is a research note with mechanisms, risks, and verification needs. It may suggest hedging categories or risk posture adjustments, but it does not execute trades. That boundary is intentional.

**It does not guarantee factual correctness.**  
No AI system should be treated as a source of truth. The notebook explicitly produces “unknowns” and “verification checklists.” That is the correct institutional stance: AI assists reasoning, humans verify facts and decide.

**It does not hide uncertainty.**  
The red team and supervisor functions exist precisely to surface uncertainty.

In other words, this system is designed to be useful in a real firm not because it pretends to be right, but because it is designed to be reviewable.

---

**How the System Avoids the Classic AI Failure Modes**

Most AI failures in professional settings come from predictable patterns:

**Failure mode 1: Narrative drift**  
A single model produces a coherent story that is not well grounded. Because it reads well, humans over-trust it.

**Notebook control:** multi-desk decomposition + red team challenge.  
Separate memos reduce single-thread drift. The red team forces explicit critique.

**Failure mode 2: Missing coverage**  
A single answer overlooks an important channel—credit transmission, FX spillovers, policy reaction asymmetry, liquidity constraints.

**Notebook control:** explicit desk roles + routing.  
Coverage is built into the topology.

**Failure mode 3: Non-auditable output**  
A chatbot answer cannot be traced. You cannot tell what steps happened or why.

**Notebook control:** trace log + final state + artifacts.  
We export the run manifest, graph topology, and final outputs. We also keep the trace.

**Failure mode 4: Endless iteration or premature stop**  
AI systems can loop without end or stop without justification.

**Notebook control:** bounded loops + termination reasons.  
Revision is allowed but capped. End conditions are explicit.

**Failure mode 5: False precision**  
The output sounds quantitative but is not tied to verifiable measures.

**Notebook control:** governance addons with trigger scaffolds and scenario scaffolds.  
Even if the model is qualitative, the system forces the structure of quantitative triggers and scenarios as “Not verified” placeholders. That is the right discipline: identify what must be quantified before action.

---

**Why This Matters for Decision-Making Under Uncertainty**

Your bosses will not adopt AI because it can write paragraphs. They will adopt AI if it improves one of the following:

- speed of coverage,
- consistency of process,
- clarity of assumptions,
- quality of review,
- auditability and governance,
- ability to scale across topics.

This notebook targets those.

In a real week, a CIO may ask multiple cross-asset questions: inflation, growth shocks, geopolitics, liquidity events, credit stress, FX spillovers. The bottleneck is often not intelligence; it is process capacity. Multi-desk coverage is expensive in time and coordination.

An AI orchestrator can help by doing a first structured pass:
- It produces desk-style memos quickly.
- It forces an explicit synthesis.
- It runs a red team critique automatically.
- It generates a verification checklist and scenario scaffold.
- It creates an audit trail.

That means a senior analyst can spend time where humans add the most value:
- selecting the right data to verify,
- evaluating the quality of the evidence,
- deciding which scenarios are material,
- translating into portfolio constraints and client mandates,
- judging risk appetite under governance.

This is the correct partnership model: AI as a process accelerator, not a decision-maker.

---

**How to Explain the Graph to the Committee Without Jargon**

When you present the notebook, you can describe the graph in one sentence:

**“We built a workflow that mimics a multi-desk research process: intake the question, route it to desks, synthesize outputs, red-team the synthesis, and then run a supervisor gate that either approves or sends it back for bounded revisions.”**

Then connect each node to an institutional role:

- **INTAKE:** research coordinator receiving the CIO question and constraints.
- **DESK_ROUTER:** operations logic deciding which desk runs next based on pending coverage.
- **MACRO / RATES_FX / EQUITY / CREDIT:** desk analysts producing memo components.
- **SYNTHESIS:** senior analyst drafting the cross-asset narrative.
- **RED_TEAM:** risk/review challenging the memo.
- **SUPERVISOR:** research head deciding approve/revise/reject.
- **REVISION_PLAN:** operational step to rerun a subset of desks under bounded iteration.
- **END:** explicit stop with a recorded termination reason.

This mapping is intuitive. It helps the committee see that the “AI system” is not a monolith. It is a structured process that resembles how they already operate.

---

**What “State” Means Here (In Business Terms)**

“State” is just the notebook’s way of keeping a disciplined case file. The state contains:

- the query and constraints,
- which desks were selected,
- which desks remain pending,
- the memos produced so far,
- the synthesis and red team critique,
- the supervisor decision,
- the revision round count,
- the trace of events.

In a firm, this is the equivalent of a shared folder or case-management system where you can see what’s done and what remains. The difference is that here the state directly controls execution. That is why behavior is predictable.

---

**What the Artifacts Mean for Governance**

The notebook exports three required artifacts plus the full report:

**run_manifest.json**  
This is the run metadata: when it ran, which model, what configuration, what environment versions. In a governance setting, this is essential. It lets us reproduce a run and explain differences over time.

**graph_spec.json**  
This is the workflow specification: which nodes exist, how they connect, and the Mermaid diagram. It is a control document: it defines what the system is allowed to do.

**final_state.json**  
This is the full case file: inputs, outputs, memos, synthesis, critique, decision, trace. This is what you would retain if you wanted to audit what was produced and why.

**research_report.txt**  
This is the human-readable printable memo pack.

If a committee member asks, “Can we show exactly what happened?” the answer is yes: we have a state file and a trace.

---

**How This Evolves Into Production**

This notebook is Notebook 10 in a sequence because it represents the most “institutional” topology. Earlier notebooks teach components: loops, refusal boundaries, critique cycles, tool-augmented nodes, regime machines, committee aggregation, hub-and-spoke drafting, document routing, and event-driven monitoring. Notebook 10 synthesizes these ideas into a supervised multi-agent orchestrator.

To move from this architecture into production, we would do three things:

**1) Add validated data inputs**  
Desk memos would reference specific datasets (inflation measures, financial conditions indices, curves, spreads, earnings revisions). The “trigger scaffold” would be filled with real metrics, thresholds, and alert logic.

**2) Harden controls**  
We would formalize “verification status” fields and require the system to tag claims as supported vs assumed vs open. We would add policy constraints for what the system can recommend.

**3) Implement human sign-off**  
The supervisor decision in production would be a human gate. The AI can propose revise/approve recommendations, but the decision to distribute or act belongs to the human governance chain.

Those upgrades are straightforward because the architecture already separates roles, routing, and artifacts. The notebook is intentionally built to make that progression natural.

---

**How to Position This to the Committee**

If you want to be direct with your bosses, position the notebook as:

**A repeatable research process that scales coverage, improves structure, and produces audit artifacts.**

It is not a “prediction engine.” It is not “automated portfolio management.” It is a disciplined workflow that produces a research pack and forces explicit verification needs.

In other words:

**This notebook shows how AI can behave like a multi-desk team, not by being smarter than humans, but by being organized like humans.**

That is the core claim. And it is exactly what you can show in the notebook: the visible graph, the desk memos, the synthesis, the red team critique, the supervisor gate, and the exported audit artifacts.

---

**Closing: The Value Proposition in One Paragraph**

The value of this notebook is operational. It demonstrates that AI can accelerate the front end of institutional research: quickly generating desk-style coverage, producing a structured synthesis, forcing adversarial critique, and producing a verification checklist—while keeping the process bounded, inspectable, and auditable. The committee does not need to believe AI is “right.” They only need to see that the system produces better structure, faster coverage, and stronger governance around uncertainty. That is what this notebook delivers, and that is why the architecture is the primary learning artifact—not the prose of any single memo.


##1.LIBRARIES AND ENVIRONMENT

**CELL 1 — Environment setup, dependency hygiene, and reproducibility guardrails**

Cell 1 exists to solve a problem that looks “boring” but is responsible for many notebook failures in front of committees: dependency conflicts. Google Colab comes with preinstalled libraries, and those preinstalled versions can change over time. If we import libraries carelessly, we can trigger subtle runtime errors—exactly like the `langchain.debug` issue you encountered. So Cell 1 is deliberately written as a defensive foundation: it installs only what we need, pins versions where appropriate, avoids unnecessary upgrades of packages that Colab depends on, and prints an environment fingerprint so we can audit what ran.

Pedagogically, this is the first lesson of institutional AI: before we talk about “agents,” we need **reproducibility** and **inspectability**. If the committee asks “Could we reproduce this?” we should be able to point to a deterministic seed, stable package versions, and a logged environment. This is why we set random seeds, stabilize hashing where possible, and record versions for `langgraph`, `langchain-core`, and `anthropic`. These choices make later results attributable to a controlled configuration rather than to random drift.

Cell 1 also establishes the global imports that every later cell relies on: typing utilities (`TypedDict`, `Literal`), standard libraries (`json`, `hashlib`, `uuid`, `datetime`), and any display utilities required for Mermaid visualization. The goal is separation of concerns: later cells should focus on architecture (state, nodes, routing) rather than re-declaring low-level utilities.

Finally, Cell 1 is where we align to the project’s security and operational requirements. The API key is retrieved via `userdata.get("ANTHROPIC_API_KEY")` (all caps), which is the Colab-safe pattern that avoids hardcoding credentials. In short: Cell 1 is the notebook’s “operations layer.” It is not glamorous, but it is what makes the rest of the system trustworthy in a professional setting.


In [1]:
# CELL 1/10 — Colab-safe bootstrap (conflict-avoiding) + version locks + legacy shims
# Goal: avoid Colab preinstall collisions (esp. langchain/langchain_core globals) and keep runtime stable.

import sys, subprocess, os, platform, json, hashlib, uuid, time
import datetime as _dt
from typing import TypedDict, Literal, Dict, Any, List, Optional, Callable, Tuple

os.environ["PYTHONHASHSEED"] = "7"

def _pip_install(pkgs: List[str]) -> None:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "--upgrade"] + pkgs)

def _version(pkg: str) -> str:
    try:
        import importlib.metadata as md
        return md.version(pkg)
    except Exception:
        return "missing"

def _clear_modules(prefixes: List[str]) -> None:
    # Remove already-imported modules to ensure pins take effect without restarting runtime.
    for m in list(sys.modules.keys()):
        if any(m == p or m.startswith(p + ".") for p in prefixes):
            del sys.modules[m]

# --- Hard pins for this project (KNOWN COMPATIBILITY SET) ---
# We pin langchain + langchain-core together to avoid the `langchain.debug` AttributeError in Colab.
PINS = {
    "langgraph": "0.2.39",
    "langchain": "0.3.14",
    "langchain-core": "0.3.40",
    "anthropic": "0.34.0",      # minimum; allow >=
    "packaging": "24.2",        # stable Version parsing
}

# Install only what is missing/mismatched to reduce churn
need = []
if _version("langgraph") != PINS["langgraph"]:
    need.append(f"langgraph=={PINS['langgraph']}")
if _version("langchain") != PINS["langchain"]:
    need.append(f"langchain=={PINS['langchain']}")
if _version("langchain-core") != PINS["langchain-core"]:
    need.append(f"langchain-core=={PINS['langchain-core']}")

if _version("packaging") != PINS["packaging"]:
    need.append(f"packaging=={PINS['packaging']}")

# anthropic: accept >= minimum (install if missing)
if _version("anthropic") == "missing":
    need.append(f"anthropic>={PINS['anthropic']}")

if need:
    _pip_install(need)

# Clear cached modules so newly installed versions are used even without runtime restart
_clear_modules(["langchain", "langchain_core", "langgraph", "anthropic"])

# Imports AFTER pins
from packaging.version import Version  # noqa
import langchain  # noqa

# Legacy globals expected by some langchain_core code paths (Colab sometimes ships a variant lacking these)
if not hasattr(langchain, "debug"):
    langchain.debug = False
if not hasattr(langchain, "verbose"):
    langchain.verbose = False

from google.colab import userdata  # noqa
from IPython.display import HTML, display  # noqa

from langgraph.graph import StateGraph, END  # noqa

print("BOOTSTRAP_OK:", {
    "python": sys.version.split()[0],
    "platform": platform.platform(),
    "langgraph": _version("langgraph"),
    "langchain": _version("langchain"),
    "langchain-core": _version("langchain-core"),
    "anthropic": _version("anthropic"),
    "langchain.debug": getattr(langchain, "debug", None),
    "langchain.verbose": getattr(langchain, "verbose", None),
})


BOOTSTRAP_OK: {'python': '3.12.12', 'platform': 'Linux-6.6.105+-x86_64-with-glibc2.35', 'langgraph': '0.2.39', 'langchain': '0.3.14', 'langchain-core': '0.3.40', 'anthropic': '0.82.0', 'langchain.debug': False, 'langchain.verbose': False}


##2.CONFOGURATION

###2.1.OVERVIEW

**CELL 2 — Configuration and explicit state schema (TypedDict) as the contract**

Cell 2 defines the notebook’s operating contract. In this project, the contract is explicit: the system is state-driven, auditable, deterministic, and modular. To enforce that, we separate “configuration” from “state.” Configuration is what we choose before the run (model lock, maximum steps, maximum revision rounds, desk order, retry policies). State is what changes during the run (pending desks, completed desks, memos produced, synthesis, red team critique, supervisor decision, termination reason, trace log).

The most important artifact in this cell is the **TypedDict state schema**. A TypedDict turns the state into a structured object with named fields. This is not just style. It is a governance control. It prevents hidden variables and undocumented behavior. Every node reads from and writes to the same declared state object. That means we can audit: “What did the Macro desk see?” “What did Synthesis use?” “Why did Supervisor decide REVISE?” The answers are in the state.

In institutional workflows, ambiguity is risk. Typed state reduces ambiguity. It also supports modularity: if we add another desk later (for example, “Commodities” or “EM”), we extend the state schema predictably. If we add a new control (for example, “evidence_tags”), we add it as a state field, and all nodes can adopt it without hacks.

Cell 2 also declares the enumerations and controlled vocabularies: desk names, termination reasons, and supervisor decisions. This matters because routing depends on these values. We want routing to be deterministic and machine-checkable, not dependent on free-form text.

Finally, Cell 2 is where we define bounded-loop parameters: maximum total steps and maximum revision rounds. Those caps are essential to avoid infinite loops and to satisfy LangGraph recursion constraints. In short: Cell 2 defines the “rules of the game” before the game starts. It makes the notebook professional because it makes the system explicit.


###2.2.CODE AND IMPLEMENTATION

In [2]:
# CELL 2/10 — Configuration + explicit TypedDict state + deterministic state factory

class CFG(TypedDict):
    model: str
    max_revision_rounds: int
    desk_order: List[str]  # deterministic desk ordering
    max_total_steps: int   # bounded safety net
    temperature: float
    max_tokens: int

CFG_DEFAULT: CFG = {
    "model": "claude-haiku-4-5-20251001",
    "max_revision_rounds": 2,
    "desk_order": ["MACRO", "EQUITY", "CREDIT", "RATES_FX"],
    "max_total_steps": 30,
    "temperature": 0.2,
    "max_tokens": 900,
}

DeskName = Literal["MACRO", "EQUITY", "CREDIT", "RATES_FX"]
SupervisorDecision = Literal["APPROVE", "REVISE", "REJECT"]

class ResearchState(TypedDict):
    # Inputs
    query: str
    constraints: Dict[str, Any]

    # Orchestration
    selected_desks: List[DeskName]
    pending_desks: List[DeskName]
    completed_desks: List[DeskName]
    revision_round: int
    steps_executed: int

    # Work products
    desk_memos: Dict[DeskName, Dict[str, Any]]
    synthesis: Dict[str, Any]
    red_team: Dict[str, Any]
    supervisor: Dict[str, Any]

    # Controls / auditability
    termination_reason: str
    trace: List[Dict[str, Any]]
    errors: List[Dict[str, Any]]

def utc_now_iso() -> str:
    return _dt.datetime.now(_dt.timezone.utc).isoformat()

def state_init(query: str, constraints: Optional[Dict[str, Any]] = None) -> ResearchState:
    c = constraints or {}
    return {
        "query": query,
        "constraints": c,

        "selected_desks": [],
        "pending_desks": [],
        "completed_desks": [],
        "revision_round": 0,
        "steps_executed": 0,

        "desk_memos": {},
        "synthesis": {},
        "red_team": {},
        "supervisor": {},

        "termination_reason": "",
        "trace": [{"ts_utc": utc_now_iso(), "event": "INIT", "query": query, "constraints": c}],
        "errors": [],
    }

print("CFG_DEFAULT:", json.dumps(CFG_DEFAULT, indent=2))
print("STATE_KEYS:", list(state_init("demo").keys()))


CFG_DEFAULT: {
  "model": "claude-haiku-4-5-20251001",
  "max_revision_rounds": 2,
  "desk_order": [
    "MACRO",
    "EQUITY",
    "CREDIT",
    "RATES_FX"
  ],
  "max_total_steps": 30,
  "temperature": 0.2,
  "max_tokens": 900
}
STATE_KEYS: ['query', 'constraints', 'selected_desks', 'pending_desks', 'completed_desks', 'revision_round', 'steps_executed', 'desk_memos', 'synthesis', 'red_team', 'supervisor', 'termination_reason', 'trace', 'errors']


##3.LLM CLIENT

###3.1.OVERVIEW

**CELL 3 — Client creation, LLM call wrapper, and structured JSON enforcement**

Cell 3 builds the “tooling layer” that every agent node uses to communicate with the model. The committee should understand this cell as the notebook’s equivalent of a firm’s internal research platform API: it sets the rules for how requests are made, how responses are validated, and how failures are handled.

The first function is the client initializer: it retrieves `ANTHROPIC_API_KEY` from Colab secrets and creates a client. This enforces security hygiene: the key never appears in code or output. It also centralizes configuration such as timeouts and model name, which prevents accidental substitution of the model (a core project lock).

The second key component is the LLM wrapper that enforces **structured JSON outputs**. This is essential because unstructured text is hard to route, hard to validate, and easy to “sound good” while missing required fields. In this notebook, desks must return a memo object; synthesis must return consensus/divergences/unknowns/actions; red team must return critique/gaps/questions; supervisor must return decision/followups/rerun desks. If the model returns malformed JSON, the wrapper retries with a bounded policy, and if it still fails, we record an error and trigger a deterministic fallback.

This cell addresses the practical failure mode you encountered earlier: “unbalanced JSON delimiters.” The fix is not to “hope the model behaves.” The fix is to enforce structured output via the API, validate parseability, retry deterministically, and fail safely.

Cell 3 also establishes trace and error logging utilities, because professional AI systems must be able to explain what happened. When the model fails, we record node name, error type, and timestamp. That trace is later exported in `final_state.json`.

In short: Cell 3 is where we make the system robust. Without this cell, the notebook might work “most of the time,” but not reliably enough to present to a committee. With it, we have controlled behavior under success and under failure.


###3.2.CODE AND IMPLEMENTATION

In [24]:
# CELL 3/10 — Anthropic client + "function-style" JSON mode via tool_use (NO fragile parsing)
# This removes 99% of JSON delimiter failures by asking the model to call a tool with JSON args.

from anthropic import Anthropic

def get_client() -> Anthropic:
    key = userdata.get("ANTHROPIC_API_KEY")  # ALL CAPS (project lock)
    if not key or not isinstance(key, str) or len(key.strip()) < 10:
        raise RuntimeError("Missing/invalid ANTHROPIC_API_KEY in Colab secrets (userdata).")
    return Anthropic(api_key=key.strip())

# One universal tool: model returns structured JSON as tool arguments.
JSON_TOOL = {
    "name": "return_json",
    "description": "Return a single JSON object that matches the requested schema.",
    "input_schema": {
        "type": "object",
        "additionalProperties": True
    }
}

def llm_tool_json(
    client: Anthropic,
    cfg: CFG,
    system: str,
    user: str,
) -> Dict[str, Any]:
    """
    Preferred: enforce structured output using tool_use.
    If the model fails to call the tool, we raise loudly (caller may fallback).
    """
    msg = client.messages.create(
        model=cfg["model"],
        temperature=cfg["temperature"],
        max_tokens=cfg["max_tokens"],
        system=system,
        tools=[JSON_TOOL],
        tool_choice={"type": "tool", "name": "return_json"},
        messages=[{"role": "user", "content": user}],
    )

    for block in msg.content:
        if getattr(block, "type", None) == "tool_use" and getattr(block, "name", None) == "return_json":
            # block.input is already a dict (no parsing)
            return dict(block.input)

    raise ValueError("Model did not return tool_use for return_json")

SYSTEM_BASE = (
    "You are a finance research desk agent in an institutional setting. "
    "Never invent facts or sources. Tag uncertainty explicitly. "
    "If information is missing, list precise open_items."
)

print("CELL3_OK: tool_use JSON mode enabled (parsing eliminated).")


CELL3_OK: tool_use JSON mode enabled (parsing eliminated).


##4.AGENT NODE

###4.1.OVERVIEW

**CELL 4 — Visualization standard: hardened Mermaid renderer + graph display utility**

Cell 4 exists because architecture is the core learning objective of this project, and architecture must be visible. In a committee setting, a graph diagram is also your best narrative tool: it shows, at a glance, that the system is not a single chatbot response but a supervised workflow with desk routing, synthesis, critique, and an explicit end state.

This cell implements the “Visualization Standard v1”: a hardened Mermaid ESM renderer pinned to a specific Mermaid version (10.6.1 unless changed). Pinning matters. Mermaid rendering can break if the CDN changes or if syntax expectations shift. By pinning the version and using an ESM import approach that is stable in Colab, we reduce the chance that your demo fails due to visualization issues.

The cell defines a function like `display_langgraph_mermaid(graph_or_app)` which extracts Mermaid code from the compiled graph and renders it in notebook output. This is not cosmetic. It is a governance artifact: the diagram must match topology exactly, and the diagram is exported as part of the graph spec. That means if someone reviews the system later, they can see the same topology you presented.

Pedagogically, this cell teaches a crucial point: multi-agent systems must be understood as systems, not as prompts. Visualizing the topology forces the reader to think in terms of routing, conditional edges, bounded loops, and stop conditions. It also helps explain the role of controls like the supervisor gate: the supervisor is not “just another agent,” it is a structural checkpoint.

Finally, in institutional settings, “explainability” often means “process explainability.” A diagram is the simplest way to show process explainability. Cell 4 makes that possible, reliably, in Colab.


###4.2.CODE AND IMPLEMENTATION

In [25]:
# CELL 4/10 — AgentNode abstraction + trace helpers (small, composable, testable)

def _sha256_str(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8")).hexdigest()

def trace_add(state: ResearchState, node: str, detail: Dict[str, Any]) -> None:
    state["trace"].append({
        "ts_utc": utc_now_iso(),
        "node": node,
        **detail
    })

def safe_error(state: ResearchState, node: str, err: Exception) -> None:
    rec = {"ts_utc": utc_now_iso(), "node": node, "error_type": type(err).__name__, "error": str(err)}
    state["errors"].append(rec)
    trace_add(state, node, {"event": "ERROR", **rec})

class AgentNode:
    """
    Minimal wrapper: name + pure function(state)->state.
    The LangGraph node uses the function; state transitions remain explicit.
    """
    def __init__(self, name: str, fn: Callable[[ResearchState, CFG, Anthropic], ResearchState]):
        self.name = name
        self.fn = fn

    def __call__(self, state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
        state["steps_executed"] += 1
        trace_add(state, self.name, {"event": "ENTER", "steps_executed": state["steps_executed"]})
        if state["steps_executed"] > cfg["max_total_steps"]:
            state["termination_reason"] = "MAX_TOTAL_STEPS_REACHED"
            trace_add(state, self.name, {"event": "FORCE_END", "reason": state["termination_reason"]})
            return state
        try:
            out = self.fn(state, cfg, client)
            trace_add(out, self.name, {"event": "EXIT"})
            return out
        except Exception as e:
            safe_error(state, self.name, e)
            state["termination_reason"] = "ERROR"
            return state

print("AgentNode abstraction ready.")


AgentNode abstraction ready.


##5.VISUALIZATION CONFIGURATION

###5.1.OVERVIEW

**CELL 5 — AgentNode abstraction and desk memo node implementations**

Cell 5 is where “agents” become real engineering components. Instead of treating each agent as an ad hoc prompt, we define an **AgentNode abstraction**: a small object that has a name and a callable function that takes state and returns updated state. This is the foundation of modularity. Every node has the same interface, which allows the graph to connect nodes predictably and allows us to test them in isolation if needed.

The next part of the cell defines the desk memo nodes: Macro, Rates/FX, Equity, and Credit. Each desk node is responsible for producing a structured memo. The important design point is that each memo is produced from the same query and constraints, but through a desk-specific lens. This reduces “single narrative” risk and increases coverage.

Each desk memo follows a schema: thesis, key_points, risks, assumptions, open_items, recommended_next_actions. That schema is deliberate. It mirrors how a real desk writes: start with a view, then mechanisms, then what could go wrong, then what is assumed, then what is unknown, then what to do next. For a committee, that structure is reassuring because it resembles familiar internal research formats.

The cell also enforces auditability: each desk memo is stored under `state["desk_memos"][DESK_NAME]`. That means we can trace exactly which desk contributed what content to the final synthesis.

Finally, desk nodes update the routing variables: they remove themselves from `pending_desks` and append to `completed_desks`. That is not just bookkeeping. It is how the router knows what to run next. This is the key architectural point: **state drives routing**. The system does not “decide” based on vibes; it moves because the pending queue changes.

In short: Cell 5 operationalizes desk specialization and makes it composable, testable, and auditable.


###5.2.CODE AND IMPLEMENTATION

In [26]:
# CELL 5/10 — Hardened Colab Mermaid ESM renderer (pinned) + display_langgraph_mermaid(graph)

MERMAID_VERSION = "10.6.1"

def render_mermaid_locally(mermaid_code: str) -> None:
    diagram_id = f"mermaid-{uuid.uuid4().hex[:10]}"
    html = f"""
    <div style="border:1px solid rgba(0,0,0,0.1); border-radius:10px; padding:12px; overflow-x:auto;">
      <pre class="mermaid" id="{diagram_id}">{mermaid_code}</pre>
    </div>
    <script type="module">
      import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@{MERMAID_VERSION}/dist/mermaid.esm.min.mjs";
      mermaid.initialize({{
        startOnLoad: true,
        securityLevel: "loose",
        theme: "default",
        flowchart: {{ curve: "basis" }},
      }});
      mermaid.run({{ querySelector: "#{diagram_id}" }});
    </script>
    """
    display(HTML(html))

def display_langgraph_mermaid(compiled_graph) -> str:
    """
    Returns the Mermaid string and renders it in Colab.
    """
    mermaid = compiled_graph.get_graph().draw_mermaid()
    render_mermaid_locally(mermaid)
    return mermaid

print("Mermaid renderer ready. Version pinned:", MERMAID_VERSION)


Mermaid renderer ready. Version pinned: 10.6.1


##6.DESK AGENTS

###6.1.OVERVIEW

**CELL 6 — Intake and desk router nodes: selection, queueing, and bounded desk loop**

Cell 6 implements the operational core of multi-desk orchestration: it decides which desks will run, and it manages the queue that drives execution. In a real firm, this is the equivalent of research coordination: assigning tasks and tracking completion.

The Intake node reads the query and constraints and selects desks. In this notebook, the selection can be rule-based (for determinism) or model-assisted (for flexibility), but it must always write results into state: `selected_desks`, `pending_desks`, and `completed_desks`. The design objective is clarity: at any time, we can print the state and know exactly what is happening.

The Desk Router node is intentionally simple: it does not do analysis. Its job is to look at the queue and decide the next step. If there are pending desks, it routes to the next desk. If there are none, it routes to Synthesis. If a terminal condition exists (error, max steps, approval), it routes to END.

This is where bounded loops matter. The desk loop is bounded because it consumes the queue. A bounded loop is one that must terminate because its control variable monotonically decreases. Here, each desk completion reduces `pending_desks` by one. That’s a provable termination argument, and it’s exactly the kind of logic committees appreciate.

Cell 6 also contributes to robustness by updating trace logs. Each entry records node entry/exit and routing decisions. That trace is exported, which means we can explain behavior after the fact.

Overall, this cell is the “orchestration layer.” It is not glamorous, but it is what transforms a collection of desk agents into a controlled multi-step workflow that behaves like a real institutional process.


###6.2.CODE AND IMPLEMENTATION

In [27]:
# CELL 6/10 — Desk agents rewritten to use tool_use JSON mode + fail-safe per-desk (no run-kill on one bad desk)

def n_intake(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    system = SYSTEM_BASE + " Role: INTAKE. Choose which desks to engage (from allowed_desks)."
    payload = {
        "task": "Select desks for this research query",
        "query": state["query"],
        "constraints": state["constraints"],
        "allowed_desks": cfg["desk_order"],
        "output_schema": {
            "selected_desks": "array of desk codes from allowed_desks",
            "scope": "short statement of scope",
            "open_items": "array of questions needed to proceed safely",
        },
    }
    try:
        obj = llm_tool_json(client, cfg, system, json.dumps(payload))
        raw = obj.get("selected_desks", [])
        selected = [d for d in raw if d in cfg["desk_order"]] if isinstance(raw, list) else []
        selected = [d for d in cfg["desk_order"] if d in selected]
        if not selected:
            selected = cfg["desk_order"][:2]
        trace_add(state, "INTAKE", {"event": "TOOL_JSON_OK", "scope": obj.get("scope", ""), "selected": selected})
    except Exception as e:
        safe_error(state, "INTAKE", e)
        selected = cfg["desk_order"][:]  # conservative deterministic fallback
        trace_add(state, "INTAKE", {"event": "FALLBACK_SELECTED_DESKS", "selected_desks": selected})

    state["selected_desks"] = selected  # type: ignore
    state["pending_desks"] = selected.copy()  # type: ignore
    trace_add(state, "INTAKE", {"event": "SELECTED_DESKS", "selected_desks": selected})
    return state

def n_desk_router(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    trace_add(state, "DESK_ROUTER", {
        "event": "ROUTE",
        "pending": state["pending_desks"],
        "completed": state["completed_desks"],
        "termination_reason": state["termination_reason"],
    })
    return state

def _desk_payload(desk: DeskName, query: str, constraints: Dict[str, Any]) -> Dict[str, Any]:
    return {
        "desk": desk,
        "query": query,
        "constraints": constraints,
        "deliverable": "Produce a desk memo: mechanisms, risks, assumptions, open_items, next actions.",
        "output_schema": {
            "thesis": "1-2 sentences",
            "key_points": "array of 5-8 bullets",
            "risks": "array of risks",
            "assumptions": "array of explicit assumptions",
            "open_items": "array of verification questions",
            "recommended_next_actions": "array of actions for the supervisor",
        },
    }

def _run_desk(state: ResearchState, cfg: CFG, client: Anthropic, desk: DeskName) -> ResearchState:
    system = SYSTEM_BASE + f" Role: {desk} desk."
    payload = _desk_payload(desk, state["query"], state["constraints"])

    try:
        memo = llm_tool_json(client, cfg, system, json.dumps(payload))
        state["desk_memos"][desk] = memo
        trace_add(state, desk, {"event": "MEMO_OK"})
    except Exception as e:
        # Desk-level failure should not kill the whole run: record error + produce a minimal memo.
        safe_error(state, desk, e)
        memo = {
            "thesis": "",
            "key_points": [],
            "risks": ["Desk output failed; treat as missing coverage."],
            "assumptions": [],
            "open_items": ["Rerun desk; verify tool_use response integrity."],
            "recommended_next_actions": ["Escalate to supervisor; consider rerun."],
            "_status": "DESK_FAILED",
        }
        state["desk_memos"][desk] = memo
        trace_add(state, desk, {"event": "MEMO_FALLBACK"})

    # Progress the deterministic queue regardless
    if desk in state["pending_desks"]:
        state["pending_desks"] = [d for d in state["pending_desks"] if d != desk]
    if desk not in state["completed_desks"]:
        state["completed_desks"].append(desk)
    trace_add(state, desk, {"event": "MEMO_COMPLETE", "remaining_pending": state["pending_desks"]})
    return state

def n_macro(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    return _run_desk(state, cfg, client, "MACRO")

def n_equity(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    return _run_desk(state, cfg, client, "EQUITY")

def n_credit(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    return _run_desk(state, cfg, client, "CREDIT")

def n_rates_fx(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    return _run_desk(state, cfg, client, "RATES_FX")

INTAKE = AgentNode("INTAKE", n_intake)
DESK_ROUTER = AgentNode("DESK_ROUTER", n_desk_router)
MACRO = AgentNode("MACRO", n_macro)
EQUITY = AgentNode("EQUITY", n_equity)
CREDIT = AgentNode("CREDIT", n_credit)
RATES_FX = AgentNode("RATES_FX", n_rates_fx)

print("CELL6_OK: tool_use JSON + desk-level fail-safe (no delimiter parsing).")


CELL6_OK: tool_use JSON + desk-level fail-safe (no delimiter parsing).


##7.SYNTHESIS, RED TEAM AND SUPERVISOR

###7.1.OVERVIEW

**CELL 7 — Synthesis, red team, supervisor, and revision planning (governance layer)**

Cell 7 is where the notebook becomes “institutional-grade” rather than “AI demo.” Multi-desk coverage is valuable, but without review controls it can still drift into persuasive but unverified narratives. Cell 7 adds governance explicitly.

First, Synthesis takes the desk memos and produces a cross-asset summary with structured sections: consensus, divergences, unknowns, actionable recommendations, and confidence. This mirrors what a senior analyst does: not repeating desk notes, but integrating them into a coherent picture.

Second, Red Team is an adversarial check. It does not produce new analysis; it challenges the synthesis: missing evidence, hidden assumptions, scenario gaps, and lack of implementable triggers. This is critical because the most common failure mode in AI-assisted research is overconfidence driven by narrative fluency. Red teaming forces the system to surface what must be verified before action.

Third, Supervisor is a gate: APPROVE, REVISE, or REJECT. This is not a rhetorical flourish. It is an operational control. If the output is not ready, the system must say so explicitly, and it must say what to do next.

Fourth, Revision Plan updates state to rerun specific desks under bounded iteration. This makes the process realistic. In real research, you rarely accept a first draft. You revise, but you also cannot revise forever. The notebook therefore caps revision rounds and sets explicit termination reasons such as MAX_REVISION_ROUNDS_REACHED.

A key engineering detail is “normalization”: if the model returns incomplete content, the cell fills minimum required sections deterministically. This prevents empty reports and ensures the output always contains the governance structures the committee expects: verification checklist, trigger scaffolds, and scenario scaffolds labeled as “Not verified.”

In short: Cell 7 is the governance heart of the notebook. It demonstrates that AI can support research while still respecting institutional controls.


###7.2.CODE AND IMPLEMENTATION

In [43]:
# CELL 7/10 — Hardened synthesis/red-team/supervisor with MINIMUMS + string cleaning + deterministic governance addons

# -----------------------------
# Helpers: cleaning + padding
# -----------------------------
def _clean_str(x) -> str:
    s = x if isinstance(x, str) else ""
    s = s.strip()
    if s.lower() in ["", "none", "n/a", "na", "(none)", "(empty)"]:
        return ""
    return s

def _ensure_list(x) -> List[Any]:
    return x if isinstance(x, list) else []

def _pad_list(items: List[Any], min_len: int, filler: List[str]) -> List[Any]:
    out = list(items or [])
    i = 0
    while len(out) < min_len and i < len(filler):
        out.append(filler[i])
        i += 1
    return out

# -----------------------------
# Normalizers: enforce minimums
# -----------------------------
def _normalize_synthesis(syn: Dict[str, Any]) -> Dict[str, Any]:
    syn = dict(syn or {})

    syn["executive_summary"] = _clean_str(syn.get("executive_summary"))
    syn["confidence"] = _clean_str(syn.get("confidence"))

    consensus = _ensure_list(syn.get("consensus"))
    divergences = _ensure_list(syn.get("divergences"))
    unknowns = _ensure_list(syn.get("unknowns"))
    actions = _ensure_list(syn.get("actionable_recommendations"))

    # Minimums with deterministic padding (prevents empty report sections)
    syn["consensus"] = _pad_list(
        consensus, 3,
        [
            "Real-yield shock can hurt both duration and equities simultaneously; correlation regime may flip.",
            "Financial conditions tightening propagates into credit spreads, liquidity, and refinancing risk.",
            "Policy reaction function and credibility dominate tails; positioning must be conditional on verification.",
        ],
    )
    syn["divergences"] = _pad_list(
        divergences, 2,
        [
            "Timing: credit defaults lag tightening (6–12 months) vs. rates/FX reprices immediately via real yields.",
            "Severity: ‘stagflation-lite’ vs. hard-landing depends on policy overshoot and earnings resilience.",
        ],
    )
    syn["unknowns"] = _pad_list(
        unknowns, 3,
        [
            "Inflation persistence: wage growth, services inflation, and breakeven dynamics.",
            "Tightening depth: spreads, bank lending standards, funding stress indicators.",
            "Earnings pass-through: pricing power vs. demand destruction and margin compression.",
        ],
    )
    syn["actionable_recommendations"] = _pad_list(
        actions, 4,
        [
            "Define trigger levels for inflation persistence and tighten/loosen stance based on observed prints and expectations.",
            "Reduce leverage and liquidity risk; pre-plan de-risking paths under widening spreads / vol spikes.",
            "Hedge duration and inflation tail risks with explicit sizing; prefer convex hedges where feasible.",
            "Bias toward quality balance sheets and shorter refinancing needs; avoid ‘BBB cliff’ concentrations.",
        ],
    )

    # Ensure executive summary exists (last-resort fallback)
    if not syn["executive_summary"]:
        syn["executive_summary"] = (
            "Cross-asset view: persistent inflation surprise plus tightening financial conditions is a real-yield shock regime "
            "that can compress both duration assets and equities while widening credit spreads and impairing liquidity. "
            "Actionability depends on verifying inflation persistence, the degree of tightening, and central bank reaction."
        )

    if not syn["confidence"]:
        syn["confidence"] = (
            "medium: coherent cross-desk mechanism, but requires verification of inflation persistence, policy credibility, and earnings resilience"
        )

    return syn

def _normalize_red_team(rt: Dict[str, Any]) -> Dict[str, Any]:
    rt = dict(rt or {})

    critique = _ensure_list(rt.get("critique"))
    gaps = _ensure_list(rt.get("evidence_gaps"))
    attacks = _ensure_list(rt.get("assumption_attacks"))
    questions = _ensure_list(rt.get("questions_to_resolve"))
    sev = _clean_str(rt.get("severity"))

    rt["critique"] = _pad_list(
        critique, 3,
        [
            "The synthesis lacks quantitative triggers (what magnitude/duration defines ‘persistent’ inflation surprise?).",
            "Portfolio guidance is under-specified (hedge ratios, sizing rules, and liquidity constraints not defined).",
            "Scenario mapping is incomplete: policy pivot vs policy overshoot can invert cross-asset relationships.",
        ],
    )
    rt["evidence_gaps"] = _pad_list(
        gaps, 3,
        [
            "No explicit measures of financial conditions (spreads, lending standards, funding markets) were specified.",
            "No explicit inflation persistence checks (wage trackers, services inflation, expectations) were specified.",
            "No explicit stress test of correlation regime shift / forced deleveraging channels was specified.",
        ],
    )
    rt["assumption_attacks"] = _pad_list(
        attacks, 2,
        [
            "Assumes diversification breakdown is sustained; could revert if recession dominates and inflation collapses.",
            "Assumes central bank credibility is the key; supply shocks or fiscal dominance can override tightening.",
        ],
    )
    rt["questions_to_resolve"] = _pad_list(
        questions, 3,
        [
            "Which inflation metrics define ‘persistent surprise’ over the next 3–6 months?",
            "Which financial conditions indicators define ‘tightening’ and at what thresholds?",
            "What sizing/hedging framework is acceptable given liquidity, drawdown, and governance limits?",
        ],
    )

    rt["severity"] = sev or (
        "medium: plausible mechanism, but action requires triggers, sizing rules, and explicit scenario mapping"
    )
    return rt

def _normalize_supervisor(sup: Dict[str, Any], cfg: CFG, selected_desks: List[DeskName]) -> Dict[str, Any]:
    sup = dict(sup or {})

    dec = sup.get("decision", "REVISE")
    if dec not in ["APPROVE", "REVISE", "REJECT"]:
        dec = "REVISE"
    sup["decision"] = dec
    sup["reason"] = _clean_str(sup.get("reason"))

    sup["required_followups"] = _ensure_list(sup.get("required_followups"))
    sup["which_desks_to_rerun"] = _ensure_list(sup.get("which_desks_to_rerun"))

    if dec == "REVISE" and not sup["required_followups"]:
        sup["required_followups"] = [
            "Add quantitative triggers: define thresholds for inflation persistence and tightening financial conditions.",
            "Add scenario matrix (policy overshoot vs pivot vs soft landing vs stagflation) with expected cross-asset signs.",
            "Add portfolio implementation guidance: hedge ratios/sizing rules + liquidity constraints + stop conditions.",
            "Add verification checklist: what must be checked weekly before changing risk posture.",
        ]

    rerun = [d for d in sup["which_desks_to_rerun"] if d in cfg["desk_order"]]
    if dec == "REVISE" and not rerun:
        rerun = selected_desks[:] if selected_desks else cfg["desk_order"][:]
    sup["which_desks_to_rerun"] = rerun

    if not sup["reason"]:
        sup["reason"] = "Conservative revision: require quantitative triggers, scenario mapping, and implementable sizing rules."

    return sup

# -----------------------------
# Deterministic governance addons (no extra model calls)
# -----------------------------
def derive_governance_addons(state: ResearchState) -> None:
    desk_memos = state.get("desk_memos", {}) or {}
    syn = state.get("synthesis", {}) or {}
    rt = state.get("red_team", {}) or {}

    checklist = []
    for d in CFG_DEFAULT["desk_order"]:
        m = desk_memos.get(d, {}) or {}
        for x in _ensure_list(m.get("open_items")):
            if isinstance(x, str) and x.strip():
                checklist.append(x.strip())

    for x in _ensure_list(syn.get("unknowns")):
        if isinstance(x, str) and x.strip():
            checklist.append(x.strip())

    for x in _ensure_list(rt.get("questions_to_resolve")):
        if isinstance(x, str) and x.strip():
            checklist.append(x.strip())

    seen = set()
    dedup = []
    for x in checklist:
        k = x.lower()
        if k not in seen:
            seen.add(k)
            dedup.append(x)

    syn["verification_checklist"] = dedup or [
        "Define persistence criteria for inflation surprise (metrics + time window).",
        "Define tightening criteria for financial conditions (spreads/lending/funding metrics + thresholds).",
        "Define acceptable sizing/hedging constraints under liquidity and drawdown governance.",
    ]

    syn["trigger_scaffold"] = [
        {"name": "Inflation persistence", "metric": "TBD", "threshold": "TBD", "status": "Not verified"},
        {"name": "Financial conditions tightening", "metric": "TBD", "threshold": "TBD", "status": "Not verified"},
        {"name": "Earnings deterioration", "metric": "TBD", "threshold": "TBD", "status": "Not verified"},
        {"name": "Liquidity stress", "metric": "TBD", "threshold": "TBD", "status": "Not verified"},
    ]

    syn["scenario_matrix_scaffold"] = [
        {"scenario": "Policy holds (credibility strong) + disinflation emerges", "rates": "?", "equities": "?", "credit": "?", "fx": "?", "notes": "Fill signs + triggers"},
        {"scenario": "Policy overshoot → hard landing", "rates": "?", "equities": "?", "credit": "?", "fx": "?", "notes": "Fill signs + triggers"},
        {"scenario": "Policy pivot due to stability stress (inflation sticky)", "rates": "?", "equities": "?", "credit": "?", "fx": "?", "notes": "Fill signs + triggers"},
        {"scenario": "Stagflation persists (growth weak, inflation high)", "rates": "?", "equities": "?", "credit": "?", "fx": "?", "notes": "Fill signs + triggers"},
    ]

    state["synthesis"] = syn

# -----------------------------
# Nodes: tool_use JSON + normalization + fallbacks
# -----------------------------
def n_synthesize(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    system = SYSTEM_BASE + (
        " Role: SYNTHESIS. Output must be dense and complete. "
        "Minimum requirements: consensus>=3, divergences>=2, unknowns>=3, actionable_recommendations>=4. "
        "Provide confidence with a short justification."
    )
    payload = {
        "query": state["query"],
        "desk_memos": state["desk_memos"],
        "output_schema_minimums": {
            "consensus_min": 3,
            "divergences_min": 2,
            "unknowns_min": 3,
            "actionable_recommendations_min": 4,
        },
        "output_schema": {
            "executive_summary": "short paragraph (non-empty)",
            "consensus": "array (>=3)",
            "divergences": "array (>=2)",
            "unknowns": "array (>=3)",
            "actionable_recommendations": "array (>=4)",
            "confidence": "low/medium/high with justification (non-empty)",
        },
    }

    try:
        syn = llm_tool_json(client, cfg, system, json.dumps(payload))
        state["synthesis"] = _normalize_synthesis(syn)
        trace_add(state, "SYNTHESIS", {"event": "TOOL_JSON_OK"})
    except Exception as e:
        safe_error(state, "SYNTHESIS", e)
        syn_fallback = _normalize_synthesis({})
        syn_fallback["_status"] = "SYNTHESIS_FAILED"
        state["synthesis"] = syn_fallback
        trace_add(state, "SYNTHESIS", {"event": "FALLBACK"})
    return state

def n_red_team(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    system = SYSTEM_BASE + (
        " Role: RED_TEAM. Be adversarial. "
        "Minimum requirements: critique>=3, evidence_gaps>=3, questions_to_resolve>=3. "
        "Always set severity with rationale (non-empty)."
    )
    payload = {
        "query": state["query"],
        "synthesis": state["synthesis"],
        "desk_memos": state["desk_memos"],
        "output_schema_minimums": {
            "critique_min": 3,
            "evidence_gaps_min": 3,
            "questions_to_resolve_min": 3,
        },
        "output_schema": {
            "critique": "array (>=3)",
            "evidence_gaps": "array (>=3)",
            "assumption_attacks": "array (>=2 preferred)",
            "questions_to_resolve": "array (>=3)",
            "severity": "low/medium/high with rationale (non-empty)",
        },
    }

    try:
        rt = llm_tool_json(client, cfg, system, json.dumps(payload))
        state["red_team"] = _normalize_red_team(rt)
        trace_add(state, "RED_TEAM", {"event": "TOOL_JSON_OK", "severity": state["red_team"].get("severity", "")})
    except Exception as e:
        safe_error(state, "RED_TEAM", e)
        rt_fallback = _normalize_red_team({})
        rt_fallback["_status"] = "RED_TEAM_FAILED"
        state["red_team"] = rt_fallback
        trace_add(state, "RED_TEAM", {"event": "FALLBACK", "severity": state["red_team"].get("severity", "")})

    # Deterministic governance addons require both synthesis + red-team to be present
    derive_governance_addons(state)
    trace_add(state, "RED_TEAM", {"event": "GOVERNANCE_ADDONS_DERIVED", "checklist_n": len(state["synthesis"].get("verification_checklist", []))})
    return state

def n_supervisor(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    system = SYSTEM_BASE + (
        " Role: SUPERVISOR. Decide APPROVE/REVISE/REJECT conservatively. "
        "If REVISE, required_followups must be non-empty and which_desks_to_rerun must be non-empty."
    )
    payload = {
        "query": state["query"],
        "constraints": state["constraints"],
        "synthesis": state["synthesis"],
        "red_team": state["red_team"],
        "revision_round": state["revision_round"],            # completed revision cycles so far
        "max_revision_rounds": cfg["max_revision_rounds"],    # maximum completed cycles allowed
        "output_schema": {
            "decision": "APPROVE or REVISE or REJECT",
            "reason": "short justification (non-empty)",
            "required_followups": "array (non-empty if REVISE)",
            "which_desks_to_rerun": "array subset of allowed desks (non-empty if REVISE)",
        },
    }

    try:
        sup = llm_tool_json(client, cfg, system, json.dumps(payload))
        state["supervisor"] = _normalize_supervisor(sup, cfg, state["selected_desks"])
        decision = state["supervisor"]["decision"]
        trace_add(state, "SUPERVISOR", {"event": "TOOL_JSON_OK", "decision": decision, "round": state["revision_round"]})
    except Exception as e:
        safe_error(state, "SUPERVISOR", e)
        sup_fallback = _normalize_supervisor({}, cfg, state["selected_desks"])
        sup_fallback["_status"] = "SUPERVISOR_FAILED"
        state["supervisor"] = sup_fallback
        decision = state["supervisor"]["decision"]
        trace_add(state, "SUPERVISOR", {"event": "FALLBACK", "decision": decision, "round": state["revision_round"]})

    # Termination gate:
    # revision_round counts completed cycles; REVISION_PLAN increments it.
    if decision == "APPROVE":
        state["termination_reason"] = "APPROVED"
    elif decision == "REJECT":
        state["termination_reason"] = "REJECTED"
    else:
        # If we're already at the cap, do not allow another revision cycle.
        if state["revision_round"] >= cfg["max_revision_rounds"]:
            state["termination_reason"] = "MAX_REVISION_ROUNDS_REACHED"
        else:
            state["termination_reason"] = ""

    return state

def n_revision_plan(state: ResearchState, cfg: CFG, client: Anthropic) -> ResearchState:
    # revision_round = completed cycles. Planning a new cycle increments.
    state["revision_round"] += 1
    rerun = state["supervisor"].get("which_desks_to_rerun", [])
    rerun = [d for d in rerun if d in cfg["desk_order"]]
    rerun = [d for d in cfg["desk_order"] if d in rerun]
    if not rerun:
        rerun = state["selected_desks"][:] if state["selected_desks"] else cfg["desk_order"][:2]

    state["pending_desks"] = rerun  # type: ignore
    state["completed_desks"] = []
    trace_add(state, "REVISION_PLAN", {"event": "PLANNED", "rerun_desks": rerun, "revision_round": state["revision_round"]})
    return state

# Wire nodes (AgentNode abstraction)
SYNTHESIS = AgentNode("SYNTHESIS", n_synthesize)
RED_TEAM = AgentNode("RED_TEAM", n_red_team)
SUPERVISOR = AgentNode("SUPERVISOR", n_supervisor)
REVISION_PLAN = AgentNode("REVISION_PLAN", n_revision_plan)

print("CELL7_OK: hardened nodes + minimums + governance addons (no empty sections).")


CELL7_OK: hardened nodes + minimums + governance addons (no empty sections).


##8.LANGRAPH TOPOLOGY

###8.1.OVERVIEW

**CELL 8 — Graph construction: conditional routing, explicit END, and topology-as-control**

Cell 8 is where all the parts become a system. We build the LangGraph StateGraph, register nodes, connect edges, and define conditional routing functions. This is the notebook’s “operating model.” It is also the most important cell to show the committee because it is the clearest proof that this is not a chatbot.

The graph has an entry point (INTAKE) and an explicit END node. The explicit END matters because professional workflows must stop for a reason and record the reason. “We stopped because we ran out of steps” is different from “We stopped because the supervisor approved.” The termination reason is part of the audit trail.

Conditional routing is also central. We use routing functions that read state, not free-form text. For example, the Desk Router routes to a desk if `pending_desks` is non-empty, otherwise to Synthesis. The Supervisor routes to Revision Plan if decision is REVISE and revision rounds remain, otherwise to END. These are deterministic rules, and they are bounded.

This cell also reintroduces the loop safety: we enforce maximum step caps and revision caps in routing logic. That prevents GraphRecursionError and ensures the graph cannot run indefinitely.

Finally, Cell 8 renders the Mermaid diagram. This is not decoration. It is the visual contract of the system. When you present to the committee, the diagram is your map: you can point to each node and explain what role it plays, and you can show where governance controls live. It is also a learning artifact: it teaches the team that “multi-agent” means “workflow,” not “prompt tricks.”

In short: Cell 8 defines the architecture, enforces routing controls, and makes the system visible.


###8.2.CODE AND IMPLEMENTATION

In [44]:
# CELL 8/10 — Build LangGraph topology (supervised orchestrator) with HARD bounded routing + visualize

def route_from_desk_router(state: ResearchState) -> str:
    """
    State-driven routing ONLY.
    Hard stop conditions prevent GraphRecursionError.
    """
    # Terminal reasons
    if state["termination_reason"] in [
        "APPROVED", "REJECTED", "MAX_REVISION_ROUNDS_REACHED", "ERROR", "MAX_TOTAL_STEPS_REACHED"
    ]:
        return "END"

    # Global step cap (defensive)
    if state["steps_executed"] >= CFG_DEFAULT["max_total_steps"]:
        state["termination_reason"] = "MAX_TOTAL_STEPS_REACHED"
        return "END"

    # Desk fan-out loop (bounded by pending_desks length)
    if state["pending_desks"]:
        nxt = state["pending_desks"][0]
        return nxt

    # If no desks pending, proceed to synthesis
    return "SYNTHESIS"

def route_after_supervisor(state: ResearchState) -> str:
    """
    After SUPERVISOR, either END or go to REVISION_PLAN.
    Also enforces revision cap to prevent infinite loops.
    """
    # Terminal reasons already set
    if state["termination_reason"] in [
        "APPROVED", "REJECTED", "MAX_REVISION_ROUNDS_REACHED", "ERROR", "MAX_TOTAL_STEPS_REACHED"
    ]:
        return "END"

    # Enforce revision cap BEFORE planning another revision round
    if state["revision_round"] >= CFG_DEFAULT["max_revision_rounds"]:
        state["termination_reason"] = "MAX_REVISION_ROUNDS_REACHED"
        return "END"

    # Otherwise, REVISE => revision planning
    return "REVISION_PLAN"

# --- Build graph ---
graph = StateGraph(ResearchState)

graph.add_node("INTAKE", lambda s: INTAKE(s, CFG_DEFAULT, client))
graph.add_node("DESK_ROUTER", lambda s: DESK_ROUTER(s, CFG_DEFAULT, client))
graph.add_node("MACRO", lambda s: MACRO(s, CFG_DEFAULT, client))
graph.add_node("EQUITY", lambda s: EQUITY(s, CFG_DEFAULT, client))
graph.add_node("CREDIT", lambda s: CREDIT(s, CFG_DEFAULT, client))
graph.add_node("RATES_FX", lambda s: RATES_FX(s, CFG_DEFAULT, client))

graph.add_node("SYNTHESIS", lambda s: SYNTHESIS(s, CFG_DEFAULT, client))
graph.add_node("RED_TEAM", lambda s: RED_TEAM(s, CFG_DEFAULT, client))
graph.add_node("SUPERVISOR", lambda s: SUPERVISOR(s, CFG_DEFAULT, client))
graph.add_node("REVISION_PLAN", lambda s: REVISION_PLAN(s, CFG_DEFAULT, client))

graph.set_entry_point("INTAKE")
graph.add_edge("INTAKE", "DESK_ROUTER")

graph.add_conditional_edges(
    "DESK_ROUTER",
    route_from_desk_router,
    {
        "MACRO": "MACRO",
        "EQUITY": "EQUITY",
        "CREDIT": "CREDIT",
        "RATES_FX": "RATES_FX",
        "SYNTHESIS": "SYNTHESIS",
        "END": END,
    }
)

# After each desk, return to router (bounded via pending_desks depletion)
graph.add_edge("MACRO", "DESK_ROUTER")
graph.add_edge("EQUITY", "DESK_ROUTER")
graph.add_edge("CREDIT", "DESK_ROUTER")
graph.add_edge("RATES_FX", "DESK_ROUTER")

# Supervisor chain
graph.add_edge("SYNTHESIS", "RED_TEAM")
graph.add_edge("RED_TEAM", "SUPERVISOR")

graph.add_conditional_edges(
    "SUPERVISOR",
    route_after_supervisor,
    {
        "REVISION_PLAN": "REVISION_PLAN",
        "END": END,
    }
)

graph.add_edge("REVISION_PLAN", "DESK_ROUTER")

# Compile + visualize
client = get_client()
app = graph.compile()

mermaid_code = display_langgraph_mermaid(app)
print("CELL8_OK:", {"compiled": True, "mermaid_len": len(mermaid_code), "max_revision_rounds": CFG_DEFAULT["max_revision_rounds"], "max_total_steps": CFG_DEFAULT["max_total_steps"]})


CELL8_OK: {'compiled': True, 'mermaid_len': 927, 'max_revision_rounds': 2, 'max_total_steps': 30}


##9.EXECUTION

###9.1.0VERVIEW

**CELL 9 — Execution run and diagnostics: bounded invocation, trace inspection, and reliability**

Cell 9 is where we execute the workflow and demonstrate that it produces a complete, inspectable result under bounded conditions. This matters in a committee setting: you want to show that the system runs end-to-end quickly and predictably, not that it requires manual intervention.

The cell compiles the graph and invokes it with an initial state produced by `state_init(query, constraints)`. The key is that we set `recursion_limit` in the invoke config. This does not replace our bounded loop controls; it is a defensive measure aligned with LangGraph’s internal recursion protection. Our bounded logic ensures the process should end. The recursion limit provides headroom so the system can finish legitimately bounded loops without triggering a framework-level stop.

The diagnostics printed in this cell are the “executive dashboard” for the run:
- termination reason,
- revision rounds executed,
- steps executed,
- desks selected and completed,
- which memos exist,
- supervisor decision,
- red team severity,
- errors (if any),
- trace tail (last N events).

These diagnostics serve two purposes. First, they help you troubleshoot quickly if something goes wrong. Second, they provide the committee with evidence that the process is governed and legible. For example, “REVISION_ROUNDS: 1” is meaningful. “SUPERVISOR_DECISION: REVISE” is meaningful. “DESKS_COMPLETED: MACRO, EQUITY…” confirms that coverage actually occurred.

Most importantly, Cell 9 shows that the system is not hiding failures. If an LLM JSON parse fails, the error is recorded. If fallback logic activates, the state records it. That transparency is what makes the notebook credible: it acknowledges that AI can fail, and it shows how we control those failures.

In short: Cell 9 runs the system and exposes its behavior in a way that is suitable for professional review.


###9.2.CODE AND IMPLEMENTATION

In [45]:
# CELL 9/10 — Execute run (bounded) + diagnostics (includes recursion_limit sized to bounded loop)

client = get_client()
app = graph.compile()

query = (
    "Synthesize a cross-asset research view on: "
    "‘How should a multi-asset portfolio manager think about a sudden, persistent inflation surprise "
    "combined with tightening financial conditions?’ "
    "Focus on mechanisms, risks, and what must be verified before acting."
)

constraints = {
    "audience": "Institutional PM / CIO",
    "risk_posture": "Conservative; avoid unsupported claims",
    "time_horizon": "3-6 months",
    "deliverable_style": "Mechanism-first; explicit assumptions and open items",
}

state0 = state_init(query, constraints)

# Justification:
# Each revision round executes (up to) desks(4) + router hops + synthesis/red/supervisor (+ revision_plan) ~= ~12-14 steps.
# With max_revision_rounds=2, worst-case < 45 steps. We set recursion_limit=80 for safe headroom.
final_state = app.invoke(state0, config={"recursion_limit": 80})

print("TERMINATION:", final_state.get("termination_reason", ""))
print("REVISION_ROUNDS:", final_state.get("revision_round", ""))
print("STEPS_EXECUTED:", final_state.get("steps_executed", ""))
print("DESKS_SELECTED:", final_state.get("selected_desks", []))
print("DESKS_COMPLETED:", final_state.get("completed_desks", []))
print("DESK_MEMOS:", list((final_state.get("desk_memos", {}) or {}).keys()))
print("SUPERVISOR_DECISION:", (final_state.get("supervisor", {}) or {}).get("decision", "N/A"))

syn = final_state.get("synthesis", {}) or {}
rt = final_state.get("red_team", {}) or {}

print("\nSYNTHESIS_EXEC_SUMMARY (first 600 chars):\n", (syn.get("executive_summary", "") or "")[:600])
print("\nRED_TEAM_SEVERITY:", rt.get("severity", ""))

errs = final_state.get("errors", []) or []
print("\nERRORS_N:", len(errs))
if errs:
    print("LAST_ERROR:", errs[-1])

print("\nTRACE_LAST_14:")
for t in (final_state.get("trace", []) or [])[-14:]:
    print(t)


TERMINATION: 
REVISION_ROUNDS: 2
STEPS_EXECUTED: 30
DESKS_SELECTED: ['MACRO', 'EQUITY', 'CREDIT', 'RATES_FX']
DESKS_COMPLETED: ['MACRO']
DESK_MEMOS: ['MACRO', 'EQUITY', 'CREDIT', 'RATES_FX']
SUPERVISOR_DECISION: REVISE

SYNTHESIS_EXEC_SUMMARY (first 600 chars):
 A persistent inflation surprise combined with tightening financial conditions creates a stagflationary regime that compresses real returns across traditional asset classes, breaks down diversification benefits, and forces portfolio rebalancing toward real assets and duration hedges. The critical 3-6 month window requires verification of inflation drivers (demand vs. supply-driven), central bank credibility and policy response function, and stress-testing of leverage and refinancing calendars before repositioning. The primary tail risk is policy error (over-tightening) triggering a credit event

RED_TEAM_SEVERITY: medium: plausible mechanism, but action requires triggers, sizing rules, and explicit scenario mapping

ERRORS_N: 0


In [41]:
# PRINT REPORT — human-readable research packet from final_state (console + optional .txt export)

def _fmt_bullets(items, indent="  - "):
    if not items:
        return ""
    out = []
    for x in items:
        if isinstance(x, str):
            out.append(f"{indent}{x}")
        else:
            out.append(f"{indent}{json.dumps(x, ensure_ascii=False)}")
    return "\n".join(out)

def print_research_report(s: ResearchState, max_width: int = 110, save_txt: bool = True, filename: str = "research_report.txt") -> str:
    lines = []
    lines.append("=" * max_width)
    lines.append("N10 MULTI-DESK RESEARCH SYNTHESIS — REPORT")
    lines.append("=" * max_width)
    lines.append(f"TS_UTC: {utc_now_iso()}")
    lines.append(f"TERMINATION: {s.get('termination_reason','')}")
    lines.append(f"REVISION_ROUNDS: {s.get('revision_round','')}")
    lines.append(f"STEPS_EXECUTED: {s.get('steps_executed','')}")
    lines.append(f"DESKS_SELECTED: {', '.join(s.get('selected_desks', []))}")
    lines.append(f"DESKS_COMPLETED: {', '.join(s.get('completed_desks', []))}")
    sup = s.get("supervisor", {}) or {}
    lines.append(f"SUPERVISOR_DECISION: {sup.get('decision','N/A')}")
    lines.append("")

    # --- Desk memos ---
    lines.append("-" * max_width)
    lines.append("DESK MEMOS")
    lines.append("-" * max_width)
    desk_memos = s.get("desk_memos", {}) or {}
    for desk in CFG_DEFAULT["desk_order"]:
        if desk not in desk_memos:
            continue
        m = desk_memos[desk] or {}
        lines.append(f"\n[{desk}]")
        if m.get("_status"):
            lines.append(f"STATUS: {m.get('_status')}")
        lines.append(f"THESIS: {m.get('thesis','')}")
        lines.append("KEY_POINTS:")
        lines.append(_fmt_bullets(m.get("key_points", [])) or "  - (none)")
        lines.append("RISKS:")
        lines.append(_fmt_bullets(m.get("risks", [])) or "  - (none)")
        lines.append("ASSUMPTIONS:")
        lines.append(_fmt_bullets(m.get("assumptions", [])) or "  - (none)")
        lines.append("OPEN_ITEMS:")
        lines.append(_fmt_bullets(m.get("open_items", [])) or "  - (none)")
        lines.append("RECOMMENDED_NEXT_ACTIONS:")
        lines.append(_fmt_bullets(m.get("recommended_next_actions", [])) or "  - (none)")

    # --- Synthesis ---
    syn = s.get("synthesis", {}) or {}
    lines.append("\n" + "-" * max_width)
    lines.append("SYNTHESIS")
    lines.append("-" * max_width)
    if syn.get("_status"):
        lines.append(f"STATUS: {syn.get('_status')}")
    lines.append("EXECUTIVE_SUMMARY:")
    lines.append(syn.get("executive_summary", "") or "(empty)")
    lines.append("\nCONSENSUS:")
    lines.append(_fmt_bullets(syn.get("consensus", [])) or "  - (none)")
    lines.append("\nDIVERGENCES:")
    lines.append(_fmt_bullets(syn.get("divergences", [])) or "  - (none)")
    lines.append("\nUNKNOWNS:")
    lines.append(_fmt_bullets(syn.get("unknowns", [])) or "  - (none)")
    lines.append("\nACTIONABLE_RECOMMENDATIONS:")
    lines.append(_fmt_bullets(syn.get("actionable_recommendations", [])) or "  - (none)")
    lines.append(f"\nCONFIDENCE: {syn.get('confidence','')}")

    # --- Red Team ---
    rt = s.get("red_team", {}) or {}
    lines.append("\n" + "-" * max_width)
    lines.append("RED TEAM")
    lines.append("-" * max_width)
    if rt.get("_status"):
        lines.append(f"STATUS: {rt.get('_status')}")
    lines.append(f"SEVERITY: {rt.get('severity','')}")
    lines.append("\nCRITIQUE:")
    lines.append(_fmt_bullets(rt.get("critique", [])) or "  - (none)")
    lines.append("\nEVIDENCE_GAPS:")
    lines.append(_fmt_bullets(rt.get("evidence_gaps", [])) or "  - (none)")
    lines.append("\nASSUMPTION_ATTACKS:")
    lines.append(_fmt_bullets(rt.get("assumption_attacks", [])) or "  - (none)")
    lines.append("\nQUESTIONS_TO_RESOLVE:")
    lines.append(_fmt_bullets(rt.get("questions_to_resolve", [])) or "  - (none)")

    # --- Supervisor ---
    lines.append("\n" + "-" * max_width)
    lines.append("SUPERVISOR")
    lines.append("-" * max_width)
    lines.append(f"DECISION: {sup.get('decision','')}")
    lines.append(f"REASON: {sup.get('reason','')}")
    lines.append("REQUIRED_FOLLOWUPS:")
    lines.append(_fmt_bullets(sup.get("required_followups", [])) or "  - (none)")
    lines.append("WHICH_DESKS_TO_RERUN:")
    lines.append(_fmt_bullets(sup.get("which_desks_to_rerun", [])) or "  - (none)")

    # --- Errors ---
    errs = s.get("errors", []) or []
    lines.append("\n" + "-" * max_width)
    lines.append("ERRORS")
    lines.append("-" * max_width)
    if not errs:
        lines.append("(none)")
    else:
        for e in errs[-10:]:
            lines.append(json.dumps(e, ensure_ascii=False))

    lines.append("\n" + "=" * max_width)
    report = "\n".join(lines)
    print(report)

    if save_txt:
        with open(filename, "w", encoding="utf-8") as f:
            f.write(report)
        print(f"\nWROTE: {filename}")

    return report

_ = print_research_report(final_state, save_txt=True, filename="research_report.txt")


N10 MULTI-DESK RESEARCH SYNTHESIS — REPORT
TS_UTC: 2026-02-19T14:01:55.848884+00:00
TERMINATION: 
REVISION_ROUNDS: 2
STEPS_EXECUTED: 30
DESKS_SELECTED: MACRO, EQUITY, CREDIT, RATES_FX
DESKS_COMPLETED: MACRO
SUPERVISOR_DECISION: REVISE

--------------------------------------------------------------------------------------------------------------
DESK MEMOS
--------------------------------------------------------------------------------------------------------------

[MACRO]
THESIS: A persistent inflation surprise combined with tightening financial conditions creates a stagflationary regime that pressures growth assets and real yields simultaneously, requiring portfolio managers to rebalance toward real assets, duration hedges, and liquidity buffers while verifying whether central banks will prioritize price stability over financial stability. The critical risk is policy error: either insufficient tightening (inflation persistence) or over-tightening (credit stress and recession), both o

##10.AUDIT BUNDLE

###10.1.OVERVIEW

**CELL 10 — Full report generation, artifact export, and reproducible audit bundle**

Cell 10 turns the run into deliverables. In institutional settings, “AI output” is not just text on screen. It must be packaged, archived, and reviewable. Cell 10 does exactly that.

First, it prints a full human-readable report: desk memos, synthesis, governance addons (verification checklist, triggers, scenario matrix scaffolds), red team critique, supervisor decision, and error/trace summaries. This report is designed to be “committee-ready”: it reads like a multi-desk packet and it includes the control sections that a risk-aware organization expects.

Second, it writes that report to `research_report.txt`. This is the printable artifact you can share internally without requiring others to open the notebook. It is also the easiest way to attach outputs to an email, internal wiki, or memo repository.

Third, it exports three required JSON artifacts:
- `run_manifest.json` records metadata (timestamp, config hash, environment fingerprint, model lock).
- `graph_spec.json` records the topology and Mermaid diagram so the process is preserved.
- `final_state.json` records the full state: inputs, outputs, memos, synthesis, critique, supervisor, errors, trace.

These artifacts are what make the notebook auditable. If someone asks “What exactly did the model see and produce?” you can point to the final state. If someone asks “What workflow produced this?” you can point to the graph spec. If someone asks “Could we reproduce this later?” you can point to the manifest with versions and hashes.

Finally, Cell 10 bundles everything into a zip file and writes SHA-256 hashes for integrity. This is a professional-grade control: it ensures that what you archived is exactly what you ran.

In short: Cell 10 is the “governance packaging layer.” It makes the notebook output portable, reviewable, and suitable for institutional use.


###10.2.CODE AND IMPLEMENTATION

In [48]:
# CELL 10/10 — Full report printer (includes governance addons) + export artifacts + zip bundle

import zipfile

def _sha256_bytes(b: bytes) -> str:
    return hashlib.sha256(b).hexdigest()

def _write_json(path: str, obj: Any) -> None:
    with open(path, "w", encoding="utf-8") as f:
        json.dump(obj, f, indent=2, sort_keys=True, ensure_ascii=False)

def env_fingerprint() -> Dict[str, Any]:
    pkgs = ["langgraph", "langchain", "langchain-core", "anthropic"]
    return {
        "python": sys.version.split()[0],
        "platform": platform.platform(),
        "packages": {p: _version(p) for p in pkgs},
    }

def cfg_hash(cfg: CFG) -> str:
    return hashlib.sha256(json.dumps(cfg, sort_keys=True).encode("utf-8")).hexdigest()

def graph_spec_from_mermaid(mermaid: str) -> Dict[str, Any]:
    declared_nodes = [
        "INTAKE","DESK_ROUTER","MACRO","EQUITY","CREDIT","RATES_FX",
        "SYNTHESIS","RED_TEAM","SUPERVISOR","REVISION_PLAN","END"
    ]
    return {
        "topology_name": "N10 Multi-Desk Research Synthesis + Red Team + Supervisor",
        "declared_nodes": declared_nodes,
        "mermaid": mermaid,
        "controls": {
            "bounded_desk_loop": True,
            "bounded_revision_loop": True,
            "explicit_END": True,
        },
        "notes": [
            "Desk fan-out implemented as bounded loop over pending_desks.",
            "Supervisor decides APPROVE/REVISE/REJECT; revision capped by max_revision_rounds.",
            "All routing is state-driven (no implicit conversational memory).",
            "Structured outputs enforced via tool_use JSON.",
        ],
    }

def _fmt_bullets(items, indent="  - "):
    if not items:
        return ""
    out = []
    for x in items:
        if isinstance(x, str):
            out.append(f"{indent}{x}")
        else:
            out.append(f"{indent}{json.dumps(x, ensure_ascii=False)}")
    return "\n".join(out)

def print_research_report(s: ResearchState, save_txt: bool = True, filename: str = "research_report.txt") -> str:
    width = 110
    lines = []
    def add(x=""):
        lines.append(x)

    add("=" * width)
    add("N10 MULTI-DESK RESEARCH SYNTHESIS — FULL REPORT (WITH GOVERNANCE ADDONS)")
    add("=" * width)
    add(f"TS_UTC: {utc_now_iso()}")
    add(f"TERMINATION: {s.get('termination_reason','')}")
    add(f"REVISION_ROUNDS: {s.get('revision_round','')}")
    add(f"STEPS_EXECUTED: {s.get('steps_executed','')}")
    add(f"DESKS_SELECTED: {', '.join(s.get('selected_desks', []))}")
    add(f"DESKS_COMPLETED: {', '.join(s.get('completed_desks', []))}")
    sup = s.get("supervisor", {}) or {}
    add(f"SUPERVISOR_DECISION: {sup.get('decision','N/A')}")
    add("")

    # -----------------
    # Desk memos
    # -----------------
    add("-" * width)
    add("DESK MEMOS")
    add("-" * width)
    desk_memos = s.get("desk_memos", {}) or {}
    for desk in CFG_DEFAULT["desk_order"]:
        if desk not in desk_memos:
            continue
        m = desk_memos[desk] or {}
        add("")
        add(f"[{desk}]")
        if m.get("_status"):
            add(f"STATUS: {m.get('_status')}")
        add(f"THESIS: {m.get('thesis','')}")
        add("KEY_POINTS:")
        add(_fmt_bullets(m.get("key_points", [])) or "  - (empty)")
        add("RISKS:")
        add(_fmt_bullets(m.get("risks", [])) or "  - (empty)")
        add("ASSUMPTIONS:")
        add(_fmt_bullets(m.get("assumptions", [])) or "  - (empty)")
        add("OPEN_ITEMS:")
        add(_fmt_bullets(m.get("open_items", [])) or "  - (empty)")
        add("RECOMMENDED_NEXT_ACTIONS:")
        add(_fmt_bullets(m.get("recommended_next_actions", [])) or "  - (empty)")

    # -----------------
    # Synthesis (+ governance addons)
    # -----------------
    syn = s.get("synthesis", {}) or {}
    add("")
    add("-" * width)
    add("SYNTHESIS")
    add("-" * width)
    if syn.get("_status"):
        add(f"STATUS: {syn.get('_status')}")
    add("EXECUTIVE_SUMMARY:")
    add(syn.get("executive_summary", "") or "(empty)")

    add("")
    add("CONSENSUS:")
    add(_fmt_bullets(syn.get("consensus", [])) or "  - (empty)")

    add("")
    add("DIVERGENCES:")
    add(_fmt_bullets(syn.get("divergences", [])) or "  - (empty)")

    add("")
    add("UNKNOWNS:")
    add(_fmt_bullets(syn.get("unknowns", [])) or "  - (empty)")

    add("")
    add("ACTIONABLE_RECOMMENDATIONS:")
    add(_fmt_bullets(syn.get("actionable_recommendations", [])) or "  - (empty)")

    add("")
    add(f"CONFIDENCE: {syn.get('confidence','') or '(empty)'}")

    # Governance addons (deterministic scaffolds)
    add("")
    add("VERIFICATION_CHECKLIST:")
    add(_fmt_bullets(syn.get("verification_checklist", [])) or "  - (empty)")

    add("")
    add("TRIGGER_SCAFFOLD (NOT VERIFIED):")
    trig = syn.get("trigger_scaffold", []) or []
    if trig:
        for row in trig:
            add("  - " + json.dumps(row, ensure_ascii=False))
    else:
        add("  - (empty)")

    add("")
    add("SCENARIO_MATRIX_SCAFFOLD (NOT VERIFIED):")
    mat = syn.get("scenario_matrix_scaffold", []) or []
    if mat:
        for row in mat:
            add("  - " + json.dumps(row, ensure_ascii=False))
    else:
        add("  - (empty)")

    # -----------------
    # Red team
    # -----------------
    rt = s.get("red_team", {}) or {}
    add("")
    add("-" * width)
    add("RED TEAM")
    add("-" * width)
    if rt.get("_status"):
        add(f"STATUS: {rt.get('_status')}")
    add(f"SEVERITY: {rt.get('severity','') or '(empty)'}")

    add("")
    add("CRITIQUE:")
    add(_fmt_bullets(rt.get("critique", [])) or "  - (empty)")

    add("")
    add("EVIDENCE_GAPS:")
    add(_fmt_bullets(rt.get("evidence_gaps", [])) or "  - (empty)")

    add("")
    add("ASSUMPTION_ATTACKS:")
    add(_fmt_bullets(rt.get("assumption_attacks", [])) or "  - (empty)")

    add("")
    add("QUESTIONS_TO_RESOLVE:")
    add(_fmt_bullets(rt.get("questions_to_resolve", [])) or "  - (empty)")

    # -----------------
    # Supervisor
    # -----------------
    add("")
    add("-" * width)
    add("SUPERVISOR")
    add("-" * width)
    add(f"DECISION: {sup.get('decision','')}")
    add(f"REASON: {sup.get('reason','')}")
    add("REQUIRED_FOLLOWUPS:")
    add(_fmt_bullets(sup.get("required_followups", [])) or "  - (empty)")
    add("WHICH_DESKS_TO_RERUN:")
    add(_fmt_bullets(sup.get("which_desks_to_rerun", [])) or "  - (empty)")

    # -----------------
    # Errors + trace tail
    # -----------------
    errs = s.get("errors", []) or []
    add("")
    add("-" * width)
    add("ERRORS")
    add("-" * width)
    if not errs:
        add("(none)")
    else:
        for e in errs[-50:]:
            add(json.dumps(e, ensure_ascii=False))

    add("")
    add("-" * width)
    add("TRACE_LAST_20")
    add("-" * width)
    tr = s.get("trace", []) or []
    for t in tr[-20:]:
        add(json.dumps(t, ensure_ascii=False))

    add("")
    add("=" * width)

    report = "\n".join(lines)
    print(report)

    if save_txt:
        with open(filename, "w", encoding="utf-8") as f:
            f.write(report)
        print(f"\nWROTE: {filename}")

    return report

# -----------------
# Export artifacts
# -----------------
run_id = uuid.uuid4().hex
manifest = {
    "run_id": run_id,
    "ts_utc": utc_now_iso(),
    "project": "AA-FIN-LG-2026",
    "notebook": "N10 Multi-Desk Research Synthesis — Supervised Orchestrator",
    "config": CFG_DEFAULT,
    "config_hash": cfg_hash(CFG_DEFAULT),
    "env": env_fingerprint(),
    "controls": {
        "model_lock": CFG_DEFAULT["model"],
        "bounded_loops": {
            "desk_loop_max": len(CFG_DEFAULT["desk_order"]),
            "revision_rounds_max": CFG_DEFAULT["max_revision_rounds"],
            "global_step_cap": CFG_DEFAULT["max_total_steps"],
        },
        "explicit_end_node": True,
        "state_schema": "ResearchState (TypedDict)",
        "structured_output": "anthropic tool_use return_json",
    },
    "outputs": {
        "run_manifest.json": "run_manifest.json",
        "graph_spec.json": "graph_spec.json",
        "final_state.json": "final_state.json",
        "research_report.txt": "research_report.txt",
        "bundle_zip": "n10_artifacts.zip",
    },
}

graph_spec = graph_spec_from_mermaid(mermaid_code)

_write_json("run_manifest.json", manifest)
_write_json("graph_spec.json", graph_spec)
_write_json("final_state.json", final_state)

_ = print_research_report(final_state, save_txt=True, filename="research_report.txt")

# Bundle + hashes
files_to_zip = ["run_manifest.json", "graph_spec.json", "final_state.json", "research_report.txt"]
with zipfile.ZipFile("n10_artifacts.zip", "w", compression=zipfile.ZIP_DEFLATED) as z:
    for p in files_to_zip:
        with open(p, "rb") as f:
            data = f.read()
        z.writestr(p, data)
        z.writestr(p + ".sha256", _sha256_bytes(data).encode("utf-8"))

print("\nBUNDLE_WROTE: n10_artifacts.zip")
print("ARTIFACTS:", files_to_zip)


N10 MULTI-DESK RESEARCH SYNTHESIS — FULL REPORT (WITH GOVERNANCE ADDONS)
TS_UTC: 2026-02-19T14:36:12.125846+00:00
TERMINATION: 
REVISION_ROUNDS: 2
STEPS_EXECUTED: 30
DESKS_SELECTED: MACRO, EQUITY, CREDIT, RATES_FX
DESKS_COMPLETED: MACRO
SUPERVISOR_DECISION: REVISE

--------------------------------------------------------------------------------------------------------------
DESK MEMOS
--------------------------------------------------------------------------------------------------------------

[MACRO]
THESIS: A persistent inflation surprise combined with tightening financial conditions creates a stagflationary regime that compresses real yields, destabilizes equity risk premia, and forces portfolio rebalancing away from duration and growth assets—requiring immediate stress-testing of correlations, credit spreads, and liquidity buffers before repositioning. The critical question is whether central banks will tolerate the financial stability costs of sustained tightening or pivot, which

##11.CONCLUSION

**Conclusion: From a Demonstration Notebook to a Fully Governed, Auditable Human-in-the-Loop Research System**

This notebook proves a specific point that matters to an investment organization: AI becomes genuinely useful when it is treated as a **process component** rather than a source of truth. The system we built is not a chatbot. It is a structured workflow that behaves like a multi-desk research process: intake, desk memos, synthesis, red team challenge, supervisor gate, bounded revision, explicit termination, and exported artifacts. That design choice is the foundation for governance, because governance is always easier when the process is explicit.

Moving from this notebook to a fully auditable, governed implementation with humans in the loop is less about adding “more AI” and more about strengthening three layers: **evidence**, **controls**, and **accountability**.

First, we strengthen the evidence layer. Today, this notebook is architecture-first: it demonstrates the workflow and produces a packet with structured gaps and verification needs. In production, the same desks must be able to reference validated inputs. That means integrating approved data sources (economic releases, curve data, spreads, earnings revisions, funding indicators) and binding the system to a “data provenance record” for every run. Each memo should carry explicit tags: what claims are supported by data pulled in this run, what claims are general background, and what claims are assumptions. The key is not to make the AI “more confident,” but to make the output **traceable to sources**.

Second, we strengthen the controls layer. The notebook already contains the essential controls: bounded loops, explicit supervisor decisions, red team critique, and exported artifacts. In production, we turn these into enforceable gates. The supervisor node becomes a true “release control” with formal rules: a memo cannot be distributed unless (a) key sections are present, (b) verification checklists are populated with real metrics and thresholds, and (c) risk disclosures are included. We also formalize stop conditions: if evidence is insufficient, the correct outcome is “REVISE” or “REJECT,” not a polished narrative. The system should be biased toward refusing to finalize when verification is missing, because that is the safe institutional behavior.

Third, we strengthen accountability with a human-in-the-loop workflow. In production, humans are not “optional reviewers.” They are the owners of decisions and the signatories of outputs. The most practical implementation is a two-step loop. Step one is AI-assisted draft generation: the system produces desk memos, synthesis, and red team critique quickly and consistently. Step two is human review and sign-off: desk analysts validate the desk-specific claims and fill in the verification triggers with real metrics, while a senior reviewer approves distribution. In other words, the AI system accelerates preparation and structure; humans own factual verification and actionability.

To make this auditable, we treat every run like a case file. The notebook already exports `run_manifest.json`, `graph_spec.json`, and `final_state.json`. In a governed deployment, these artifacts become mandatory records stored in a controlled repository with retention rules. We add run identifiers, reviewer identifiers, and a structured “approval record” that captures who approved what and when. We also log any overrides: if a human changes a recommendation, we record the change and the rationale. This is how you achieve auditability: not by trusting the model, but by capturing the process that produced the deliverable and the humans who took responsibility for it.

Finally, we extend governance from “single run auditability” to “model lifecycle governance.” Over time, models change, prompts drift, and market regimes change. A governed system must track versions, run-to-run differences, and performance of the process itself. That means establishing an evaluation suite: not a leaderboard, but a set of institutional tests for completeness, consistency, refusal behavior, and risk disclosure. The committee should understand this as the same discipline we apply to any model risk management program: documented controls, documented limitations, periodic reviews, and clear escalation paths.

The path forward is therefore straightforward and operational. We keep the same topology and strengthen the production boundary: validated data inputs, enforceable gates, and human sign-off integrated into the workflow. This notebook is already the correct skeleton for that outcome because it is state-driven, inspectable, bounded, and artifact-producing. With evidence provenance, formal control gates, and human accountability added, it becomes not just a demonstration of multi-desk AI, but a practical blueprint for governed research at institutional standard.
