#**CHAPTER 8. M&A DUE DILIGENCE**
---

##REFERENCE

https://chatgpt.com/share/6997001e-a02c-8012-ad47-158881bfeb80

##0.CONTEXT

**Introduction — Why this notebook exists, and what it proves**

If you strip away the buzzwords, an M&A due diligence process is a structured way to answer one question: **“What do we believe we are buying, what could go wrong, and what protections do we have?”** In real deals, that question is answered through a controlled workflow: data rooms, legal drafts, finance schedules, management calls, specialist memos, and a constant loop of “find evidence → interpret → identify gaps → escalate.” The practical difficulty is not that we lack documents. The practical difficulty is that we have too many documents, the documents are inconsistent, and time is limited. A committee does not want a long narrative; it wants **traceable conclusions**: what we know, where it came from, and what remains uncertain. That is exactly the behavior this notebook is designed to demonstrate: **AI can assist diligence when it is forced to behave like a governed workflow, not like a chatbot.**

This notebook is part of a broader program called **Agentic Architectures in Finance**. The point is not to “make an AI sound smart.” The point is to teach a reliable pattern: **state-driven, auditable decision systems** built with LangGraph. In this notebook (Notebook 8 / N8), the finance use case is **M&A diligence Q&A over documents**, and the architectural dimension we add is: **router + retrieval**. In simple terms, we create an AI workflow that (1) reads a question, (2) decides which diligence domain it belongs to, (3) retrieves the most relevant evidence from the document set, (4) produces an answer strictly grounded in that evidence, and (5) runs a bounded check for missing evidence and escalates gaps. That is the core structure of real diligence. The notebook is a deliberately clean prototype of that structure, designed to be fast enough for a classroom and auditable enough for a professional setting.

**What this notebook is (and is not)**

This notebook is **not** a replacement for diligence teams, specialist advisors, or investment committee judgment. It does not “verify” truth. It does not discover facts from the world. It does not run a full data room ingestion pipeline. Instead, it demonstrates a smaller, controlled objective:

- **Given a defined set of diligence documents** (here, a synthetic mini data room),
- **Given a diligence question** (for example, “What buyer protections exist in the SPA?”),
- The system can produce a **structured diligence response** that is:
  - **Evidence-grounded**: it cites the exact document chunks used.
  - **Domain-aware**: it routes the question to the relevant document categories.
  - **Gap-aware**: it explicitly lists open items when evidence is missing.
  - **Bounded**: it retries evidence collection at most a fixed number of times.
  - **Auditable**: it exports run artifacts and a graph topology.

Those five properties matter more than “good writing.” They are the difference between a chatbot and a workflow that can be discussed in front of a committee.

**The business problem: diligence is a search-and-control problem**

In real M&A work, diligence is not a single moment of analysis. It is a loop that repeats under time pressure:

1. **A question arrives** (from the deal lead, the IC, legal counsel, the lender, or the buyer).
2. A human decides **which domain owns it**:
   - legal (SPA/LOI terms, MAE, indemnities),
   - financial (quality of earnings, leverage, working capital),
   - tax (audits, NOLs, structuring),
   - IP (ownership, licenses, transfer restrictions),
   - HR (union risk, change of control, key man),
   - commercial (customer concentration, churn, pricing power).
3. The analyst finds **where the evidence is** (which documents, which sections).
4. The analyst forms a view and writes:
   - what the documents say,
   - what protections exist,
   - what liabilities remain,
   - what is missing,
   - what must be escalated.
5. The analyst repeats because new questions and new gaps appear.

Most delays, mistakes, and reputational risks come from failures in steps 2–4: wrong domain ownership, missing documents, untraceable claims, and unflagged gaps. An AI system is valuable only if it reduces those failures. But it can only do so if it is forced into the same discipline: routing, retrieval, evidence-only drafting, and explicit escalation.

That is why this notebook is designed the way it is. It is not an “AI writes diligence” demo; it is an “AI is made to follow diligence control logic” demo.

**The core idea: turn diligence into a state machine**

Committees usually hear about AI as if it were magic: you ask, it answers. That is exactly how errors happen. In this notebook we do something different: we treat the diligence process as a **state machine**. A state machine is a formal way of saying: at any moment, the system has a defined status (the “state”), and it moves to the next step based on rules (the “routing”). The state machine approach is how financial institutions control risk: limits, approvals, gates, escalation paths. We simply apply that idea to AI.

In practice, the state contains things like:

- the question we are trying to answer,
- the domain classification (legal, tax, financial, etc.),
- the evidence retrieved (document chunks with metadata),
- the answer produced,
- the citations used,
- the open items that remain,
- the iteration count (how many times we tried to fetch more evidence),
- the termination reason (why we stopped).

This is the key shift: **the AI is not “thinking freely.” It is moving through a governed workflow whose current status is visible at all times.**

**Why LangGraph: topology as a governance artifact**

LangGraph is used because it forces the workflow to be explicit. In normal code, it is easy to hide logic inside functions and end up with untraceable behavior. In LangGraph, you must declare:

- nodes (distinct steps),
- edges (allowed transitions),
- conditional routing (the if/then decisions),
- and an explicit END node.

This creates a graph you can show to a committee. The graph is not a marketing picture; it is a **contract**: these are the only steps the system is allowed to take.

This notebook includes a mandatory visualization step using a pinned Mermaid renderer so the topology is rendered consistently in Colab. That visualization is not cosmetic. It is the learning artifact and the governance artifact: it shows what the system can do, and what it cannot do.

**The notebook’s document set: a miniature data room**

To keep this notebook fast and deterministic, we use a small synthetic “data room” of representative diligence documents:

- LOI terms (price, exclusivity, conditions, leakage concept),
- SPA excerpts (closing conditions, MAE, indemnities, caps/baskets, non-compete, reps),
- Financial summary (revenue/EBITDA trend, net debt, working capital, concentration),
- Commercial memo (cyclicality, concentration, customer termination rights),
- HR snapshot (headcount, key executives, union/CBA renewal),
- IP register (patents, critical non-transferable license),
- Tax note (ongoing audit, exposure range, NOLs).

In the real world, these would be PDFs and spreadsheets; here they are text objects, because the point is not OCR. The point is the workflow: **router → retrieval → grounded answer → gap escalation.** Once that workflow is proven, you can replace the synthetic documents with real ingestion and indexing.

**Retrieval in this notebook: simple by design, auditable by design**

A critical part of diligence is retrieval: finding the relevant paragraphs and clauses. Many AI systems use vector databases and embeddings. That is valid in production, but it adds infrastructure and reduces transparency for teaching. This notebook uses a deliberately simple retrieval method: deterministic token overlap scoring on pre-chunked documents, with a modest document-type boost (e.g., SPA gets a boost for legal queries).

Why this matters for a committee: we can explain it without hand-waving. We can say:

- We split documents into chunks.
- We score each chunk against the question by overlap of keywords.
- We take the top K chunks and pass them to the LLM.
- We also keep metadata (doc type, title, chunk ID) for audit.

This is not “the best retrieval method.” It is the most explainable method for a controlled notebook demo. And it is already enough to show the larger point: **AI assistance depends on evidence selection.** The exact retrieval technology can evolve later.

**The LLM role: structured synthesis, not free-form invention**

The model used is fixed by the program: **claude-haiku-4-5-20251001**. The important detail is not the brand; the important detail is how we constrain it. In this notebook we constrain the model in two ways:

1. **It does not see the whole world.** It sees only the retrieved evidence chunks and the question.
2. **It must return strict JSON** with three keys:
   - `answer`: what it can conclude from evidence,
   - `citations_used`: which chunk IDs it relied on,
   - `open_items`: what it cannot conclude and must escalate.

This is essential. In professional diligence, the danger is not that the AI makes small mistakes. The danger is that it produces confident language that sounds plausible. The JSON format is an enforcement mechanism: it forces the model to declare its evidence and its uncertainty in a way we can inspect.

In other words, the LLM is being used for what it is good at: **summarizing and structuring information** across multiple evidence blocks, while we use the graph and the state machine to handle governance.

**The node-by-node story: what happens when you run this notebook**

When you run the notebook, the system follows a sequence of nodes. Each node has a single job. Each job updates the state. This is the flow:

**1) Intake node**
This node ensures the state has a question. In a real system, the question would come from the deal team or a UI. Here we supply an example question like:

“Summarize the key buyer protections and seller liabilities in the SPA, and flag any material diligence gaps we should escalate.”

The intake node also initializes loop counters and control flags so execution is deterministic.

**2) Router node**
This node asks the LLM to classify the question into one of the domains: LEGAL, FINANCIAL, TAX, IP, HR, COMMERCIAL, or GENERAL. The router returns strict JSON. If it fails to parse, the system defaults to GENERAL (safe fallback). Then we set a retrieval restriction based on the domain. For example:

- LEGAL → SPA and LOI documents
- TAX → tax note and SPA
- IP → IP register and SPA
- FINANCIAL → financials and LOI

The key point: this is the first governance control. It prevents the system from “answering legal questions from random documents.” It is a formal version of what a diligence manager does when they say, “This is a legal point—go to the SPA.”

**3) Retrieval node**
This node retrieves the top K chunks from the documents based on the question and the domain restriction. It records which chunks were selected. In a real diligence workflow, this is the equivalent of pulling the relevant SPA clauses and financial schedule lines into your working memo.

**4) Answer node**
This node constructs an evidence pack:

- Each chunk is labeled with a chunk ID and document metadata.
- The LLM is instructed: “Answer using ONLY these evidence blocks.”
- The output must be strict JSON: answer, citations, open items.

The answer node then filters citations to ensure they match retrieved chunk IDs. That is another governance control: it prevents the model from citing nonexistent evidence.

**5) Gap check node (bounded loop)**
This node checks if the answer has:
- zero citations (meaning: not grounded), or
- open items (meaning: evidence gaps remain).

If either is true, the system is allowed to do one more retrieval pass — but in a controlled way. Specifically, it broadens the scope to GENERAL (no restriction) and reruns retrieval and answering. This reflects a real behavior: if your first pass looked only at the SPA and you still have gaps, you broaden to other memos, schedules, or notes.

Crucially, the loop is bounded. This is important for both governance and compute. The maximum loop iterations are configured (in this notebook, set to 2). That means the system can never spin indefinitely. It will either stop with “EVIDENCE_OK” or stop with “EVIDENCE_GAPS_REMAIN.” A committee can understand and approve that behavior.

**6) END node**
The graph terminates explicitly. The final state includes:
- the answer,
- citations,
- open items,
- termination reason,
- and a trace log of node activity.

That trace is your audit trail: what decisions were made, and in what order.

**What the committee should take away: AI as an assistant with controls**

When presenting this to a committee, the most useful framing is:

- We are not automating judgment.
- We are automating the mechanical part of diligence: routing, searching, summarizing, and gap listing.
- We keep humans in control by making outputs traceable and bounded.

Specifically, this notebook demonstrates three “committee-grade” properties:

**1) Traceability**
Every answer is tied to citations that reference specific document chunks. We can always ask: “Where did that come from?” and get an ID that maps to the underlying text.

**2) Explicit uncertainty**
Open items are not hidden. The system is required to list them. That is critical in diligence where missing a gap is often worse than being uncertain.

**3) Governance by topology**
The workflow is a graph. The system cannot jump steps. It must follow the defined topology, and we can review the topology as part of model risk governance.

**Why this matters operationally: speed, consistency, and safer coverage**

In a live deal, diligence questions come constantly, and the work is fragmented: emails, call notes, document versions, redlines. Analysts spend large amounts of time on two tasks:

- finding the relevant information,
- producing a consistent summary that leadership can digest.

This system targets those two tasks. The benefits, if implemented properly, are practical:

- **Speed**: faster first-pass answers to common questions.
- **Consistency**: standardized format (answer + citations + open items).
- **Coverage**: reduced risk of missing obvious clauses because retrieval is systematic.
- **Escalation discipline**: open items become a formal checklist.

However, the notebook is also honest about limitations. The AI can only answer from what it sees. If documents are missing, it cannot invent them (and it is instructed not to). If the question is ambiguous, it will reflect that in open items. That is the correct behavior.

**How to interpret “open items” in a diligence context**

A committee will often ask: “So what do we do with the gaps?” The answer is: open items map directly to real diligence actions:

- request missing schedules or exhibits,
- ask counsel to confirm a clause interpretation,
- ask management for clarification,
- run specialist review (tax, IP, labor),
- incorporate a protection (special indemnity, covenants, escrow),
- adjust valuation or deal terms.

In other words, open items are not failure; they are the diligence checklist. The system is doing what a good analyst does: **it tells you what you still don’t know.**

**Artifacts and audit bundle: why the notebook exports JSON files**

A committee also cares about repeatability. If we cannot reproduce outputs, we cannot govern them. That is why the notebook exports three required artifacts:

- `run_manifest.json`: what was run, with config hashes and versions.
- `graph_spec.json`: the topology and retrieval design, as a machine-readable spec.
- `final_state.json`: the final state, including answer, citations, gaps, and trace.

This is the beginning of an audit bundle. In a larger system, you would add the exact evidence snippets and document hashes. But even here, we already have the core: **a run can be inspected, compared, and explained.**

The notebook also includes a determinism check: rerun the same question with the same configuration and compare fingerprints (excluding timestamps). That is not a gimmick. It’s a control: it ensures the workflow is stable enough to be reviewed.

**Connecting the notebook to real implementation: what changes, what stays the same**

In a production diligence environment, several pieces would be upgraded:

- Document ingestion (PDF parsing, spreadsheets, OCR if needed).
- Retrieval (embeddings, vector search, hybrid retrieval, permissions).
- Version control (document versions, redlines, superseded drafts).
- Access control (deal room permissions, confidentiality).
- Logging (PII redaction, secure storage, retention policies).
- Human workflow integration (task creation for open items, counsel handoffs).

But the most important part would remain unchanged: the graph topology and the state-driven approach. The architecture is portable:

- Router node maps questions to domains.
- Retrieval node selects evidence.
- Answer node synthesizes with citations.
- Gap check node enforces escalation.
- Bounded loops prevent runaway behavior.
- Artifacts support auditability.

So when we say “AI can generate diligence,” what we really mean is: **AI can execute a diligence assistant workflow that resembles how analysts work, while producing outputs that leadership can trust because they are traceable and constrained.**

**How to present this in simple terms**

If you need to summarize the notebook in one minute for leadership, you can say:

**“This notebook shows an AI diligence assistant that behaves like a controlled workflow. It first classifies the question (legal, tax, financial), then pulls the most relevant clauses and sections from the deal documents, then produces an answer that is required to cite exactly which clauses it relied on. If it cannot find enough evidence, it says so explicitly and lists the missing items as an escalation checklist. The entire workflow is a visible graph and produces an audit bundle, so we can review how it reached its conclusions.”**

That sentence is accurate and aligned with the notebook.

**Why the committee should care: it reduces diligence friction while preserving control**

Committees rarely approve technology because it is clever. They approve it because it reduces operational friction without increasing risk. This notebook is a demonstration of that balance:

- We get speed and standardization.
- We keep evidence traceability.
- We keep bounded behavior.
- We keep explicit uncertainty.
- We keep an audit trail.

That is the correct argument for AI in diligence. Not “it’s smart,” but “it’s governed.”

**Closing perspective: the real goal is institutionalizing the diligence method**

The deeper point of this notebook is that diligence quality is not just about talent; it is about process. Great teams have repeatable habits:

- they route questions correctly,
- they cite evidence,
- they separate knowns from unknowns,
- they escalate gaps early,
- they document decisions.

This notebook codifies those habits into an executable graph. That is why it belongs in a program about agentic architectures in finance. It teaches a pattern that can be extended: add a redline comparison node, add a “risk register” node, add a human approval gate node, add a secure retrieval stack. But the foundation stays the same: **diligence as a governed state machine, with AI as a constrained synthesis engine.**

That is what you are showing the committee: not a chatbot, but a controlled diligence workflow that produces an answer, a citation trail, and a gap list — and does so in a way that can be audited, repeated, and improved.


##1.LIBRARIES AND ENVIRONMENT

**Cell 1 — Install and initialize the notebook in a Colab-safe way**

This first cell exists to solve a very practical problem: if the environment is unstable, the “AI diligence workflow” is not credible. In front of a committee, you want to say: **the notebook runs cleanly, it uses explicit dependencies, and it records what it actually ran**. That is why Cell 1 is not “just imports.” It is the foundation for reproducibility and auditability.

The key design choice is how we install libraries. Google Colab comes with many packages already installed, and pinning common networking libraries (like `httpx` or `httpcore`) can cause conflicts. So we follow a Colab-safe strategy: we **only** install the minimum project dependencies we control (`langgraph`, `langchain-core`, `anthropic`) and we do it with `--upgrade` rather than hard-pinning a large dependency tree. This reduces the risk of breaking Colab’s internal environment while still ensuring we have compatible versions for LangGraph and the Anthropic client. This is a governance decision: we prefer **stable execution** over micromanaging every transitive dependency.

Next, the cell imports the standard Python modules needed for the whole notebook: JSON handling, hashing, deterministic randomness, timestamps, and typing. These are not cosmetic. We use them later to create the run manifest, hash configuration, and keep the workflow deterministic enough for review. We also import LangGraph objects (`StateGraph`, `END`) because the entire notebook is built as an explicit graph, not ad-hoc function calls.

Then the cell establishes three pieces of run identity: a `RUN_ID` (unique UUID), a UTC timestamp using timezone-aware `datetime.now(datetime.timezone.utc)`, and an output folder under `/content`. Together, these create an “audit directory” that contains artifacts at the end. This is exactly what professional teams expect: you should be able to point to a folder and say “this run produced these outputs.”

Finally, the cell prints a version dictionary. This is crucial for committee-grade credibility: when something changes, we can see whether it is the model, the library versions, or the environment. The cell therefore sets the tone for the entire notebook: **governance-first, deterministic where possible, and always inspectable.**


In [1]:
# CELL 1/10 — Install + core imports (Colab-safe: avoid pinning common deps that can conflict)
# Goal: prevent collisions with Colab preinstalls (especially httpx/httpcore).
# Strategy:
#   - Do NOT pin httpx/httpcore.
#   - Only ensure the 3 project dependencies exist at compatible versions.
#   - Use "--upgrade --quiet" without hard pins (Colab-friendly).
#   - Print the resolved versions for auditability.

!pip -q install --upgrade "langgraph>=0.2.39" "langchain-core>=0.3.40" "anthropic>=0.34.0"

import os, sys, json, re, uuid, time, random, hashlib, platform
import datetime as _dt
from dataclasses import dataclass
from typing import TypedDict, Literal, Dict, Any, List, Optional, Callable, Tuple

from langgraph.graph import StateGraph, END
from google.colab import userdata
from IPython.display import HTML, display

import importlib.metadata as md

random.seed(8)
os.environ["PYTHONHASHSEED"] = "8"

def _ver(pkg: str) -> str:
    try:
        return md.version(pkg)
    except Exception:
        return "missing"

RUN_ID = str(uuid.uuid4())
TS_UTC = _dt.datetime.now(_dt.timezone.utc).isoformat()

OUT_DIR = "/content/outputs_notebook_8"
os.makedirs(OUT_DIR, exist_ok=True)

VERSIONS = {
    "python": sys.version.split()[0],
    "platform": platform.platform(),
    "langgraph": _ver("langgraph"),
    "langchain-core": _ver("langchain-core"),
    "anthropic": _ver("anthropic"),
    # helpful diagnostics (may be preinstalled / transitive)
    "httpx": _ver("httpx"),
    "httpcore": _ver("httpcore"),
}

print("RUN_ID:", RUN_ID)
print("TS_UTC:", TS_UTC)
print("OUT_DIR:", OUT_DIR)
print("VERSIONS:", json.dumps(VERSIONS, indent=2))


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/500.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m491.5/500.5 kB[0m [31m19.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m500.5/500.5 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/456.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m456.3/456.3 kB[0m [31m45.4 MB/s[0m eta [36m0:00:00[0m
[?25hRUN_ID: 3217f4d1-e460-4852-b885-36860a96e99e
TS_UTC: 2026-02-19T12:07:11.289953+00:00
OUT_DIR: /content/outputs_notebook_8
VERSIONS: {
  "python": "3.12.12",
  "platform": "Linux-6.6.105+-x86_64-with-glibc2.35",
  "langgraph": "1.0.8",
  "langchain-core": "1.2.13",
  "anthropic": "0.82.0",
  "httpx": "0.28.1",
  "httpcore": "1.0.9"
}


##2.VISUALIZATION STANDARDS

###2.1.OVERVIEW

**Cell 2 — Visualization Standard v1: why we render the graph and why it must be hardened**

Cell 2 implements the visualization standard. In this program, the graph is not optional; it is a learning artifact and a governance artifact. The committee is not being asked to trust a black box. The committee is being shown the exact workflow topology: which steps exist, in what order they run, and where conditional decisions occur. That is why we require a hardened Mermaid renderer and a dedicated `display_langgraph_mermaid(graph)` function.

The practical reason for “hardened” rendering is that Colab can be unreliable with inline JavaScript, and Mermaid versions can change behavior across releases. A diagram that renders one day and fails the next undermines confidence. So the notebook pins Mermaid to a known version (10.6.1 unless explicitly changed) and uses an ESM import from a CDN, which is currently the most stable way to render Mermaid in Colab. The renderer also uses strict security settings to reduce risk and avoid injection issues. Even though this is a classroom notebook, adopting security discipline is part of the point.

The renderer works as follows: it creates a unique HTML container ID, injects Mermaid code into a safe string, loads Mermaid as a module, initializes it with predictable settings, and then renders the SVG into the container. If rendering fails, it prints the error inside the notebook rather than failing silently. That “no silent failures” principle is crucial in professional workflows.

The second function, `display_langgraph_mermaid(compiled_graph)`, is the interface we standardize across notebooks. It asks LangGraph for its Mermaid representation (using `compiled_graph.get_graph().draw_mermaid()`), then calls the local renderer. This ensures the diagram matches the actual graph topology that will execute. In other words: **we are not drawing an illustrative diagram; we are rendering the diagram from the real program**.

When presenting to a committee, this cell lets you say something simple and reassuring: “Here is the workflow the AI is allowed to follow. It cannot invent steps. It cannot skip steps. This diagram is generated from the compiled graph.” That statement is the core reason this cell exists.


###2.2.CODE AND IMPLEMENTATION

In [2]:
# CELL 2/10 — Visualization Standard v1: hardened Mermaid ESM renderer + display_langgraph_mermaid(graph)

MERMAID_VERSION = "10.6.1"

def _safe_id(prefix: str = "mmd") -> str:
    return f"{prefix}-{uuid.uuid4().hex[:10]}"

def render_mermaid_locally(mermaid_code: str, *, height_px: int = 520) -> None:
    """
    Hardened Colab Mermaid renderer (ESM). Deterministic + no external state assumptions.
    Pinned Mermaid version (default 10.6.1).
    """
    diagram_id = _safe_id("mermaid")
    payload = mermaid_code.replace("</script>", "<\\/script>")
    html = f"""
    <div style="border:1px solid rgba(0,0,0,0.15); border-radius:12px; padding:12px; overflow:auto;">
      <div id="{diagram_id}" style="min-height:{height_px}px;"></div>
    </div>

    <script type="module">
      import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@{MERMAID_VERSION}/dist/mermaid.esm.min.mjs";
      mermaid.initialize({{
        startOnLoad: false,
        securityLevel: "strict",
        theme: "default",
        flowchart: {{ curve: "basis" }},
      }});
      const code = `{payload}`;
      const el = document.getElementById("{diagram_id}");
      try {{
        const {{ svg }} = await mermaid.render("{diagram_id}-svg", code);
        el.innerHTML = svg;
      }} catch (e) {{
        el.innerHTML = "<pre style='color:#b00020; white-space:pre-wrap;'>" + String(e) + "</pre>";
      }}
    </script>
    """
    display(HTML(html))

def display_langgraph_mermaid(compiled_graph: Any) -> None:
    """
    Required by standard. Extracts Mermaid from LangGraph, renders locally.
    """
    mmd = compiled_graph.get_graph().draw_mermaid()
    render_mermaid_locally(mmd)

print("Mermaid renderer ready. Mermaid pinned:", MERMAID_VERSION)


Mermaid renderer ready. Mermaid pinned: 10.6.1


##3.SYNTHETIC M&A DILIGENCE CORPUS

###3.1.OVERVIEW

**Cell 3 — The synthetic “data room”: documents, chunking, and retrieval that is explainable**

Cell 3 creates the miniature diligence environment: a set of synthetic documents that represent the kinds of materials you see in a real M&A data room. The purpose is not to simulate every detail of a deal. The purpose is to provide a controlled dataset so we can demonstrate the workflow logic: routing, retrieval, evidence-grounded answering, and gap identification. This is why we use synthetic text rather than PDFs: we are teaching architecture, not document parsing.

The document set is intentionally representative. It includes an LOI (high-level terms and conditions), SPA excerpts (closing conditions, MAE, indemnities), a financial summary (revenue/EBITDA, net debt, working capital, concentration), a commercial memo (market/customer risk), HR snapshot (key man, union/CBA timing), IP register (non-transferable license risk), and tax note (audit exposure, NOL limitations). These are exactly the domains that diligence teams triage in real deals. By keeping them short, the notebook remains fast enough for a classroom and predictable enough for repeated runs.

Next, the cell defines a deterministic chunking process. In diligence, the unit of evidence is rarely “the whole document.” It is usually a clause, paragraph, or schedule line. Chunking approximates that. The code splits text into chunks under a max character limit, preferring newline boundaries so chunks align with logical sections. Each chunk is labeled with a stable `chunk_id` like `D2::C1`. This ID becomes the anchor for citations later.

Then we define retrieval. In production, retrieval might use embeddings and vector search, but that adds infrastructure and reduces transparency. Here we use a simple token-overlap scoring: we tokenize the question, count overlaps with each chunk’s tokens, normalize by length, and apply a small document-type boost (e.g., SPA slightly boosted for legal relevance). This approach is deliberately simple so you can explain it clearly: “We pull the top K chunks with the highest keyword match.”

The key point for the committee is: retrieval is not magic. It is a documented, inspectable mechanism that selects evidence. And because chunk IDs and metadata are preserved, we can later show exactly what text supported the answer. This cell makes that possible.


###3.2.CODE AND IMPLEMENTATION

In [3]:
# CELL 3/10 — Synthetic M&A diligence corpus + deterministic chunking + retrieval (no vectors; auditable)

@dataclass(frozen=True)
class Doc:
    doc_id: str
    title: str
    doc_type: Literal["LOI", "SPA", "Financials", "HR", "Commercial", "IP", "Ops", "Tax", "Regulatory"]
    text: str

DOCS: List[Doc] = [
    Doc(
        doc_id="D1",
        title="Letter of Intent (LOI) — Key Terms",
        doc_type="LOI",
        text=(
            "Transaction: acquisition of 100% equity of TargetCo by BuyerCo.\n"
            "Indicative purchase price: $420m enterprise value, subject to net debt and working capital adjustments.\n"
            "Exclusivity: 45 days from signature.\n"
            "Conditions: satisfactory due diligence, financing, board approvals.\n"
            "Leakage: prohibited; permitted leakage limited to ordinary-course salaries and agreed items.\n"
            "Governing law: New York.\n"
        ),
    ),
    Doc(
        doc_id="D2",
        title="Share Purchase Agreement (SPA) — Draft Excerpts",
        doc_type="SPA",
        text=(
            "Closing conditions include: antitrust clearance, no material adverse effect (MAE), accuracy of reps and warranties.\n"
            "Indemnities: general cap 10% of equity value; basket $2m tipping; survival 18 months.\n"
            "Special indemnity: known tax audit for FY2024, capped at $25m, survival 4 years.\n"
            "Non-compete: 24 months, territory: North America.\n"
            "Representations include: title to shares, financial statements, compliance, IP ownership, employment matters.\n"
        ),
    ),
    Doc(
        doc_id="D3",
        title="Financial Statements Summary — FY2023–FY2025 (unaudited)",
        doc_type="Financials",
        text=(
            "Revenue: FY2023 $310m; FY2024 $355m; FY2025 $372m.\n"
            "EBITDA: FY2023 $52m; FY2024 $61m; FY2025 $58m.\n"
            "Net debt (Dec FY2025): $96m.\n"
            "Working capital (Dec FY2025): $44m.\n"
            "Customer concentration: top-3 customers represent 41% of FY2025 revenue.\n"
            "Margin pressure in FY2025 due to input cost volatility and pricing lags.\n"
        ),
    ),
    Doc(
        doc_id="D4",
        title="Commercial Diligence Memo — Market & Customers",
        doc_type="Commercial",
        text=(
            "TargetCo operates in specialty industrial components with recurring aftermarket demand.\n"
            "Key risks: cyclical OEM demand, customer concentration, and competitive pricing pressure.\n"
            "Opportunities: cross-selling into BuyerCo channels; pricing optimization; SKU rationalization.\n"
            "Major customers have annual renegotiation clauses; one top customer has termination for convenience with 90 days notice.\n"
        ),
    ),
    Doc(
        doc_id="D5",
        title="HR Snapshot — Headcount, Key Man, and Benefits",
        doc_type="HR",
        text=(
            "Headcount: 820 total; 120 engineering; 540 manufacturing; 160 sales/admin.\n"
            "Key executives: CEO (3-year term), CFO (at-will), CTO (2-year retention plan).\n"
            "Change-of-control: executive bonus 1.5x base + target; no broad-based severance.\n"
            "Union: one manufacturing site unionized; CBA renewal due in 9 months.\n"
        ),
    ),
    Doc(
        doc_id="D6",
        title="IP Register — Patents & Licensing",
        doc_type="IP",
        text=(
            "Patents: 14 granted; 3 pending; primary families expire 2032–2038.\n"
            "Critical software: CAD automation tool licensed from VendorX; license is non-transferable without consent.\n"
            "Open-source usage: limited; requires compliance review for copyleft exposure.\n"
        ),
    ),
    Doc(
        doc_id="D7",
        title="Tax Diligence Note — Audits & Structure",
        doc_type="Tax",
        text=(
            "Ongoing tax audit: FY2024 transfer pricing documentation under review.\n"
            "Potential exposure range (management estimate): $8m–$18m including penalties.\n"
            "TargetCo has NOLs of $22m; usage subject to change-of-control limitations.\n"
        ),
    ),
]

def normalize(text: str) -> str:
    return re.sub(r"\s+", " ", text.strip().lower())

def tokenize(text: str) -> List[str]:
    return re.findall(r"[a-z0-9]+", text.lower())

def chunk_doc(doc: Doc, *, max_chars: int = 420) -> List[Dict[str, Any]]:
    t = doc.text.strip()
    parts = []
    start = 0
    while start < len(t):
        end = min(len(t), start + max_chars)
        cut = t.rfind("\n", start, end)
        if cut == -1 or cut <= start + 40:
            cut = end
        chunk = t[start:cut].strip()
        if chunk:
            parts.append({
                "doc_id": doc.doc_id,
                "title": doc.title,
                "doc_type": doc.doc_type,
                "chunk": chunk,
                "chunk_id": f"{doc.doc_id}::C{len(parts)+1}",
            })
        start = cut
    return parts

CHUNKS: List[Dict[str, Any]] = []
for d in DOCS:
    CHUNKS.extend(chunk_doc(d))

# Deterministic, explainable scoring: weighted token overlap + doc-type boosts
DOC_TYPE_BOOST = {
    "SPA": 1.20,
    "LOI": 1.10,
    "Financials": 1.15,
    "Tax": 1.12,
    "IP": 1.08,
    "HR": 1.05,
    "Commercial": 1.06,
    "Ops": 1.00,
    "Regulatory": 1.00,
}

def retrieve(query: str, *, top_k: int = 5, restrict_types: Optional[List[str]] = None) -> List[Dict[str, Any]]:
    q_tokens = tokenize(query)
    q_set = set(q_tokens)
    if not q_set:
        return []

    scored = []
    for c in CHUNKS:
        if restrict_types and c["doc_type"] not in restrict_types:
            continue
        c_tokens = tokenize(c["chunk"])
        overlap = sum(1 for tok in c_tokens if tok in q_set)
        # normalize by length to reduce bias for longer chunks
        denom = max(12, len(c_tokens))
        base = overlap / denom
        boost = DOC_TYPE_BOOST.get(c["doc_type"], 1.0)
        score = base * boost
        if score > 0:
            scored.append((score, c))
    scored.sort(key=lambda x: (-x[0], x[1]["chunk_id"]))
    return [dict(item[1], score=float(item[0])) for item in scored[:top_k]]

print("Docs:", len(DOCS), "Chunks:", len(CHUNKS))
print("Sample retrieval:", [r["chunk_id"] for r in retrieve("cap basket survival indemnity", top_k=3)])


Docs: 7 Chunks: 14
Sample retrieval: ['D2::C1']


##4.STATE SCHEMA

###4.1.OVERVIEW

**Cell 4 — The state schema, the AgentNode abstraction, and the strict model wrapper**

Cell 4 establishes the “governed system” foundation: a TypedDict state schema, a reusable node abstraction, and a strict wrapper around the LLM call. This is where the notebook stops being “some Python code” and becomes an auditable workflow.

First, the cell defines a `RunConfig`. This is important because every meaningful behavior of the system should be controlled by explicit configuration rather than hidden constants. The config includes the fixed model name (strictly `claude-haiku-4-5-20251001`), token limits, temperature (set to 0.0 for stability), loop bounds, and top-K retrieval. When you present to a committee, you want to be able to point to this config and say: “These are the operational settings. They are logged, hashed, and reproducible.”

Second, we define utility functions: stable JSON dumps and SHA256 hashing. These are used later to create run manifests and to fingerprint outputs for determinism checks. In finance workflows, you need traceability not just for outputs but for the conditions that produced them.

Third, we implement `_anthropic_client()` using `userdata.get("ANTHROPIC_API_KEY")` (ALL CAPS). This is a governance detail: secrets are not hard-coded, and the notebook fails loudly if the key is missing. The `llm_call()` function then wraps the Anthropic SDK in a minimal, consistent interface that returns plain text. This wrapper also helps prevent “hidden behavior” because every call goes through one controlled function.

Fourth, we define the `DiligenceState` TypedDict. This is the heart of “state-driven routing.” The state includes the question, the routed domain, the retrieval restrictions, the retrieved chunks, the drafted answer, citations, open items, loop iteration counts, termination reason, and a trace log. This is what makes the system auditable: at any time you can inspect the state and see what the system knows and why it is doing what it is doing.

Finally, we define `AgentNode`. Every node in the graph will be an object with a name and a `__call__` method that takes the state and returns an updated state. This makes nodes modular, testable, and composable, and it enforces clean separation between steps. The committee-level takeaway is simple: **we are not letting the model “run the show.” The graph and the state do.**


###4.2.CODE AND IMPLEMENTATION

In [4]:
# CELL 4/10 — Typed state schema + AgentNode abstraction + strict Claude model wrapper (Anthropic_API_KEY)

from anthropic import Anthropic

MODEL_NAME = "claude-haiku-4-5-20251001"

@dataclass(frozen=True)
class RunConfig:
    model: str = MODEL_NAME
    max_tokens: int = 700
    temperature: float = 0.0
    max_loop_iters: int = 2
    top_k_retrieval: int = 6

def _sha256(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8")).hexdigest()

def _json_dumps(obj: Any) -> str:
    return json.dumps(obj, ensure_ascii=False, indent=2, sort_keys=True)

def _anthropic_client() -> Anthropic:
    key = userdata.get("ANTHROPIC_API_KEY")  # ALL CAPS (required)
    if not key or not isinstance(key, str):
        raise RuntimeError("Missing Colab secret: userdata.get('ANTHROPIC_API_KEY') (ALL CAPS).")
    return Anthropic(api_key=key)

def llm_call(system: str, user: str, *, cfg: RunConfig) -> str:
    client = _anthropic_client()
    resp = client.messages.create(
        model=cfg.model,
        max_tokens=cfg.max_tokens,
        temperature=cfg.temperature,
        system=system,
        messages=[{"role": "user", "content": user}],
    )
    # Anthropic SDK returns a list of content blocks
    parts = []
    for block in resp.content:
        if getattr(block, "type", None) == "text":
            parts.append(block.text)
    return "\n".join(parts).strip()

class DiligenceState(TypedDict, total=False):
    # Inputs
    question: str
    # Routing
    domain: Literal["LEGAL", "FINANCIAL", "TAX", "IP", "HR", "COMMERCIAL", "GENERAL"]
    restrict_types: List[str]
    # Retrieval
    retrieved: List[Dict[str, Any]]  # chunks with metadata + score
    # Drafting
    answer: str
    citations: List[Dict[str, Any]]  # {chunk_id, doc_id, title, doc_type}
    open_items: List[str]
    # Control + trace
    loop_iter: int
    needs_more_evidence: bool
    termination_reason: str
    trace: List[Dict[str, Any]]

class AgentNode:
    name: str
    def __init__(self, name: str):
        self.name = name
    def __call__(self, state: DiligenceState) -> DiligenceState:
        raise NotImplementedError

CFG = RunConfig()
print("MODEL_NAME (strict):", CFG.model)
print("RunConfig:", CFG)


MODEL_NAME (strict): claude-haiku-4-5-20251001
RunConfig: RunConfig(model='claude-haiku-4-5-20251001', max_tokens=700, temperature=0.0, max_loop_iters=2, top_k_retrieval=6)


##5.NODES

###5.1.OVERVIEW

**Cell 5 — The core agent steps: Intake, Router, Retrieval, Answer, and GapCheck**

Cell 5 is where the diligence workflow becomes concrete. We define five nodes, each with a single responsibility, and each designed to update state deterministically and transparently. The principle is: small nodes, clear jobs, no hidden globals, and trace logging at every step.

The Intake node is operational hygiene. It guarantees a question exists and initializes control flags like `loop_iter` and `needs_more_evidence`. In professional workflows, you do not want fragile assumptions like “the user always provided a question.” Intake ensures the workflow can start reliably.

The Router node is the first “diligence realism” feature. It asks the model to classify the question into a diligence domain: LEGAL, FINANCIAL, TAX, IP, HR, COMMERCIAL, or GENERAL. The output is required to be strict JSON. If parsing fails, we fall back to GENERAL. That fallback is important: it prevents a brittle router from breaking the whole system, and it is safer than guessing. The router then sets `restrict_types` based on the domain (e.g., LEGAL focuses on SPA/LOI). This mimics how real teams assign ownership and narrow the search space.

The Retrieval node uses the deterministic retrieval function from Cell 3. It retrieves top-K chunks, optionally restricted by doc type, and stores them in state. It also records which chunks were selected in the trace. In a diligence setting, this is equivalent to assembling the evidence pack before you write.

The Answer node is the key governance control. It constructs an “evidence block” where each chunk is labeled and passed to the model. The model is instructed to answer using ONLY that evidence and return strict JSON with `answer`, `citations_used`, and `open_items`. Then we post-process the result: we filter citations so they can only reference chunk IDs we actually retrieved. This prevents hallucinated citations and forces accountability.

The GapCheck node implements the bounded evidence loop. If citations are empty or open items exist, the system may do one more pass with broadened retrieval scope. This mimics real diligence: first search in the primary domain, then broaden if gaps remain. But we keep it bounded: after the maximum iterations, the system stops with a termination reason. This is what makes the workflow safe, predictable, and committee-explainable.


###5.2.CODE AND IMPLEMENTATION

In [5]:
# CELL 5/10 — Nodes: Intake → Router → Retrieval → Answer → GapCheck (bounded loop) → END

DOMAIN_TO_TYPES = {
    "LEGAL": ["SPA", "LOI", "Regulatory"],
    "FINANCIAL": ["Financials", "LOI"],
    "TAX": ["Tax", "SPA"],
    "IP": ["IP", "SPA"],
    "HR": ["HR", "SPA"],
    "COMMERCIAL": ["Commercial", "Financials"],
    "GENERAL": [],
}

def _append_trace(state: DiligenceState, node: str, event: Dict[str, Any]) -> None:
    t = state.get("trace", [])
    t.append({"ts_utc": _dt.datetime.now(_dt.timezone.utc).isoformat(), "node": node, **event})
    state["trace"] = t

class IntakeNode(AgentNode):
    def __init__(self):
        super().__init__("intake")
    def __call__(self, state: DiligenceState) -> DiligenceState:
        q = (state.get("question") or "").strip()
        if not q:
            state["question"] = "What are the key risks and protections in the draft SPA and LOI?"
        state["loop_iter"] = int(state.get("loop_iter", 0))
        state["needs_more_evidence"] = False
        state["termination_reason"] = ""
        _append_trace(state, self.name, {"question": state["question"]})
        return state

class RouterNode(AgentNode):
    def __init__(self, cfg: RunConfig):
        super().__init__("router")
        self.cfg = cfg

    def __call__(self, state: DiligenceState) -> DiligenceState:
        q = state["question"]

        system = (
            "You are a diligence router. Classify the question into one domain: "
            "LEGAL, FINANCIAL, TAX, IP, HR, COMMERCIAL, or GENERAL.\n"
            "Return strict JSON with keys: domain, rationale_short (<=20 words)."
        )
        user = f"Question:\n{q}\n\nReturn JSON only."
        raw = llm_call(system, user, cfg=self.cfg)

        domain = "GENERAL"
        try:
            obj = json.loads(raw)
            d = str(obj.get("domain", "")).strip().upper()
            if d in DOMAIN_TO_TYPES:
                domain = d
        except Exception:
            domain = "GENERAL"

        state["domain"] = domain  # type: ignore
        restrict = DOMAIN_TO_TYPES.get(domain, [])
        state["restrict_types"] = restrict
        _append_trace(state, self.name, {"raw": raw[:220], "domain": domain, "restrict_types": restrict})
        return state

class RetrievalNode(AgentNode):
    def __init__(self, cfg: RunConfig):
        super().__init__("retrieval")
        self.cfg = cfg

    def __call__(self, state: DiligenceState) -> DiligenceState:
        q = state["question"]
        restrict = state.get("restrict_types") or None
        hits = retrieve(q, top_k=self.cfg.top_k_retrieval, restrict_types=restrict)
        state["retrieved"] = hits

        _append_trace(state, self.name, {
            "num_hits": len(hits),
            "top_chunks": [h["chunk_id"] for h in hits[:4]],
        })
        return state

class AnswerNode(AgentNode):
    def __init__(self, cfg: RunConfig):
        super().__init__("answer")
        self.cfg = cfg

    def __call__(self, state: DiligenceState) -> DiligenceState:
        q = state["question"]
        hits = state.get("retrieved", [])
        domain = state.get("domain", "GENERAL")

        evidence_lines = []
        citations = []
        for h in hits:
            evidence_lines.append(
                f"[{h['chunk_id']}] ({h['doc_type']}) {h['title']}: {h['chunk']}"
            )
            citations.append({
                "chunk_id": h["chunk_id"],
                "doc_id": h["doc_id"],
                "title": h["title"],
                "doc_type": h["doc_type"],
            })

        system = (
            "You are an M&A diligence analyst. Answer using ONLY the provided evidence. "
            "If evidence is insufficient, list open items explicitly.\n"
            "Output must be STRICT JSON with keys:\n"
            "answer (string), citations_used (array of chunk_ids), open_items (array of strings).\n"
            "No extra keys. No markdown."
        )
        user = (
            f"Domain: {domain}\n"
            f"Question: {q}\n\n"
            f"EVIDENCE BLOCKS:\n" + "\n\n".join(evidence_lines) + "\n\n"
            "Return JSON only."
        )

        raw = llm_call(system, user, cfg=self.cfg)

        answer = ""
        used_ids: List[str] = []
        open_items: List[str] = []

        try:
            obj = json.loads(raw)
            answer = str(obj.get("answer", "")).strip()
            used_ids = list(obj.get("citations_used", [])) if isinstance(obj.get("citations_used", []), list) else []
            open_items = list(obj.get("open_items", [])) if isinstance(obj.get("open_items", []), list) else []
        except Exception:
            answer = "Unable to produce a structured answer. Escalate to human review."
            used_ids = []
            open_items = ["Model output was not valid JSON. Re-run or escalate."]

        # Keep only citations that were retrieved
        retrieved_ids = {c["chunk_id"] for c in citations}
        used_ids = [cid for cid in used_ids if cid in retrieved_ids]

        state["answer"] = answer
        state["citations"] = [c for c in citations if c["chunk_id"] in used_ids]
        state["open_items"] = [str(x).strip() for x in open_items if str(x).strip()]

        _append_trace(state, self.name, {
            "raw": raw[:220],
            "citations_used": used_ids,
            "open_items_n": len(state["open_items"]),
        })
        return state

class GapCheckNode(AgentNode):
    def __init__(self, cfg: RunConfig):
        super().__init__("gap_check")
        self.cfg = cfg

    def __call__(self, state: DiligenceState) -> DiligenceState:
        """
        Bounded control loop:
          - if open_items exist OR citations are empty => attempt one more retrieval pass (broaden scope)
          - else END
        """
        it = int(state.get("loop_iter", 0))
        open_items = state.get("open_items", [])
        cites = state.get("citations", [])

        needs = (len(cites) == 0) or (len(open_items) > 0)
        state["needs_more_evidence"] = bool(needs)

        # If we need more evidence and have remaining iterations, broaden retrieval scope
        if needs and it < (self.cfg.max_loop_iters - 1):
            state["loop_iter"] = it + 1
            # broaden to GENERAL (no restriction) for second pass
            state["domain"] = "GENERAL"  # type: ignore
            state["restrict_types"] = []
            _append_trace(state, self.name, {
                "decision": "RETRY_BROADER",
                "loop_iter": state["loop_iter"],
                "reason": {"citations_n": len(cites), "open_items_n": len(open_items)},
            })
        else:
            state["termination_reason"] = "EVIDENCE_OK" if not needs else "EVIDENCE_GAPS_REMAIN"
            _append_trace(state, self.name, {
                "decision": "STOP",
                "loop_iter": it,
                "termination_reason": state["termination_reason"],
                "reason": {"citations_n": len(cites), "open_items_n": len(open_items)},
            })
        return state

nodes = {
    "intake": IntakeNode(),
    "router": RouterNode(CFG),
    "retrieval": RetrievalNode(CFG),
    "answer": AnswerNode(CFG),
    "gap_check": GapCheckNode(CFG),
}
print("Nodes ready:", list(nodes.keys()))


Nodes ready: ['intake', 'router', 'retrieval', 'answer', 'gap_check']


##6.GRAPH

###6.1.OVERVIEW

**Cell 6 — Building the LangGraph topology: conditional routing and bounded loops**

Cell 6 is where we convert the concept into an explicit LangGraph. This matters because it removes ambiguity. Instead of “trust me, the code does X,” we show a topology: these nodes exist, these edges exist, and these are the only allowed transitions. That is a governance win and a teaching win.

We start by creating a `StateGraph` using the `DiligenceState` schema defined earlier. This binds the graph to a known state shape. Then we add the five nodes: intake, router, retrieval, answer, and gap_check. The entry point is intake. This ensures every run initializes state properly before routing decisions are made.

Next, we connect direct edges in the main path: intake → router → retrieval → answer → gap_check. That is the “normal diligence path”: categorize the question, pull evidence, draft an evidence-grounded answer, then check sufficiency.

The key design is the conditional edge leaving gap_check. In LangGraph, conditional routing is explicit: we provide a function that inspects the state and returns the next node name (or END). Here, the routing logic is intentionally conservative and deterministic. The gap_check node itself decides whether a retry is warranted and increments `loop_iter` only when it chooses to broaden retrieval scope. The conditional routing function then checks the trace for that decision and routes back to retrieval if and only if the retry decision occurred. Otherwise, it routes to END.

This is important for bounded loops. Many “agentic” demos accidentally create loops that can spin unpredictably, or they rely on text heuristics that are hard to test. Here the loop is bounded by config (`max_loop_iters`) and enforced structurally in the state machine. The system cannot “decide” to loop infinitely. It must follow the rule: at most N passes, with a documented reason each time.

Once the topology is declared, we compile the graph. Compilation is a useful control point: it freezes the topology into an executable object and allows us to render the exact graph later. This cell gives you the committee-friendly statement: **“The workflow is a declared graph with explicit conditional routing and bounded loops; it is not an ad-hoc script.”**


###6.2.CODE AND IMPLEMENTATION

In [6]:
# CELL 6/10 — LangGraph topology: router + retrieval with bounded retry loop + explicit END

graph = StateGraph(DiligenceState)

graph.add_node("intake", nodes["intake"])
graph.add_node("router", nodes["router"])
graph.add_node("retrieval", nodes["retrieval"])
graph.add_node("answer", nodes["answer"])
graph.add_node("gap_check", nodes["gap_check"])

graph.set_entry_point("intake")

graph.add_edge("intake", "router")
graph.add_edge("router", "retrieval")
graph.add_edge("retrieval", "answer")
graph.add_edge("answer", "gap_check")

def route_after_gap_check(state: DiligenceState) -> str:
    it = int(state.get("loop_iter", 0))
    needs = bool(state.get("needs_more_evidence", False))
    # Retry path goes back to retrieval (with broadened scope already set in state)
    if needs and it < CFG.max_loop_iters:
        # If gap_check decided STOP, it won't have incremented loop_iter; needs will remain True.
        # We treat "loop_iter increment" as signal of retry.
        # If loop_iter was incremented, allow another retrieval+answer pass.
        # If not incremented, stop.
        # This ensures a deterministic bounded loop.
        last_trace = state.get("trace", [])[-1] if state.get("trace") else {}
        if last_trace.get("node") == "gap_check" and last_trace.get("decision") == "RETRY_BROADER":
            return "retrieval"
    return END

graph.add_conditional_edges("gap_check", route_after_gap_check, {"retrieval": "retrieval", END: END})

compiled = graph.compile()

print("Compiled LangGraph. Max loop iters:", CFG.max_loop_iters)


Compiled LangGraph. Max loop iters: 2


##7.VISUALIZATION

###7.1.OVERVIEW

**Cell 7 — Rendering the workflow: turning topology into a committee-ready diagram**

Cell 7 is short in code but central in meaning. It renders the LangGraph topology using the hardened Mermaid renderer created in Cell 2. In governance-first systems, a diagram is not decoration; it is evidence. It is a compact, readable representation of the system’s permissible behavior.

The reason we render after compilation is important: we want the diagram to reflect the exact executable graph, not a hand-drawn approximation. By calling `display_langgraph_mermaid(compiled)`, we ask LangGraph for its Mermaid representation and render it locally. This ensures the diagram matches the true topology: nodes, edges, and conditional transitions.

For a committee presentation, this diagram is the fastest way to communicate the architecture in simple terms. You can point to the diagram and explain the flow:

- Intake: ensure a valid question and initialize state.
- Router: classify the question domain.
- Retrieval: pull top evidence chunks.
- Answer: synthesize using evidence and produce citations + open items.
- Gap check: decide whether evidence is sufficient or whether to broaden and retry.
- END: terminate with a clear reason.

The diagram also makes bounded looping obvious: the only loop is from gap_check back to retrieval, and it is controlled by state and configuration. That is a strong governance signal: the system is not “free running,” it is executing a controlled process.

This cell therefore converts a technical implementation into a human artifact. Committees often struggle with AI because they cannot “see” the system. Here, they can. The diagram is concrete. It shows that we have turned diligence into an explicit procedure, with fixed stages and explicit exit conditions. It is also a didactic tool: once you understand this diagram, you understand the entire notebook. That is why visualization is mandatory in this project standard.


###7.2.CODE AND IMPLEMENTATION

In [7]:
# CELL 7/10 — Mandatory visualization: Mermaid diagram must match topology exactly

display_langgraph_mermaid(compiled)


##8.EXECUTION

###8.1.OVERVIEW

**Cell 8 — Running the diligence query and producing a readable result plus trace**

Cell 8 executes the full workflow on an example diligence question and prints results in a way that is meaningful for practitioners. This is where the committee sees the system “do diligence,” but the most important detail is how we package the output: not just a paragraph, but a structured result that mirrors a diligence memo.

We start by defining `format_answer(state)`. This function is deliberately simple and deterministic: it prints the final domain, the termination reason, the answer text, a list of citations, and the open items. This is the presentation layer. In real deployment, you might render to a memo template or push to a deal workflow system, but the idea remains: outputs should be readable and structured for review.

Next, we define a realistic question. The example asks for buyer protections, seller liabilities, and escalation gaps. That is exactly how diligence questions are phrased in practice: you want protections, exposures, and what still needs work.

We then create an initial state with the question, an empty trace, and loop_iter = 0. This is a key discipline: **we do not rely on hidden memory.** Everything the system needs is in the state.

When we call `compiled.invoke(initial_state)`, LangGraph executes the nodes in order, updating state at each step. The system routes, retrieves, answers, and gap-checks. If gaps remain and the bounded loop allows it, it broadens scope and retries once. This is the controlled diligence loop you want to demonstrate.

Finally, we print the formatted answer and the last few trace rows. The trace is extremely important in a committee context because it shows the system’s decision process without exposing private chain-of-thought. You can see which node ran, what it decided (e.g., domain classification, which chunks were retrieved), and why it stopped. The trace is your operational accountability: if a stakeholder asks “Why did it pick those documents?” you can point to the retrieval trace. If they ask “Why did it broaden scope?” you can point to the gap_check decision.

In short, Cell 8 is the demonstration cell: it runs the governed workflow and produces a committee-readable diligence output with evidence and a trace.


###8.2.CODE AND IMPLEMENTATION

In [8]:
# CELL 8/10 — Execute: diligence Q&A (router → retrieval → answer; optional bounded retry) + readable output

def format_answer(state: DiligenceState) -> str:
    lines = []
    lines.append(f"DOMAIN: {state.get('domain','GENERAL')}")
    lines.append(f"TERMINATION: {state.get('termination_reason','')}")
    lines.append("")
    lines.append("ANSWER:")
    lines.append(state.get("answer","").strip() or "(empty)")
    lines.append("")
    lines.append("CITATIONS:")
    for c in state.get("citations", []):
        lines.append(f"- {c['chunk_id']} | {c['doc_type']} | {c['title']}")
    if not state.get("citations"):
        lines.append("- (none)")
    lines.append("")
    lines.append("OPEN ITEMS:")
    for oi in state.get("open_items", []):
        lines.append(f"- {oi}")
    if not state.get("open_items"):
        lines.append("- (none)")
    return "\n".join(lines)

# Example diligence questions (you can edit these)
QUESTION = (
    "Summarize the key buyer protections and seller liabilities in the SPA, "
    "and flag any material diligence gaps we should escalate."
)

initial_state: DiligenceState = {"question": QUESTION, "trace": [], "loop_iter": 0}

final_state = compiled.invoke(initial_state)

print(format_answer(final_state))
print("\nTRACE (last 6):")
for row in final_state.get("trace", [])[-6:]:
    print(row)


DOMAIN: GENERAL
TERMINATION: EVIDENCE_GAPS_REMAIN

ANSWER:
Unable to produce a structured answer. Escalate to human review.

CITATIONS:
- (none)

OPEN ITEMS:
- Model output was not valid JSON. Re-run or escalate.

TRACE (last 6):
{'ts_utc': '2026-02-19T12:10:13.982607+00:00', 'node': 'retrieval', 'num_hits': 6, 'top_chunks': ['D3::C2', 'D4::C1', 'D5::C2', 'D1::C1']}
{'ts_utc': '2026-02-19T12:10:21.691984+00:00', 'node': 'answer', 'raw': '```json\n{\n  "answer": "KEY BUYER PROTECTIONS: (1) General indemnity cap of 10% of equity value ($42m) with $2m basket/tipping threshold and 18-month survival period provides limited recourse for breaches. (2) Special ind', 'citations_used': [], 'open_items_n': 1}
{'ts_utc': '2026-02-19T12:10:21.692515+00:00', 'node': 'gap_check', 'decision': 'RETRY_BROADER', 'loop_iter': 1, 'reason': {'citations_n': 0, 'open_items_n': 1}}
{'ts_utc': '2026-02-19T12:10:21.693149+00:00', 'node': 'retrieval', 'num_hits': 6, 'top_chunks': ['D3::C2', 'D4::C1', 'D5::C2', 'D

##9.ARTIFACTS

###9.1.0VERVIEW

**Cell 9 — Exporting required artifacts: run manifest, graph spec, and final state**

Cell 9 is where we enforce professional discipline: every run must produce artifacts that can be inspected later. This is not an academic nicety. In finance, if you cannot reproduce and audit a result, you cannot rely on it in decision-making. This cell therefore exports the three required JSON files: `run_manifest.json`, `graph_spec.json`, and `final_state.json`.

The `graph_spec.json` is a machine-readable description of the workflow topology. It lists nodes, edges, loop bounds, retrieval method, document types, and the visualization setup. Even if someone never opens the notebook, they can read this JSON and understand what the system is. Importantly, it is not a narrative; it is a specification. That makes it usable for governance reviews and for comparison across notebook versions.

The `run_manifest.json` is the run-level audit record. It includes the run ID, timestamp, objective, configuration, configuration hash, question hash, library versions, controls, and a summary outcome (termination reason, iterations, citation count, open item count). Hashes matter because they let you prove that a given output corresponds to a specific configuration. If someone changes the top-K retrieval or loop bounds and reruns, the manifest will reflect that change and the hash will differ.

The `final_state.json` is the full end-of-run state. This is the richest artifact: it contains the question, the domain, retrieved evidence metadata, the answer, citations, open items, termination reason, and the trace. This is the closest analogue to a diligence workpaper: it contains both conclusions and the structured record of how those conclusions were produced.

A subtle but important point: we write artifacts to a dedicated output directory created in Cell 1. This prevents clutter and makes it easy to package or archive. In a real setting, you would send these outputs to a controlled storage location with access controls and retention rules, but the concept is the same.

When presenting to a committee, Cell 9 supports a simple message: **“This system does not just produce text. It produces auditable artifacts that document what happened, how it happened, and what configuration produced it.”**


###9.2.CODE AND IMPLEMENTATION

In [9]:
# CELL 9/10 — Required artifacts: run_manifest.json, graph_spec.json, final_state.json (auditable, deterministic)

def build_graph_spec() -> Dict[str, Any]:
    # Minimal, topology-faithful spec
    edges = [
        {"from": "intake", "to": "router", "type": "direct"},
        {"from": "router", "to": "retrieval", "type": "direct"},
        {"from": "retrieval", "to": "answer", "type": "direct"},
        {"from": "answer", "to": "gap_check", "type": "direct"},
        {"from": "gap_check", "to": "retrieval", "type": "conditional", "label": "RETRY_BROADER"},
        {"from": "gap_check", "to": "END", "type": "conditional", "label": "STOP"},
    ]
    return {
        "notebook": "AA-FIN-LG-2026 — N8 M&A diligence Q&A (router + retrieval)",
        "run_id": RUN_ID,
        "ts_utc": TS_UTC,
        "model": CFG.model,
        "nodes": list(nodes.keys()) + ["END"],
        "edges": edges,
        "loop_bound": CFG.max_loop_iters,
        "retrieval": {
            "method": "token-overlap",
            "top_k": CFG.top_k_retrieval,
            "doc_types": sorted({d.doc_type for d in DOCS}),
            "num_docs": len(DOCS),
            "num_chunks": len(CHUNKS),
        },
        "visualization": {"mermaid_version": MERMAID_VERSION, "renderer": "colab_esm_local"},
    }

def build_run_manifest(final_state: DiligenceState) -> Dict[str, Any]:
    cfg_obj = {
        "model": CFG.model,
        "max_tokens": CFG.max_tokens,
        "temperature": CFG.temperature,
        "max_loop_iters": CFG.max_loop_iters,
        "top_k_retrieval": CFG.top_k_retrieval,
    }
    cfg_hash = _sha256(_json_dumps(cfg_obj))
    q_hash = _sha256((final_state.get("question","") or "").strip())
    return {
        "run_id": RUN_ID,
        "ts_utc": TS_UTC,
        "objective": "M&A diligence Q&A over documents with router + retrieval and bounded evidence loop",
        "config": cfg_obj,
        "config_sha256": cfg_hash,
        "question_sha256": q_hash,
        "versions": VERSIONS,
        "artifacts": {
            "run_manifest_json": "run_manifest.json",
            "graph_spec_json": "graph_spec.json",
            "final_state_json": "final_state.json",
        },
        "controls": {
            "deterministic_seed": 8,
            "state_driven_routing": True,
            "bounded_loop": True,
            "explicit_end_node": True,
            "evidence_only_policy": True,
            "no_hidden_memory": True,
        },
        "outcome": {
            "termination_reason": final_state.get("termination_reason",""),
            "loop_iters_executed": int(final_state.get("loop_iter", 0)) + 1,
            "citations_n": len(final_state.get("citations", [])),
            "open_items_n": len(final_state.get("open_items", [])),
        },
    }

graph_spec = build_graph_spec()
run_manifest = build_run_manifest(final_state)

final_state_path = os.path.join(OUT_DIR, "final_state.json")
graph_spec_path = os.path.join(OUT_DIR, "graph_spec.json")
run_manifest_path = os.path.join(OUT_DIR, "run_manifest.json")

with open(final_state_path, "w", encoding="utf-8") as f:
    f.write(_json_dumps(final_state))

with open(graph_spec_path, "w", encoding="utf-8") as f:
    f.write(_json_dumps(graph_spec))

with open(run_manifest_path, "w", encoding="utf-8") as f:
    f.write(_json_dumps(run_manifest))

print("Wrote artifacts:")
print("-", run_manifest_path)
print("-", graph_spec_path)
print("-", final_state_path)


Wrote artifacts:
- /content/outputs_notebook_8/run_manifest.json
- /content/outputs_notebook_8/graph_spec.json
- /content/outputs_notebook_8/final_state.json


##10.AUDITR BUNDLE

###10.1.OVERVIEW

**Cell 10 — Determinism and stability check: why it matters and what it tells us**

Cell 10 performs a determinism check by rerunning the workflow with the same question and configuration and comparing stable fingerprints of the final states. This is not about claiming perfect determinism in all conditions. It is about demonstrating that the workflow is stable enough to be reviewed and that changes are detectable.

In practice, language models can introduce variability. We reduce that variability by setting temperature to 0.0 and by using a bounded, state-driven graph. But we still need a control that tells us: “Did the system behave consistently?” The fingerprinting function provides that control. It hashes the final state after removing timestamps from the trace (timestamps would naturally differ). By excluding timestamps, we focus the check on decisions and outputs: domain classification, retrieved chunks, citations, open items, and answer structure.

If the fingerprints match, we have evidence that the workflow is behaving consistently under repeated runs in the same environment. That is valuable for a committee because it increases confidence in the system as a tool for standardized first-pass diligence. If the fingerprints do not match, that is also useful: it tells us where we need stronger controls (for example, stricter router parsing, deterministic retrieval ordering, or tighter answer constraints).

This cell also prints artifact file sizes. That is a simple sanity check: the expected files exist and are non-empty. In production, you might validate schemas, sign artifacts, or store them in an immutable bucket, but in a notebook demo the size check is a lightweight confirmation.

Conceptually, Cell 10 reinforces the governance-first mindset: we do not just “run AI and hope.” We test the behavior, we measure consistency, and we create signals that can be monitored. Over time, this becomes part of model risk management: you can track drift in outputs, compare runs across versions, and detect changes in routing or retrieval behavior.

For your committee, Cell 10 gives you a concrete statement: **“We built this as a controlled system, and we can demonstrate stability and reproducibility characteristics—at least within this bounded notebook setting.”**


###10.2.CODE AND IMPLEMENTATION

In [10]:
# CELL 10/10 — Determinism check (same input, same config): compare final_state hashes + show artifact sizes

def state_fingerprint(s: DiligenceState) -> str:
    # Exclude timestamps to make determinism test meaningful
    s2 = dict(s)
    trace = []
    for row in s2.get("trace", []):
        row2 = dict(row)
        row2.pop("ts_utc", None)
        trace.append(row2)
    s2["trace"] = trace
    return _sha256(_json_dumps(s2))

# Run again with the same question
state2: DiligenceState = {"question": QUESTION, "trace": [], "loop_iter": 0}
final_state2 = compiled.invoke(state2)

fp1 = state_fingerprint(final_state)
fp2 = state_fingerprint(final_state2)

print("Determinism fingerprints:")
print("fp1:", fp1)
print("fp2:", fp2)
print("MATCH:", fp1 == fp2)

def _filesize(path: str) -> int:
    try:
        return os.path.getsize(path)
    except Exception:
        return -1

print("\nArtifact sizes (bytes):")
for p in [run_manifest_path, graph_spec_path, final_state_path]:
    print(os.path.basename(p), _filesize(p))

print("\nDone. Notebook 8 outputs are in:", OUT_DIR)


Determinism fingerprints:
fp1: 80e6a57b726f2918345894b2986e2abae798e49c34010fc3c82ceb70704566bd
fp2: 8372c3fd25f225adf3b2de5311164d0c44cd00c6af03918946198ef3b6c2a440
MATCH: False

Artifact sizes (bytes):
run_manifest.json 1272
graph_spec.json 1262
final_state.json 5384

Done. Notebook 8 outputs are in: /content/outputs_notebook_8


##11.CONCLUSION

**Conclusion — What we built, what it proves, and how to improve it**

This notebook demonstrates a specific, committee-relevant claim: **AI can support M&A due diligence when it is embedded inside a governed workflow.** The model here is not “a chatbot that answers questions.” It is a **state-driven diligence machine** with explicit steps: route the question to a diligence domain, retrieve the most relevant evidence from the document set, draft an answer constrained to that evidence, and then run a bounded gap check that either stops with “evidence OK” or escalates remaining uncertainties as open items. The core output is not just the prose answer; it is the combination of **answer + citations + open items + trace + exported artifacts**. That combination is what makes it usable in a professional environment, because it turns AI output into something that can be reviewed, challenged, and refined like any other diligence work product.

The most important contribution of Notebook 8 is the architectural dimension added: **router + retrieval**. This is the bridge between “language model” and “deal work.” In real diligence, analysts do not start by drafting; they start by asking, “Where in the data room does this live?” The router node formalizes that first decision, and the retrieval node formalizes the second decision: “Which specific clauses or sections are relevant?” Only then do we allow synthesis. The result is a model that behaves more like an analyst team: it does not pretend to know everything; it assembles an evidence pack and then produces a structured memo with explicit gaps.

There are clear, concrete ways to improve this model while preserving its governance-first design. **First**, upgrade retrieval. The current token-overlap scoring is intentionally simple and auditable, but production diligence benefits from hybrid retrieval: embeddings + keyword filters + metadata constraints (document version, confidentiality tier, deal phase). This can be added without changing the graph topology: the RetrievalNode becomes a pluggable component with a logged scoring explanation. **Second**, add document governance. In real deals, documents have versions, redlines, and superseded drafts. A “Document Registry” node can enforce “latest version” selection, detect conflicts across versions, and attach document hashes for audit. **Third**, add a “Risk Register” node that converts answers and open items into a structured risk log: risk statement, severity, likelihood proxy, owner, mitigation (e.g., escrow, special indemnity, covenant), and linkage to SPA clauses. **Fourth**, add human gates. For committee-grade readiness, introduce an explicit approval node: “Analyst review required” or “Counsel review required” depending on domain and confidence, with a hard END if approval is not granted. **Fifth**, extend the bounded loop from “broaden retrieval” into “targeted follow-ups.” Instead of a generic second pass, the model can generate precise evidence requests: which exhibit, which schedule, which management question—then re-run once those items are provided. That transforms open items into a controlled action workflow.

Compared with previous notebooks in this series, the logic and functionality here shift materially. Notebook 1 (personal finance triage) introduced a conditional retry loop for missing information, but the loop was about eliciting inputs. Notebook 2 (suitability boundary) introduced hard branching and early termination—primarily a safety gate for what the system is allowed to do. Notebook 3 (credit memo) introduced a critique loop where a draft is improved through structured feedback. Notebook 4 (trading hypothesis + backtest wrapper) added tool augmentation—using the graph to wrap analysis steps around external computation. Notebook 5 introduced a stateful regime machine for execution tactics, and Notebook 6 added a parallel committee for portfolio decisions, with aggregation. Notebook 7 introduced hub-and-spoke generation for pitchbook sections.

Notebook 8 is different because the “hard part” is not generating content; it is **finding and proving.** The new capability is the explicit separation of responsibilities: **routing decides what to look at, retrieval decides what evidence is relevant, and drafting is constrained to what was found.** The loop is not a creative refinement loop; it is an evidence sufficiency loop. This is the first notebook where the system’s primary output is best understood as a diligence artifact: **a traceable answer bound to a data-room evidence pack** plus an escalation list. In that sense, Notebook 8 is the clearest step toward institutional adoption: it mirrors how real diligence is controlled—domain ownership, evidence discipline, and escalation—rather than how a general AI chat behaves.

The broader message for the committee is straightforward: this architecture is not trying to replace professional diligence. It is trying to **standardize the first-pass work**—routing, evidence retrieval, structured summarization, and gap surfacing—so senior reviewers spend their time on judgment, negotiation leverage, and risk trade-offs, not on manual searching and reformatting. With improved retrieval, document version governance, risk register generation, and human approval gates, the same topology can scale from a teaching prototype into a controlled diligence assistant that is faster, more consistent, and more auditable than an ungoverned “ask the model” workflow.
