#**AI LAW, CHAPTER 5. ORGANIZATIONS**

---

##0.REFERENCE

https://chatgpt.com/share/695fb8a4-501c-8012-b08b-906083964e03

##1.CONTEXT

This notebook is not “a chatbot in a browser.” It is a small, auditable workflow that treats generative AI the way a responsible legal team must treat it: as a fallible drafting and reasoning aid operating inside a controlled process, with a record of what happened, what was assumed, what is missing, what is risky, and what must be verified before anyone relies on the output. If you have only used chatbots in apps or on the internet, you are used to a simple interaction: you type a question, you get an answer, and you move on. That is fine for casual uses—summaries, brainstorming, or learning. It is not fine for legal work where confidentiality, privilege, accuracy, supervision, and professional responsibility are not “nice to have” features but baseline obligations. Chapter 5 matters because it shows the transition from “individual lawyer using AI” to “an organization using AI,” and that transition is where risk and governance either become real—or become expensive.

Here is what we are doing, precisely. We are simulating a “mini-firm” workflow: a structured pipeline that takes a matter from first contact to a controlled set of deliverables, with checkpoints and an audit trail. The pipeline has stages that mirror real legal operations: intake, conflicts/engagement checks, scope definition, workplan, drafting, quality assurance and red-teaming, sign-off preparation, and audit packaging. At each stage, the system produces outputs in a strict, structured format—JSON—so that we can reliably capture what the AI produced and what a supervising lawyer should review. We are not asking the AI to “be right.” We are forcing the AI to be explicit about what it knows, what it assumes, what it does not know, and what must be verified. That is the core difference between “chat” and “legal use.”

The model we call (Claude) is powerful, but it is still a probabilistic system. It can sound confident and still be wrong. It can follow instructions and still drift. It can produce a beautiful paragraph that includes a detail that was never provided. In ordinary chatbot usage, those limitations are annoying; in legal work, they can be harmful. They can mislead a client, undermine a filing, create a false record, or cause a lawyer to miss a critical fact. That is why this notebook begins with a safety envelope: you do not paste confidential client information; you redact by default; you label everything “Not verified”; and you treat outputs as drafts requiring human review. These are not “compliance decorations.” They are practical controls that reduce the chance that AI becomes the weak link in your professional obligations.

This notebook also differs from typical chatbot use because it is designed to be replayable and auditable. In a standard chat session, you have a conversation history, but it is not a structured operational record. In legal settings—especially organizations—what matters is not only the final text but how it was produced. What inputs were used? What instructions were given? Which model version was called? What risks were flagged? What steps were taken to check and correct the draft? This notebook generates governance artifacts automatically: a run manifest that records the model and parameters, a prompts log that records redacted inputs and output hashes, a risk log that aggregates flagged risks, and a deliverables folder that stores stage-by-stage outputs. At the end, it bundles everything into a single zip file you can retain as an “audit bundle.” The point is not surveillance; the point is accountability—so that a supervising attorney can review the work, understand what happened, and decide whether and how to rely on it.

A critical technical aspect of this notebook, and one of the main lessons from earlier chapters, is reliability in structured output. Large language models are trained to be conversational, which means they often add polite explanations (“Here is your JSON”) or wrap outputs in formatting. That breaks automated parsing and, in a workflow system, a broken parse is not a small nuisance—it stops the pipeline. In Chapter 5 we therefore enforce structured output using a technique that “steers” the model to complete a JSON object rather than write free-form text. In plain terms: instead of hoping the model behaves, we shape the interaction so it is much more likely to behave. This is an important organizational lesson. At Level 5, you do not rely on best intentions; you design controls that make failure less likely and make failures visible when they occur.

The mini-firm simulation also introduces the concept of “separation of concerns.” In real practice, intake is not drafting; conflicts checks are not strategy; QA is not the same as client communication. A single person may do multiple roles, but the roles remain distinct because each role has different risks and different required checks. This notebook encodes that separation. Intake produces a checklist and open questions. Conflicts produces questions and boundaries. Scope clarifies what is in and out. Workplan lays out steps and approval gates. Draft produces a first-pass work product appropriate to the domain. QA and red-team stress test the draft for missing facts, overconfidence, tone, and injection-style manipulation. Sign-off does not “rewrite the memo”; it creates a package for the lawyer to review and verify. Audit organizes the artifacts so the matter can be replayed and defended. That is the organizational maturity move: legal work is not only “writing.” It is process.

You will also see why we insist on “Not verified.” In casual chatbot usage, you might accept a plausible explanation as “good enough.” In legal work, plausible is dangerous. This notebook forces outputs to include “questions to verify” and prohibits invented authorities. If a rule, case, or statute is needed, the correct behavior is not to fabricate it; it is to identify that verification is required and ask for the source. This is how you operationalize candor and accuracy in AI-assisted workflows: you build the uncertainty into the artifact rather than leaving it in someone’s head.

Finally, Chapter 5 matters because it is the bridge from personal experimentation to institutional readiness. Many lawyers can use a chatbot to draft an email. That is Level 1. Some can use AI to structure analysis. That is Level 2. Some can build multi-step workflows. That is Level 3. Some can create reusable assets. That is Level 4. But an organization needs something more: it needs repeatability, boundaries, logging, and supervisory control. In other words, it needs a system that behaves like a firm behaves. Chapter 5 is the blueprint for that: a mini-firm simulation you can run, inspect, and improve. If you understand this notebook, you understand the difference between “AI as a helpful tool” and “AI as a governed operational capability”—and that difference is where legal organizations will either gain safe leverage or accumulate hidden risk.

If you take only one thing from this notebook, take this: in legal work, the value of AI is not just the words it produces. The value is a controlled workflow that makes drafts faster while making risks clearer, verification easier, and accountability non-negotiable. That is what we are building here.


##2.LIBRARIES AND ENVIRONMENT

In [10]:
# Cell 2 (Code)
# Goal: Install/imports + create run directory (timezone-aware UTC; no utcnow() deprecation warning)
# Output: prints run directory path

!pip -q install anthropic

import os, json, re, hashlib, platform, textwrap, traceback, subprocess
from datetime import datetime, timezone
from pathlib import Path

RUN_ID = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
RUN_DIR = Path(f"/content/ai_law_ch5_runs/run_{RUN_ID}")
DELIVER_DIR = RUN_DIR / "deliverables"
DELIVER_DIR.mkdir(parents=True, exist_ok=True)

print("Run directory:", str(RUN_DIR))


Run directory: /content/ai_law_ch5_runs/run_20260108T140237Z


##3.API SETUP AND CLIENT INITIALIZATION

###3.1.OVERVIEW

**API Key Setup and Client Initialization**

This section establishes the connection between your Google Colab notebook and the Anthropic API service. Think of it as setting up a phone line before making a call - you need the right credentials and connection details to communicate with Claude.

**What Happens in This Section**

First, the notebook retrieves your Anthropic API key from Google Colab's secure storage system called "Secrets". This is similar to retrieving a password from a password manager rather than writing it directly in your code. The key acts as your authorization credential, proving you have permission to use the Claude API service.

Next, the system stores this key in an environment variable. Environment variables are temporary storage locations that programs can access during their execution. This makes the key available to other parts of the notebook without repeatedly typing it.

Then, the code creates a "client" object using the Anthropic library. The client is your communication interface - it handles all the technical details of sending requests to Claude and receiving responses. Without this client, your notebook cannot interact with the AI model.

Finally, the section specifies which Claude model to use. In this notebook, we use Claude Haiku version four point five. Different models have different capabilities, speeds, and costs. Haiku is designed for efficiency while maintaining high quality output, making it suitable for production legal workflows where you need reliable performance.

**Why This Matters for Legal Practice**

For lawyers using AI tools, proper API initialization is a governance requirement, not just a technical step. The approach here demonstrates several best practices. First, keeping API keys in secure storage rather than hardcoding them prevents accidental exposure if you share the notebook. Second, explicitly declaring which model version you use creates an audit trail - six months later, you can verify exactly which AI system generated a particular output. Third, the error handling ensures you receive clear feedback if something goes wrong during setup, rather than mysterious failures later in the workflow.

**What You See When Running**

When this section executes successfully, you will see three confirmation messages indicating the API key loaded correctly, displaying the model name, and confirming the client initialized. If there is a problem, you will see an error message directing you to add your API key to Colab Secrets using the key icon in the sidebar. This immediate feedback helps you catch configuration issues before attempting to generate any legal assets.

**Connection to the Overall Workflow**

Everything that follows in this notebook depends on this initialization. Without a properly configured client and model specification, the asset generation pipeline cannot function. This section is the foundation that enables all subsequent governance-tracked AI interactions.

###3.2.CODE AND IMPLEMENTATION

In [11]:
# Cell 3 (Code)
# Goal: API key setup + Anthropic client initialization (explicit)
# Output: prints key loaded yes/no and model name

import anthropic
from google.colab import userdata

ANTHROPIC_API_KEY = userdata.get("ANTHROPIC_API_KEY")
if not ANTHROPIC_API_KEY:
    raise RuntimeError(
        "Missing ANTHROPIC_API_KEY. In Colab: Right panel → Secrets → add ANTHROPIC_API_KEY, then re-run Cell 3."
    )

os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_API_KEY

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

MODEL = "claude-haiku-4-5-20251001"  # REQUIRED by user
DEFAULT_TEMPERATURE = 0.1
RETRY_TEMPERATURE = 0.0
MIN_MAX_TOKENS = 1800

print("API key loaded:", "yes" if bool(ANTHROPIC_API_KEY) else "no")
print("Model:", MODEL)


API key loaded: yes
Model: claude-haiku-4-5-20251001


##4.GOVERNANCE UTILITIES

###4.1.OVERVIEW

**Cell 4: Governance Utilities and Run Artifacts (Why This Matters)**

**What Cell 4 does**  
Cell 4 creates the “paper trail” for the entire notebook run. It sets up a few small utilities that let the notebook write structured files and append records safely. Then it immediately generates the core governance artifacts: the run manifest, the prompt log file, the risk log file, and a dependency snapshot. From this point on, every important step in the pipeline can leave a trace that a supervising lawyer can review later.

**Why legal workflows need this, but casual chatbot use does not**  
When people use a chatbot in an app, they typically focus only on the final text. In legal work, that is not enough. If a draft influences client advice, a filing, or a negotiation, you need to be able to answer basic questions later: What model was used? What settings were used? What was asked? What risks were flagged? What did the system assume? A governed workflow does not rely on memory or screenshots. It produces a structured audit record by design.

**The run manifest: what it records and why it matters**  
The run manifest is a small JSON file that captures the “identity” of the run: when it happened, which chapter/pipeline is being executed, which model was used, and what key parameters were set. Think of it as the cover sheet for the entire run. If you rerun the notebook next week, the manifest helps you confirm whether you are truly reproducing the same conditions.

**The prompts log: why it exists and what it avoids**  
The prompts log stores one record per model call, but it is designed to be safe. The notebook logs only redacted content plus hashes, so you can verify integrity without storing sensitive client material. This log supports accountability: you can see what the system asked the model to do at each stage and confirm that the workflow followed the intended steps.

**The risk log: how it supports supervision**  
The risk log aggregates the risk flags produced across stages and matters. Instead of burying risks inside long drafts, the risk log makes them easy to review. This is essential for organizational use: it allows a supervisor to quickly spot high-severity issues like missing facts, confidentiality concerns, or hallucination risk.

**What you should see after Cell 4 runs**  
Cell 4 prints the file paths of the artifacts it created. This is your confirmation that the notebook is now operating as a controlled pipeline with a persistent audit trail, not as an ephemeral chat session.


###4.2.CODE AND IMPLEMENTATION

In [12]:
# Cell 4 (Code)
# Goal: Governance utilities + write run_manifest + init logs (+ pip_freeze.txt)
# Output: prints artifact file paths

def now_iso():
    return datetime.utcnow().replace(microsecond=0).isoformat() + "Z"

def sha256_text(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8", errors="ignore")).hexdigest()

def write_json(path: Path, obj) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(json.dumps(obj, indent=2, ensure_ascii=False))

def append_jsonl(path: Path, record: dict) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    with path.open("a", encoding="utf-8") as f:
        f.write(json.dumps(record, ensure_ascii=False) + "\n")

MANIFEST_PATH = RUN_DIR / "run_manifest.json"
PROMPTS_LOG_PATH = RUN_DIR / "prompts_log.jsonl"
RISK_LOG_PATH = RUN_DIR / "risk_log.json"
PIP_FREEZE_PATH = RUN_DIR / "pip_freeze.txt"

# pip freeze for reproducibility
try:
    freeze_txt = subprocess.check_output(["pip", "freeze"], text=True)
except Exception:
    freeze_txt = "pip freeze failed"
PIP_FREEZE_PATH.write_text(freeze_txt)

run_manifest = {
    "run_id": RUN_ID,
    "timestamp_utc": now_iso(),
    "chapter": "Chapter 5 — Level 5 (Organizations)",
    "model": MODEL,
    "params": {
        "temperature_default": DEFAULT_TEMPERATURE,
        "temperature_retry": RETRY_TEMPERATURE,
        "max_tokens_min": MIN_MAX_TOKENS
    },
    "environment": {
        "python_version": platform.python_version(),
        "platform": platform.platform(),
    },
    "purpose": "Mini-firm simulation: intake → checks → workflow → QA → sign-off → audit (governance-first)."
}

write_json(MANIFEST_PATH, run_manifest)

# initialize logs
if not PROMPTS_LOG_PATH.exists():
    PROMPTS_LOG_PATH.write_text("", encoding="utf-8")
if not RISK_LOG_PATH.exists():
    write_json(RISK_LOG_PATH, {"run_id": RUN_ID, "timestamp_utc": now_iso(), "risks": []})

print("Created:")
print(" -", str(MANIFEST_PATH))
print(" -", str(PROMPTS_LOG_PATH))
print(" -", str(RISK_LOG_PATH))
print(" -", str(PIP_FREEZE_PATH))


Created:
 - /content/ai_law_ch5_runs/run_20260108T140237Z/run_manifest.json
 - /content/ai_law_ch5_runs/run_20260108T140237Z/prompts_log.jsonl
 - /content/ai_law_ch5_runs/run_20260108T140237Z/risk_log.json
 - /content/ai_law_ch5_runs/run_20260108T140237Z/pip_freeze.txt


  return datetime.utcnow().replace(microsecond=0).isoformat() + "Z"


##5.REDACTION AND MINIMUM NECESSARYN INTAKE HELPERS

###5.1.OVERVIEW

**Redaction and Minimum-Necessary Intake Utilities**

This section implements privacy protection mechanisms that prevent sensitive client information from being inadvertently exposed during AI interactions. Think of it as establishing attorney-client privilege safeguards before handling confidential materials - you create protective barriers first, then work within those boundaries.

**Understanding the Redaction Function**

The redaction function scans text for common patterns of personally identifiable information and replaces them with placeholder labels. Specifically, it searches for email addresses, telephone numbers in United States formats, Social Security numbers, and street addresses. When it finds these patterns, it substitutes them with labels like EMAIL REDACTED or PHONE REDACTED. The function also tracks what types of information it removed, returning both the cleaned text and a list of redacted categories.

**How Pattern Matching Works**

The redaction uses what programmers call regular expressions, which are sophisticated search patterns that can identify text structures. For email addresses, the pattern looks for words, followed by the at symbol, followed by more words, a dot, and a domain extension. For phone numbers, it searches for three digits, possibly a separator like a dash or dot, three more digits, another separator, and four final digits. For Social Security numbers, it expects exactly the format three digits dash two digits dash four digits. For street addresses, it looks for a number followed by capitalized words ending with street types like Street, Avenue, Road, or Boulevard.

These patterns catch standard formatting reliably. If someone writes five five five dash one two three dash four five six seven, the pattern recognizes it as a phone number. If someone writes john dot smith at example dot com, the pattern recognizes it as an email. This automation means you don't have to manually review every piece of text hunting for identifiers to remove.

**Critical Limitations You Must Understand**

The notebook explicitly and repeatedly warns that redaction is imperfect and operates on a best-effort basis. Pattern-based redaction cannot catch everything. Names standing alone without accompanying identifiers will pass through undetected. If your text says "Maria Torres called yesterday," the redaction function has no way to know that Maria Torres is a real client rather than a hypothetical example. Case-specific facts that could identify individuals but don't match standard patterns will also pass through. A unique medical condition, an unusual business transaction, or a distinctive family situation might identify someone even without standard identifiers present.

Information in unusual formats escapes pattern matching. If someone writes their phone number as five five five one two three four five six seven with no separators, the pattern looking for digit groups with separators will miss it. If an email uses an unusual format or includes special characters, the standard email pattern might not match. If an address is abbreviated or formatted non-standardly, the address pattern might not catch it.

**Why This Approach Despite Limitations**

Given these limitations, why use pattern-based redaction at all? Because it provides a meaningful safety net that catches the most common identifiers in their most common formats. Most people write phone numbers with separators. Most people format Social Security numbers with dashes. Most people write email addresses in standard format. Catching these common cases prevents the majority of accidental exposures.

The alternative would be no automated protection at all, relying entirely on human vigilance to avoid entering sensitive information. Human attention wavers, especially during repetitive tasks or long work sessions. Automated redaction provides consistent protection that never gets tired or distracted. It's not perfect, but it's substantially better than nothing.

**The Demonstration**

The section runs a live demonstration using deliberately fake data containing all four protected identifier types. You see the original text with an email address, phone number, street address, and Social Security number clearly visible. Then you see the redacted version with each identifier replaced by its corresponding placeholder. Finally you see a summary listing which categories were removed: emails, phone numbers, Social Security numbers, and addresses.

This concrete example serves multiple purposes. First, it proves the redaction function actually works as described. Second, it shows you exactly what the placeholders look like so you can recognize them in later outputs. Third, it demonstrates the reporting mechanism that tells you what was removed. Fourth, it reinforces through visual example what the function can and cannot do.

**Minimum-Necessary Fields Function**

The section also includes a utility for filtering data dictionaries to keep only required fields. This implements the principle of data minimization, sending only the minimum information necessary to accomplish the task. If you have a data structure with twenty fields describing a client, but only five of those fields are relevant to generating a particular asset, this function strips away the unnecessary fifteen fields.

Why does this matter? Because every piece of information sent to an external API represents potential exposure risk. Even with strong contractual protections and technical safeguards, minimizing what you share reduces risk. If the five fields you need don't include the client's full name, home address, or financial details, those sensitive items never leave your control at all.

**Integration Into the Pipeline**

This redaction infrastructure doesn't exist in isolation. It gets invoked repeatedly throughout the entire pipeline. Before sending case facts to the API, the generate asset function applies redaction. Before logging prompts to the prompts log file, the logging function applies redaction. Before writing responses to disk, the system applies redaction. This layered approach creates multiple opportunities to catch and remove sensitive information before it could be exposed.

**Warning Messages**

The section concludes with a prominent warning message displayed every time it runs: "WARNING: Redaction is imperfect. Do NOT paste sensitive client data." This repeated warning serves a crucial function. It prevents complacency. Every time you run the notebook, you see this reminder that the technical safeguard has limits. This keeps the limitation front of mind rather than letting you forget about it after the first explanation.

**Professional Responsibility Context**

For lawyers, this section addresses a fundamental ethical tension. You need factual context for AI tools to generate useful outputs. Generic requests produce generic results. Specific requests with concrete details produce practical results. But providing those specific details risks exposing confidential client information, violating your duty to protect client confidences and potentially waiving attorney-client privilege.

The redaction approach here represents one strategy for managing this tension. Provide enough specificity for usefulness, but automatically scrub obvious identifiers before transmission. However, the warnings emphasize that technology alone cannot ensure confidentiality. Sound professional judgment about what information to include remains essential. Some matters are too sensitive for any AI involvement regardless of technical safeguards. Some facts are too uniquely identifying even without standard identifiers. Lawyers must exercise judgment rather than blindly trusting automated protection.

**Building Toward Asset Generation**

With this privacy protection layer in place, the notebook can proceed to actual asset generation with reduced confidentiality risk. The next sections will show API calls and structured outputs. Throughout all of that, you can trust that redaction occurred before any external transmission. This foundational privacy layer makes the subsequent workflow ethically defensible rather than reckless.

###5.2.CODE AND IMPLEMENTATION

In [5]:
# Cell 5 (Code)
# Goal: Redaction + minimum-necessary intake helper (never write unredacted to disk)
# Output: demo before/after with fake data

EMAIL_RE = re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Za-z]{2,}\b")
PHONE_RE = re.compile(r"(\+?\d{1,2}\s*)?(\(?\d{3}\)?[\s.-]?)?\d{3}[\s.-]?\d{4}\b")
SSN_RE = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")
# very rough address heuristic
ADDR_RE = re.compile(r"\b\d{1,6}\s+[A-Za-z0-9.\- ]+\s+(Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Lane|Ln|Drive|Dr)\b", re.IGNORECASE)

def redact(text: str):
    removed = {"emails": [], "phones": [], "ssns": [], "addresses": []}
    t = text

    removed["emails"] = EMAIL_RE.findall(t)
    t = EMAIL_RE.sub("[REDACTED_EMAIL]", t)

    removed["phones"] = PHONE_RE.findall(t)
    t = PHONE_RE.sub("[REDACTED_PHONE]", t)

    removed["ssns"] = SSN_RE.findall(t)
    t = SSN_RE.sub("[REDACTED_SSN]", t)

    removed["addresses"] = ADDR_RE.findall(t)
    t = ADDR_RE.sub("[REDACTED_ADDRESS]", t)

    # normalize removed fields (phones regex returns tuples)
    removed["phones"] = ["".join(p).strip() for p in removed["phones"] if "".join(p).strip()]
    return t, removed

def minimum_necessary_facts(raw_text: str):
    redacted_text, removed = redact(raw_text)
    # Simple heuristic: split into short bullets
    bullets = [b.strip("- ").strip() for b in re.split(r"[\n•]+", redacted_text) if b.strip()]
    bullets = bullets[:10]
    return {"redacted_text": redacted_text, "facts_bullets": bullets, "removed_fields": removed}

# Demo with fake data (safe)
demo = "Client John Doe, email john@acme.com, phone (212) 555-1212, SSN 123-45-6789, address 10 Main Street."
mn = minimum_necessary_facts(demo)
print("BEFORE:\n", demo)
print("\nAFTER (REDACTED):\n", mn["redacted_text"])
print("\nREMOVED FIELDS:\n", json.dumps(mn["removed_fields"], indent=2))


BEFORE:
 Client John Doe, email john@acme.com, phone (212) 555-1212, SSN 123-45-6789, address 10 Main Street.

AFTER (REDACTED):
 Client John Doe, email [REDACTED_EMAIL], phone [REDACTED_PHONE], SSN [REDACTED_SSN], address [REDACTED_ADDRESS].

REMOVED FIELDS:
 {
  "emails": [
    "john@acme.com"
  ],
  "phones": [
    "(212)"
  ],
  "ssns": [
    "123-45-6789"
  ],
  "addresses": [
    "Street"
  ]
}


##6.CLAUDE WRAPPER

###6.1.OVERVIEW

**Cell 6: The Reliable “JSON-Only” Claude Wrapper (Why This Matters Most)**

**What Cell 6 does**  
Cell 6 builds the single most important piece of the notebook: the function that calls Claude and reliably returns a structured result the rest of the pipeline can use. Every later stage—intake, conflicts, scope, drafting, QA, sign-off, and audit—depends on this wrapper. If the wrapper is unreliable, the entire pipeline collapses because downstream steps cannot safely parse or trust the model’s output format.

**Why we had problems in earlier chapters**  
In earlier notebooks, Claude often responded like a helpful human assistant: it added “Here is the JSON,” inserted explanations, or wrapped the JSON in formatting. Even when we asked for strict JSON, the model’s conversational instincts sometimes won. That behavior is harmless in a chat window, but it is damaging in a workflow system because automated processing needs predictable structure.

**The key fix: the prefill technique**  
Cell 6 uses a “prefill” technique to force structure. Instead of asking Claude to start from scratch, we begin the assistant’s response with a single opening curly brace. This gently pushes the model into “completion mode,” so it continues the JSON object rather than drifting into conversation. The notebook then reconstructs the full JSON and parses it. In practice, this dramatically increases first-attempt success and reduces the need for retries.

**Defense-in-depth: retries and extraction fallbacks**  
Even with prefill, the wrapper includes backup protections. If parsing fails, it retries the call with stricter instructions and lower randomness. If that still fails, it uses a few extraction strategies that try to recover a JSON object from text. The goal is not to be clever; the goal is to keep the pipeline stable. A stable pipeline is safer than a brittle one, because brittle systems encourage users to bypass controls when things break.

**Schema validation: preventing “almost-correct” outputs**  
Cell 6 also checks that the response includes exactly the required fields and nothing extra. This matters because “almost correct” JSON can hide missing sections, omitted risks, or a missing verification list. Schema validation ensures every stage output contains the same supervisory structure: facts, assumptions, open questions, controls applied, stage output, handoff, risks, and verification tasks.

**Logging and risk capture: making supervision possible**  
Cell 6 writes a record to the prompts log for each call and updates the risk log with any flagged risks. This is how apparent “drafting” becomes an organizational process. The logs are not optional extras; they are what allow a reviewing lawyer to see what the model did, what could be wrong, and what must be verified.

**The smoke test: proving the wrapper works before the pipeline runs**  
At the end of Cell 6, the notebook runs a small smoke test that confirms the wrapper can produce valid, schema-compliant JSON. This is a practical safeguard: it catches format failures early, before you run multiple matters through multiple stages.

**Why Cell 6 is essential for legal use**  
Legal workflows require repeatability, structure, and reviewability. Cell 6 is the engineering step that turns a conversational model into a controlled component inside a supervised pipeline. Without it, you do not have a workflow—you have a chat session dressed up as a notebook.


###6.2.CODE AND IMPLEMENTATION

In [6]:
# Cell 6 (Code) — CRITICAL RELIABILITY
# Goal: Prefill-enforced JSON wrapper (fixes Claude non-JSON behavior) + extraction fallback + schema validation + retries + smoke test
# Output: PASS/FAIL diagnostics

# ---- STRICT SCHEMA (Chapter 5 “Organization pipeline stage output”) ----
REQUIRED_KEYS = [
    "task",
    "stage",
    "matter_id",
    "facts_provided",
    "assumptions",
    "open_questions",
    "controls_applied",
    "stage_output",
    "handoff",
    "risks",
    "verification_status",
    "questions_to_verify"
]

def schema_is_valid(obj: dict):
    if not isinstance(obj, dict):
        return False, "Parsed object is not a dict"
    keys = list(obj.keys())
    missing = [k for k in REQUIRED_KEYS if k not in obj]
    extra = [k for k in keys if k not in REQUIRED_KEYS]
    if missing:
        return False, f"Missing keys: {missing}"
    if extra:
        return False, f"Extra keys not allowed: {extra}"
    # minimal type checks
    if obj.get("verification_status") != "Not verified":
        return False, "verification_status must be 'Not verified'"
    return True, "ok"

def extract_json_from_text(text: str):
    # Strategy 1: as-is
    try:
        return json.loads(text)
    except Exception:
        pass

    # Strategy 3: strip fenced blocks first (common failure mode)
    fenced = re.search(r"```(?:json)?\s*(\{.*\})\s*```", text, flags=re.DOTALL)
    if fenced:
        try:
            return json.loads(fenced.group(1))
        except Exception:
            pass

    # Strategy 2: first { to last }
    i = text.find("{")
    j = text.rfind("}")
    if i != -1 and j != -1 and j > i:
        candidate = text[i:j+1]
        try:
            return json.loads(candidate)
        except Exception:
            pass

    # Strategy 4: bracket balancing scan
    start = text.find("{")
    if start == -1:
        return None
    depth = 0
    for idx in range(start, len(text)):
        ch = text[idx]
        if ch == "{":
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0:
                candidate = text[start:idx+1]
                try:
                    return json.loads(candidate)
                except Exception:
                    return None
    return None

SYSTEM_PROMPT = textwrap.dedent(f"""
You are a legal workflow assistant for US lawyers.

CRITICAL: Output ONLY a JSON object. No prose. No markdown. No code fences.
Your response must start with {{ and end with }}. Nothing before, nothing after.

You MUST follow this exact JSON schema (no extra keys):
{{
  "task": "...",
  "stage": "intake|conflicts|scope|workplan|draft|qa|signoff|audit",
  "matter_id": "...",
  "facts_provided": ["..."],
  "assumptions": ["..."],
  "open_questions": ["..."],
  "controls_applied": ["..."],
  "stage_output": {{
    "type": "checklist|memo|email|plan|package",
    "summary": "...",
    "items": ["..."]
  }},
  "handoff": {{
    "next_stage": "...",
    "needs_human_approval": true,
    "approval_question": "...",
    "stop_if": ["..."]
  }},
  "risks": [
    {{"type":"confidentiality|privilege|hallucination|missing_facts|unauthorized_practice|overconfidence|prompt_injection|tone|other",
      "severity":"low|medium|high",
      "note":"..."}}
  ],
  "verification_status": "Not verified",
  "questions_to_verify": ["..."]
}}

Rules:
- verification_status must always be "Not verified"
- Never invent legal authorities or citations
- Put missing facts in open_questions and verification tasks in questions_to_verify
- stage_output.summary must be concise (<= 120 words)
""").strip()

def call_claude_json_prefill(*, task: str, stage: str, matter_id: str, facts: list, user_instruction: str,
                            max_tokens: int = MIN_MAX_TOKENS):
    """
    PREFILL TECHNIQUE:
    We add an assistant message content "{" so Claude continues the JSON object rather than chatting.
    We then prepend "{" back before parsing.
    """
    # Minimal user prompt (do NOT over-instruct; system enforces schema)
    redacted_instruction, _ = redact(user_instruction)
    payload = (
        f"Task: {task}\n"
        f"Stage: {stage}\n"
        f"Matter ID: {matter_id}\n"
        f"Facts:\n- " + "\n- ".join([str(x) for x in facts]) + "\n"
        f"Instruction: {redacted_instruction}\n"
        f"Return ONLY JSON."
    )

    attempts = [
        {"temperature": DEFAULT_TEMPERATURE, "prefix": ""},
        {"temperature": RETRY_TEMPERATURE, "prefix": "OUTPUT ONLY JSON. NO TEXT. "},
        {"temperature": RETRY_TEMPERATURE, "prefix": "OUTPUT ONLY JSON. NO TEXT. "}
    ]

    last_error = None
    for attempt_idx, a in enumerate(attempts, start=1):
        t = a["temperature"]
        prefix = a["prefix"]

        messages = [
            {"role": "user", "content": prefix + payload},
            {"role": "assistant", "content": "{"}  # PREFILL
        ]

        try:
            resp = client.messages.create(
                model=MODEL,
                max_tokens=max_tokens,
                temperature=t,
                system=SYSTEM_PROMPT,
                messages=messages,
            )

            # Anthropic SDK: resp.content is a list of content blocks; join text blocks
            text_out = ""
            for block in resp.content:
                if getattr(block, "type", None) == "text":
                    text_out += block.text

            # Reconstruct full JSON
            reconstructed = "{" + (text_out or "").strip()

            parsed = None
            try:
                parsed = json.loads(reconstructed)
            except Exception:
                parsed = extract_json_from_text(reconstructed) or extract_json_from_text(text_out or "")

            if parsed is None:
                raise ValueError("JSON parse failed (even after fallback extraction).")

            ok, msg = schema_is_valid(parsed)
            if not ok:
                raise ValueError(f"Schema validation failed: {msg}")

            # Log prompt/response (redacted only)
            append_jsonl(PROMPTS_LOG_PATH, {
                "ts_utc": now_iso(),
                "attempt": attempt_idx,
                "temperature": t,
                "task": task,
                "stage": stage,
                "matter_id": matter_id,
                "prompt_redacted": payload,
                "prompt_hash": sha256_text(payload),
                "response_hash": sha256_text(json.dumps(parsed, ensure_ascii=False)),
            })

            # Update risk log
            risk_log = json.loads(RISK_LOG_PATH.read_text(encoding="utf-8"))
            for r in parsed.get("risks", []):
                risk_log["risks"].append({
                    "ts_utc": now_iso(),
                    "matter_id": matter_id,
                    "stage": stage,
                    **r
                })
            write_json(RISK_LOG_PATH, risk_log)

            return parsed

        except Exception as e:
            last_error = f"Attempt {attempt_idx} failed: {repr(e)}"
            # log failure minimally (redacted)
            append_jsonl(PROMPTS_LOG_PATH, {
                "ts_utc": now_iso(),
                "attempt": attempt_idx,
                "temperature": t,
                "task": task,
                "stage": stage,
                "matter_id": matter_id,
                "prompt_redacted": payload,
                "prompt_hash": sha256_text(payload),
                "error": last_error
            })

    # Error fallback (MUST still match schema)
    fallback = {
        "task": task,
        "stage": stage,
        "matter_id": matter_id,
        "facts_provided": facts,
        "assumptions": [],
        "open_questions": ["System error: JSON generation/parsing failed. Re-run with simpler facts or smaller content request."],
        "controls_applied": ["prefill_json_enforcement", "retry_logic", "schema_validation"],
        "stage_output": {
            "type": "memo",
            "summary": "Error fallback generated. This stage did not complete successfully.",
            "items": [
                "This is a draft only. Not legal advice. Human lawyer review required.",
                f"Error: {last_error}"
            ]
        },
        "handoff": {
            "next_stage": "audit",
            "needs_human_approval": True,
            "approval_question": "Do you want to retry this stage with fewer facts and a shorter instruction?",
            "stop_if": ["Output remains invalid JSON after retry."]
        },
        "risks": [
            {"type": "other", "severity": "high", "note": f"JSON_PARSE_ERROR: {last_error}"}
        ],
        "verification_status": "Not verified",
        "questions_to_verify": []
    }

    # add to risk log
    risk_log = json.loads(RISK_LOG_PATH.read_text(encoding="utf-8"))
    for r in fallback["risks"]:
        risk_log["risks"].append({"ts_utc": now_iso(), "matter_id": matter_id, "stage": stage, **r})
    write_json(RISK_LOG_PATH, risk_log)

    return fallback

# ---- Smoke test (fast, minimal) ----
print("=" * 70)
print("SMOKE TEST (Cell 6): Prefill JSON enforcement")
print("=" * 70)

smoke = call_claude_json_prefill(
    task="smoke_test",
    stage="intake",
    matter_id="SMOKE-001",
    facts=["Client sent a contract yesterday for review.", "No confidential details provided."],
    user_instruction="Create an intake checklist and 3 open questions. Keep stage_output.summary short."
)

ok, msg = schema_is_valid(smoke)
print("Smoke test schema valid:", ok, "|", msg)
print("Stage:", smoke.get("stage"), "| Matter:", smoke.get("matter_id"))
print("Summary preview:", (smoke.get("stage_output", {}).get("summary", "")[:120] + "..."))
print("=" * 70)


SMOKE TEST (Cell 6): Prefill JSON enforcement
Smoke test schema valid: True | ok
Stage: intake | Matter: SMOKE-001
Summary preview: Initial intake for contract review. Client has submitted document but scope and details remain undefined. Proceed to con...


  return datetime.utcnow().replace(microsecond=0).isoformat() + "Z"


##7.CASE BUILDERS

###7.1.OVERVIEW

**Cell 7: Defining the “Mini-Firm” Pipeline and the Matters We Will Run**

**What Cell 7 does**  
Cell 7 is where the notebook becomes a true organizational simulation. Instead of treating AI as a single-step drafting tool, we define a full workflow that resembles how a small firm or legal team actually operates. Cell 7 creates two essential things: the pipeline stages we will follow, and the set of matters (mini-cases) that will run through those stages.

**The pipeline stages: the structure of the mini-firm**  
Cell 7 lays out the stages in a clear sequence. Each stage represents a different responsibility that, in real practice, might be handled by different people or checked by different controls. The stages are organized so that the system does not jump to conclusions too early. We begin by gathering information, then we check boundaries and risk, then we plan, then we draft, then we test quality, then we prepare for lawyer sign-off, and finally we create the audit bundle. This sequence is intentional: it prevents “draft-first thinking,” which is one of the main ways AI becomes risky in legal work.

**The matters: why we use recurring mini-cases**  
Cell 7 also defines the matters the pipeline will process. We keep the same four domains across the entire book—Criminal, Regulatory/Administrative, International, and Teaching/Academia—so you can see how capabilities evolve across levels. At Level 5, the important shift is that each matter is no longer a single prompt. Each matter becomes a structured file of facts that the pipeline can process repeatedly and consistently.

**Why the facts are concrete and bounded**  
In earlier chapters, vague facts and open-ended prompts made it too easy for the model to “fill in blanks” and produce confident-sounding content that was not grounded. In Cell 7, each matter includes concrete details, but still leaves some information missing. This is deliberate. We want the model to surface missing facts in the “open questions” field rather than quietly inventing them. This is one of the most important governance behaviors for legal AI.

**Why the stage instructions are minimal**  
Cell 7 sets short, stage-specific instructions instead of long, complicated directions. The strict structure is enforced by the system prompt and schema in Cell 6, not by long user prompts. This design avoids triggering the model’s “explanation mode” and makes it more likely to produce clean, machine-readable outputs.

**What you should see after Cell 7 runs**  
Cell 7 prints the pipeline stages and confirms the matters are loaded with their IDs and domains. This is your checkpoint that the notebook has a defined workflow and defined inputs. From here, Cell 8 can run the entire mini-firm simulation in a predictable, repeatable way.


###7.2.CODE AND IMPLEMENTATION

In [7]:
# Cell 7 (Code)
# Goal: Define the Chapter 5 mini-firm pipeline + 4 recurring matters (minimal prompts)
# Output: prints matter IDs and pipeline stages

PIPELINE_STAGES = ["intake", "conflicts", "scope", "workplan", "draft", "qa", "signoff", "audit"]

def build_matter(case_id: str, domain: str, facts: list):
    return {
        "matter_id": case_id,
        "domain": domain,
        "facts": facts
    }

# Four recurring matters (sanitized, concrete, bounded)
MATTERS = [
    build_matter(
        "CRIM-001",
        "Criminal",
        [
            "Defendant: Alex R. (redacted), first-time felony charge alleged (details unknown).",
            "Court date in 10 days; counsel must prepare for bail/conditions request.",
            "Employment: full-time (3 years) and stable housing; family in jurisdiction.",
            "Prior record: unknown; immigration status: unknown.",
            "Client’s main concern: avoid detention; keep job; comply with conditions."
        ]
    ),
    build_matter(
        "REG-001",
        "Regulatory/Administrative",
        [
            "Client: First Community Bank ($2B assets; 15 branches).",
            "Proposed rule summary provided internally (not quoted): new reporting + controls within 120 days.",
            "Implementation cost estimate: $450k–$700k first year; 2 FTE ongoing.",
            "Vendor dependency: core banking vendor lead time 6–9 months.",
            "Goal: prepare comment strategy + verification list (no legal citations)."
        ]
    ),
    build_matter(
        "INTL-001",
        "International",
        [
            "Client: US services company contracting with vendor in another country (country not specified).",
            "Contract value: $1.2M/year; payment net-30; deliverables monthly.",
            "Prior dispute with vendor over delays (informal emails only).",
            "Client prefers faster dispute resolution; cost-sensitive; wants predictable enforcement.",
            "Need options: governing law + forum/arbitration (must be 'Not verified')."
        ]
    ),
    build_matter(
        "TEACH-001",
        "Teaching/Academia",
        [
            "Course: Contracts (upper-level), 60 students, mixed assessments (memo + exam).",
            "Instructor wants AI allowed for brainstorming but not for final graded submissions.",
            "Existing honor code exists (not quoted).",
            "Concern: fairness, disclosure, enforceability, and clear edge-case handling.",
            "Goal: produce policy + FAQ + enforcement checklist."
        ]
    ),
]

# Minimal per-stage instruction patterns (keep short to avoid “explanatory mode”)
STAGE_INSTRUCTIONS = {
    "intake":   "Create a structured intake checklist and 5 open questions. No citations.",
    "conflicts":"Create a conflicts/engagement-risk checklist + 5 questions to verify conflicts and scope boundaries.",
    "scope":    "Draft a scope statement (bullet items) + out-of-scope list + assumptions; keep it practical.",
    "workplan": "Create a 7-step workplan with human approval gates and audit artifacts to save at each step.",
    "draft":    "Produce a draft work product appropriate to the domain (<= 350 words) with disclaimer included.",
    "qa":       "Run QA + red-team: list weakest links, missing facts, and prompt-injection risks; propose fixes.",
    "signoff":  "Create a sign-off package: what the lawyer must review, what must be verified, and client-safe messaging notes.",
    "audit":    "Create an audit bundle index: list all artifacts produced, hashes to compute, and replay instructions."
}

print("Pipeline stages:", PIPELINE_STAGES)
print("\nMatters loaded:")
for m in MATTERS:
    print("-", m["matter_id"], "|", m["domain"])


Pipeline stages: ['intake', 'conflicts', 'scope', 'workplan', 'draft', 'qa', 'signoff', 'audit']

Matters loaded:
- CRIM-001 | Criminal
- REG-001 | Regulatory/Administrative
- INTL-001 | International
- TEACH-001 | Teaching/Academia


##8.EXECUTION

###8.1.OVERVIEW

**Cell 8: Executing the Full Mini-Firm Workflow and Saving Stage-by-Stage Deliverables**

**What Cell 8 does**  
Cell 8 is where the notebook actually “runs the firm.” It takes each matter defined in Cell 7 and pushes it through every stage of the pipeline in order. For each matter, it calls the Claude wrapper from Cell 6 at each stage, receives a structured JSON result, and saves that result as a durable deliverable. The output is not a single document; it is a complete sequence of artifacts that shows how the final work product was produced and reviewed.

**How the notebook runs matters through stages**  
Cell 8 loops over matters one by one. For each matter, it runs stage-by-stage: intake, conflicts, scope, workplan, drafting, QA/red-team, sign-off preparation, and audit packaging. At each stage, it uses the same pattern: it sends the matter’s facts plus a short stage instruction, and it requires the model to return a strictly structured response. This consistency is the point. The goal is a repeatable operational workflow, not an ad hoc conversation.

**What gets saved at every stage and why it matters**  
For each stage, Cell 8 saves two files: a structured JSON file and a human-readable text file. The JSON file preserves the supervisory structure: facts, assumptions, open questions, controls applied, stage output, handoff instructions, risks, and verification tasks. The text file is designed for quick review by a lawyer who wants to read the output without opening JSON. Saving both formats ensures the work is both auditable and practical.

**Progress indicators and transparency**  
Cell 8 prints clear progress messages such as which matter is running and which stage is currently being processed. This is not cosmetic. In organizational workflows, visibility matters. If something breaks, you want to know exactly where it broke, and you want to see which outputs were already saved.

**Human-in-the-loop gates are simulated here**  
Cell 8 also demonstrates approval gates. In real legal operations, certain transitions require human confirmation, such as approving the plan before drafting or approving the sign-off package before client-facing use. In the notebook, these approvals are simulated with simple on/off settings. The important idea is not the mechanism; it is the discipline: the pipeline is designed to stop or pause when human review is required.

**Error handling: why the pipeline keeps going**  
Cell 8 is built to be resilient. If a stage fails for a particular matter—for example, due to a temporary API issue or an unexpected parsing problem—the notebook does not collapse. Instead, it catches the error, records it, generates a structured fallback deliverable, and continues. This is essential in real organizations. A brittle system encourages people to bypass controls when something goes wrong. A resilient system preserves governance even under failure.

**Statistics and final summary**  
At the end of the run, Cell 8 prints a summary table showing each matter’s status, how many stages completed, and the highest risk severity encountered. It also reports totals, such as how many model calls were made and how many stage runs succeeded or failed. This summary is the supervisor’s dashboard in miniature: a quick way to understand what happened and where attention is needed.

**What you should have after Cell 8 completes**  
You should end Cell 8 with a deliverables folder containing subfolders for each matter, and inside each subfolder a complete trail of stage outputs. This is the defining feature of Chapter 5: instead of a single draft, you have an organized, reviewable, replayable record of an AI-assisted legal workflow.


###8.2.CODE AND IMPLEMENTATION

In [8]:
# Cell 8 (Code)
# Goal: Run the full Chapter 5 “mini-firm” pipeline across 4 matters with robust error handling
# Output: progress indicators + summary table + paths to deliverables

def save_stage_outputs(matter_id: str, stage: str, obj: dict):
    case_dir = DELIVER_DIR / matter_id
    case_dir.mkdir(parents=True, exist_ok=True)

    json_path = case_dir / f"{stage}_output.json"
    txt_path = case_dir / f"{stage}_output.txt"

    write_json(json_path, obj)

    # Human-readable rendering (no external links)
    lines = []
    lines.append("This is a draft only. Not legal advice. Human lawyer review required.\n")
    lines.append(f"Matter: {matter_id} | Stage: {stage}\n")
    lines.append("FACTS PROVIDED:\n- " + "\n- ".join(obj.get("facts_provided", [])) + "\n")
    lines.append("ASSUMPTIONS:\n- " + "\n- ".join(obj.get("assumptions", [])) + "\n")
    lines.append("OPEN QUESTIONS:\n- " + "\n- ".join(obj.get("open_questions", [])) + "\n")
    lines.append("CONTROLS APPLIED:\n- " + "\n- ".join(obj.get("controls_applied", [])) + "\n")

    so = obj.get("stage_output", {})
    lines.append(f"STAGE OUTPUT TYPE: {so.get('type','')}\n")
    lines.append(f"SUMMARY:\n{so.get('summary','')}\n")
    items = so.get("items", [])
    if items:
        lines.append("ITEMS:\n- " + "\n- ".join(items) + "\n")

    ho = obj.get("handoff", {})
    lines.append("HANDOFF:\n")
    lines.append(f"- next_stage: {ho.get('next_stage','')}\n")
    lines.append(f"- needs_human_approval: {ho.get('needs_human_approval', True)}\n")
    lines.append(f"- approval_question: {ho.get('approval_question','')}\n")
    stop_if = ho.get("stop_if", [])
    if stop_if:
        lines.append("STOP IF:\n- " + "\n- ".join(stop_if) + "\n")

    risks = obj.get("risks", [])
    if risks:
        lines.append("RISKS:\n" + "\n".join([f"- ({r.get('severity')}) {r.get('type')}: {r.get('note')}" for r in risks]) + "\n")

    lines.append("VERIFICATION STATUS: Not verified\n")
    qv = obj.get("questions_to_verify", [])
    if qv:
        lines.append("QUESTIONS TO VERIFY:\n- " + "\n- ".join(qv) + "\n")

    txt_path.write_text("".join(lines), encoding="utf-8")
    return str(json_path), str(txt_path)

# Human-in-the-loop gates (simulate approvals)
APPROVE_PLAN = True
APPROVE_FINAL = True

stats = {
    "total_matters": len(MATTERS),
    "matters_success": 0,
    "matters_failed": 0,
    "total_calls": 0,
    "stage_success": 0,
    "stage_failed": 0
}

summary_rows = []

for mi, matter in enumerate(MATTERS, start=1):
    matter_id = matter["matter_id"]
    domain = matter["domain"]
    facts = matter["facts"]

    print(f"\n[Case {mi}/{len(MATTERS)}] Processing matter: {matter_id} ({domain})")
    case_ok = True
    stages_completed = 0
    highest_sev = "low"

    for si, stage in enumerate(PIPELINE_STAGES, start=1):
        print(f"  [Stage {si}/{len(PIPELINE_STAGES)}] {stage} ...")

        # Simulated gates
        if stage == "workplan" and not APPROVE_PLAN:
            print("    ❌ Skipped (plan not approved).")
            case_ok = False
            break
        if stage == "signoff" and not APPROVE_FINAL:
            print("    ❌ Skipped (final not approved).")
            case_ok = False
            break

        try:
            instruction = STAGE_INSTRUCTIONS[stage]

            result = call_claude_json_prefill(
                task=f"Chapter5_{domain}_{stage}",
                stage=stage,
                matter_id=matter_id,
                facts=facts,
                user_instruction=instruction,
                max_tokens=max(MIN_MAX_TOKENS, 1800)
            )
            stats["total_calls"] += 1

            # add a few controls_applied locally (defense-in-depth)
            result["controls_applied"] = list(set(result.get("controls_applied", []) + [
                "redaction_default",
                "no_invented_authority",
                "verification_status_not_verified",
                "prefill_json_enforcement",
                "schema_validation",
                "stage_based_gates"
            ]))

            jp, tp = save_stage_outputs(matter_id, stage, result)
            print("    ✅ Saved:", jp)
            stages_completed += 1
            stats["stage_success"] += 1

            # compute highest severity
            sev_rank = {"low": 1, "medium": 2, "high": 3}
            for r in result.get("risks", []):
                if sev_rank.get(r.get("severity","low"), 1) > sev_rank.get(highest_sev, 1):
                    highest_sev = r.get("severity","low")

        except Exception as e:
            stats["stage_failed"] += 1
            case_ok = False
            print("    ❌ Failed:", repr(e))
            traceback.print_exc()

            # create a valid error deliverable (schema-conformant)
            err_obj = {
                "task": f"Chapter5_{domain}_{stage}",
                "stage": stage,
                "matter_id": matter_id,
                "facts_provided": facts,
                "assumptions": [],
                "open_questions": ["Stage failed due to a runtime error. Review error details and retry."],
                "controls_applied": ["error_handling", "schema_conformant_fallback"],
                "stage_output": {
                    "type": "memo",
                    "summary": "Error fallback. Stage did not complete.",
                    "items": [
                        "This is a draft only. Not legal advice. Human lawyer review required.",
                        f"Error: {repr(e)}"
                    ]
                },
                "handoff": {
                    "next_stage": "audit",
                    "needs_human_approval": True,
                    "approval_question": "Do you want to retry this stage with fewer facts and a shorter instruction?",
                    "stop_if": ["Repeated errors or invalid JSON persists."]
                },
                "risks": [
                    {"type": "other", "severity": "high", "note": f"RUNTIME_ERROR: {repr(e)}"}
                ],
                "verification_status": "Not verified",
                "questions_to_verify": []
            }
            save_stage_outputs(matter_id, stage, err_obj)
            highest_sev = "high"
            # proceed to next stage (continue processing despite failure)
            continue

    if case_ok:
        stats["matters_success"] += 1
        status = "✅"
    else:
        stats["matters_failed"] += 1
        status = "❌"

    summary_rows.append((matter_id, domain, status, stages_completed, highest_sev))

# Print summary table
print("\n" + "=" * 70)
print("CHAPTER 5 PIPELINE SUMMARY")
print("=" * 70)
print(f"Matters: {stats['total_matters']} | Success: {stats['matters_success']} | Failed: {stats['matters_failed']}")
print(f"Stages success: {stats['stage_success']} | Stages failed: {stats['stage_failed']} | API calls: {stats['total_calls']}")
print("-" * 70)
print(f"{'Matter':<10} {'Domain':<22} {'Status':<6} {'Stages':<7} {'TopRisk':<7}")
print("-" * 70)
for r in summary_rows:
    print(f"{r[0]:<10} {r[1]:<22} {r[2]:<6} {str(r[3]):<7} {r[4]:<7}")
print("-" * 70)
print("Deliverables root:", str(DELIVER_DIR))
print("=" * 70)



[Case 1/4] Processing matter: CRIM-001 (Criminal)
  [Stage 1/8] intake ...


  return datetime.utcnow().replace(microsecond=0).isoformat() + "Z"


    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/intake_output.json
  [Stage 2/8] conflicts ...
    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/conflicts_output.json
  [Stage 3/8] scope ...
    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/scope_output.json
  [Stage 4/8] workplan ...
    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/workplan_output.json
  [Stage 5/8] draft ...
    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/draft_output.json
  [Stage 6/8] qa ...
    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/qa_output.json
  [Stage 7/8] signoff ...
    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/signoff_output.json
  [Stage 8/8] audit ...
    ✅ Saved: /content/ai_law_ch5_runs/run_20260108T131732Z/deliverables/CRIM-001/audit_output.json

[Case 2/4] Processing matt

##9.USER'S EXERCISES

###9.1.OVERVIEW

**Cell 9: Your Matter Exercise (Safe Intake, Redaction, and a Shortened Organizational Pipeline)**

**What Cell 9 does**  
Cell 9 lets you run the same Chapter 5 approach on your own scenario, but in a controlled and safer way. Instead of processing one of the pre-defined matters from Cell 7, you provide a short, sanitized description of a matter. The notebook then applies the same governance principles: it redacts sensitive patterns, converts the text into minimum-necessary facts, and runs a shortened version of the mini-firm workflow to generate structured outputs and saved deliverables.

**Why this exercise is intentionally “sanitized” and “shortened”**  
This notebook is a teaching tool, not a secure matter management system. In real practice, you would use approved systems, internal policies, and client consent standards before using any external model. Cell 9 therefore emphasizes a strict rule: do not paste confidential or privileged material. It also uses a shortened pipeline because the goal is learning the workflow pattern rather than recreating an entire production environment.

**The intake and redaction step: your first safeguard**  
Cell 9 begins by asking you for a scenario. Before the model sees anything, the notebook applies redaction and shows you a summary of what was removed. This makes the privacy control visible. The point is not that redaction is perfect; it is that legal users must adopt a “minimum necessary” mindset and confirm what they are sending to a model.

**Choosing a domain: matching the workflow to the matter type**  
Next, Cell 9 asks you to choose a domain such as criminal, regulatory, international, or teaching. This choice matters because it shapes what the drafting stage produces and what the QA stage focuses on. Even in a simplified exercise, organizational workflows must be sensitive to context. A one-size-fits-all prompt is a common cause of AI misuse in legal settings.

**Running a shortened but governed pipeline**  
Cell 9 then runs several key stages that represent the organizational core: intake, scope, drafting, QA/red-team, sign-off preparation, and audit packaging. Even though it is shortened, it still preserves the Chapter 5 logic: do not draft first, do not skip QA, and do not treat outputs as final. Each stage produces structured outputs that include open questions, risks, and verification tasks.

**What gets saved and why that is the real lesson**  
Just like the main run in Cell 8, Cell 9 saves outputs to a dedicated folder for your matter. You get both a JSON version and a human-readable text version per stage. This teaches the core habit of Level 5: AI outputs are not “messages.” They are workflow artifacts that must be reviewable, traceable, and easy to supervise.

**What you should see after Cell 9 runs**  
You should see a confirmation of where your deliverables were saved and a clear record of the redaction summary. The most important outcome is not the draft itself. The most important outcome is that your matter went through a controlled sequence with explicit uncertainty, explicit risk flags, and a sign-off-oriented output that a lawyer can responsibly review.


###9.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 9 (Code)
# Goal: User exercise (sanitized intake) + run a shortened pipeline (still includes QA + signoff + audit)
# Output: redaction summary + saved paths

print("USER EXERCISE (Do NOT paste confidential/privileged information.)")
raw = input("\nPaste a SANITIZED scenario (no names, no confidential facts). Press Enter when done:\n> ").strip()

mn = minimum_necessary_facts(raw)
print("\nREDACTION SUMMARY (best-effort; imperfect):")
print(json.dumps(mn["removed_fields"], indent=2))

# Choose domain workflow
domain_choice = input("\nChoose domain: criminal | regulatory | international | teaching\n> ").strip().lower()
domain_map = {
    "criminal": "Criminal",
    "regulatory": "Regulatory/Administrative",
    "international": "International",
    "teaching": "Teaching/Academia"
}
domain = domain_map.get(domain_choice, "Regulatory/Administrative")

user_matter_id = f"USER-{domain_choice[:4].upper()}-{RUN_ID[-6:]}"
facts = mn["facts_bullets"] if mn["facts_bullets"] else ["User provided minimal sanitized scenario; details missing."]

SHORT_PIPELINE = ["intake", "scope", "draft", "qa", "signoff", "audit"]

print(f"\nRunning shortened pipeline for: {user_matter_id} ({domain})")
for stage in SHORT_PIPELINE:
    try:
        instruction = STAGE_INSTRUCTIONS.get(stage, "Create a concise stage output. Return ONLY JSON.")
        result = call_claude_json_prefill(
            task=f"Chapter5_USER_{domain}_{stage}",
            stage=stage,
            matter_id=user_matter_id,
            facts=facts,
            user_instruction=instruction,
            max_tokens=max(MIN_MAX_TOKENS, 1800)
        )
        result["controls_applied"] = list(set(result.get("controls_applied", []) + [
            "user_exercise",
            "redaction_default",
            "prefill_json_enforcement",
            "schema_validation"
        ]))
        jp, tp = save_stage_outputs(user_matter_id, stage, result)
        print(f"  ✅ {stage}: saved {jp}")
    except Exception as e:
        print(f"  ❌ {stage} failed:", repr(e))
        traceback.print_exc()

print("\nUser deliverables saved under:", str(DELIVER_DIR / user_matter_id))


##10.AUDIT ARTIFACTS

###10.1.OVERVIEW

**Cell 10: Audit Readme and Final Bundle (Turning a Run into a Defensible Record)**

**What Cell 10 does**  
Cell 10 is the closing step that turns everything you did in the notebook into a single, organized package. It creates a short audit readme that explains what the run produced and how to review it, and then it bundles the entire run directory into one compressed file. This is the moment where the notebook stops being “a series of outputs” and becomes an auditable record of a controlled workflow.

**Why this step is essential for Level 5 (Organizations)**  
At an organizational level, the output is not just the final draft. The output is the chain of artifacts that supports supervision, quality control, and accountability. Without an audit bundle, you have a fragile situation: useful text exists, but you cannot easily show how it was created, what was assumed, what risks were flagged, or what needs verification. Cell 10 exists to prevent that. It is the institutional habit: preserve the run, make it reviewable, and make it reproducible.

**The audit readme: what it provides**  
The audit readme is a plain-language map of the run. It identifies the run ID, model name, and the governance artifacts that were created. It explains what each artifact is for, and it provides a short review checklist that a supervising lawyer could follow. This matters because organizational review should not depend on the original notebook author remembering what happened. The audit readme makes the run self-explanatory.

**What gets bundled and why the bundle is valuable**  
Cell 10 includes the run manifest, prompts log, risk log, dependency snapshot, and the full deliverables folder. In practical terms, this means you can hand the bundle to someone else—another lawyer, a supervisor, or an internal reviewer—and they can see what the system did without rerunning anything. The bundle also supports “replay” thinking: if someone wants to reproduce the run, the manifest and dependency snapshot provide the starting point.

**How this differs from saving a chat transcript**  
Saving a chat transcript captures conversation, but it does not capture workflow structure, stage outputs, risk aggregation, or reproducibility context. The bundle produced by Cell 10 is more like a matter file: it is organized, stage-based, and designed for review. That is the key organizational difference. The notebook is not trying to be a better chat app. It is trying to behave like a controlled legal process.

**What you should see after Cell 10 runs**  
You should see confirmation that the audit readme was created, a list of the top-level files in the run directory, and the path to the final compressed bundle. That bundle is the final product of Chapter 5: a complete mini-firm run packaged in a way that supports supervision, accountability, and safe iteration.


###10.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 10 (Code)
# Goal: AUDIT_README + zip bundle + print file list + zip path
# Output: zip path + included artifacts checklist

AUDIT_README_PATH = RUN_DIR / "AUDIT_README.txt"

audit_text = f"""
Chapter 5 — Level 5 (Organizations): Audit Readme
Run ID: {RUN_ID}
Timestamp (UTC): {now_iso()}
Model: {MODEL}

DISCLAIMER
- This is a draft-only workflow demonstration. Not legal advice.
- Human lawyer review required for any reliance-bearing use.
- verification_status is always "Not verified" by design.

ARTIFACTS
1) run_manifest.json
   - What: run configuration (model, params, environment, purpose)
   - Why: reproducibility + governance record

2) prompts_log.jsonl
   - What: one record per Claude call (REDACTED prompt + hashes)
   - Why: audit trail without storing sensitive content

3) risk_log.json
   - What: consolidated list of risk flags per matter per stage
   - Why: supervisory review + escalation

4) pip_freeze.txt
   - What: dependency snapshot
   - Why: reproducibility

5) deliverables/<matter_id>/
   - Each stage produces:
     - <stage>_output.json (structured)
     - <stage>_output.txt  (human-readable)
   - Why: traceable workflow outputs by stage

REPLAY INSTRUCTIONS
- Re-run Cells 2–10 in order.
- Ensure ANTHROPIC_API_KEY is set in Colab Secrets.
- Compare hashes in prompts_log.jsonl if needed.

REVIEW CHECKLIST (MINIMUM)
- Confirm no sensitive data was pasted.
- Review open_questions + questions_to_verify before any reliance.
- Confirm risks are understood and mitigated.
- Lawyer sign-off required before client-facing use.
""".strip()

AUDIT_README_PATH.write_text(audit_text, encoding="utf-8")

# Zip the run directory
zip_path = Path(f"{RUN_DIR}.zip")
try:
    subprocess.check_call(["bash", "-lc", f"cd {RUN_DIR.parent} && zip -qr {zip_path.name} {RUN_DIR.name}"])
except Exception as e:
    raise RuntimeError(f"Zip failed: {repr(e)}")

# Print file list (top-level)
print("Created AUDIT_README:", str(AUDIT_README_PATH))
print("\nTop-level files in run directory:")
for p in sorted(RUN_DIR.glob("*")):
    print(" -", p.name)

print("\nZip bundle:", str(zip_path))
print("\nIncluded checklist:")
print(" - run_manifest.json")
print(" - prompts_log.jsonl")
print(" - risk_log.json")
print(" - pip_freeze.txt")
print(" - deliverables/ (per matter per stage JSON + TXT)")
print(" - AUDIT_README.txt")


##11.CONCLUSIONS

**Conclusion: The Chapter 5 Mini-Firm Pipeline, Step by Step**

This notebook is a practical demonstration of what it means to use generative AI at **Level 5 (Organizations)**: not as a conversational chatbot that “answers questions,” but as a component inside a **governed legal workflow**. The central lesson is simple and non-negotiable: once AI becomes part of an organization’s work product pipeline, the relevant question is no longer “Did the model draft something useful?” The relevant question becomes “Can we explain, reproduce, supervise, and defend how this draft was produced—without compromising confidentiality, privilege, or accuracy?” Chapter 5 answers that question by building a mini-firm simulation that mirrors how real legal work should flow, stage by stage, with explicit controls and an audit trail.

**Step 1: Intake (structured capture of what we know and what we do not know)**  
The pipeline begins with intake because legal work is only as good as the facts you start with. In this stage, the notebook converts a user’s scenario into **sanitized, minimum-necessary facts** and then forces the system to generate an intake output that is not a “solution,” but a **checklist and a set of open questions**. That difference matters. A chatbot tries to be helpful by “finishing the task.” A governed pipeline must first define the task boundary and expose missing inputs. Intake produces a structured artifact that a lawyer can review quickly: the facts provided, assumptions being made, and a list of information needed before any reliance.

**Step 2: Conflicts and engagement risk checks (organizational safeguards before work begins)**  
Next, the notebook simulates the early institutional gatekeeping that many lawyers skip when using consumer chat tools. Conflicts checks, engagement boundaries, and risk triggers (such as sensitive parties or unclear representation) are not optional in organizational practice. This stage creates a conflicts/engagement checklist and highlights what must be verified internally before substantive drafting proceeds. The output is designed to support supervision: it does not “decide” representation; it enumerates what the firm must confirm.

**Step 3: Scope definition (what is in, what is out, and what is explicitly not being done)**  
Scope is where legal risk often hides, especially with AI. Without scope discipline, an AI draft can drift into advice outside the agreed task, or use assumptions as if they were facts. This stage outputs a scope statement in plain terms: deliverables, constraints, and an explicit out-of-scope list. It also forces the separation between “facts provided” and “assumptions,” which is one of the most important professional responsibility controls in AI-assisted practice. A scope artifact is not busywork; it is a guardrail.

**Step 4: Workplan (a supervised sequence with human gates and artifact expectations)**  
The workplan stage converts the matter into a plan with **human approval points** and a list of artifacts to preserve. This is the organizational shift: rather than asking the AI to jump straight into a draft, we require a structured sequence—what happens next, what decisions require human confirmation, and what to store to maintain a defensible record. The workplan stage is where the “mini-firm” becomes visible: it defines roles, steps, and stop conditions, and it makes supervision operational instead of aspirational.

**Step 5: Drafting (a controlled work product draft, never a final answer)**  
Only after intake, checks, scope, and plan do we draft. Even here, the notebook’s goal is not to produce a final legal conclusion; it is to produce a **draft work product** suitable for lawyer review. The drafting stage is deliberately constrained: it includes disclaimers, avoids invented authority, and remains “Not verified.” The key is that drafting is no longer a standalone event. It is one stage inside a system that already knows what is missing and what must be verified.

**Step 6: QA and red-team (stress testing before anyone relies on the output)**  
Quality assurance is where Level 5 differs sharply from casual chatbot use. This stage forces the system to identify weakest links, missing facts, tone risks, and prompt-injection style vulnerabilities. Importantly, the red-team pass is not an abstract lecture; it is an output artifact that lists concrete failure modes and suggested fixes. In organizational use, QA is not a luxury. It is how you prevent plausible-sounding drafts from becoming unreviewed reliance.

**Step 7: Sign-off package (what a supervising lawyer must review and verify)**  
In a real firm, the last mile is not “make the draft prettier.” The last mile is preparing a sign-off package that makes review efficient and responsible. This stage produces a structured review set: what must be checked, what must be verified, what can be safely communicated to a client, and what must remain tentative. This stage reinforces the central rule: the lawyer owns the final output, and the notebook is designed to make that ownership manageable rather than overwhelming.

**Step 8: Audit bundle (reproducibility, traceability, and organizational memory)**  
Finally, the pipeline generates an audit index and bundles the run. This is the organizational maturity move that most chatbot usage lacks. The notebook records a manifest of the environment and model, a prompts log (redacted), a risk log, and staged deliverables. The point is not to create paperwork for its own sake; it is to enable accountability. If a question arises later—what was asked, what was assumed, what was produced, and what risks were identified—this pipeline can answer it.

**Why the pipeline works (and why it must be different for legal use)**  
A consumer chatbot optimizes for conversational helpfulness. This pipeline optimizes for supervised legal operations: minimal sensitive input, structured uncertainty, explicit verification tasks, and retained artifacts. That is the core of Chapter 5: a mini-firm that transforms AI from a “clever text generator” into a governed workflow component. In practice, this is how legal organizations can use AI safely: not by trusting the model more, but by designing the process so trust is never required.
