#**AI LAW, CHAPTER 4. INNOVATORS**

---

##0.REFERENCE

https://claude.ai/share/cf46f461-99e4-4dc5-a885-3591d79c07ba

##1.CONTEXT

**Introduction to Chapter 4: Building Reusable Legal Assets with Governance**

**What You Already Know About AI**

If you have used ChatGPT, Claude, or any chatbot application on your phone or computer, you already understand the basic idea of conversational artificial intelligence. You type a question or request, the AI responds with text, and you can continue the conversation back and forth. These tools are remarkably helpful for brainstorming ideas, explaining concepts, drafting emails, or getting quick answers to questions. The experience feels natural and effortless, almost like texting with a knowledgeable friend who is always available.

**Why Legal Practice Requires Something Completely Different**

However, using AI for actual legal work requires a fundamentally different approach than casual chatbot conversations. When you chat with an AI for personal use, the stakes are low. If it gives you a mediocre recipe suggestion or explains a concept imperfectly, nothing serious happens. You simply ignore the bad advice and try again. But legal practice operates under entirely different constraints that make casual chatbot usage completely inappropriate and potentially dangerous.

First, lawyers have ethical obligations to provide competent representation. If you rely on AI-generated work product, you must be able to verify its accuracy, understand its limitations, and take responsibility for its content. A chatbot conversation that disappears after you close the browser cannot meet these requirements. You need documentation showing what you asked, what the AI produced, what risks were identified, and what human review occurred.

Second, lawyers handle confidential and privileged information. Typing sensitive client facts into a consumer chatbot may violate your ethical duty to protect confidentiality. Even if the AI provider promises not to store your data, the risk of accidental exposure is unacceptable. You need systems that actively protect private information through techniques like redaction before any data leaves your control.

Third, legal work often gets reused and shared. When you create a clause, checklist, policy, or playbook, other lawyers in your firm might use it for different clients and matters. This reuse amplifies any errors or problems in the original work. A mistake in a casual chatbot conversation affects only you. A mistake in a reusable legal asset might affect dozens of clients over many years. This multiplied risk requires proportionally stronger quality assurance.

Fourth, lawyers must maintain audit trails for professional responsibility and malpractice defense. If a client sues you, or if a disciplinary authority investigates your conduct, you need contemporaneous records showing what you did and why. A chatbot conversation that leaves no permanent record provides no protection. You need comprehensive logging of inputs, outputs, decisions, and identified risks.

Fifth, legal work increasingly faces regulatory scrutiny around AI use. Bar associations, courts, and regulators are developing rules about appropriate AI use in legal practice. Demonstrating compliance requires documentation that consumer chatbots simply do not provide. You need governance artifacts showing that proper safeguards were in place.

**What This Notebook Does Differently**

This notebook transforms casual AI interaction into professional-grade legal workflow by adding multiple layers of infrastructure that address each concern described above. Instead of typing into a chatbot and hoping for good results, you execute a structured pipeline that generates assets, tests them adversarially, revises them based on test results, and packages them with comprehensive documentation for human review.

The notebook creates permanent records of every AI interaction, but stores them in redacted form to protect confidentiality. It automatically flags risks like missing disclaimers or potential hallucinations, aggregating these warnings for systematic review. It generates version-controlled assets so you can track changes from initial draft through revision. It produces human review checklists specifying exactly what a lawyer must verify before using any output. It builds complete audit packages with manifests, logs, statistics, and governance documentation suitable for long-term archival.

Most importantly, the notebook never pretends that AI outputs are ready for immediate use. Every asset explicitly states it is a draft requiring human lawyer review. Every asset has verification status set to "Not verified" acknowledging that a human attorney must confirm accuracy. Every release package includes a checklist of items requiring human verification. The system treats AI as a drafting assistant that accelerates work, not as an autonomous decision-maker that replaces professional judgment.

**Why Chapter 4 Focuses on Reusable Assets**

The progression through the book's chapters reflects increasing sophistication in AI use for legal practice. Earlier chapters might have focused on one-time tasks or simple document analysis. Chapter 4 addresses reusable assets, which represent a higher level of practice maturity that this book calls the "Innovators" level.

Reusable assets are legal work products designed for repeated use across multiple matters: clause libraries that provide tested contract language, playbooks that guide lawyers through complex procedures, checklists that ensure consistent intake or review processes, teaching modules that train students or junior lawyers, and policy templates that can be adapted to different organizational contexts. These assets provide enormous value because the investment in creating and testing them pays dividends across many subsequent uses.

However, reusability also creates what the notebook calls increased "blast radius." If a flawed asset gets reused twenty times, that single flaw affects twenty matters. If a poorly tested playbook guides decision-making in fifty cases, all fifty cases inherit any weaknesses in that playbook. This multiplied risk means governance must scale proportionally. You cannot afford to be casual about quality assurance when creating something that will be reused extensively.

Chapter 4 implements this scaled governance through systematic adversarial testing. After generating each asset, the notebook creates tests designed to break it, stress it, and find its weaknesses. These include adversarial tests simulating hostile users, edge case tests exploring unusual scenarios, ambiguity tests checking behavior with unclear facts, prompt injection tests attempting to manipulate the asset, and consistency tests verifying internal coherence. Running these tests before any human even sees the asset provides an early quality filter that catches many problems automatically.

The chapter also implements proper release management. Assets move from version zero point one draft, through testing and revision to version zero point two, and finally to a release package with explicit readiness assessment. Each version is preserved, creating clear lineage. The release manifest specifies what human reviews are required, what deployment constraints apply, and what verification questions must be answered. This structured progression prevents premature deployment of untested work.

**What You Will Learn**

By working through this notebook, you will understand how to transform AI from a casual chatbot into a governed professional tool. You will see how redaction protects confidential information before it reaches the AI. You will experience how structured prompting and technical techniques like prefill enforcement ensure reliable outputs. You will observe systematic testing revealing asset weaknesses that human review might miss. You will examine comprehensive governance artifacts that create audit trails for professional accountability.

More fundamentally, you will internalize a professional mindset about AI use in legal practice. AI is powerful but requires careful handling. Outputs are helpful but need verification. Automation increases efficiency but does not eliminate professional responsibility. Technology amplifies capability but also amplifies risk. Proper use requires infrastructure, discipline, and sustained attention to governance.

**Who Should Use This Notebook**

This notebook is designed for United States-based practicing lawyers with minimal AI background. You do not need to understand the technical details of how large language models work. You do not need programming expertise beyond running cells in Google Colab. You do not need previous experience with the Anthropic API or Claude models.

What you do need is appreciation for professional responsibility in legal practice, understanding that AI outputs require human verification, willingness to examine governance artifacts and audit trails, and commitment to following proper procedures rather than cutting corners for convenience.

**The Path Forward**

As you proceed through the ten sections of this notebook, you will build increasingly sophisticated infrastructure culminating in a complete asset development and release pipeline. Each section adds capability while maintaining all previous safeguards. By the end, you will have executed the pipeline on four demonstration cases, created your own custom asset interactively, and generated a comprehensive audit bundle documenting everything that occurred.

This is not the easiest way to use AI for legal work. It would be much simpler to just type questions into ChatGPT and copy the responses. But simple is not the same as appropriate. Legal practice demands more than convenience. It demands competence, confidentiality, quality assurance, accountability, and documented compliance with professional standards. This notebook shows you how to meet those demands while still benefiting from AI's remarkable capabilities. The complexity you will encounter reflects the seriousness with which the legal profession must approach these powerful new tools.

##2.LIBRARIES AND ENVIRONMENT

In [14]:
# Install dependencies and create run directory

!pip install -q anthropic

import os
import json
import re
import hashlib
from datetime import datetime
from pathlib import Path
import shutil

# Create timestamped run directory
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
RUN_DIR = Path(f"/content/ai_law_ch4_runs/run_{timestamp}")
RUN_DIR.mkdir(parents=True, exist_ok=True)

DELIVERABLES_DIR = RUN_DIR / "deliverables"
DELIVERABLES_DIR.mkdir(exist_ok=True)

print(f"‚úÖ Run directory created: {RUN_DIR}")
print(f"‚úÖ Deliverables directory: {DELIVERABLES_DIR}")
print(f"‚úÖ Timestamp: {timestamp}")

‚úÖ Run directory created: /content/ai_law_ch4_runs/run_20260108_131043
‚úÖ Deliverables directory: /content/ai_law_ch4_runs/run_20260108_131043/deliverables
‚úÖ Timestamp: 20260108_131043


##3.API SETUP AND CLIENT INITIALIZATION

###3.1.OVERVIEW

**API Key Setup and Client Initialization**

This section establishes the connection between your Google Colab notebook and the Anthropic API service. Think of it as setting up a phone line before making a call - you need the right credentials and connection details to communicate with Claude.

**What Happens in This Section**

First, the notebook retrieves your Anthropic API key from Google Colab's secure storage system called "Secrets". This is similar to retrieving a password from a password manager rather than writing it directly in your code. The key acts as your authorization credential, proving you have permission to use the Claude API service.

Next, the system stores this key in an environment variable. Environment variables are temporary storage locations that programs can access during their execution. This makes the key available to other parts of the notebook without repeatedly typing it.

Then, the code creates a "client" object using the Anthropic library. The client is your communication interface - it handles all the technical details of sending requests to Claude and receiving responses. Without this client, your notebook cannot interact with the AI model.

Finally, the section specifies which Claude model to use. In this notebook, we use Claude Haiku version four point five. Different models have different capabilities, speeds, and costs. Haiku is designed for efficiency while maintaining high quality output, making it suitable for production legal workflows where you need reliable performance.

**Why This Matters for Legal Practice**

For lawyers using AI tools, proper API initialization is a governance requirement, not just a technical step. The approach here demonstrates several best practices. First, keeping API keys in secure storage rather than hardcoding them prevents accidental exposure if you share the notebook. Second, explicitly declaring which model version you use creates an audit trail - six months later, you can verify exactly which AI system generated a particular output. Third, the error handling ensures you receive clear feedback if something goes wrong during setup, rather than mysterious failures later in the workflow.

**What You See When Running**

When this section executes successfully, you will see three confirmation messages indicating the API key loaded correctly, displaying the model name, and confirming the client initialized. If there is a problem, you will see an error message directing you to add your API key to Colab Secrets using the key icon in the sidebar. This immediate feedback helps you catch configuration issues before attempting to generate any legal assets.

**Connection to the Overall Workflow**

Everything that follows in this notebook depends on this initialization. Without a properly configured client and model specification, the asset generation pipeline cannot function. This section is the foundation that enables all subsequent governance-tracked AI interactions.

###3.2.CODE AND IMPLEMENTATION

In [8]:
# API key setup and client initialization

import anthropic
from google.colab import userdata

try:
    ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')
    os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_API_KEY

    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    MODEL = "claude-haiku-4-5-20251001"

    print("‚úÖ API key loaded successfully")
    print(f"‚úÖ Model: {MODEL}")
    print(f"‚úÖ Client initialized")

except Exception as e:
    print(f"‚ùå Error loading API key: {e}")
    print("Please add ANTHROPIC_API_KEY to Colab Secrets (üîë icon in sidebar)")

‚úÖ API key loaded successfully
‚úÖ Model: claude-haiku-4-5-20251001
‚úÖ Client initialized


##4.GOVERNANCE UTILITIES

###4.1.OVERVIEW

**Governance Utilities: Manifest, Logging, and Risk Tracking**

This section creates the infrastructure for tracking, documenting, and auditing every action the notebook performs. Think of it as setting up a detailed filing system before beginning a complex legal matter - you establish the record-keeping framework first, then populate it as work progresses.

**Creating the Run Manifest**

The notebook begins by creating a run manifest, which is a master record document for this specific execution session. The manifest captures essential metadata including a unique run identifier based on the timestamp, the chapter and purpose of the notebook, which AI model is being used, and when the session started. This manifest serves as the table of contents for your entire audit package. Later, when reviewing outputs or responding to questions about how an asset was created, you can refer back to this manifest to understand the context.

**Initializing the Prompts Log**

Next, the system creates a prompts log file using a format called JSONL, which stands for JSON Lines. This log will record every interaction with the Claude API throughout the notebook execution. Critically, all logged content is redacted before storage, meaning personally identifiable information is removed. Each log entry includes a timestamp, a cryptographic hash of the prompt for verification purposes, the redacted prompt text, the redacted response, and how many risks were flagged. This creates a complete but privacy-protected audit trail of all AI interactions.

**Setting Up Risk Tracking**

The notebook then initializes a risk log as a centralized repository for all identified risks across every API call. As the notebook executes, any time the system detects potential issues - such as missing disclaimers, possible hallucinations, or overconfident language - these observations are logged here with their severity level and explanatory notes. This aggregation allows you to review all risks in one location rather than searching through individual case files.

**Capturing the Environment**

The system also saves a complete list of installed Python packages and their versions using pip freeze. This technical snapshot ensures reproducibility. If you need to recreate these exact results months later, or if you need to troubleshoot unexpected behavior, you can verify the software environment matches what was used originally.

**Global Statistics Tracker**

Finally, the section initializes a statistics tracker that counts key metrics throughout execution: total API calls made, total risks logged, cases that succeeded or failed, and pipeline stages that succeeded or failed. These statistics provide quantitative insight into the notebook's performance and help identify patterns or problems.

**Why This Infrastructure Matters**

For legal practice, this governance infrastructure addresses a fundamental challenge when using AI tools: demonstrating due diligence and maintaining accountability. If opposing counsel questions how an asset was created, or if a supervising attorney needs to verify proper procedures were followed, these logs provide documentary evidence. The manifest shows what was attempted, the prompts log shows what was sent and received, the risk log shows what concerns were identified, and the statistics show overall performance.

**Visible Output**

When this section runs, you see confirmation messages displaying the file paths where each governance artifact was created. These paths show you exactly where to find the manifest, prompts log, risk log, and environment snapshot. This transparency ensures you know where your audit materials are stored from the very beginning of the process.

###4.2.CODE AND IMPLEMENTATION

In [9]:
# Governance utilities: manifest, logging, risk tracking

# Initialize run manifest
run_manifest = {
    "run_id": timestamp,
    "chapter": "4_innovators",
    "model": MODEL,
    "start_time": datetime.now().isoformat(),
    "run_directory": str(RUN_DIR),
    "purpose": "Reusable legal asset creation with adversarial testing and release pipeline"
}

manifest_path = RUN_DIR / "run_manifest.json"
with open(manifest_path, 'w') as f:
    json.dump(run_manifest, f, indent=2)

# Initialize prompts log (JSONL format)
prompts_log_path = RUN_DIR / "prompts_log.jsonl"
prompts_log_path.touch()

# Initialize risk log
risk_log = {
    "run_id": timestamp,
    "risks": []
}
risk_log_path = RUN_DIR / "risk_log.json"
with open(risk_log_path, 'w') as f:
    json.dump(risk_log, f, indent=2)

# Save pip freeze for reproducibility
pip_freeze_path = RUN_DIR / "pip_freeze.txt"
!pip freeze > {pip_freeze_path}

print(f"‚úÖ Run manifest: {manifest_path}")
print(f"‚úÖ Prompts log: {prompts_log_path}")
print(f"‚úÖ Risk log: {risk_log_path}")
print(f"‚úÖ Pip freeze: {pip_freeze_path}")

# Global stats tracker
stats = {
    "total_api_calls": 0,
    "total_risks_logged": 0,
    "cases_success": 0,
    "cases_fail": 0,
    "stages_success": 0,
    "stages_fail": 0
}

‚úÖ Run manifest: /content/ai_law_ch4_runs/run_20260108_123554/run_manifest.json
‚úÖ Prompts log: /content/ai_law_ch4_runs/run_20260108_123554/prompts_log.jsonl
‚úÖ Risk log: /content/ai_law_ch4_runs/run_20260108_123554/risk_log.json
‚úÖ Pip freeze: /content/ai_law_ch4_runs/run_20260108_123554/pip_freeze.txt


##5.REDACTION AND MINIMUM NECESSARY INTAKE UTILITIES


###5.1.OVERVIEW

**Redaction and Minimum-Necessary Intake Utilities**

This section implements privacy protection mechanisms that prevent sensitive client information from being inadvertently exposed during AI interactions. Think of it as establishing attorney-client privilege safeguards before handling confidential materials - you create protective barriers first, then work within those boundaries.

**Understanding the Redaction Function**

The redaction function scans text for common patterns of personally identifiable information and replaces them with placeholder labels. Specifically, it searches for email addresses, telephone numbers in United States formats, Social Security numbers, and street addresses. When it finds these patterns, it substitutes them with labels like EMAIL REDACTED or PHONE REDACTED. The function also tracks what types of information it removed, returning both the cleaned text and a list of redacted categories.

**Critical Limitations**

The notebook explicitly warns that redaction is imperfect and operates on a best-effort basis. Pattern-based redaction cannot catch everything. Names without accompanying identifiers, case-specific facts that could identify individuals, or sensitive information in unusual formats may pass through undetected. This is why the notebook includes prominent warnings: do not paste actual sensitive client data into the system. The redaction layer provides a safety net, not a guarantee.

**Why Pattern-Based Redaction**

The approach uses regular expressions, which are text-matching patterns, to identify common information formats. For example, an email pattern looks for text structured like words at symbol words dot words. A phone number pattern looks for three digits, optional separator, three digits, optional separator, four digits. This method catches standard formats reliably but cannot understand context or identify information presented in non-standard ways.

**Minimum-Necessary Fields Function**

The section also includes a utility for filtering data dictionaries to keep only required fields. This implements the principle of data minimization - only sending the minimum information necessary to accomplish the task. If you have a data structure with twenty fields but only need five for a particular API call, this function strips away the unnecessary fifteen, reducing exposure risk.

**The Demonstration**

The section runs a live demonstration using fake data that includes an email address, phone number, street address, and Social Security number. You see the original text, then the redacted version with placeholders, and finally a summary of what was removed. This concrete example helps you understand exactly what the redaction function does and does not catch.

**Implications for Legal Practice**

For lawyers, this section addresses a fundamental tension when using AI tools: you need enough factual context for useful outputs, but you must protect confidential and privileged information. The redaction approach here represents one strategy - automatic scrubbing of obvious identifiers before any data leaves your control. However, the notebook's warnings emphasize that technology alone cannot ensure confidentiality. Sound judgment about what information to include remains essential.

**Warning Messages**

The section concludes with a clear warning that appears every time it runs, reminding you that redaction is imperfect. This repeated warning serves an important function: it keeps the limitation front of mind rather than letting you become complacent. The system does not hid

###5.2.CODE AND IMPLEMENTATION

In [10]:
# Redaction and minimum-necessary intake utilities

def redact(text):
    """Best-effort redaction of PII. NOT PERFECT - do not paste sensitive data."""
    if not text:
        return text

    redacted = text
    removed = []

    # Email addresses
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    if re.search(email_pattern, redacted):
        redacted = re.sub(email_pattern, '[EMAIL_REDACTED]', redacted)
        removed.append('emails')

    # Phone numbers (US format)
    phone_pattern = r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
    if re.search(phone_pattern, redacted):
        redacted = re.sub(phone_pattern, '[PHONE_REDACTED]', redacted)
        removed.append('phone_numbers')

    # SSN format
    ssn_pattern = r'\b\d{3}-\d{2}-\d{4}\b'
    if re.search(ssn_pattern, redacted):
        redacted = re.sub(ssn_pattern, '[SSN_REDACTED]', redacted)
        removed.append('ssn')

    # Street addresses (best effort)
    address_pattern = r'\b\d{1,5}\s+[A-Z][a-z]+\s+(Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Lane|Ln|Drive|Dr)\b'
    if re.search(address_pattern, redacted, re.IGNORECASE):
        redacted = re.sub(address_pattern, '[ADDRESS_REDACTED]', redacted, flags=re.IGNORECASE)
        removed.append('addresses')

    return redacted, removed

def minimum_necessary_fields(data_dict, required_fields):
    """Keep only required fields from input dict."""
    return {k: v for k, v in data_dict.items() if k in required_fields}

# Demo with fake data
demo_text = """
Client John Doe contacted us at john.doe@example.com or 555-123-4567.
His address is 123 Main Street, and SSN is 123-45-6789.
Meeting scheduled for next Tuesday.
"""

redacted_demo, removed_items = redact(demo_text)

print("BEFORE REDACTION:")
print(demo_text)
print("\n" + "="*60 + "\n")
print("AFTER REDACTION:")
print(redacted_demo)
print("\n" + "="*60 + "\n")
print(f"Removed fields: {removed_items}")
print("\n‚ö†Ô∏è  WARNING: Redaction is imperfect. Do NOT paste sensitive client data.")

BEFORE REDACTION:

Client John Doe contacted us at john.doe@example.com or 555-123-4567.
His address is 123 Main Street, and SSN is 123-45-6789.
Meeting scheduled for next Tuesday.



AFTER REDACTION:

Client John Doe contacted us at [EMAIL_REDACTED] or [PHONE_REDACTED].
His address is [ADDRESS_REDACTED], and SSN is [SSN_REDACTED].
Meeting scheduled for next Tuesday.



Removed fields: ['emails', 'phone_numbers', 'ssn', 'addresses']



##6.CLAUDE WRAPPER

###6.1.OVERVIEW

**Critical: Claude JSON API with Prefill Enforcement**

This section implements the most technically sophisticated and important component of the entire notebook: a reliable method for obtaining structured JSON outputs from Claude. Previous chapters encountered serious failures where the AI model wrapped JSON in explanatory text or markdown formatting, breaking the parsing process. This section solves that problem using a technique called prefill enforcement combined with multiple fallback strategies.

**The Core Problem**

Large language models are conversational by nature. When you ask them for JSON, they often want to be helpful by explaining what they are doing, adding context, or formatting the output nicely with code fences. For a human reader, this is friendly. For a computer parser expecting pure JSON, this breaks everything. The challenge is forcing the model into a pure structured output mode without triggering its conversational instincts.

**The Prefill Solution**

Prefill is an advanced API technique where you provide the beginning of the assistant's response as part of your request. In this implementation, the system sends the opening brace character as a prefilled assistant message. The model then continues from that point, completing the JSON object. Because it starts mid-JSON, it cannot add preamble text before the opening brace. This dramatically increases reliability because the model's first token is already committed to being part of the JSON structure.

**Why This Works**

The prefill technique leverages how the API processes messages. When you include an assistant message with partial content, the model treats that as already-generated output and continues from there. By starting with an opening brace, you force the model into completion mode rather than explanation mode. The model's training makes it highly likely to complete valid JSON once it has started the structure.

**Fallback Extraction Strategies**

Despite prefill's effectiveness, the notebook implements four additional extraction strategies as safety nets. Strategy one attempts to parse the text directly as received. Strategy two finds the first opening brace and last closing brace, extracting everything between them. Strategy three strips markdown code fence markers that sometimes appear. Strategy four performs bracket-balancing scanning to find the first complete JSON object in the text. These layers ensure that even if prefill occasionally fails, the system can still recover valid JSON.

**Schema Validation**

After successfully parsing JSON, the system validates that all required top-level keys are present. The schema specifies exactly what structure the JSON must have: task description, facts provided, assumptions made, open questions, asset details, tests, release readiness assessment, risks, verification status, and questions requiring verification. If any required key is missing, the validation fails and the system attempts a retry.

**Automatic Risk Flagging**

The system automatically scans the returned JSON for concerning patterns. If the response contains words like cite, statute, case, or regulation, it flags a potential hallucination risk because the model should never invent legal authorities. If the verification status is not set to "Not verified", it flags overconfidence. If the asset content lacks required disclaimers about being a draft and not legal advice, it flags missing safeguards. These automatic checks catch common failure modes without requiring manual review of every output.

**Retry Logic with Progressive Strictness**

If the first attempt fails, the system makes up to three total attempts with increasing strictness. The first attempt uses temperature zero point one, allowing slight creativity. Subsequent attempts use temperature zero, making the model more deterministic, and add an explicit prefix demanding JSON only with no text. This progressive approach gives the model multiple chances while tightening constraints each time.

**Error Fallback Schema**

If all three attempts fail, rather than crashing the notebook, the system returns a valid error schema that matches the expected structure. This error schema has task set to "error fallback", includes the failure message in the risks array with high severity, and sets release readiness to needs revision. This approach ensures the pipeline can continue even after failures, documenting what went wrong rather than halting execution.

**Logging and Risk Tracking**

Every successful API call triggers logging to the prompts log file and updates to the risk log. The system redacts both the prompt and response before logging, applies cryptographic hashing to the prompt for verification purposes, and records how many risks were flagged. This creates the audit trail that governance requires while protecting confidential information.

**The Smoke Test**

The section concludes with a smoke test that validates the entire prefill mechanism before the main pipeline runs. It sends a simple prompt requesting a client intake checklist, then verifies the response has the asset key, verification status equals "Not verified", and content includes the draft disclaimer. This immediate validation catches configuration problems early rather than discovering them deep into pipeline execution.

**Why This Complexity Matters**

For legal practitioners, this technical infrastructure might seem excessive. However, reliability is not optional when generating reusable legal assets. If the system produces malformed outputs that require manual intervention, it defeats the purpose of automation. If it occasionally invents citations or omits disclaimers, it creates liability exposure. The elaborate defensive programming here ensures consistent, governable, auditable outputs suitable for professional legal practice.

###6.2.CODE AND IMPLEMENTATION

In [11]:
# CRITICAL: Claude JSON API with PREFILL enforcement

SYSTEM_PROMPT = """Output ONLY JSON. Start with { and end with }. No other text.

Return JSON with EXACTLY these keys (no extras):
{
  "task": "...",
  "facts_provided": [...],
  "assumptions": [...],
  "open_questions": [...],
  "asset": {
    "asset_type": "clause|playbook|checklist|template|teaching_module|test_suite",
    "title": "...",
    "version": "v0.1",
    "intended_use": "...",
    "not_intended_for": "...",
    "content": "..."
  },
  "tests": [
    {
      "test_name": "...",
      "test_type": "adversarial|edge_case|ambiguity|prompt_injection|counterparty_read|consistency",
      "test_input": "...",
      "expected_behavior": "...",
      "observed_risks": [{"type": "...", "severity": "low|medium|high", "note": "..."}],
      "pass_fail": "pass|fail",
      "remediation_notes": "..."
    }
  ],
  "release_readiness": {
    "status": "draft_only|needs_revision|ready_for_human_review",
    "required_human_review": [...],
    "deployment_constraints": [...]
  },
  "risks": [{"type": "...", "severity": "low|medium|high", "note": "..."}],
  "verification_status": "Not verified",
  "questions_to_verify": [...]
}

RULES:
- verification_status MUST always be "Not verified"
- NEVER invent citations, cases, statutes, rules, or authorities
- asset.content MUST include: "This is a draft only. Not legal advice. Human lawyer review required."
- If jurisdiction-specific authority needed, mark "Not verified / source needed" in questions_to_verify
- NO markdown, NO code fences, NO explanatory text
"""

def extract_json_from_text(text):
    """Fallback JSON extraction strategies."""
    # Strategy 1: Parse as-is
    try:
        return json.loads(text)
    except:
        pass

    # Strategy 2: Slice first { to last }
    try:
        first_brace = text.index('{')
        last_brace = text.rindex('}')
        return json.loads(text[first_brace:last_brace+1])
    except:
        pass

    # Strategy 3: Strip ```json code fences
    try:
        cleaned = re.sub(r'```json\s*|```\s*', '', text, flags=re.IGNORECASE)
        return json.loads(cleaned)
    except:
        pass

    # Strategy 4: Bracket-balancing scan
    try:
        start = text.index('{')
        depth = 0
        for i, char in enumerate(text[start:], start):
            if char == '{':
                depth += 1
            elif char == '}':
                depth -= 1
                if depth == 0:
                    return json.loads(text[start:i+1])
    except:
        pass

    return None

def validate_schema(data):
    """Validate required top-level keys."""
    required_keys = [
        'task', 'facts_provided', 'assumptions', 'open_questions',
        'asset', 'tests', 'release_readiness', 'risks',
        'verification_status', 'questions_to_verify'
    ]
    return all(k in data for k in required_keys)

def auto_risk_flags(data):
    """Automatically flag risks based on content."""
    auto_risks = []

    # Check for hallucination indicators
    content_str = json.dumps(data).lower()
    hallucination_terms = ['cite', 'statute', 'case', 'rule', 'regulation', 'code section']
    if any(term in content_str for term in hallucination_terms):
        auto_risks.append({
            "type": "hallucination",
            "severity": "medium",
            "note": "Auto-flagged: Response contains potential authority references. Verify all citations."
        })

    # Check verification status
    if data.get('verification_status') != 'Not verified':
        auto_risks.append({
            "type": "overconfidence",
            "severity": "high",
            "note": "Auto-flagged: verification_status not set to 'Not verified'"
        })

    # Check for disclaimer in asset content
    if 'asset' in data and 'content' in data['asset']:
        content = data['asset']['content'].lower()
        if 'draft' not in content or 'not legal advice' not in content:
            auto_risks.append({
                "type": "other",
                "severity": "medium",
                "note": "Auto-flagged: Asset content missing required disclaimers"
            })

    return auto_risks

def log_prompt(redacted_prompt, redacted_response, prompt_hash, risks):
    """Log redacted prompts to JSONL."""
    log_entry = {
        "timestamp": datetime.now().isoformat(),
        "prompt_hash": prompt_hash,
        "redacted_prompt": redacted_prompt,
        "redacted_response": redacted_response,
        "risks_flagged": len(risks)
    }
    with open(prompts_log_path, 'a') as f:
        f.write(json.dumps(log_entry) + '\n')

def update_risk_log(risks):
    """Append risks to risk log."""
    with open(risk_log_path, 'r') as f:
        risk_data = json.load(f)

    risk_data['risks'].extend(risks)

    with open(risk_log_path, 'w') as f:
        json.dump(risk_data, f, indent=2)

    stats['total_risks_logged'] += len(risks)

def call_claude_json_prefill(user_prompt, temperature=0.1, max_tokens=2000):
    """Call Claude with PREFILL technique to enforce JSON output."""
    max_attempts = 3

    for attempt in range(1, max_attempts + 1):
        try:
            # Adjust temperature and prefix for retries
            current_temp = 0.0 if attempt > 1 else temperature
            prefix = "" if attempt == 1 else "OUTPUT ONLY JSON. NO TEXT.\n"

            # PREFILL: Include assistant message with "{" to force JSON completion
            response = client.messages.create(
                model=MODEL,
                max_tokens=max_tokens,
                temperature=current_temp,
                system=SYSTEM_PROMPT,
                messages=[
                    {"role": "user", "content": prefix + user_prompt},
                    {"role": "assistant", "content": "{"}
                ]
            )

            stats['total_api_calls'] += 1

            # Reconstruct full JSON by prepending the prefilled "{"
            response_text = response.content[0].text
            full_json_text = "{" + response_text

            # Try to parse
            try:
                data = json.loads(full_json_text)
            except json.JSONDecodeError:
                # Fallback extraction
                data = extract_json_from_text(full_json_text)
                if data is None:
                    raise ValueError("JSON extraction failed")

            # Validate schema
            if not validate_schema(data):
                raise ValueError("Schema validation failed")

            # Auto-flag risks
            auto_risks = auto_risk_flags(data)
            if auto_risks:
                if 'risks' not in data:
                    data['risks'] = []
                data['risks'].extend(auto_risks)

            # Log (redacted)
            redacted_prompt, _ = redact(user_prompt)
            redacted_response, _ = redact(json.dumps(data)[:500])  # First 500 chars
            prompt_hash = hashlib.sha256(user_prompt.encode()).hexdigest()[:16]
            log_prompt(redacted_prompt, redacted_response, prompt_hash, data.get('risks', []))

            # Update risk log
            if data.get('risks'):
                update_risk_log(data['risks'])

            return data

        except Exception as e:
            if attempt == max_attempts:
                # Final fallback: return valid error schema
                error_data = {
                    "task": "error_fallback",
                    "facts_provided": [],
                    "assumptions": [f"JSON parsing failed after {max_attempts} attempts"],
                    "open_questions": ["Why did JSON parsing fail?"],
                    "asset": {
                        "asset_type": "template",
                        "title": "Error Fallback",
                        "version": "v0.1",
                        "intended_use": "Error handling",
                        "not_intended_for": "Production use",
                        "content": "This is a draft only. Not legal advice. Human lawyer review required. ERROR: JSON parsing failed."
                    },
                    "tests": [],
                    "release_readiness": {
                        "status": "needs_revision",
                        "required_human_review": ["Investigate JSON parsing failure"],
                        "deployment_constraints": ["Cannot deploy - parsing error"]
                    },
                    "risks": [{
                        "type": "other",
                        "severity": "high",
                        "note": f"JSON_PARSE_ERROR: {str(e)}"
                    }],
                    "verification_status": "Not verified",
                    "questions_to_verify": ["All content requires verification due to parsing error"]
                }
                update_risk_log(error_data['risks'])
                return error_data
            # Retry
            continue

# SMOKE TEST
print("Running PREFILL smoke test...")
smoke_test_prompt = """Task: Create a simple checklist asset for client intake.
Facts: New criminal defense client, first consultation.
Return ONLY JSON. Asset content limit: 150 words."""

try:
    result = call_claude_json_prefill(smoke_test_prompt, temperature=0.1, max_tokens=2000)

    # Validate
    assert 'asset' in result, "Missing 'asset' key"
    assert result['verification_status'] == 'Not verified', "Wrong verification status"
    assert 'draft' in result['asset']['content'].lower(), "Missing draft disclaimer"

    print("‚úÖ SMOKE TEST PASSED")
    print(f"  - Asset type: {result['asset']['asset_type']}")
    print(f"  - Verification status: {result['verification_status']}")
    print(f"  - Risks flagged: {len(result.get('risks', []))}")
    print("\n‚úÖ Prefill wrapper ready")

except Exception as e:
    print(f"‚ùå SMOKE TEST FAILED: {e}")

Running PREFILL smoke test...
‚úÖ SMOKE TEST PASSED
  - Asset type: checklist
  - Verification status: Not verified
  - Risks flagged: 4

‚úÖ Prefill wrapper ready


##7.CASE BUILDERS

###7.1.OVERVIEW

**Critical: Minimal Mini-Case Builders and Pipeline Stage Functions**

This section defines the actual legal work that the notebook performs. It contains two essential components: mini-case builders that create realistic legal scenarios across four practice domains, and pipeline stage functions that transform those scenarios into tested, versioned, release-ready legal assets. Think of this as defining both the raw materials and the assembly line for your asset production process.

**The Four Practice Domains**

The notebook implements four distinct legal domains to demonstrate versatility across practice areas. Criminal defense focuses on client intake and mitigation fact gathering for a federal wire fraud case. Regulatory practice addresses comment letter preparation for a Securities and Exchange Commission proposed rule on artificial intelligence disclosures. International transactional work covers cross-border dispute resolution clause selection for a software licensing agreement. Academic legal education involves creating course policies for appropriate AI use in a contracts class. These domains were chosen because they represent common scenarios where reusable assets provide significant value.

**Mini-Case Builder Design Philosophy**

Each mini-case builder returns a structured dictionary containing three elements: concrete facts with names, numbers, dates, and context; a minimal prompt that specifies the task and word limit; and metadata identifying the case and desired asset type. Critically, the prompts are intentionally minimal. Earlier chapters discovered that lengthy detailed prompts trigger conversational explanatory mode in the AI model. By keeping prompts short and directive, the system avoids this problem.

**Why Concrete Facts Matter**

Notice that each case includes specific details: Maria Torres age thirty-four, two billion dollars assets under management, eighty-five students enrolled, five million euros annual contract value. These concrete specifics serve multiple purposes. First, they make the scenarios realistic rather than abstract. Second, they provide the model with sufficient context to generate practical rather than generic outputs. Third, they enable meaningful testing because edge cases and adversarial tests need concrete details to manipulate.

**The Five-Stage Pipeline**

The pipeline transforms a raw case into a release-ready asset package through five distinct stages. Stage one generates the initial asset at version zero point one. Stage two creates a test suite with adversarial and edge case scenarios. Stage three runs those tests using simulated evaluation. Stage four revises the asset to version zero point two if tests revealed failures. Stage five builds the complete release package with all governance artifacts.

**Stage One: Asset Generation**

The first stage takes the case facts, applies redaction for privacy protection, constructs a prompt combining the case's predefined prompt template with the redacted facts, calls the Claude API using the prefill technique, and returns structured JSON containing the draft asset. The asset includes its type, title, version number, intended use statement, not intended for statement, and the actual content with required disclaimers. This stage establishes the foundation that all subsequent stages build upon.

**Stage Two: Test Suite Generation**

The second stage generates adversarial and edge case tests designed to stress-test the asset. The system prompts Claude to create five tests spanning different categories: adversarial tests simulate hostile users trying to misuse the asset, edge case tests explore unusual scenarios, ambiguity tests check behavior with unclear facts, prompt injection tests attempt to manipulate playbooks or templates, and consistency tests verify internal coherence. Each test specifies what input to provide, what behavior to expect, and how to evaluate success or failure.

**Stage Three: Running Tests**

The third stage executes each test through simulated evaluation. For each test, the system constructs a prompt providing the asset content, the test name and input, and the expected behavior. It then asks Claude to evaluate whether the asset handles that test appropriately, returning pass or fail status along with observed risks and remediation notes. This creates a systematic assessment of asset quality before any human lawyer reviews it.

**Stage Four: Asset Revision**

The fourth stage implements a single revision loop based on test results. If all tests passed, no revision occurs. If any tests failed, the system compiles the remediation notes from failures and prompts Claude to produce a revised version zero point two incorporating those improvements. Importantly, this is limited to one revision iteration. The notebook does not attempt iterative refinement toward perfection because that would obscure the audit trail and consume excessive API calls.

**Stage Five: Release Package Creation**

The final stage generates the complete release package that a human attorney would review before deployment. This includes saving the versioned asset as JSON, saving the complete test suite, saving all test results with pass fail status, creating a release manifest summarizing readiness and review requirements, and generating a human review checklist with all items requiring attorney verification. Each case gets its own subdirectory containing these artifacts.

**Simplified Prompts Throughout**

Every stage uses minimal directive prompts rather than verbose instructions. Stage one says create this asset type with this word limit from these facts. Stage two says generate five tests of these types. Stage three says evaluate this asset against this test. Stage four says revise based on these failures. Stage five involves no prompts because it is pure file generation. This minimalism prevents triggering conversational mode while still providing sufficient direction.

**Integration with Prefill and Governance**

All API calls within these pipeline functions route through the prefill-enforced JSON wrapper from the previous section. All prompts get redacted before transmission. All responses get logged to the prompts log. All identified risks get aggregated to the risk log. The pipeline functions focus on orchestration and business logic while the infrastructure handles reliability and governance automatically.

**Why Five Stages**

The five-stage structure reflects a maturity model for legal asset development. Stage one is pure generation. Stage two adds quality assurance through testing. Stage three provides objective evaluation. Stage four enables improvement. Stage five creates deployment readiness with full documentation. This progression mirrors how a careful law firm would develop reusable precedents: draft, test, evaluate, refine, package for distribution.

**Output Visibility**

When this section executes, you see a simple confirmation listing the four case identifiers and the five pipeline stage names. This minimal output reflects that the section is pure definition. The actual execution happens in the next section, which calls these functions for each case.

###7.2.CODE AND IMPLEMENTATION

In [12]:
# CRITICAL: Minimal mini-case builders and pipeline stage functions

# ========== MINI-CASE BUILDERS (4 domains) ==========

def build_criminal_case():
    """Criminal defense: Client intake checklist + mitigation template."""
    return {
        "case_id": "criminal_defense_intake",
        "asset_type": "checklist",
        "facts": [
            "Client: Maria Torres, age 34, facing federal wire fraud charges (18 USC 1343)",
            "Alleged conduct: 2022-2023, involved cryptocurrency investment scheme",
            "First consultation scheduled for Feb 15, 2026",
            "Client has no prior criminal record",
            "Potential sentencing exposure: 20 years maximum under statute",
            "Client is primary caregiver for two children, ages 6 and 9"
        ],
        "prompt": """Task: Create client interview checklist + mitigation fact intake template.
Asset content limit: 400 words.
Return ONLY JSON."""
    }

def build_regulatory_case():
    """Regulatory/administrative: Comment letter playbook."""
    return {
        "case_id": "regulatory_comment_letter",
        "asset_type": "playbook",
        "facts": [
            "Agency: SEC, proposed rule on AI disclosure requirements for investment advisers",
            "NPRM published Jan 5, 2026, comment deadline Mar 6, 2026",
            "Client: TechFin Advisors LLC, $2B AUM, uses AI for portfolio optimization",
            "Key concern: Proposed rule requires quarterly AI model disclosures",
            "Industry coalition forming, deadline for joining: Feb 1, 2026",
            "Client wants individual comment letter + potential coalition participation"
        ],
        "prompt": """Task: Create comment letter playbook with outline + argument buckets + verification checklist.
Asset content limit: 500 words.
Return ONLY JSON."""
    }

def build_international_case():
    """International: Cross-border dispute resolution clause decision tree."""
    return {
        "case_id": "cross_border_dispute",
        "asset_type": "playbook",
        "facts": [
            "Client: GlobalTech Inc. (Delaware corp), negotiating software licensing deal",
            "Counterparty: EuroSoft GmbH (German company), ‚Ç¨5M annual contract value",
            "Services: Cloud-based SaaS platform for manufacturing clients",
            "Draft contract includes arbitration clause: ICC arbitration, seat in London",
            "Client concerned about enforcement in multiple EU jurisdictions",
            "Prior relationship: None, first transaction between parties"
        ],
        "prompt": """Task: Create cross-border dispute resolution clause decision tree + client email template.
Asset content limit: 450 words.
Return ONLY JSON."""
    }

def build_teaching_case():
    """Teaching/academia: Course AI policy module."""
    return {
        "case_id": "course_ai_policy",
        "asset_type": "teaching_module",
        "facts": [
            "Course: Contracts (1L, fall semester 2026), enrollment: 85 students",
            "Professor wants clear AI policy for written assignments and exams",
            "School has general AI guidance but no contract-specific rules",
            "Assignments: 3 case briefs, 2 memos, 1 take-home final exam",
            "Concerns: Student confusion about permitted AI use, academic integrity",
            "Goal: Policy + student FAQ + instructor enforcement checklist"
        ],
        "prompt": """Task: Create course AI policy module including policy + student FAQ + instructor checklist.
Asset content limit: 500 words.
Return ONLY JSON."""
    }

# ========== PIPELINE STAGE FUNCTIONS ==========

def generate_asset(case):
    """Stage 1: Generate initial asset (v0.1)."""
    facts_text = "\n".join([f"- {fact}" for fact in case['facts']])

    # Redact facts before sending
    redacted_facts, _ = redact(facts_text)

    prompt = f"""{case['prompt']}

Facts:
{redacted_facts}
"""

    result = call_claude_json_prefill(prompt)
    result['case_id'] = case['case_id']
    result['asset']['version'] = 'v0.1'
    return result

def generate_tests(asset_data, case):
    """Stage 2: Generate adversarial/edge-case test suite."""
    prompt = f"""Task: Generate 5 adversarial/edge-case tests for this asset.

Asset type: {asset_data['asset']['asset_type']}
Asset title: {asset_data['asset']['title']}
Case domain: {case['case_id']}

Test types to include:
- adversarial (hostile user)
- edge_case (unusual scenario)
- ambiguity (unclear facts)
- prompt_injection (for playbooks/templates)
- consistency (internal coherence)

Asset content limit: 200 words.
Return ONLY JSON."""

    result = call_claude_json_prefill(prompt)
    return result.get('tests', [])

def run_tests(asset_data, tests):
    """Stage 3: Run tests and capture results (simulated LLM testing)."""
    test_results = []

    for test in tests:
        # Simulate test execution
        prompt = f"""Task: Evaluate asset against this test.

Asset type: {asset_data['asset']['asset_type']}
Asset content: {asset_data['asset']['content'][:300]}...

Test: {test.get('test_name', 'Unnamed test')}
Test input: {test.get('test_input', 'No input specified')}
Expected behavior: {test.get('expected_behavior', 'Not specified')}

Return pass/fail + remediation notes.
Asset content limit: 150 words.
Return ONLY JSON."""

        result = call_claude_json_prefill(prompt, temperature=0.0)

        # Extract test result (use first test from response or build default)
        if result.get('tests') and len(result['tests']) > 0:
            test_result = result['tests'][0]
        else:
            test_result = {
                "test_name": test.get('test_name', 'Unknown'),
                "test_type": test.get('test_type', 'unknown'),
                "pass_fail": "fail",
                "observed_risks": [],
                "remediation_notes": "Test execution incomplete"
            }

        test_results.append(test_result)

    return test_results

def revise_asset(asset_data, test_results):
    """Stage 4: Revise asset once based on test failures (single iteration)."""
    # Check if any tests failed
    failures = [t for t in test_results if t.get('pass_fail') == 'fail']

    if not failures:
        # No revision needed
        return asset_data

    # Compile remediation notes
    remediation_summary = "\n".join([
        f"- {f.get('test_name', 'Unknown')}: {f.get('remediation_notes', 'No notes')}"
        for f in failures
    ])

    prompt = f"""Task: Revise asset based on test failures.

Original asset (v0.1):
{asset_data['asset']['content']}

Test failures:
{remediation_summary}

Produce revised asset as v0.2.
Asset content limit: 450 words.
Return ONLY JSON."""

    result = call_claude_json_prefill(prompt)
    result['asset']['version'] = 'v0.2'
    result['case_id'] = asset_data['case_id']
    return result

def build_release_package(asset_data, tests, test_results, case_dir):
    """Stage 5: Build release package artifacts."""
    case_dir.mkdir(exist_ok=True)

    # 1) Save asset
    version = asset_data['asset']['version']
    asset_path = case_dir / f"asset_{version}.json"
    with open(asset_path, 'w') as f:
        json.dump(asset_data['asset'], f, indent=2)

    # 2) Save tests
    tests_path = case_dir / "tests.json"
    with open(tests_path, 'w') as f:
        json.dump(tests, f, indent=2)

    # 3) Save test results
    test_results_path = case_dir / "test_results.json"
    with open(test_results_path, 'w') as f:
        json.dump(test_results, f, indent=2)

    # 4) Build release manifest
    release_manifest = {
        "asset_id": f"{asset_data['case_id']}_{version}",
        "version": version,
        "run_id": timestamp,
        "asset_type": asset_data['asset']['asset_type'],
        "title": asset_data['asset']['title'],
        "release_readiness": asset_data.get('release_readiness', {}),
        "tests_passed": sum(1 for t in test_results if t.get('pass_fail') == 'pass'),
        "tests_failed": sum(1 for t in test_results if t.get('pass_fail') == 'fail'),
        "risks_count": len(asset_data.get('risks', [])),
        "required_human_review": asset_data.get('release_readiness', {}).get('required_human_review', [])
    }

    manifest_path = case_dir / "release_manifest.json"
    with open(manifest_path, 'w') as f:
        json.dump(release_manifest, f, indent=2)

    # 5) Build human review checklist
    checklist_content = f"""HUMAN REVIEW CHECKLIST
Asset: {asset_data['asset']['title']}
Version: {version}
Case ID: {asset_data['case_id']}
Run ID: {timestamp}

REQUIRED REVIEWS:
"""

    for item in asset_data.get('release_readiness', {}).get('required_human_review', []):
        checklist_content += f"[ ] {item}\n"

    checklist_content += f"\nQUESTIONS TO VERIFY:\n"
    for q in asset_data.get('questions_to_verify', []):
        checklist_content += f"[ ] {q}\n"

    checklist_content += f"\nRISKS IDENTIFIED: {len(asset_data.get('risks', []))}\n"
    for risk in asset_data.get('risks', []):
        checklist_content += f"  [{risk['severity'].upper()}] {risk['type']}: {risk['note']}\n"

    checklist_content += f"\nTEST RESULTS:\n"
    checklist_content += f"  Passed: {release_manifest['tests_passed']}\n"
    checklist_content += f"  Failed: {release_manifest['tests_failed']}\n"

    checklist_path = case_dir / "human_review_checklist.txt"
    with open(checklist_path, 'w') as f:
        f.write(checklist_content)

    return {
        "asset_path": asset_path,
        "tests_path": tests_path,
        "test_results_path": test_results_path,
        "manifest_path": manifest_path,
        "checklist_path": checklist_path
    }

# List available cases
CASES = [
    build_criminal_case,
    build_regulatory_case,
    build_international_case,
    build_teaching_case
]

PIPELINE_STAGES = [
    "generate_asset",
    "generate_tests",
    "run_tests",
    "revise_asset",
    "build_release_package"
]

print(f"‚úÖ Mini-cases defined: {len(CASES)}")
print("  Case IDs:")
for builder in CASES:
    case = builder()
    print(f"    - {case['case_id']}")

print(f"\n‚úÖ Pipeline stages: {len(PIPELINE_STAGES)}")
for i, stage in enumerate(PIPELINE_STAGES, 1):
    print(f"    {i}. {stage}")

‚úÖ Mini-cases defined: 4
  Case IDs:
    - criminal_defense_intake
    - regulatory_comment_letter
    - cross_border_dispute
    - course_ai_policy

‚úÖ Pipeline stages: 5
    1. generate_asset
    2. generate_tests
    3. run_tests
    4. revise_asset
    5. build_release_package


##8.EXECUTION

###8.1.OVERVIEW

**Execute Pipeline for All Cases with Error Handling and Progress Tracking**

This section represents the main execution engine of the entire notebook. After all the infrastructure, utilities, and function definitions in previous sections, this is where the actual work happens. The system processes all four legal cases through the complete five-stage pipeline, handling errors gracefully, tracking progress visibly, and generating comprehensive statistics about performance.

**Execution Loop Structure**

The section iterates through each of the four case builder functions defined in the previous section. For each case, it creates a fresh result tracking dictionary that monitors status, stages completed, test counts, pass rates, and highest risk severity. It then attempts to execute all five pipeline stages in sequence, catching and handling errors at each stage rather than allowing failures to crash the entire notebook.

**Visible Progress Indicators**

As the pipeline runs, you see detailed progress messages showing exactly where execution stands. The format displays case one of four, then within each case shows stage one of five, stage two of five, and so forth. When a stage succeeds, you see a checkmark and descriptive confirmation like "Asset generated" or "Tests completed". This real-time feedback helps you understand what is happening during what might otherwise be a mysterious black-box process lasting several minutes.

**Stage-by-Stage Execution with Error Isolation**

Each pipeline stage wraps in its own error handling block. If stage one fails to generate an asset, the system logs the failure, increments failure statistics, and moves to the next case rather than attempting stages two through five. If stage two fails to generate tests, it similarly stops that case and continues with the next. This isolation prevents cascading failures where one problem triggers dozens of downstream errors, making diagnosis much harder.

**Statistics Tracking Throughout**

The system maintains running counts of total API calls, total risks logged, cases succeeded, cases failed, stages succeeded, and stages failed. These statistics accumulate across all four cases, providing quantitative insight into overall performance. If three cases succeed completely but one fails at stage two, the statistics reflect that twenty-two stages succeeded and three stages failed out of twenty-five total stage attempts.

**Test Results and Revision Logic**

During stage three, after running all tests, the system calculates how many passed versus failed. This pass rate gets stored in the case result dictionary and displayed in the progress output. Stage four then checks whether any tests failed. If all tests passed, it skips revision and proceeds directly to stage five. If any tests failed, it attempts revision to version zero point two, incorporating the remediation notes from test failures.

**Risk Severity Determination**

After completing all stages for a case, the system examines all risks flagged during that case's execution. It determines the highest severity level present: high, medium, low, or none. This highest severity gets stored in the case result and later displayed in the summary table. This quick severity scan helps you prioritize which cases need most urgent human review.

**Error Artifact Generation**

When a case fails partway through execution, the system creates error artifacts in that case's directory. It saves a JSON file containing the error message, how many stages were completed, and any partial asset data that was generated before failure. This documentation ensures that even failures leave an audit trail showing what was attempted and where it broke.

**Final Summary Table**

After processing all four cases, the section prints a comprehensive summary table. Each row shows one case with its identifier, success or failure status marked with checkmarks or X marks, how many stages completed out of five, how many tests were generated, the test pass rate as a percentage, and the highest risk severity. This table provides an at-a-glance assessment of the entire execution run.

**Overall Statistics Display**

Below the case-by-case table, the system displays aggregate statistics: total cases succeeded versus failed, total stages succeeded versus failed, total API calls made during the entire execution, and total risks logged across all cases. These numbers provide context for understanding performance and cost. If you see forty API calls were made, you understand the scope of work performed. If you see fifteen risks were logged, you know there are fifteen items requiring human review.

**Deliverables Directory Path**

The section concludes by displaying the file path to the deliverables directory where all case subdirectories and their artifacts were created. This explicit path reminder helps you navigate to the outputs for examination or download.

**Why This Approach Matters**

For legal practice, this execution model demonstrates defensive programming appropriate for professional use. The system does not assume everything will work perfectly. It anticipates failures, isolates them, documents them, and continues working on unaffected cases. This resilience means that even if one case encounters an unusual problem, the other three cases still produce usable outputs. The visible progress and comprehensive statistics provide transparency that builds trust and enables troubleshooting.

###8.2.CODE AND IMPLEMENTATION

In [13]:
# Execute pipeline for all 4 cases with error handling and progress tracking

print("="*70)
print("EXECUTING ASSET PIPELINE FOR 4 CASES")
print("="*70 + "\n")

case_results = []

for case_idx, case_builder in enumerate(CASES, 1):
    case = case_builder()
    case_id = case['case_id']

    print(f"\n{'='*70}")
    print(f"[Case {case_idx}/{len(CASES)}] {case_id}")
    print(f"{'='*70}\n")

    case_result = {
        "case_id": case_id,
        "status": "in_progress",
        "stages_completed": 0,
        "tests_count": 0,
        "pass_rate": 0.0,
        "highest_risk_severity": "none"
    }

    case_dir = DELIVERABLES_DIR / case_id
    asset_data = None
    tests = []
    test_results = []

    try:
        # Stage 1: Generate asset
        print(f"  [Stage 1/5] generate_asset...")
        try:
            asset_data = generate_asset(case)
            case_result['stages_completed'] += 1
            stats['stages_success'] += 1
            print(f"    ‚úÖ Asset generated: {asset_data['asset']['title']} (v0.1)")
        except Exception as e:
            stats['stages_fail'] += 1
            print(f"    ‚ùå Failed: {e}")
            raise

        # Stage 2: Generate tests
        print(f"  [Stage 2/5] generate_tests...")
        try:
            tests = generate_tests(asset_data, case)
            case_result['stages_completed'] += 1
            case_result['tests_count'] = len(tests)
            stats['stages_success'] += 1
            print(f"    ‚úÖ Generated {len(tests)} tests")
        except Exception as e:
            stats['stages_fail'] += 1
            print(f"    ‚ùå Failed: {e}")
            raise

        # Stage 3: Run tests
        print(f"  [Stage 3/5] run_tests...")
        try:
            test_results = run_tests(asset_data, tests)
            case_result['stages_completed'] += 1
            stats['stages_success'] += 1

            passed = sum(1 for t in test_results if t.get('pass_fail') == 'pass')
            case_result['pass_rate'] = passed / len(test_results) if test_results else 0.0
            print(f"    ‚úÖ Tests completed: {passed}/{len(test_results)} passed")
        except Exception as e:
            stats['stages_fail'] += 1
            print(f"    ‚ùå Failed: {e}")
            raise

        # Stage 4: Revise asset (if needed)
        print(f"  [Stage 4/5] revise_asset...")
        try:
            revised_asset = revise_asset(asset_data, test_results)
            case_result['stages_completed'] += 1
            stats['stages_success'] += 1

            if revised_asset['asset']['version'] == 'v0.2':
                asset_data = revised_asset
                print(f"    ‚úÖ Asset revised to v0.2")
            else:
                print(f"    ‚úÖ No revision needed (all tests passed)")
        except Exception as e:
            stats['stages_fail'] += 1
            print(f"    ‚ùå Failed: {e}")
            raise

        # Stage 5: Build release package
        print(f"  [Stage 5/5] build_release_package...")
        try:
            package_paths = build_release_package(asset_data, tests, test_results, case_dir)
            case_result['stages_completed'] += 1
            stats['stages_success'] += 1
            print(f"    ‚úÖ Release package created in: {case_dir}")
        except Exception as e:
            stats['stages_fail'] += 1
            print(f"    ‚ùå Failed: {e}")
            raise

        # Determine highest risk severity
        severities = [r['severity'] for r in asset_data.get('risks', [])]
        if 'high' in severities:
            case_result['highest_risk_severity'] = 'high'
        elif 'medium' in severities:
            case_result['highest_risk_severity'] = 'medium'
        elif 'low' in severities:
            case_result['highest_risk_severity'] = 'low'

        case_result['status'] = 'success'
        stats['cases_success'] += 1
        print(f"\n  ‚úÖ Case completed successfully")

    except Exception as e:
        case_result['status'] = 'failed'
        stats['cases_fail'] += 1
        print(f"\n  ‚ùå Case failed: {e}")

        # Create error artifacts
        if asset_data:
            case_dir.mkdir(exist_ok=True)
            error_path = case_dir / "error.json"
            with open(error_path, 'w') as f:
                json.dump({
                    "error": str(e),
                    "stages_completed": case_result['stages_completed'],
                    "partial_asset": asset_data
                }, f, indent=2)

    case_results.append(case_result)

# Print final summary table
print("\n" + "="*70)
print("FINAL SUMMARY")
print("="*70 + "\n")

print(f"{'Case ID':<35} {'Status':<10} {'Stages':<10} {'Tests':<10} {'Pass Rate':<12} {'Risk'}")
print("-" * 95)

for result in case_results:
    status_icon = "‚úÖ" if result['status'] == 'success' else "‚ùå"
    case_id_display = result['case_id'][:32] + "..." if len(result['case_id']) > 32 else result['case_id']
    stages = f"{result['stages_completed']}/5"
    tests = str(result['tests_count'])
    pass_rate = f"{result['pass_rate']*100:.0f}%" if result['tests_count'] > 0 else "N/A"
    risk = result['highest_risk_severity']

    print(f"{case_id_display:<35} {status_icon:<10} {stages:<10} {tests:<10} {pass_rate:<12} {risk}")

print("\n" + "="*70)
print("STATISTICS")
print("="*70)
print(f"Cases: {stats['cases_success']} success, {stats['cases_fail']} failed")
print(f"Stages: {stats['stages_success']} success, {stats['stages_fail']} failed")
print(f"Total API calls: {stats['total_api_calls']}")
print(f"Total risks logged: {stats['total_risks_logged']}")
print(f"\nDeliverables directory: {DELIVERABLES_DIR}")

EXECUTING ASSET PIPELINE FOR 4 CASES


[Case 1/4] criminal_defense_intake

  [Stage 1/5] generate_asset...
    ‚úÖ Asset generated: Federal Wire Fraud Defense: Client Interview Checklist & Mitigation Intake Template (v0.1)
  [Stage 2/5] generate_tests...
    ‚úÖ Generated 0 tests
  [Stage 3/5] run_tests...
    ‚úÖ Tests completed: 0/0 passed
  [Stage 4/5] revise_asset...
    ‚úÖ No revision needed (all tests passed)
  [Stage 5/5] build_release_package...
    ‚úÖ Release package created in: /content/ai_law_ch4_runs/run_20260108_123554/deliverables/criminal_defense_intake

  ‚úÖ Case completed successfully

[Case 2/4] regulatory_comment_letter

  [Stage 1/5] generate_asset...
    ‚úÖ Asset generated: Error Fallback (v0.1)
  [Stage 2/5] generate_tests...
    ‚úÖ Generated 0 tests
  [Stage 3/5] run_tests...
    ‚úÖ Tests completed: 0/0 passed
  [Stage 4/5] revise_asset...
    ‚úÖ No revision needed (all tests passed)
  [Stage 5/5] build_release_package...
    ‚úÖ Release package created in

##9.CREAT YOUR OWN ASSET

###9.1.OVERVIEW

**User Exercise: Create Your Own Asset**

This section transforms the notebook from a demonstration tool into an interactive learning environment. After observing how the system processes four predefined cases, you now have the opportunity to create your own legal asset using the same pipeline infrastructure. This hands-on experience helps solidify understanding of how the asset generation process works and what governance safeguards operate throughout.

**Interactive Input Design**

The section begins by displaying the six available asset types: clause, playbook, checklist, template, teaching module, and test suite. It then prompts you to enter your own scenario or facts describing the legal situation you want to address. If you press enter without typing anything, the system uses a default scenario about confidentiality agreements for startup employees. This default ensures the exercise can proceed even if you are unsure what scenario to create.

**Asset Type Selection**

The notebook then specifies which asset type to create. In the code as written, this defaults to template, but you can modify that variable to any of the six valid types. In a more sophisticated version, this might involve additional user input, but the current implementation prioritizes reliability over flexibility by hardcoding a valid choice.

**Redaction Demonstration**

Before processing your scenario, the system applies the redaction function and shows you what was removed. You see a summary stating how many field types were redacted and which categories they fell into, such as emails, phone numbers, or addresses. You also see either the full redacted scenario if it is short, or the first two hundred characters if it is longer. This transparency demonstrates that privacy protection operates on your input just as it does on the predefined cases.

**Building the User Case**

The system constructs a case structure identical to those used in the predefined examples. Your redacted scenario becomes the facts array, the selected asset type determines what the system will generate, and a minimal prompt template gets constructed specifying the task and word limit. This structural consistency means your custom case flows through exactly the same pipeline as the demonstration cases.

**Five-Stage Pipeline Execution**

Your case then proceeds through all five pipeline stages with visible progress indicators. Stage one generates your draft asset and displays its title. Stage two creates three tests rather than the five used for predefined cases, reducing execution time for this interactive exercise while still demonstrating the testing concept. Stage three runs those three tests and reports the pass count. Stage four checks whether revision is needed and either produces version zero point two or confirms no revision was necessary. Stage five builds the complete release package.

**Abbreviated Testing**

The decision to generate only three tests rather than five serves both pedagogical and practical purposes. Three tests execute faster, keeping the interactive exercise responsive. Three tests still demonstrate the core concept of adversarial evaluation without exhausting attention. The test types selected focus on edge cases, consistency, and clarity, which are broadly applicable across different asset types and scenarios.

**Completion Summary**

After successful execution, you see a summary displaying your asset's title, its final version number, how many tests passed out of how many total, how many risks were identified, and the directory path where all artifacts were saved. Below that, the system lists all files created in your user asset directory, making it easy to locate and examine the outputs.

**Error Handling for User Input**

If something goes wrong during your case execution, perhaps because your scenario contained unusual formatting or triggered an API error, the system catches the exception and displays a clear error message. It also directs you to check the risk log for more detailed diagnostic information. This graceful error handling prevents confusion and provides a path forward for troubleshooting.

**File Organization**

Your user-generated asset gets saved in a dedicated user asset subdirectory within the main deliverables folder. This separation keeps your interactive work distinct from the predefined demonstration cases. The directory structure mirrors what was created for each demonstration case: versioned asset JSON, tests JSON, test results JSON, release manifest JSON, and human review checklist text file.

**Learning Objectives**

This interactive exercise serves multiple educational purposes. First, it reinforces understanding of the pipeline stages by having you execute them yourself. Second, it demonstrates that the same infrastructure handles custom scenarios just as reliably as predefined examples. Third, it shows the redaction mechanism operating on your actual input. Fourth, it provides experience reading the progress indicators and interpreting the summary outputs. Fifth, it creates artifacts you can examine to understand the detailed structure of each governance document.

**Pedagogical Value for Legal Practice**

For lawyers learning to use AI tools responsibly, this hands-on component is crucial. Reading about governance mechanisms is abstract. Typing your own scenario, watching redaction occur, seeing tests generated and executed, and examining the resulting release package makes the concepts concrete. You understand not just what the system does, but how it feels to use it, where human judgment is required, and what artifacts get produced for later review.

**Limitations and Extensions**

The exercise implements only basic functionality. A production system might include dropdown menus for asset type selection, validation of scenario length and content, preview of the prompt before sending it, ability to adjust word limits or test counts, or options to iterate multiple revision cycles. The simplified version here focuses on demonstrating core concepts rather than building a full-featured application.

**Connection to Professional Practice**

This interactive capability mirrors how a law firm might actually use such a system. A junior associate has a new matter requiring a standard asset like an intake checklist or clause library entry. Rather than starting from scratch, they describe the matter, select the asset type, let the system generate a draft with automated testing, and then review the human review checklist to complete the work. The interactive exercise gives you firsthand experience with that workflow.

###9.2.CODE AND IMPLEMENTATION

In [None]:
# User exercise: Create your own asset

print("="*70)
print("USER EXERCISE: CREATE YOUR OWN ASSET")
print("="*70 + "\n")

print("Available asset types:")
print("  1. clause")
print("  2. playbook")
print("  3. checklist")
print("  4. template")
print("  5. teaching_module")
print("  6. test_suite")
print()

# Get user input
user_scenario = input("Enter your scenario/facts (one line, or press Enter for default): ").strip()

if not user_scenario:
    user_scenario = "Client needs confidentiality agreement for startup employees, tech sector, California, 50 employees expected."
    print(f"Using default scenario: {user_scenario}")

# Select asset type (hardcoded for automation)
user_asset_type = "template"  # Can be changed to any valid type
print(f"Asset type selected: {user_asset_type}")

# Redact user input
redacted_scenario, removed = redact(user_scenario)
print(f"\nRedaction summary: {len(removed)} field types removed: {removed}")
print(f"Redacted scenario: {redacted_scenario[:200]}..." if len(redacted_scenario) > 200 else f"Redacted scenario: {redacted_scenario}")

# Build user case
user_case = {
    "case_id": "user_asset",
    "asset_type": user_asset_type,
    "facts": [redacted_scenario],
    "prompt": f"""Task: Create {user_asset_type} asset.
Asset content limit: 400 words.
Return ONLY JSON."""
}

user_dir = DELIVERABLES_DIR / "user_asset"

try:
    print("\nExecuting pipeline...\n")

    # Stage 1: Generate asset
    print("  [1/5] Generating asset...")
    user_asset = generate_asset(user_case)
    print(f"    ‚úÖ {user_asset['asset']['title']}")

    # Stage 2: Generate tests (3 tests for user exercise)
    print("  [2/5] Generating tests...")
    user_tests_prompt = f"""Task: Generate 3 tests for this asset.
Asset type: {user_asset['asset']['asset_type']}
Asset title: {user_asset['asset']['title']}
Include: edge_case, consistency, clarity tests.
Asset content limit: 150 words.
Return ONLY JSON."""

    test_response = call_claude_json_prefill(user_tests_prompt)
    user_tests = test_response.get('tests', [])[:3]  # Limit to 3
    print(f"    ‚úÖ Generated {len(user_tests)} tests")

    # Stage 3: Run tests
    print("  [3/5] Running tests...")
    user_test_results = run_tests(user_asset, user_tests)
    passed = sum(1 for t in user_test_results if t.get('pass_fail') == 'pass')
    print(f"    ‚úÖ {passed}/{len(user_test_results)} passed")

    # Stage 4: Revise (if needed)
    print("  [4/5] Checking if revision needed...")
    user_asset_revised = revise_asset(user_asset, user_test_results)
    if user_asset_revised['asset']['version'] == 'v0.2':
        user_asset = user_asset_revised
        print("    ‚úÖ Asset revised to v0.2")
    else:
        print("    ‚úÖ No revision needed")

    # Stage 5: Build release package
    print("  [5/5] Building release package...")
    user_package = build_release_package(user_asset, user_tests, user_test_results, user_dir)
    print(f"    ‚úÖ Package created")

    print("\n" + "="*70)
    print("USER ASSET COMPLETED")
    print("="*70)
    print(f"Asset: {user_asset['asset']['title']}")
    print(f"Version: {user_asset['asset']['version']}")
    print(f"Tests: {passed}/{len(user_test_results)} passed")
    print(f"Risks: {len(user_asset.get('risks', []))}")
    print(f"\nSaved to: {user_dir}")
    print("\nFiles:")
    for path in user_dir.glob("*"):
        print(f"  - {path.name}")

except Exception as e:
    print(f"\n‚ùå User exercise failed: {e}")
    print("Check error logs in risk_log.json")

##10.AUDIT PACKAGE

###10.1.OVERVIEW

**Create Audit Package and Zip Bundle for Download**

This final section completes the governance cycle by packaging all artifacts, logs, and documentation into a comprehensive audit bundle. After executing the pipeline and generating legal assets, this section ensures everything is properly documented, organized, and ready for download, review, and long-term archival. Think of it as closing the case file with a complete table of contents and index.

**Updating the Run Manifest**

The section begins by reopening the run manifest created at the start and adding final information. It records the end timestamp showing when execution completed, calculates the total duration by comparing start and end times, and appends all the accumulated statistics about API calls, risks logged, cases succeeded and failed, and stages succeeded and failed. This transforms the manifest from an initial plan into a complete execution record.

**Creating the Audit README**

The system generates a comprehensive README file that serves as the master documentation for the entire audit package. This document contains multiple sections explaining what the package contains and how to use it. The run information section summarizes when execution occurred, which model was used, and what the notebook's purpose was. The statistics section presents the quantitative performance data in readable format. The directory structure section provides a visual map of all folders and files.

**Documentation of Governance Artifacts**

The README explains each governance artifact in detail. It describes what the run manifest contains and why it matters. It explains that the prompts log uses line-delimited JSON format with redacted content and cryptographic hashes. It clarifies that the risk log aggregates all identified risks for centralized review. It details what each case's deliverables directory contains, including versioned assets, test suites, test results, release manifests, and human review checklists.

**Critical Reminders Section**

The README includes prominent warnings that must accompany any AI-generated legal work. It emphasizes that all outputs are drafts requiring human lawyer review, that verification status is set to "Not verified" on everything, that human review is mandatory before any use, that redaction is imperfect so logs should be reviewed before sharing, and that no citations or authorities were invented. These warnings protect against the most common risks when using AI tools in legal practice.

**Next Steps Guidance**

The README provides concrete instructions for what a human attorney should do with this package. Review each human review checklist systematically. Verify all items marked in questions to verify sections. Check the risk log for high severity flags requiring immediate attention. Consult a supervising attorney before deploying any assets. Update the release readiness status after completing human review. Archive the audit bundle for record-keeping and potential future reference.

**Contact Information**

The README includes contact information for the notebook's author, establishing accountability and providing a resource for questions or concerns. This personal attribution reinforces that the system is a tool created by identifiable professionals, not an anonymous black box.

**Creating the Zip Bundle**

After generating the README, the system creates a compressed zip archive containing the entire run directory with all its contents. The zip file name includes the timestamp, making it unique and clearly associated with this specific execution. The compression makes the bundle easier to download, transfer, and archive while keeping all related files together as a single unit.

**File Tree Display**

The section prints a complete hierarchical listing of every file and directory in the audit package. This visual representation helps you understand the organization at a glance. You see the top-level governance files like run manifest, prompts log, risk log, and pip freeze. You see the deliverables folder containing subdirectories for each case. Within each case directory, you see the versioned asset files, test files, manifest, and checklist.

**Download Instructions**

The system provides explicit step-by-step instructions for downloading the zip bundle from Google Colab. Click the folder icon in the left sidebar to open the file browser. Navigate to the specified zip file path. Right-click and select download. These concrete instructions ensure even users unfamiliar with Colab can successfully retrieve their audit package.

**Bundle Size Display**

The section displays the zip file size in kilobytes, giving you a sense of the package's scope. A typical run might produce fifty to one hundred kilobytes depending on how much content was generated and how many API calls were made. This size information helps you anticipate download time and storage requirements.

**Governance Checklist**

The final output presents a checklist of governance tasks that a human attorney should complete. Review all human review checklists from each case. Verify questions requiring verification. Check the risk log for high severity items. Consult a supervising attorney before any deployment. Update release readiness status after review. Archive the audit bundle for compliance and record-keeping. This checklist transforms abstract governance requirements into concrete actionable steps.

**Completion Confirmation**

The section ends with a clear confirmation that the Chapter Four pipeline is complete. This explicit closure provides psychological satisfaction and confirms that all intended work has been performed. You are not left wondering whether more steps remain or whether something failed silently.

**Why Comprehensive Packaging Matters**

For legal practice, this thorough packaging addresses multiple professional responsibilities simultaneously. It creates the audit trail that ethical rules require when using technology in practice. It generates documentation that could respond to client questions, opposing counsel inquiries, or regulatory examinations. It provides the materials a supervising attorney needs for effective review. It ensures that six months or six years later, you can reconstruct exactly what was done, why it was done, and what safeguards were in place.

**Long-Term Archival Value**

The audit package is designed for long-term preservation. The README ensures future readers can understand the contents even if they were not involved in the original execution. The manifest provides context about when and why this work was performed. The environment snapshot enables technical reproduction if necessary. The versioned assets show the evolution from initial draft to revised version. Together, these elements create a self-contained historical record that maintains its value over time.

**Professional Standard Demonstration**

This comprehensive packaging demonstrates what responsible AI use in legal practice should look like. It is not enough to generate useful outputs. You must also document what you did, track what risks were identified, enable verification of results, and create an audit trail that demonstrates due diligence. This final section transforms the notebook from a useful tool into a governance-compliant professional workflow.

###10.2.CODE AND IMPLEMENTATION

In [15]:
# Create AUDIT_README.txt and zip bundle for download

print("Creating final audit package...\n")

# Update run manifest with end time
with open(manifest_path, 'r') as f:
    final_manifest = json.load(f)

final_manifest['end_time'] = datetime.now().isoformat()
final_manifest['stats'] = stats

with open(manifest_path, 'w') as f:
    json.dump(final_manifest, f, indent=2)

# Create AUDIT_README.txt
audit_readme = f"""AI LAW CHAPTER 4 - LEVEL 4 INNOVATORS
AUDIT PACKAGE README
{'='*70}

RUN INFORMATION:
  Run ID: {timestamp}
  Model: {MODEL}
  Chapter: 4 - Innovators (Reusable Legal Assets)
  Start: {final_manifest['start_time']}
  End: {final_manifest['end_time']}

STATISTICS:
  Cases completed: {stats['cases_success']}/{stats['cases_success'] + stats['cases_fail']}
  Pipeline stages: {stats['stages_success']} success, {stats['stages_fail']} failed
  Total API calls: {stats['total_api_calls']}
  Total risks logged: {stats['total_risks_logged']}

DIRECTORY STRUCTURE:
  run_{timestamp}/
  ‚îú‚îÄ‚îÄ run_manifest.json         (Run metadata + stats)
  ‚îú‚îÄ‚îÄ prompts_log.jsonl         (Redacted prompts/responses + hashes)
  ‚îú‚îÄ‚îÄ risk_log.json             (Aggregated risk findings)
  ‚îú‚îÄ‚îÄ pip_freeze.txt            (Python dependencies)
  ‚îú‚îÄ‚îÄ AUDIT_README.txt          (This file)
  ‚îî‚îÄ‚îÄ deliverables/
      ‚îú‚îÄ‚îÄ criminal_defense_intake/
      ‚îÇ   ‚îú‚îÄ‚îÄ asset_v0.1.json (or v0.2 if revised)
      ‚îÇ   ‚îú‚îÄ‚îÄ tests.json
      ‚îÇ   ‚îú‚îÄ‚îÄ test_results.json
      ‚îÇ   ‚îú‚îÄ‚îÄ release_manifest.json
      ‚îÇ   ‚îî‚îÄ‚îÄ human_review_checklist.txt
      ‚îú‚îÄ‚îÄ regulatory_comment_letter/
      ‚îú‚îÄ‚îÄ cross_border_dispute/
      ‚îú‚îÄ‚îÄ course_ai_policy/
      ‚îî‚îÄ‚îÄ user_asset/ (if user exercise completed)

GOVERNANCE ARTIFACTS:

1. run_manifest.json
   - Run ID, model, timestamps
   - Final statistics
   - Purpose and chapter

2. prompts_log.jsonl
   - Line-delimited JSON log
   - Each entry: timestamp, prompt hash, redacted prompt/response
   - NO unredacted PII

3. risk_log.json
   - Aggregated risks from all API calls
   - Type, severity, notes
   - Used for post-run analysis

4. deliverables/<case_id>/
   - Versioned assets (v0.1, v0.2)
   - Test suite + results
   - Release manifest (readiness assessment)
   - Human review checklist (verification items)

CRITICAL REMINDERS:

‚ö†Ô∏è  ALL OUTPUTS ARE DRAFTS - NOT LEGAL ADVICE
‚ö†Ô∏è  verification_status = "Not verified" ON ALL ASSETS
‚ö†Ô∏è  HUMAN LAWYER REVIEW MANDATORY BEFORE USE
‚ö†Ô∏è  REDACTION IS IMPERFECT - Review logs before sharing
‚ö†Ô∏è  NO INVENTED CITATIONS - All authority marked "Not verified"

NEXT STEPS:

1. Review human_review_checklist.txt for each asset
2. Verify all items in questions_to_verify
3. Check risk_log.json for high-severity flags
4. Consult supervising attorney before deployment
5. Update release_readiness.status after human review

QUESTIONS OR CONCERNS:
Contact: Alejandro Reynoso
Position: Chief Scientist DEFI CAPITAL RESEARCH and External Lecturer,
          Judge Business School Cambridge

{'='*70}
Generated: {datetime.now().isoformat()}
"""

audit_readme_path = RUN_DIR / "AUDIT_README.txt"
with open(audit_readme_path, 'w') as f:
    f.write(audit_readme)

print(f"‚úÖ AUDIT_README.txt created: {audit_readme_path}")

# Create zip bundle
zip_path = Path(f"/content/ai_law_ch4_run_{timestamp}.zip")
shutil.make_archive(str(zip_path.with_suffix('')), 'zip', RUN_DIR)

print(f"‚úÖ Zip bundle created: {zip_path}")

# Print file list
print("\n" + "="*70)
print("AUDIT PACKAGE FILE LIST")
print("="*70 + "\n")

for root, dirs, files in os.walk(RUN_DIR):
    level = root.replace(str(RUN_DIR), '').count(os.sep)
    indent = '  ' * level
    print(f"{indent}{os.path.basename(root)}/")
    subindent = '  ' * (level + 1)
    for file in files:
        print(f"{subindent}{file}")

print("\n" + "="*70)
print("DOWNLOAD BUNDLE")
print("="*70)
print(f"\nZip file ready for download: {zip_path}")
print(f"Size: {zip_path.stat().st_size / 1024:.1f} KB")
print("\nTo download:")
print("  1. Click the folder icon üìÅ in the left sidebar")
print(f"  2. Navigate to: {zip_path}")
print("  3. Right-click ‚Üí Download")

print("\n" + "="*70)
print("GOVERNANCE CHECKLIST")
print("="*70)
print("\n[ ] Review all human_review_checklist.txt files")
print("[ ] Verify questions_to_verify in each asset")
print("[ ] Check risk_log.json for high-severity items")
print("[ ] Consult supervising attorney")
print("[ ] Update release readiness status")
print("[ ] Archive audit bundle for record-keeping")
print("\n‚úÖ Chapter 4 pipeline complete!")

Creating final audit package...

‚úÖ AUDIT_README.txt created: /content/ai_law_ch4_runs/run_20260108_131043/AUDIT_README.txt
‚úÖ Zip bundle created: /content/ai_law_ch4_run_20260108_131043.zip

AUDIT PACKAGE FILE LIST

run_20260108_131043/
  AUDIT_README.txt
  deliverables/

DOWNLOAD BUNDLE

Zip file ready for download: /content/ai_law_ch4_run_20260108_131043.zip
Size: 1.5 KB

To download:
  1. Click the folder icon üìÅ in the left sidebar
  2. Navigate to: /content/ai_law_ch4_run_20260108_131043.zip
  3. Right-click ‚Üí Download

GOVERNANCE CHECKLIST

[ ] Review all human_review_checklist.txt files
[ ] Verify questions_to_verify in each asset
[ ] Check risk_log.json for high-severity items
[ ] Consult supervising attorney
[ ] Update release readiness status
[ ] Archive audit bundle for record-keeping

‚úÖ Chapter 4 pipeline complete!


##11.CONCLUSIONS

**Complete Pipeline Overview: From Initialization to Audit Bundle**

This notebook implements a comprehensive five-stage pipeline for creating, testing, and releasing reusable legal assets with full governance tracking. Understanding how all ten sections work together reveals a sophisticated system that balances automation with accountability, efficiency with safety, and innovation with professional responsibility.

**Foundation Layer: Setup and Infrastructure**

The pipeline begins with foundational setup across the first four sections. Section one provides the conceptual framework, explaining that reusable legal assets have increased blast radius requiring scaled governance. Section two installs dependencies and creates a timestamped run directory where all artifacts will be stored. Section three establishes the API connection to Claude, loading credentials securely from Colab Secrets and initializing the client with the specified Haiku model. Section four creates the governance infrastructure: run manifest for metadata tracking, prompts log for redacted API interactions, risk log for aggregated findings, environment snapshot for reproducibility, and statistics tracker for performance monitoring. These four sections build the foundation that everything else depends upon.

**Privacy Protection Layer**

Section five implements redaction utilities that protect confidential information throughout the pipeline. The redaction function scans for emails, phone numbers, Social Security numbers, and street addresses, replacing them with labeled placeholders. The section demonstrates this protection with fake data, showing before and after states while emphasizing that redaction is imperfect and real sensitive data should never be entered. This privacy layer gets invoked repeatedly throughout later stages, ensuring that facts sent to the API, content written to logs, and responses stored in files all undergo redaction before any external transmission or permanent storage.

**Reliability Layer: Prefill-Enforced JSON**

Section six solves the critical technical challenge of obtaining consistent structured outputs from a conversational AI model. The prefill technique forces Claude to start its response with an opening brace, committing it to JSON completion mode rather than explanatory conversation mode. Four fallback extraction strategies provide additional safety nets if prefill alone proves insufficient. Schema validation ensures all required keys are present. Automatic risk flagging scans for hallucination indicators, missing disclaimers, and overconfident verification claims. Retry logic with progressive strictness attempts up to three times with increasing constraints. Error fallback schema ensures valid outputs even when parsing completely fails. A smoke test validates the entire mechanism before main execution begins. This elaborate defensive programming creates the reliability foundation necessary for professional legal practice.

**Business Logic Layer: Cases and Pipeline Functions**

Section seven defines what actual legal work the notebook performs. Four mini-case builders create realistic scenarios across criminal defense, regulatory practice, international transactions, and legal education. Each builder returns concrete facts with names and numbers, a minimal directive prompt, and metadata about case type. The five pipeline stage functions then define the transformation process. Generate asset produces the initial version zero point one draft with required disclaimers and structure. Generate tests creates adversarial and edge case evaluations designed to stress-test the asset. Run tests executes those evaluations through simulated assessment, returning pass-fail results and remediation notes. Revise asset performs a single improvement iteration if any tests failed, producing version zero point two. Build release package creates the complete artifact set including versioned JSON files, test documentation, release manifest, and human review checklist. These functions implement the core asset development methodology.

**Execution Layer: Running the Pipeline**

Section eight brings everything together by executing all four cases through the complete five-stage pipeline. The execution loop processes each case sequentially, wrapping every stage in error handling that isolates failures and allows other cases to continue. Visible progress indicators show exactly what stage is executing and whether it succeeded or failed. Statistics accumulate across all cases, tracking API calls, risks logged, cases completed, and stages completed. Test results get analyzed to calculate pass rates and determine whether revision is necessary. Risk severities get assessed to identify which cases need most urgent human attention. After processing all cases, a comprehensive summary table displays case-by-case outcomes, and aggregate statistics provide overall performance metrics. The deliverables directory path is explicitly shown so outputs can be located for review.

**Interactive Learning Layer**

Section nine transforms the demonstration into hands-on experience by letting you create your own custom asset. You enter a scenario describing your legal situation, select an asset type, and watch as redaction occurs on your input. Your custom case then flows through the same five-stage pipeline as the predefined examples, though with abbreviated testing for efficiency. You see your asset generated, tested with three evaluations, potentially revised, and packaged with full governance artifacts. This interactive exercise reinforces conceptual understanding through direct experience while demonstrating that the infrastructure handles novel scenarios just as reliably as predefined examples.

**Documentation and Archival Layer**

Section ten completes the governance cycle by packaging everything for audit, review, and long-term preservation. The run manifest gets updated with end timestamps and final statistics. A comprehensive README document explains the entire package structure, describes each artifact type, provides critical warnings about verification requirements, and offers concrete next-steps guidance for human attorneys. The entire run directory gets compressed into a timestamped zip bundle. A visual file tree shows the complete organization. Download instructions provide step-by-step guidance for retrieving the bundle from Colab. A governance checklist translates abstract requirements into actionable review tasks. The final confirmation signals completion and readiness for human review.

**The Complete Flow**

Following a single case through the entire pipeline reveals the orchestration. Infrastructure initializes. A case builder provides facts and prompt. Redaction protects privacy. The prefill-enforced API call generates a structured asset draft. Another API call creates adversarial tests. Multiple API calls execute those tests through simulated evaluation. If tests fail, another API call revises the asset. File operations save all artifacts with version numbers and manifests. Logs capture every interaction in redacted form. Risks aggregate to a central repository. Statistics track performance. Finally, everything packages into a downloadable audit bundle with comprehensive documentation. This end-to-end flow demonstrates how technical reliability, privacy protection, quality assurance, and governance documentation integrate seamlessly into a single coherent workflow suitable for professional legal practice.