#**AI CONSULTING CHAPTER 2:REASONERS**

---

##0.REFERENCE

https://chatgpt.com/share/696e5534-ffb8-8012-bc9a-3643c82c81a2

##1.CONTEXT

**Introduction ‚Äî How to Use This Google Colab Notebook (Chapter 2, Level 2: Reasoners)**

This notebook is a practical companion to Chapter 2 of the AI-Assisted Consulting maturity ladder. It is designed for MBA and Master of Finance audiences who want to use generative AI as a disciplined reasoning aid, without falling into the common trap of treating fluent output as truth. The central idea of Level 2 is simple but demanding: the model is allowed to help you structure thinking, but it is not allowed to make decisions for you. Your goal is not to ‚Äúget answers.‚Äù Your goal is to produce reasoning structures that are inspectable, challengeable, and governable, with a clear trail of what was assumed, what is unknown, and what must be verified.

Level 2 focuses on four core reasoning artifacts. First, an issue tree or problem decomposition: a structured hypothesis about what drives the problem. Second, alternatives: multiple plausible ways to frame or address the situation. Third, trade-offs: explicit tensions among objectives that cannot all be optimized at once. Fourth, an assumption register: a transparent inventory of what must be true for the reasoning to hold. When these are done well, they make your thinking easier to review, easier to improve, and harder to fake. When they are done poorly, they create a dangerous illusion of rigor.

That illusion is the primary risk this notebook is built to fight. Structured output feels professional. It looks like what consultants deliver. It can sound decisive even when the underlying inputs are thin. In finance and strategy settings, that is a perfect recipe for decision laundering: burying judgment inside impressive formatting and pretending the structure proves the conclusion. This notebook treats that as a governance failure. It forces an explicit label on every output: **verification_status = ‚ÄúNot verified.‚Äù** That label is not decoration. It is a rule. It means the model‚Äôs job ends at structure, and your job begins with validation.

You will see this governance posture reflected throughout the notebook. Prompts are written in neutral language. They forbid recommendations and rankings. They require open questions. They demand symmetry in trade-offs so that options are not covertly ‚Äúsold‚Äù as superior. They also prioritize minimum-necessary input. In professional practice, confidentiality is not an afterthought. Even in a simulated environment, good habits matter. The notebook includes a basic redaction step and logs prompts in a redacted way to reduce the chance that sensitive details are stored or shared unintentionally. This does not eliminate confidentiality risk, but it teaches you to reduce exposure and to treat input discipline as part of professional competence.

The notebook is organized as a complete run that produces an auditable set of artifacts. Think of it like a mini engagement file. A run manifest records what model and settings were used and captures a simple environment fingerprint. A prompts log records that prompts occurred without storing their full content. A risk log captures issues that should make you slow down, such as missing assumptions, shallow decomposition, or overly narrow framing. A verification register transforms open questions into a checklist that a human must complete before any downstream use. An approvals log creates placeholders for the human sign-off that must exist if this were a real client setting. The key point is that Level 2 is not just about producing structures. It is about producing structures with evidence of responsible process.

You will also work through multiple mini-cases. These are intentionally incomplete. Missing details are not a bug; they are the point. In consulting and corporate finance, the earliest phases of work often happen with partial information. The correct response is not to invent facts. The correct response is to map what the unknowns are, how they affect the structure, and what would change the direction of thinking. This notebook trains that skill by forcing the model to expose gaps rather than fill them. Your role as the analyst is to decide what to validate first and what questions matter most.

As you use this notebook, keep two disciplines in mind. First, separate facts from assumptions. Facts are what you were explicitly given. Assumptions are what you are temporarily treating as true to build a structure. You should be able to point to every assumption and say why it matters, how it could be tested, and what would happen if it turns out to be wrong. Second, keep neutrality. Neutrality does not mean indecision; it means intellectual honesty. Alternatives should be framed as real possibilities, not straw men. Trade-offs should be articulated in balanced language, not in a way that subtly pushes a preferred answer.

Finally, remember what this notebook is not. It is not a substitute for professional judgment. It is not a research engine. It does not verify facts. It does not produce final recommendations. It is a disciplined workshop for producing high-quality reasoning scaffolds and the audit trail around them. If you want to use AI responsibly in strategy or finance, this is the habit you need: structure first, verify second, decide last, and keep records at every step.

If you treat this notebook as a thinking partner that forces transparency, it will make you sharper. If you treat it as an authority, it will make you dangerously overconfident. The difference is not the model. The difference is governance. This notebook is built to teach that governance as a practical skill.


##2.LIBRARIES AND ENVIRONMENT

In [None]:
# Cell 2
# Type: Code
# Goal: Install dependencies, import libraries, create run directory structure
# Output: Print run directory paths confirming setup

# Cell 2: Install + Imports + Run Directory

import json
import os
import re
import hashlib
import platform
import textwrap
from datetime import datetime
from pathlib import Path
import subprocess
import uuid

# Install Anthropic SDK
print("Installing Anthropic SDK...")
subprocess.run(["pip", "install", "-q", "anthropic"], check=True)
import anthropic

print("‚úì Libraries imported")

# Create unique run directory
timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
short_id = str(uuid.uuid4())[:8]
run_name = f"run_{timestamp}_{short_id}"
base_dir = Path(f"/content/ai_consulting_ch2_runs/{run_name}")
deliverables_dir = base_dir / "deliverables"

base_dir.mkdir(parents=True, exist_ok=True)
deliverables_dir.mkdir(exist_ok=True)

print(f"\n‚úì Run directory created:")
print(f"  Base: {base_dir}")
print(f"  Deliverables: {deliverables_dir}")

Installing Anthropic SDK...
‚úì Libraries imported

‚úì Run directory created:
  Base: /content/ai_consulting_ch2_runs/run_20260119_140749_1fa07eef
  Deliverables: /content/ai_consulting_ch2_runs/run_20260119_140749_1fa07eef/deliverables


  timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")


##3.API AND CLIENT INITIALIZATION

###3.1.OVERVIEW

**Cell 3: Connecting to Claude's API**

This cell establishes the connection between your notebook and Anthropic's Claude AI service. Think of it as plugging in the power cord before you can use any electrical device.

When you work with AI models like Claude, they don't run on your computer. Instead, they run on Anthropic's servers in the cloud. To access them, you need two things: an API key (which is like a password that proves you're authorized to use the service) and a client (which is the software that manages the communication).

**What happens in this cell:**

First, the notebook retrieves your API key from Google Colab's secure storage system. You should have already added this key to Colab's "Secrets" section (the key icon in the sidebar). This is much safer than typing your API key directly into the code where others might see it.

Second, it creates what we call a "client" - a connection manager that will handle all your requests to Claude. Every time you want Claude to analyze something or generate reasoning structures, this client will send your request to Anthropic's servers and bring back the response.

Third, it specifies exactly which version of Claude you're using. In this case, we're using Claude Sonnet 4.5, which is optimized for structured reasoning tasks - perfect for management consulting work. Different Claude models have different strengths; Sonnet strikes a balance between speed and sophisticated reasoning.

**Why this matters for consulting:**

In professional consulting, you need to know exactly which tools you're using and be able to audit your work. This cell creates a clear record of which AI model you used, when you connected to it, and that you had proper authorization. This transparency is essential for governance and for explaining your methodology to clients or stakeholders.

If this cell fails, it means your API key isn't configured correctly, and none of the subsequent AI-powered cells will work.

###3.2.CODE AND IMPLEMENTATION

In [None]:

# Cell 3
# Type: Code
# Goal: Initialize Anthropic client with API key from Colab secrets
# Output: Print API key status and model configuration

# Cell 3: API Key + Client Initialization

from google.colab import userdata

try:
    ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')
    os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_API_KEY
    client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
    MODEL = "claude-sonnet-4-5-20250929"

    print("‚úì Anthropic client initialized")
    print(f"  Model: {MODEL}")
    print(f"  API Key: {'*' * 20}{ANTHROPIC_API_KEY[-4:]}")

except Exception as e:
    print("‚úó FAILED: Could not retrieve API key")
    print("  ‚Üí Go to Colab Secrets (üîë icon) and add 'ANTHROPIC_API_KEY'")
    raise e


‚úì Anthropic client initialized
  Model: claude-sonnet-4-5-20250929
  API Key: ********************RgAA


##4.GOVERNANCE ARTIFACTS

###4.1.OVERVIEW



This cell creates the foundation for professional-grade governance and accountability. Think of it as setting up a comprehensive filing system before starting a major consulting project - every document, decision, and assumption needs to be tracked and traceable.

**What happens in this cell:**

The cell builds a complete audit infrastructure by creating several tracking systems. First, it establishes a unique run directory with a timestamp and identification code. Every time you run this notebook, you get a fresh directory so previous work is never overwritten. This is like creating a new project folder for each client engagement.

Second, it creates a series of specialized log files. The run manifest captures exactly what configuration you used - which AI model, what parameters, when the analysis was conducted, and what computing environment you were using. This is your project metadata, ensuring you can reproduce results later or explain your methodology to auditors.

Third, it initializes several governance logs that will track different aspects of your work: a risk log for flagging potential issues, a verification register for tracking what needs to be validated, a change log for documenting any modifications, and an approvals log for recording human sign-offs. Each serves a specific governance purpose.

Fourth, it creates a prompts log that records which questions were sent to the AI - but critically, it only stores cryptographic hashes of the prompts, not the actual content. This protects client confidentiality while still maintaining an audit trail.

**Why this matters for consulting:**

In management consulting, traceability and auditability are non-negotiable. Clients pay substantial fees and expect to understand exactly how conclusions were reached. Regulators and compliance teams need clear documentation. This cell ensures that every AI interaction, every assumption, and every risk is logged from the start. If someone asks three months later "how did you arrive at this recommendation," you'll have a complete paper trail to reconstruct your reasoning process and demonstrate professional rigor.

###4.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 4
# Type: Code
# Goal: Implement governance artifact helpers and initialize all logs
# Output: Print artifact paths and configuration hash

# Cell 4: Governance Artifacts + Environment Fingerprint

from datetime import datetime, timezone

def now_iso():
    """Return current UTC timestamp in ISO format"""
    return datetime.now(timezone.utc).isoformat().replace('+00:00', 'Z')

def sha256_text(text):
    """Return SHA-256 hash of text"""
    return hashlib.sha256(text.encode('utf-8')).hexdigest()

def write_json(filepath, data):
    """Write JSON to file with indentation"""
    with open(filepath, 'w') as f:
        json.dump(data, f, indent=2)

def read_json(filepath):
    """Read JSON from file"""
    with open(filepath, 'r') as f:
        return json.load(f)

def append_jsonl(filepath, record):
    """Append a JSON record to JSONL file"""
    with open(filepath, 'a') as f:
        f.write(json.dumps(record) + '\n')

def get_env_fingerprint():
    """Capture environment metadata"""
    return {
        "python_version": platform.python_version(),
        "platform": platform.platform(),
        "timestamp": now_iso(),
        "working_directory": str(base_dir)
    }

def stable_config_hash():
    """Generate stable hash of configuration"""
    config = f"{MODEL}|temp=0.2|max_tokens=4128|level=2"
    return sha256_text(config)[:16]

# Initialize run_manifest.json
manifest = {
    "run_id": run_name,
    "chapter": "2",
    "level": "Reasoners",
    "model": MODEL,
    "parameters": {
        "temperature": 0.2,
        "max_tokens": 4128
    },
    "notebook_purpose": "Structured reasoning for consulting (issue trees, alternatives, trade-offs)",
    "author": "Alejandro Reynoso, Chief Scientist DEFI CAPITAL RESEARCH; External Lecturer, Judge Business School Cambridge",
    "created_at": now_iso(),
    "environment": get_env_fingerprint(),
    "config_hash": stable_config_hash()
}
write_json(base_dir / "run_manifest.json", manifest)

# Initialize empty governance logs
write_json(base_dir / "risk_log.json", {"risks": []})
write_json(base_dir / "verification_register.json", {"verifications": []})
write_json(base_dir / "change_log.json", {"changes": []})
write_json(base_dir / "approvals_log.json", {"approvals": []})

# Create empty prompts_log.jsonl (will store hashes only)
(base_dir / "prompts_log.jsonl").touch()

print("‚úì Governance artifacts initialized:")
print(f"  run_manifest.json")
print(f"  prompts_log.jsonl (redacted)")
print(f"  risk_log.json")
print(f"  verification_register.json")
print(f"  change_log.json")
print(f"  approvals_log.json")
print(f"\n  Config hash: {manifest['config_hash']}")

‚úì Governance artifacts initialized:
  run_manifest.json
  prompts_log.jsonl (redacted)
  risk_log.json
  verification_register.json
  change_log.json
  approvals_log.json

  Config hash: 4fed116412a87188


##5.CONFIDENTIALITY REDACTIONS

###5.1.OVERVIEW



This cell builds critical safeguards to prevent accidental disclosure of sensitive information. In consulting, you regularly handle confidential client data - financial figures, strategic plans, employee information, competitive intelligence. This cell ensures that when you use AI assistance, you're not inadvertently exposing what should remain private.

**What happens in this cell:**

The cell creates two main protection systems. First is a redaction function that automatically identifies and removes potentially sensitive information from any text before it gets sent to the AI. It scans for email addresses, phone numbers, dollar amounts, and company names, replacing them with generic placeholders. Think of it like a document shredder that selectively blacks out sensitive details while keeping the strategic substance intact.

The redaction function has two modes: standard and aggressive. Standard mode removes obvious identifiers like contact information. Aggressive mode goes further, removing financial figures and company names. For highly confidential work, you'd use aggressive mode; for general strategy questions, standard might suffice.

Second is a principle called "minimum necessary extraction" - taking only the essential context needed for reasoning, stripping away narrative details, specific dates, and identifying information. If you upload a fifty-page confidential memo, this function extracts just the core strategic question without all the surrounding detail.

Third, the cell establishes "reasoning guardrails" - instructions that remind the AI system to structure thinking rather than make recommendations, to expose assumptions explicitly, and to use neutral language. These guardrails help ensure the AI stays in a supporting role rather than appearing to make strategic decisions.

**Why this matters for consulting:**

Consultants are bound by strict confidentiality agreements. A single breach - accidentally including a client name in an AI prompt, leaking financial data, or exposing competitive intelligence - can destroy client relationships, trigger legal action, and end careers. This cell operationalizes confidentiality as code, making protection automatic rather than relying on human vigilance alone.

###5.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 5
# Type: Code
# Goal: Implement confidentiality redaction and reasoning guardrails
# Output: Demonstrate redaction on sample input

# Cell 5: Confidentiality + Reasoning Guardrails

def redact(text, aggressive=False):
    """
    Redact potentially confidential information from text.
    Returns (redacted_text, redaction_summary)
    """
    redacted = text
    redactions = []

    # Email addresses
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    emails = re.findall(email_pattern, redacted)
    if emails:
        redacted = re.sub(email_pattern, '[EMAIL_REDACTED]', redacted)
        redactions.append(f"Emails: {len(emails)}")

    # Phone numbers (basic patterns)
    phone_pattern = r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b'
    phones = re.findall(phone_pattern, redacted)
    if phones:
        redacted = re.sub(phone_pattern, '[PHONE_REDACTED]', redacted)
        redactions.append(f"Phones: {len(phones)}")

    # Dollar amounts (if aggressive)
    if aggressive:
        dollar_pattern = r'\$[\d,]+(?:\.\d{2})?[MmBbKk]?'
        dollars = re.findall(dollar_pattern, redacted)
        if dollars:
            redacted = re.sub(dollar_pattern, '[AMOUNT_REDACTED]', redacted)
            redactions.append(f"Amounts: {len(dollars)}")

    # Company names (simple heuristic: capitalized multi-word before "Inc", "Corp", "LLC")
    company_pattern = r'\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\s+(?:Inc\.|Corp\.|LLC|Ltd\.)'
    companies = re.findall(company_pattern, redacted)
    if companies and aggressive:
        redacted = re.sub(company_pattern, '[COMPANY_REDACTED]', redacted)
        redactions.append(f"Companies: {len(companies)}")

    summary = "; ".join(redactions) if redactions else "No redactions"
    return redacted, summary

def build_minimum_necessary(raw_text):
    """
    Extract only minimum necessary context for reasoning.
    Remove narrative details, names, dates.
    """
    # Simple heuristic: keep problem structure, remove specifics
    lines = raw_text.split('\n')
    filtered = [line for line in lines if len(line.strip()) > 20]
    return '\n'.join(filtered[:10])  # Cap at 10 lines

def reasoning_guardrails():
    """
    Return system instructions for Level 2 reasoning discipline.
    """
    return """
You are assisting a management consultant. Your role is to STRUCTURE reasoning, NOT provide recommendations.

STRICT REQUIREMENTS:
1. Do NOT recommend, rank, or score options
2. Do NOT use words like "best", "optimal", "should choose"
3. ALWAYS expose assumptions explicitly
4. ALWAYS list what information would change the conclusion
5. ALWAYS use neutral language ("Alternative A vs B" not "A is better")
6. ALWAYS include verification_status = "Not verified"

STRUCTURE TRADE-OFFS SYMMETRICALLY:
- Every pro must have a corresponding consideration
- Frame as tensions, not as weighted scores

EXPOSE WEAKEST ASSUMPTIONS:
- What facts are missing?
- What could invalidate this structure?
- What would the opposite view argue?
"""

# Demonstrate redaction
sample_input = """
We're evaluating market entry for Acme Corp. into Southeast Asia.
Contact: john.smith@acmecorp.com or 555-123-4567.
Revenue target: $150M by 2027.
"""

redacted_output, summary = redact(sample_input, aggressive=True)

print("‚úì Redaction system ready\n")
print("BEFORE REDACTION:")
print(sample_input)
print("\nAFTER REDACTION:")
print(redacted_output)
print(f"\nRedaction summary: {summary}")


‚úì Redaction system ready

BEFORE REDACTION:

We're evaluating market entry for Acme Corp. into Southeast Asia.
Contact: john.smith@acmecorp.com or 555-123-4567.
Revenue target: $150M by 2027.


AFTER REDACTION:

We're evaluating market entry for [COMPANY_REDACTED] into Southeast Asia.
Contact: [EMAIL_REDACTED] or [PHONE_REDACTED].
Revenue target: [AMOUNT_REDACTED] by 2027.


Redaction summary: Emails: 1; Phones: 1; Amounts: 1; Companies: 1


##6.LLM WRAPPER

###6.1.OVERVIEW

####6.1.1.GENERAL DESCRIPTION

**Cell 6: Building the AI Communication Pipeline**

This cell constructs the actual machinery that communicates with Claude and ensures you receive structured, usable output. Think of it as building a quality control assembly line - raw AI responses come in one end, and validated, structured reasoning comes out the other.

**What happens in this cell:**

The cell creates several interconnected functions that work together as a pipeline. First is the JSON extraction function, which takes whatever Claude responds with and finds the actual structured data within it. Sometimes AI models add explanatory text or formatting around their core response. This function strips away the packaging to get to the content, using multiple fallback strategies if the first approach fails.

Second is the validation function, which acts like a quality inspector. It checks that the AI response contains all required components - the task description, facts provided, assumptions made, open questions, risks, the reasoning structure itself, and verification status. It also enforces Level 2 constraints, scanning for forbidden language like "best option" or "you should choose" that would turn structures into recommendations. If anything is missing or improper, it flags the issue.

Third is the auto-risk detection function, which analyzes the reasoning structure for warning signs. If there are fewer than two alternatives, it flags potential narrow framing. If there are no assumptions listed, it raises an alarm about false completeness. If the issue tree appears shallow, it notes potential lack of depth. These are structural quality checks.

Fourth is the main calling function that orchestrates everything. It sends your question to Claude with strict instructions about output format, receives the response, runs it through extraction and validation, logs everything for governance, and implements retry logic if something fails. It makes multiple attempts with increasingly strict instructions if needed.

**Why this matters for consulting:**

Consultants cannot simply accept AI output at face value. You need systematic quality control to ensure reasoning structures are complete, properly formatted, and free from recommendation language that could constitute decision laundering. This cell automates those quality checks, making rigor consistent and reproducible rather than dependent on individual judgment.

####6.1.2.A DETAILED DESCRIPTION OF THE ROLE OF THIS CELL



**Step 1: The Input Arrives**

When you call the function, you provide a user prompt - a consulting problem statement like "Should we enter the Southeast Asian market?" This is your business question. The prompt might include context like current revenue, known constraints, and what's unclear. This raw input represents what a consultant would normally receive from a client.

**Step 2: Input Preparation and Security**

Before sending anything to Claude, the function creates a cryptographic hash of your prompt - a unique fingerprint - and logs only this hash to the prompts log file. The actual prompt content is never written to disk, protecting confidentiality. The function then retrieves the system prompt from the reasoning guardrails, which we'll discuss next.

**Step 3: The System Prompt - Setting the Rules**

The system prompt is critical instruction text that tells Claude its role and constraints. It says: "You are assisting a management consultant. Your role is to structure reasoning, not provide recommendations." It explicitly forbids words like "best," "optimal," and "should choose." It demands that all assumptions be listed explicitly, that trade-offs be framed symmetrically showing both pros and cons, and that the output include a verification status of "Not verified."

The system prompt also includes ultra-strict JSON formatting instructions with visual formatting and checklists. It tells Claude to start responses with an opening brace, end with a closing brace, avoid trailing commas, use double quotes, and verify all brackets match. This is like giving Claude a template and saying "follow this exactly."

**Step 4: The API Call**

The function sends both the system prompt and user prompt to Anthropic's servers using specific parameters: the model name (Claude Sonnet 4.5), a maximum token limit (how long the response can be), and a temperature setting of 0.1 (very low, meaning more consistent and predictable outputs rather than creative variation). This combination optimizes for reliable structured output.

**Step 5: Claude's Processing**

Inside Anthropic's systems, Claude processes your request according to the system instructions.

- It analyzes your consulting problem,
- identifies what facts you've provided versus what assumptions would be needed,
- generates an issue tree decomposition,
- frames multiple alternative approaches without ranking them,
- maps trade-offs showing tensions between options,
- and compiles lists of questions that would need verification.

Critically, it structures this thinking without concluding which option is superior.

**Step 6: Output Generation**

Claude generates a JSON object containing eight required sections:
-  the task it understood,
- facts provided in the prompt,
- assumptions it's making, open questions remaining, risks it identified,
- a draft output structure containing the issue tree and alternatives and trade-offs and assumption register,
- verification status set to "Not verified," and questions requiring verification before any decision could be made.

**Step 7: Output Delivery and Validation**

The response text arrives back at your notebook. The extraction function locates the JSON object, stripping any markdown formatting. The validation function checks all required keys are present and structured correctly. The auto-risk detection function analyzes the content for quality issues. If validation fails, the function can retry with corrected instructions. Once validated, the JSON object is returned to you - structured reasoning ready for human review and completion.

###6.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 6
# Type: Code
# Goal: Implement strict JSON wrapper with anti-false-rigor validation
# Output: Smoke test showing ACTUAL API call and valid reasoning JSON structure

# Cell 6: LLM Wrapper (Strict JSON + Anti-False-Rigor Checks)

import re
import json

def fix_json_string(json_str):
    """
    Aggressively fix common JSON syntax errors.
    """
    # Fix 1: Remove trailing commas before } or ]
    # This is the most common issue
    fixed = re.sub(r',(\s*[}\]])', r'\1', json_str)

    # Fix 2: Remove multiple trailing commas
    fixed = re.sub(r',+(\s*[}\]])', r'\1', fixed)

    # Fix 3: Fix missing commas between array elements (less common but possible)
    # This is risky so we skip it

    # Fix 4: Ensure proper escaping of quotes inside strings
    # This is complex, skip for now

    return fixed

def extract_json_robust(text):
    """Extract and repair JSON from response"""

    # Remove markdown code blocks
    text = re.sub(r'```json\s*', '', text)
    text = re.sub(r'```\s*', '', text)
    text = text.strip()

    # Find the main JSON object
    start = text.find('{')
    end = text.rfind('}')

    if start == -1 or end == -1:
        raise ValueError("No JSON object found in response")

    json_str = text[start:end+1]

    # Try direct parse first
    try:
        return json.loads(json_str)
    except json.JSONDecodeError as e:
        print(f"  ‚Üí Initial parse failed: {str(e)[:100]}")
        print(f"  ‚Üí Attempting automatic repairs...")

        # Apply fixes
        fixed_json = fix_json_string(json_str)

        # Try again
        try:
            result = json.loads(fixed_json)
            print(f"  ‚Üí ‚úì JSON repaired successfully")
            return result
        except json.JSONDecodeError as e2:
            # Save for debugging
            debug_file = base_dir / "debug_malformed_json.txt"
            with open(debug_file, 'w') as f:
                f.write("ORIGINAL RESPONSE:\n")
                f.write("="*70 + "\n")
                f.write(text[:2000])  # First 2000 chars
                f.write("\n\n" + "="*70 + "\n")
                f.write("EXTRACTED JSON:\n")
                f.write("="*70 + "\n")
                f.write(json_str[:2000])  # First 2000 chars
                f.write("\n\n" + "="*70 + "\n")
                f.write("AFTER FIXES:\n")
                f.write("="*70 + "\n")
                f.write(fixed_json[:2000])
                f.write("\n\n" + "="*70 + "\n")
                f.write(f"PARSE ERROR: {e2}\n")
                f.write(f"Error at position: {e2.pos if hasattr(e2, 'pos') else 'unknown'}\n")

            print(f"  ‚Üí Saved debug info to: {debug_file}")
            raise ValueError(f"JSON still invalid after repairs: {e2}")

def validate_reasoning_json(data):
    """Validate structure and enforce Level 2 constraints"""
    issues = []

    required_keys = {"task", "facts_provided", "assumptions", "open_questions",
                     "risks", "draft_output", "verification_status", "questions_to_verify"}

    missing = required_keys - set(data.keys())
    if missing:
        issues.append(f"Missing keys: {missing}")

    if "draft_output" in data:
        draft_keys = {"issue_tree_or_structure", "alternatives_or_options",
                      "tradeoffs_or_tensions", "assumption_register"}
        missing_draft = draft_keys - set(data["draft_output"].keys())
        if missing_draft:
            issues.append(f"Missing draft_output keys: {missing_draft}")

    if data.get("verification_status") != "Not verified":
        issues.append("verification_status must be 'Not verified'")

    return len(issues) == 0, issues

def auto_detect_risks(data):
    """Detect structural risks"""
    auto_risks = []

    alts = data.get("draft_output", {}).get("alternatives_or_options", [])
    if len(alts) < 2:
        auto_risks.append({
            "type": "scope_creep",
            "severity": "medium",
            "note": "Fewer than 2 alternatives"
        })

    if len(data.get("assumptions", [])) == 0:
        auto_risks.append({
            "type": "missing_facts",
            "severity": "high",
            "note": "No assumptions registered"
        })

    return auto_risks

def call_claude(user_prompt, system_prompt=None, max_retries=2):
    """Call Claude with strict JSON enforcement and aggressive retry logic"""
    if system_prompt is None:
        system_prompt = reasoning_guardrails()

    # ULTRA-STRICT system prompt
    enhanced_system = system_prompt + """

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
CRITICAL JSON OUTPUT REQUIREMENTS - READ CAREFULLY
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

YOU MUST RETURN **ONLY** A VALID JSON OBJECT. NOTHING ELSE.

RULES:
1. Start your response with the character {
2. End your response with the character }
3. NO text before the {
4. NO text after the }
5. NO markdown code blocks (no ``` at all)
6. NO explanatory text
7. NO trailing commas (check every , before ] or })
8. ALL strings use double quotes "like this"
9. ALL object keys use double quotes
10. Verify all brackets match: { }, [ ]

MOST COMMON ERROR: Trailing comma before } or ]
WRONG: {"key": "value",}
RIGHT: {"key": "value"}

WRONG: ["item1", "item2",]
RIGHT: ["item1", "item2"]

Before responding, mentally verify:
- Count opening { and closing }
- Count opening [ and closing ]
- Check no commas before } or ]
- Verify all strings have closing quotes

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
"""

    # Log hash
    prompt_hash = sha256_text(user_prompt)
    append_jsonl(base_dir / "prompts_log.jsonl", {
        "timestamp": now_iso(),
        "prompt_hash": prompt_hash,
        "model": MODEL
    })

    original_prompt = user_prompt

    for attempt in range(max_retries + 1):
        try:
            print(f"  ‚Üí API call (attempt {attempt+1}/{max_retries+1})...")

            response = client.messages.create(
                model=MODEL,
                max_tokens=2500,
                temperature=0.1,  # Lower temperature for more consistent JSON
                system=enhanced_system,
                messages=[{"role": "user", "content": user_prompt}]
            )

            response_text = response.content[0].text
            print(f"  ‚Üí Received {len(response_text)} chars")

            # Extract and repair JSON
            data = extract_json_robust(response_text)
            print(f"  ‚Üí JSON parsed ‚úì")

            # Validate structure
            is_valid, issues = validate_reasoning_json(data)
            if not is_valid:
                if attempt < max_retries:
                    print(f"  ‚Üí Validation failed: {issues}")
                    print(f"  ‚Üí Retrying...")
                    user_prompt = f"""YOUR PREVIOUS RESPONSE HAD VALIDATION ISSUES: {issues}

FIX THESE ISSUES AND RETURN VALID JSON.

Remember:
- Include ALL required keys
- verification_status MUST be "Not verified"
- NO trailing commas
- Check all brackets close

ORIGINAL REQUEST:
{original_prompt}
"""
                    continue
                else:
                    raise ValueError(f"Validation failed after all retries: {issues}")

            print(f"  ‚Üí Validation passed ‚úì")

            # Auto-detect risks
            auto_risks = auto_detect_risks(data)
            if auto_risks:
                data["risks"].extend(auto_risks)
                risks = read_json(base_dir / "risk_log.json")
                for risk in auto_risks:
                    risk["timestamp"] = now_iso()
                    risk["auto_detected"] = True
                    risks["risks"].append(risk)
                write_json(base_dir / "risk_log.json", risks)
                print(f"  ‚Üí Auto-risks: {len(auto_risks)}")

            return data

        except ValueError as e:
            error_msg = str(e)
            if "JSON" in error_msg and attempt < max_retries:
                print(f"  ‚Üí JSON error: {error_msg[:100]}")
                print(f"  ‚Üí Retrying with even stricter instructions...")

                user_prompt = f"""YOUR PREVIOUS JSON WAS INVALID!

ERROR: {error_msg[:200]}

YOU MUST FIX THIS. Return ONLY valid JSON.

CHECKLIST BEFORE RESPONDING:
‚ñ° Response starts with {{
‚ñ° Response ends with }}
‚ñ° NO trailing commas anywhere
‚ñ° All brackets paired correctly
‚ñ° All strings in double quotes
‚ñ° NO markdown blocks

ORIGINAL REQUEST:
{original_prompt}
"""
                continue
            else:
                raise e

        except Exception as e:
            if attempt < max_retries:
                print(f"  ‚Üí Unexpected error: {str(e)[:100]}")
                continue
            else:
                raise e

# =============================================================================
# SMOKE TEST
# =============================================================================

print("="*70)
print("SMOKE TEST: Ultra-Strict JSON Pipeline")
print("="*70)

test_prompt = """
Return ONLY this valid JSON object (verify syntax before responding):

{
  "task": "Evaluate warehouse automation investment",
  "facts_provided": ["Retail company", "Considering automation"],
  "assumptions": ["Labor costs rising", "Technology proven"],
  "open_questions": ["ROI timeline?", "Current throughput?"],
  "risks": [
    {"type": "missing_facts", "severity": "high", "note": "Financial details unknown"}
  ],
  "draft_output": {
    "issue_tree_or_structure": "Decision -> Cost Analysis -> Benefits -> Risks",
    "alternatives_or_options": ["Full automation", "Partial", "Status quo"],
    "tradeoffs_or_tensions": ["High cost vs savings", "Speed vs flexibility"],
    "assumption_register": ["Labor costs rise", "Tech reliable"]
  },
  "verification_status": "Not verified",
  "questions_to_verify": ["Labor cost?", "Failure rate?"]
}

CRITICAL: NO trailing commas. Verify all brackets close.
"""

try:
    result = call_claude(test_prompt)

    print("\n" + "="*70)
    print("‚úì SMOKE TEST PASSED")
    print("="*70)
    print(f"Task: {result['task']}")
    print(f"Assumptions: {len(result['assumptions'])}")
    print(f"Alternatives: {len(result['draft_output']['alternatives_or_options'])}")
    print("\n‚úì Pipeline ready for mini-cases!")
    print("="*70)

except Exception as e:
    print("\n" + "="*70)
    print("‚úó SMOKE TEST FAILED")
    print("="*70)
    print(f"Error: {e}")
    print("\nCheck debug_malformed_json.txt for details")
    raise

SMOKE TEST: Ultra-Strict JSON Pipeline
  ‚Üí API call (attempt 1/3)...
  ‚Üí Received 790 chars
  ‚Üí JSON parsed ‚úì
  ‚Üí Validation passed ‚úì

‚úì SMOKE TEST PASSED
Task: Evaluate warehouse automation investment
Assumptions: 2
Alternatives: 3

‚úì Pipeline ready for mini-cases!


##7.REASONING ARTIFACT BUILDERS

###7.1.OVERVIEW



This cell creates specialized tools that transform raw AI reasoning output into professional consulting deliverables with full governance documentation. Think of it as an automated document production facility that takes structured reasoning and packages it into multiple formats for different stakeholders - technical staff need JSON files, executives need readable summaries, auditors need verification checklists.

**What happens in this cell:**

The cell defines five builder functions, each serving a distinct governance purpose. The first creates issue tree stubs - placeholder structures that acknowledge a reasoning tree was generated by AI and requires human validation. This prevents anyone from mistaking AI-generated structures for validated analysis.

The second builder extracts all assumptions from the AI output and converts them into a formal assumption register. Each assumption gets a unique identifier, a source tag indicating where it came from, and empty fields for validation status, validation method, validator name, and validation date. This creates a checklist forcing explicit validation of every assumption before the reasoning can be used. Nothing is assumed to be true just because the AI stated it.

The third builder constructs a verification register from the questions Claude identified. Each question becomes a verification item with its own identifier, space to document how it was verified, who verified it, when, what the outcome was, and supporting notes. This transforms vague uncertainty into concrete action items with accountability.

The fourth builder creates approval records - formal placeholders for human sign-off. Each record specifies required conditions before approval can be granted: all assumptions must be validated, all verifications must be completed, the risk assessment must be reviewed, and the reasoning structure must be inspected for false rigor. This enforces a governance workflow where AI outputs cannot be used until humans have completed their due diligence.

The fifth and most comprehensive function orchestrates everything. It takes AI output and generates four deliverable files simultaneously. It saves the raw reasoning as JSON for technical analysis. It builds and saves the assumption register. It builds and saves the verification register. And critically, it creates a human-readable text summary formatted for executive review, with clear sections, governance warnings, and next-step instructions.

**Why this matters for consulting:**

Professional consulting demands documentation that serves multiple audiences and audit requirements. Technical teams need machine-readable data. Clients need readable summaries. Compliance teams need verification trails. Risk managers need approval documentation. This cell automates the production of all these artifacts consistently, ensuring nothing falls through governance cracks and every reasoning structure comes with its full accountability package.

###7.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 7
# Type: Code
# Goal: Implement reasoning artifact builders (issue trees, assumptions, verifications)
# Output: Print confirmation of loaded builders

# Cell 7: Level 2 Reasoning Builders

def build_issue_tree_stub(case_name):
    """
    Generate stub structure for issue tree visualization.
    This is a template; actual content comes from LLM.
    """
    return {
        "case": case_name,
        "tree_type": "issue_tree",
        "created_at": now_iso(),
        "note": "Structure generated by AI; requires human validation"
    }

def build_assumption_register_stub(output_json):
    """
    Extract assumptions into a structured register.
    """
    assumptions = output_json.get("assumptions", [])
    assumption_register = output_json.get("draft_output", {}).get("assumption_register", [])

    # Combine both sources
    all_assumptions = []

    for idx, assumption in enumerate(assumptions, 1):
        all_assumptions.append({
            "id": f"A{idx:03d}",
            "assumption": assumption,
            "source": "top_level",
            "validation_status": "unvalidated",
            "validation_method": None,
            "validated_by": None,
            "validated_at": None
        })

    for idx, assumption in enumerate(assumption_register, len(assumptions) + 1):
        if isinstance(assumption, dict):
            assumption_text = assumption.get("assumption", str(assumption))
        else:
            assumption_text = str(assumption)

        all_assumptions.append({
            "id": f"A{idx:03d}",
            "assumption": assumption_text,
            "source": "draft_output",
            "validation_status": "unvalidated",
            "validation_method": None,
            "validated_by": None,
            "validated_at": None
        })

    return {
        "created_at": now_iso(),
        "total_assumptions": len(all_assumptions),
        "assumptions": all_assumptions,
        "note": "All assumptions require independent validation before use"
    }

def build_verification_register_stub(output_json):
    """
    Build verification checklist from questions_to_verify.
    """
    questions = output_json.get("questions_to_verify", [])
    open_questions = output_json.get("open_questions", [])

    verifications = []

    for idx, question in enumerate(questions, 1):
        verifications.append({
            "id": f"V{idx:03d}",
            "question": question,
            "source": "questions_to_verify",
            "verification_method": None,
            "verified_by": None,
            "verified_at": None,
            "verification_outcome": None,
            "notes": None
        })

    for idx, question in enumerate(open_questions, len(questions) + 1):
        verifications.append({
            "id": f"V{idx:03d}",
            "question": question,
            "source": "open_questions",
            "verification_method": None,
            "verified_by": None,
            "verified_at": None,
            "verification_outcome": None,
            "notes": None
        })

    return {
        "created_at": now_iso(),
        "total_verifications": len(verifications),
        "verifications": verifications,
        "note": "Verification register must be completed before acting on reasoning"
    }

def build_approval_record(case_name, reviewer_role="Strategy Lead"):
    """
    Create approval placeholder for human review.
    """
    return {
        "case": case_name,
        "created_at": now_iso(),
        "approval_status": "pending",
        "reviewer_role": reviewer_role,
        "reviewer_name": None,
        "approved_at": None,
        "approval_notes": None,
        "conditions": [
            "All assumptions validated",
            "All verifications completed",
            "Risk assessment reviewed",
            "Reasoning structure inspected for false rigor"
        ]
    }

def save_case_deliverables(case_name, output_json):
    """
    Save all deliverables for a case in one go.
    Returns dict with all file paths.
    """
    case_prefix = deliverables_dir / case_name

    # 1. Save raw reasoning JSON
    reasoning_path = f"{case_prefix}_reasoning.json"
    write_json(reasoning_path, output_json)

    # 2. Build and save assumption register
    assumptions = build_assumption_register_stub(output_json)
    assumptions_path = f"{case_prefix}_assumptions.json"
    write_json(assumptions_path, assumptions)

    # 3. Build and save verification register
    verifications = build_verification_register_stub(output_json)
    verification_path = f"{case_prefix}_verification.json"
    write_json(verification_path, verifications)

    # 4. Create human-readable summary
    readable = f"""
{'='*70}
{case_name.upper().replace('_', ' ')}
{'='*70}

TASK:
{output_json['task']}

FACTS PROVIDED ({len(output_json['facts_provided'])}):
{chr(10).join('  ‚Ä¢ ' + f for f in output_json['facts_provided'])}

ASSUMPTIONS ({len(output_json['assumptions'])}):
{chr(10).join('  ‚Ä¢ ' + a for a in output_json['assumptions'])}

OPEN QUESTIONS ({len(output_json['open_questions'])}):
{chr(10).join('  ‚Ä¢ ' + q for q in output_json['open_questions'])}

RISKS IDENTIFIED ({len(output_json['risks'])}):
{chr(10).join('  ‚Ä¢ [' + r['severity'].upper() + '] ' + r['type'] + ': ' + r['note'] for r in output_json['risks'])}

{'='*70}
DRAFT REASONING STRUCTURE
{'='*70}

Issue Tree / Structure:
{output_json['draft_output']['issue_tree_or_structure']}

Alternatives / Options ({len(output_json['draft_output']['alternatives_or_options'])}):
{chr(10).join('  ' + str(i+1) + '. ' + str(a) for i, a in enumerate(output_json['draft_output']['alternatives_or_options']))}

Trade-offs / Tensions ({len(output_json['draft_output']['tradeoffs_or_tensions'])}):
{chr(10).join('  ‚Ä¢ ' + str(t) for t in output_json['draft_output']['tradeoffs_or_tensions'])}

Assumption Register (from draft_output):
{chr(10).join('  ‚Ä¢ ' + str(a) for a in output_json['draft_output']['assumption_register'])}

{'='*70}
VERIFICATION STATUS: {output_json['verification_status']}
{'='*70}

Questions to Verify ({len(output_json['questions_to_verify'])}):
{chr(10).join('  ‚Ä¢ ' + q for q in output_json['questions_to_verify'])}

{'='*70}
GOVERNANCE NOTICE
{'='*70}
This is AI-generated reasoning structure, NOT a recommendation.

Required before use:
  ‚úó Validate all {len(output_json['assumptions'])} assumptions
  ‚úó Complete all {len(verifications['verifications'])} verifications
  ‚úó Review all {len(output_json['risks'])} risks
  ‚úó Add human judgment and context
  ‚úó Obtain stakeholder approval

The consultant owns the final judgment and decision.
{'='*70}
"""

    readable_path = f"{case_prefix}_human_readable.txt"
    with open(readable_path, 'w') as f:
        f.write(readable)

    # 5. Create and log approval placeholder
    approval = build_approval_record(case_name)
    approvals = read_json(base_dir / "approvals_log.json")
    approvals["approvals"].append(approval)
    write_json(base_dir / "approvals_log.json", approvals)

    # 6. Update verification register log
    verif_reg = read_json(base_dir / "verification_register.json")
    verif_reg["verifications"].append({
        "case": case_name,
        "timestamp": now_iso(),
        "verification_count": verifications["total_verifications"],
        "assumption_count": assumptions["total_assumptions"]
    })
    write_json(base_dir / "verification_register.json", verif_reg)

    return {
        "reasoning": reasoning_path,
        "assumptions": assumptions_path,
        "verification": verification_path,
        "readable": readable_path,
        "assumption_count": assumptions["total_assumptions"],
        "verification_count": verifications["total_verifications"]
    }

print("="*70)
print("LEVEL 2 REASONING BUILDERS")
print("="*70)
print("\n‚úì Individual builders:")
print("  ‚Ä¢ build_issue_tree_stub()")
print("  ‚Ä¢ build_assumption_register_stub()")
print("  ‚Ä¢ build_verification_register_stub()")
print("  ‚Ä¢ build_approval_record()")
print("\n‚úì Integrated builder:")
print("  ‚Ä¢ save_case_deliverables() - saves all artifacts at once")
print("\n‚úì Ready for mini-cases in Cell 8")
print("="*70)

LEVEL 2 REASONING BUILDERS

‚úì Individual builders:
  ‚Ä¢ build_issue_tree_stub()
  ‚Ä¢ build_assumption_register_stub()
  ‚Ä¢ build_verification_register_stub()
  ‚Ä¢ build_approval_record()

‚úì Integrated builder:
  ‚Ä¢ save_case_deliverables() - saves all artifacts at once

‚úì Ready for mini-cases in Cell 8


##8.RUN 4 MINI-CASES

###8.1.OVERVIEW

**Cell 8: Running the Four Reasoning Demonstrations**

This cell executes the core demonstration of Level 2 reasoning across four realistic consulting scenarios. Think of it as running controlled experiments to show how AI can structure thinking for common strategic problems while maintaining strict governance discipline. Each demonstration follows identical quality and accountability protocols.

**What happens in this cell:**

The cell defines four mini-cases representing typical consulting engagements: market entry decisions, cost transformation programs, capital allocation dilemmas, and operating model redesigns. Each case is deliberately designed with incomplete information - missing competitor data, undefined customer segments, unclear strategic priorities. This incompleteness is intentional, forcing the AI to explicitly flag assumptions and unknowns rather than fabricating facts.

For each case, the cell constructs a highly structured prompt. The prompt presents the business problem, explicitly states what information is not available, demands specific reasoning outputs in strict JSON format, and includes multiple reminders about forbidden recommendation language and required neutral framing. The prompts are designed to stress-test the system's ability to maintain Level 2 discipline under realistic consulting complexity.

The execution loop processes each case sequentially. For every case, it calls Claude once through the validated pipeline from Cell 6, receives the structured reasoning output, immediately saves all four governance deliverables using the integrated builder from Cell 7, analyzes the output to determine the highest risk severity level, and compiles summary statistics. If a case fails - due to JSON errors, validation issues, or API problems - the error is logged to the risk register and execution continues to the next case rather than stopping completely.

Throughout execution, the cell provides real-time progress indicators showing which case is running, what's happening at each step, and whether each case succeeded or failed. This transparency helps users understand the multi-minute process isn't frozen but actively working.

After all four cases complete, the cell generates a summary table displaying case names, success or failure status, counts of assumptions flagged, counts of verifications needed, total risks identified, and highest risk severity for each case. This table provides an at-a-glance governance dashboard.

Finally, the cell prints the locations of all saved deliverables and provides explicit next-step instructions: review the human-readable summaries, complete verification checklists, validate assumptions independently, and remember these are structures not recommendations.

**Why this matters for consulting:**

These demonstrations prove the system works across diverse consulting scenarios while maintaining governance rigor. They create reference examples showing what good Level 2 output looks like - complete with assumptions, verifications, and risks properly flagged. For training purposes, consultants can examine these cases to understand how to structure their own problems and what quality standards to expect from AI-assisted reasoning.

###8.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 8
# Type: Code
# Goal: Run 4 mini-case reasoning demonstrations with full governance
# Output: Summary table of cases with assumption counts and risk levels

# Cell 8: Run 4 Mini-Case Reasoning Demos

print("="*70)
print("MINI-CASE DEMONSTRATIONS")
print("="*70)
print("\nRunning 4 structured reasoning demos...")
print("Each case will:")
print("  1. Call Claude API once (with retries if needed)")
print("  2. Generate reasoning structure")
print("  3. Save all governance artifacts")
print("  4. Log risks and verifications")
print("\n‚è±Ô∏è  Expected time: 2-4 minutes total")
print("="*70)

# Shared JSON template to reduce errors
JSON_TEMPLATE = """{
  "task": "...",
  "facts_provided": ["...", "..."],
  "assumptions": ["...", "..."],
  "open_questions": ["...", "..."],
  "risks": [
    {"type": "missing_facts", "severity": "high", "note": "..."}
  ],
  "draft_output": {
    "issue_tree_or_structure": "...",
    "alternatives_or_options": ["...", "...", "..."],
    "tradeoffs_or_tensions": ["...", "..."],
    "assumption_register": ["...", "..."]
  },
  "verification_status": "Not verified",
  "questions_to_verify": ["...", "..."]
}"""

mini_cases = [
    {
        "name": "market_entry",
        "title": "Market Entry Decision",
        "prompt": f"""
STRICT JSON OUTPUT REQUIRED - Copy this structure exactly:

{JSON_TEMPLATE}

Fill it with content for this problem:

PROBLEM: Mid-sized consumer goods company considering Southeast Asian market entry.
- Current revenue: $500M (North America)
- Board approved exploration
- Countries NOT identified
- Competition UNKNOWN
- Distribution NOT assessed

REQUIREMENTS:
- Issue tree for market entry
- 3+ entry mode alternatives
- Trade-offs (pros AND cons)
- Assumption register
- Neutral language (no "best", "optimal", "should")
- verification_status MUST be "Not verified"

CRITICAL:
- NO trailing commas
- Check all brackets close properly
- Use double quotes only
- Start with {{ end with }}
"""
    },
    {
        "name": "cost_transformation",
        "title": "Cost Transformation Program",
        "prompt": f"""
STRICT JSON OUTPUT REQUIRED - Copy this structure exactly:

{JSON_TEMPLATE}

Fill it with content for this problem:

PROBLEM: Manufacturing company needs 15% cost reduction over 2 years.
- Current cost base: $2B annually
- Unionized workforce
- Legacy facilities
- Cost drivers NOT detailed
- Benchmarks UNAVAILABLE

REQUIREMENTS:
- Cost driver tree
- 4+ reduction levers
- Trade-offs (short vs long term, employee impact, risk)
- Assumption register
- Neutral language
- verification_status MUST be "Not verified"

CRITICAL:
- NO trailing commas
- Check all brackets
- Double quotes only
- Pure JSON
"""
    },
    {
        "name": "capital_allocation",
        "title": "Capital Allocation Dilemma",
        "prompt": f"""
STRICT JSON OUTPUT REQUIRED - Copy this structure exactly:

{JSON_TEMPLATE}

Fill it with content for this problem:

PROBLEM: Industrial company has $300M to invest across 3 business units.
- Competing proposals
- CEO wants "objective framework"
- Proposal details NOT provided
- Strategy UNCLEAR
- Risk appetite UNDEFINED

REQUIREMENTS:
- Allocation approach alternatives
- 4+ evaluation criteria
- Uncertainty map
- Assumption register
- Neutral language
- verification_status MUST be "Not verified"

CRITICAL:
- NO trailing commas
- All brackets closed
- Double quotes only
- Pure JSON
"""
    },
    {
        "name": "operating_model",
        "title": "Operating Model Redesign",
        "prompt": f"""
STRICT JSON OUTPUT REQUIRED - Copy this structure exactly:

{JSON_TEMPLATE}

Fill it with content for this problem:

PROBLEM: Financial services firm reorganizing from product silos.
- Desired state UNCLEAR
- Want "scale" AND "customer focus"
- Org chart NOT provided
- Segments UNDEFINED
- Tech constraints UNCLEAR

REQUIREMENTS:
- Operating model design dimensions
- 3+ archetype variants
- Tension map
- Assumption register
- Neutral language
- verification_status MUST be "Not verified"

CRITICAL:
- NO trailing commas
- All brackets closed
- Double quotes only
- Pure JSON
"""
    }
]

results_summary = []

for idx, case in enumerate(mini_cases, 1):
    print(f"\n{'='*70}")
    print(f"CASE {idx}/4: {case['title']}")
    print(f"{'='*70}")

    try:
        # Call Claude with retries
        output = call_claude(case["prompt"])

        # Save all deliverables
        paths = save_case_deliverables(case["name"], output)

        # Determine highest risk
        risk_levels = [r["severity"] for r in output["risks"]]
        highest_risk = "high" if "high" in risk_levels else ("medium" if "medium" in risk_levels else "low")

        # Add to summary
        results_summary.append({
            "case": case["title"],
            "status": "‚úì",
            "assumptions": paths["assumption_count"],
            "verifications": paths["verification_count"],
            "risks": len(output["risks"]),
            "highest_risk": highest_risk
        })

        print(f"‚úì SUCCESS")
        print(f"  Assumptions: {paths['assumption_count']}")
        print(f"  Verifications: {paths['verification_count']}")
        print(f"  Risks: {len(output['risks'])} (highest: {highest_risk})")

    except Exception as e:
        error_msg = str(e)[:150]
        print(f"‚úó FAILED: {error_msg}")

        # Log the failure
        risk_record = {
            "type": "traceability",
            "severity": "high",
            "note": f"Case {case['name']} failed: {error_msg}",
            "timestamp": now_iso()
        }
        risks = read_json(base_dir / "risk_log.json")
        risks["risks"].append(risk_record)
        write_json(base_dir / "risk_log.json", risks)

        results_summary.append({
            "case": case["title"],
            "status": "‚úó",
            "assumptions": "ERROR",
            "verifications": "ERROR",
            "risks": "ERROR",
            "highest_risk": "ERROR"
        })

# Print summary table
print(f"\n{'='*70}")
print("MINI-CASES SUMMARY")
print(f"{'='*70}\n")

print(f"{'Case':<32} {'Status':<8} {'Assume':<8} {'Verify':<8} {'Risks':<8} {'High Risk':<10}")
print(f"{'-'*32} {'-'*8} {'-'*8} {'-'*8} {'-'*8} {'-'*10}")

for result in results_summary:
    print(f"{result['case']:<32} {result['status']:<8} {str(result['assumptions']):<8} {str(result['verifications']):<8} {str(result['risks']):<8} {result['highest_risk']:<10}")

# Count successes
success_count = sum(1 for r in results_summary if r["status"] == "‚úì")

print(f"\n{'='*70}")
print(f"RESULTS: {success_count}/4 cases completed successfully")
print(f"{'='*70}")

if success_count > 0:
    print(f"\n‚úì Deliverables saved to: {deliverables_dir}")
    print("\nFor each successful case:")
    print("  ‚Ä¢ <case>_reasoning.json       - Full structured output")
    print("  ‚Ä¢ <case>_assumptions.json     - Assumption register")
    print("  ‚Ä¢ <case>_verification.json    - Verification checklist")
    print("  ‚Ä¢ <case>_human_readable.txt   - Human review summary")

if success_count < 4:
    print(f"\n‚ö†Ô∏è  {4-success_count} case(s) failed - check debug_malformed_json.txt for details")

print(f"\n{'='*70}")
print("NEXT STEPS")
print(f"{'='*70}")
print("1. Review human_readable.txt files")
print("2. Complete verification checklists")
print("3. Validate all assumptions independently")
print("4. Remember: These are STRUCTURES, not RECOMMENDATIONS")
print(f"{'='*70}")

MINI-CASE DEMONSTRATIONS

Running 4 structured reasoning demos...
Each case will:
  1. Call Claude API once (with retries if needed)
  2. Generate reasoning structure
  3. Save all governance artifacts
  4. Log risks and verifications

‚è±Ô∏è  Expected time: 2-4 minutes total

CASE 1/4: Market Entry Decision
  ‚Üí API call (attempt 1/3)...
  ‚Üí Received 10843 chars
  ‚Üí JSON parsed ‚úì
  ‚Üí Validation passed ‚úì
‚úì SUCCESS
  Assumptions: 15
  Verifications: 24
  Risks: 5 (highest: high)

CASE 2/4: Cost Transformation Program
  ‚Üí API call (attempt 1/3)...
  ‚Üí Received 10776 chars
  ‚Üí Initial parse failed: Expecting ',' delimiter: line 93 column 4 (char 9886)
  ‚Üí Attempting automatic repairs...
  ‚Üí Saved debug info to: /content/ai_consulting_ch2_runs/run_20260119_140749_1fa07eef/debug_malformed_json.txt
  ‚Üí JSON error: JSON still invalid after repairs: Expecting ',' delimiter: line 93 column 4 (char 9886)
  ‚Üí Retrying with even stricter instructions...
  ‚Üí API call (a

##9.USER'S EXAMPLE

###9.1.OVERVIEW

####9.1.1.GENERAL DESCRIPTION



In this cell, we switch from ‚Äúwatching a demo‚Äù to ‚Äúyou doing the work.‚Äù Up to now, the notebook has shown you what Level 2 looks like when it produces structured reasoning: issue trees, alternatives, trade-offs, and assumptions, all labeled as **Not verified**. Cell 9 turns that into an exercise you can run on any business problem you‚Äôre thinking about, while keeping the governance posture intact.

First, the cell asks you to type a short problem statement. This is intentional: in real consulting, you rarely get a perfect brief. The exercise is designed to work with incomplete information, but it forces you to acknowledge what‚Äôs missing rather than quietly filling gaps with confident-sounding guesses.

Second, the cell applies automatic redaction. The goal is to reduce confidentiality risk before anything is sent to a model. It does not ‚Äúsolve privacy,‚Äù but it helps you practice a professional habit: share the minimum necessary information for the task. The cell then tells you what it removed, so you can judge whether the remaining text is still meaningful.

Third, the cell converts your input into a tightly framed request: generate a reasoning structure only. It explicitly forbids recommendations, fabricated facts, and benchmark invention. It also requires that assumptions and open questions be listed. This matters because Level 2 can look persuasive even when it is wrong. The structure is there to make your thinking inspectable, not to make the model ‚Äúright.‚Äù

Finally, the cell saves outputs as files: the structured reasoning, an assumption register, a verification checklist, and a human-readable summary. This is crucial for governance: you get artifacts you can review, challenge, and sign off. The learning outcome is simple: you practice producing ‚Äúconsulting-grade structure‚Äù without accidentally outsourcing judgment.


####9.1.2.EXAMPLES OF USER REQUESTS

**Example 1 ‚Äî Market entry / growth**
I need a Level 2 structured reasoning map (not a recommendation).

Problem statement:
Our mid-sized B2B software company is considering expanding into Brazil and/or Mexico within the next 12‚Äì18 months. We currently sell in the U.S. and Canada. Leadership is debating whether to enter via a local sales team, a distributor/partner model, or an acquisition. We have limited internal international experience and we are unsure about regulatory, tax, and pricing constraints.

Constraints / context:
- We have a fixed expansion budget and cannot pursue more than one entry mode initially.
- Time-to-revenue matters, but we cannot risk major compliance failures.
- We need to understand what information would change the decision.

Please produce:
- An issue tree that decomposes the decision
- At least 3 alternative entry approaches
- Symmetric trade-offs for each approach (pros and cons, tensions)
- An assumption register (what must be true, what is unknown)
- Open questions and a verification checklist

Important:
Do not rank options. Do not recommend. Do not invent benchmarks or facts. Mark verification_status as "Not verified" and list questions_to_verify.


**Example 2 ‚Äî Cost transformation / operating efficiency**
I need a Level 2 structured reasoning map (not a recommendation).

Problem statement:
A consumer products manufacturer needs to reduce costs by roughly 10‚Äì15% over the next 24 months due to margin pressure and retailer pricing pushback. The company has multiple plants, some older equipment, and a mix of permanent and contract labor. Leadership is debating whether the main lever should be procurement savings, workforce reductions, automation, footprint consolidation, or SKU rationalization. There is concern about disrupting service levels and damaging long-term capabilities.

Constraints / context:
- We cannot assume layoffs are feasible without considering labor relations and execution risk.
- Management wants an ‚Äúobjective‚Äù structure, but we need to surface assumptions and unknowns.
- We need to identify which uncertainties drive the decision.

Please produce:
- A cost-driver tree / decomposition of the cost base
- At least 4 distinct cost reduction levers
- Trade-offs (short-term savings vs long-term capability, service risk, execution complexity)
- An assumption register and open questions
- A verification checklist describing what we would need to validate

Important:
No recommendations, no scoring, no invented numbers or benchmarks. Use neutral language and set verification_status = "Not verified".


**Example 3 ‚Äî Capital allocation / portfolio decision**
I need a Level 2 structured reasoning map (not a recommendation).

Problem statement:
A diversified company has a fixed investment pool for the next planning cycle and must allocate it across three competing initiatives: (1) expand a high-growth but volatile business line, (2) modernize core operations to improve reliability and reduce risk, and (3) pursue an acquisition to enter an adjacent market. The CEO is asking for a structured framework that makes trade-offs explicit, including risk appetite, strategic fit, and timing.

Constraints / context:
- Financial details are incomplete and will arrive later.
- The leadership team is divided and wants a transparent structure rather than a ‚Äúblack box‚Äù answer.
- We need to clarify what evidence would change the direction.

Please produce:
- A decision structure / issue tree framing how to compare initiatives
- At least 3 alternative allocation approaches (e.g., concentrate vs diversify, staged vs all-in, strategic vs financial lens)
- Trade-offs and tensions (risk, timing, capability, optionality, integration complexit


###9.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 9
# Type: Code
# Goal: Interactive exercise for user to structure their own case
# Output: Paths to saved deliverables and redaction summary

# Cell 9: User Exercise: Structure Your Own Case

print("="*70)
print("USER EXERCISE: Structure Your Own Consulting Case")
print("="*70)
print("\nThis exercise lets you test the Level 2 reasoning system on your own problem.")
print("\nThe system will:")
print("  1. Automatically redact confidential information")
print("  2. Generate reasoning structure (NOT recommendations)")
print("  3. Save all governance artifacts")
print("  4. Flag assumptions and verification needs")
print("\n" + "="*70)
print("PROVIDE YOUR PROBLEM STATEMENT")
print("="*70)
print("\nExample: 'Our company is deciding whether to outsource IT operations")
print("         to reduce costs while maintaining service quality.'\n")
print("‚ö†Ô∏è  Do NOT include:")
print("  - Client names or company-specific details")
print("  - Confidential financial data")
print("  - Personal information")
print("\nRedaction will remove emails, phone numbers, and amounts automatically.")
print("="*70 + "\n")

# Get user input
user_problem = input("Enter your problem statement (or press Enter for example):\n> ")

# Use default if empty or too short
if not user_problem or len(user_problem.strip()) < 20:
    print("\n‚Üí Using default example problem...")
    user_problem = "A technology company is evaluating whether to build or buy a new CRM platform. Budget constraints and time-to-market are key considerations, but the technical capabilities of the team are uncertain."

print(f"\n‚úì Input received ({len(user_problem)} characters)")

# Redact sensitive information
redacted_problem, redaction_summary = redact(user_problem, aggressive=True)

print(f"‚úì Redaction applied: {redaction_summary}")

if redaction_summary != "No redactions":
    print("\n‚ö†Ô∏è  REDACTED VERSION:")
    print("-" * 70)
    print(redacted_problem)
    print("-" * 70)

# Build the prompt
user_case_prompt = f"""
Return ONLY valid JSON (verify syntax before responding).

PROBLEM STATEMENT (pre-redacted for confidentiality):
{redacted_problem}

YOUR TASK:
Generate reasoning structure with:
1. Issue tree or problem decomposition
2. At least 3 alternative approaches
3. Trade-offs (symmetric: pros AND cons for each)
4. Assumption register (what MUST be validated)

CRITICAL REQUIREMENTS:
- Flag areas where information is clearly incomplete
- Do NOT fabricate facts or benchmarks
- Mark verification_status = "Not verified"
- List specific questions requiring answers
- Use neutral language (no "best", "optimal", "should")

Return this exact JSON structure:
{{
  "task": "...",
  "facts_provided": ["...", "..."],
  "assumptions": ["...", "..."],
  "open_questions": ["...", "..."],
  "risks": [
    {{"type": "...", "severity": "...", "note": "..."}}
  ],
  "draft_output": {{
    "issue_tree_or_structure": "...",
    "alternatives_or_options": ["...", "...", "..."],
    "tradeoffs_or_tensions": ["...", "..."],
    "assumption_register": ["...", "..."]
  }},
  "verification_status": "Not verified",
  "questions_to_verify": ["...", "..."]
}}

NO trailing commas. All brackets closed. Double quotes only.
"""

print("\n" + "="*70)
print("GENERATING REASONING STRUCTURE")
print("="*70)
print("‚è±Ô∏è  This may take 20-40 seconds...")

try:
    # Call Claude
    output = call_claude(user_case_prompt)

    print("\n‚úì Reasoning structure generated")

    # Save all deliverables
    paths = save_case_deliverables("user_case", output)

    # Display summary
    print("\n" + "="*70)
    print("RESULTS")
    print("="*70)
    print(f"\n‚úì Case successfully structured!")
    print(f"\nTask identified:")
    print(f"  {output['task']}")
    print(f"\nKey metrics:")
    print(f"  ‚Ä¢ Facts provided: {len(output['facts_provided'])}")
    print(f"  ‚Ä¢ Assumptions flagged: {paths['assumption_count']}")
    print(f"  ‚Ä¢ Open questions: {len(output['open_questions'])}")
    print(f"  ‚Ä¢ Verifications needed: {paths['verification_count']}")
    print(f"  ‚Ä¢ Risks identified: {len(output['risks'])}")
    print(f"  ‚Ä¢ Alternatives generated: {len(output['draft_output']['alternatives_or_options'])}")

    # Show alternatives
    print(f"\nAlternatives identified:")
    for i, alt in enumerate(output['draft_output']['alternatives_or_options'], 1):
        print(f"  {i}. {alt}")

    # Show highest severity risks
    high_risks = [r for r in output['risks'] if r['severity'] == 'high']
    if high_risks:
        print(f"\n‚ö†Ô∏è  High-severity risks detected ({len(high_risks)}):")
        for risk in high_risks[:3]:  # Show first 3
            print(f"  ‚Ä¢ {risk['type']}: {risk['note']}")

    # Verification status reminder
    print(f"\n{'='*70}")
    print("VERIFICATION STATUS")
    print(f"{'='*70}")
    print(f"Status: {output['verification_status']}")
    print(f"\n‚ö†Ô∏è  CRITICAL: This is a REASONING STRUCTURE, not a recommendation.")
    print(f"All {paths['assumption_count']} assumptions must be validated independently.")
    print(f"All {paths['verification_count']} verification questions must be answered.")

    # Files saved
    print(f"\n{'='*70}")
    print("DELIVERABLES SAVED")
    print(f"{'='*70}")
    print(f"\nAll files saved to: {deliverables_dir}/user_case_*")
    print(f"\nFiles created:")
    print(f"  1. user_case_reasoning.json")
    print(f"     ‚Üí Full structured output with all reasoning")
    print(f"\n  2. user_case_assumptions.json")
    print(f"     ‚Üí Assumption register with validation tracking")
    print(f"     ‚Üí Contains {paths['assumption_count']} assumptions to validate")
    print(f"\n  3. user_case_verification.json")
    print(f"     ‚Üí Verification checklist")
    print(f"     ‚Üí Contains {paths['verification_count']} questions to answer")
    print(f"\n  4. user_case_human_readable.txt")
    print(f"     ‚Üí Human-friendly summary for review")
    print(f"     ‚Üí START HERE for reviewing the output")

    # Next steps
    print(f"\n{'='*70}")
    print("NEXT STEPS")
    print(f"{'='*70}")
    print(f"\n1. READ: user_case_human_readable.txt")
    print(f"   ‚Üí Review the reasoning structure")
    print(f"   ‚Üí Check if alternatives make sense")
    print(f"   ‚Üí Verify trade-offs are balanced")
    print(f"\n2. VALIDATE: user_case_assumptions.json")
    print(f"   ‚Üí Independently verify each assumption")
    print(f"   ‚Üí Mark validation_status for each")
    print(f"   ‚Üí Document validation_method used")
    print(f"\n3. COMPLETE: user_case_verification.json")
    print(f"   ‚Üí Answer all verification questions")
    print(f"   ‚Üí Document sources and evidence")
    print(f"   ‚Üí Record verification_outcome")
    print(f"\n4. ADD JUDGMENT:")
    print(f"   ‚Üí This structure is NOT a decision")
    print(f"   ‚Üí You must add context, priorities, constraints")
    print(f"   ‚Üí You own the final recommendation")
    print(f"\n5. SEEK APPROVAL:")
    print(f"   ‚Üí Review with domain expert")
    print(f"   ‚Üí Get stakeholder sign-off")
    print(f"   ‚Üí Document in approvals_log.json")

    # Redaction reminder
    if redaction_summary != "No redactions":
        print(f"\n{'='*70}")
        print("CONFIDENTIALITY NOTICE")
        print(f"{'='*70}")
        print(f"\nRedactions applied: {redaction_summary}")
        print(f"Original input was sanitized before sending to API.")
        print(f"Review all outputs before sharing with clients.")

    print(f"\n{'='*70}")
    print("USER EXERCISE COMPLETE")
    print(f"{'='*70}")

except Exception as e:
    print("\n" + "="*70)
    print("‚úó USER EXERCISE FAILED")
    print("="*70)
    print(f"\nError: {str(e)}")
    print(f"\nYour input was:")
    print(f"  {user_problem[:200]}...")
    print(f"\nRedacted version:")
    print(f"  {redacted_problem[:200]}...")
    print(f"\nTroubleshooting:")
    print(f"  1. Try a simpler problem statement")
    print(f"  2. Check debug_malformed_json.txt if JSON error")
    print(f"  3. Review risk_log.json for details")
    print(f"\nThe error has been logged to risk_log.json")

    # Log the failure
    risk_record = {
        "type": "traceability",
        "severity": "high",
        "note": f"User exercise failed: {str(e)[:200]}",
        "timestamp": now_iso(),
        "user_input_length": len(user_problem),
        "redaction_summary": redaction_summary
    }
    risks = read_json(base_dir / "risk_log.json")
    risks["risks"].append(risk_record)
    write_json(base_dir / "risk_log.json", risks)

USER EXERCISE: Structure Your Own Consulting Case

This exercise lets you test the Level 2 reasoning system on your own problem.

The system will:
  1. Automatically redact confidential information
  2. Generate reasoning structure (NOT recommendations)
  3. Save all governance artifacts
  4. Flag assumptions and verification needs

PROVIDE YOUR PROBLEM STATEMENT

Example: 'Our company is deciding whether to outsource IT operations
         to reduce costs while maintaining service quality.'

‚ö†Ô∏è  Do NOT include:
  - Client names or company-specific details
  - Confidential financial data
  - Personal information

Redaction will remove emails, phone numbers, and amounts automatically.

Enter your problem statement (or press Enter for example):
> 

‚Üí Using default example problem...

‚úì Input received (198 characters)
‚úì Redaction applied: No redactions

GENERATING REASONING STRUCTURE
‚è±Ô∏è  This may take 20-40 seconds...
  ‚Üí API call (attempt 1/3)...
  ‚Üí Received 10867 char

##10.AUDIT BUNDLE

###10.1.OVERVIEW

**Cell 10 ‚Äî Audit Bundle, Read-Me, and Final Packaging**

This final cell closes the loop. Up to this point, the notebook has generated reasoning structures, assumptions, risks, and verification checklists. Cell 10‚Äôs purpose is to turn all of that into something that can actually survive professional scrutiny. In other words, this is where ‚ÄúAI output‚Äù becomes an **audit-ready evidence package** rather than a loose collection of files.

First, the cell creates a comprehensive audit read-me document. This file is written for a human reviewer, not for a model. It explains, in plain language, what Level 2 is and what it is not. It draws a hard line between **structure** and **decision**, reminding the reader that issue trees and alternatives are hypotheses, not conclusions. This document matters because six months from now, or in a different team, someone needs to understand what these files represent without relying on memory or informal explanation.

Second, the cell inventories the entire run directory. Every artifact created earlier is listed with its relative path and file size. This reinforces an important professional habit: you should always be able to answer the question, ‚ÄúWhat exactly was produced, and where is it stored?‚Äù In regulated or high-stakes environments, undocumented outputs effectively do not exist.

Third, the cell bundles everything into a single zip file. This is not about convenience; it is about **integrity**. A single immutable bundle makes it easier to archive, transfer, review, or attach to a project record. It also reduces the risk of selective sharing, where only the ‚Äúnice-looking‚Äù outputs are forwarded while assumptions or risk logs are quietly omitted.

Finally, the cell prints a governance checklist and next steps. This reinforces the core teaching message of the chapter: Level 2 is incomplete by design. The checklist forces you to confirm that risks were logged, assumptions were captured, verifications remain open, and approvals are still pending. The notebook ends by explicitly stating what must happen next, and what must not happen next, before any reasoning is used in real decision-making.

Pedagogically, this cell teaches discipline. Technically, it teaches traceability. Professionally, it teaches accountability. If you remember only one thing from Cell 10, it should be this: **if you cannot bundle, explain, and audit the outputs, you are not using AI safely in a consulting or finance context.**


###10.2.CODE AND IMPLEMENTATION

In [None]:
# Cell 10
# Type: Code
# Goal: Create AUDIT_README, bundle all artifacts, and create zip file
# Output: Zip file path and audit checklist

# Cell 10: Bundle + AUDIT_README + Zip

print("="*70)
print("CREATING AUDIT BUNDLE")
print("="*70)
print("\nFinalizing governance artifacts and creating archive...")

# =============================================================================
# CREATE COMPREHENSIVE AUDIT_README.txt
# =============================================================================

audit_readme = """
================================================================================
AUDIT README: AI-ASSISTED CONSULTING (CHAPTER 2 - LEVEL 2: REASONERS)
================================================================================

Author: Alejandro Reynoso
        Chief Scientist, DEFI CAPITAL RESEARCH
        External Lecturer, Judge Business School Cambridge

Model:  claude-sonnet-4-5-20250929 (Anthropic)
Date:   """ + now_iso() + """

================================================================================
PURPOSE OF THIS ARCHIVE
================================================================================

This directory contains a COMPLETE AUDIT TRAIL for AI-assisted structured
reasoning in management consulting contexts.

CRITICAL PRINCIPLE: STRUCTURE ‚â† TRUTH

Level 2 systems generate explicit reasoning structures (issue trees,
alternatives, trade-offs, assumption registers) that make thinking INSPECTABLE,
NOT CORRECT.

An issue tree is a HYPOTHESIS about problem decomposition, not a FACT.
Alternatives are FRAMINGS to explore, not RANKINGS.
Trade-offs expose TENSIONS, not OPTIMAL PATHS.
Assumptions are EXPLICIT GAPS requiring validation.

================================================================================
WHAT IS LEVEL 2?
================================================================================

Level 2 ("Reasoners") sits between:
- Level 1: Simple Q&A and content generation
- Level 3: Multi-step agents with autonomous chaining

Level 2 Characteristics:
‚úì Generates explicit reasoning structures
‚úì Makes assumptions visible and trackable
‚úì Requires human validation at every step
‚úì One model call at a time (no autonomous chains)
‚úì Structures thinking, does NOT make decisions

Level 2 Boundaries:
‚úó NO recommendations or rankings
‚úó NO "best option" or "you should choose X"
‚úó NO autonomous web browsing or tool chaining
‚úó NO fabricated facts or unverified benchmarks

================================================================================
DIRECTORY STRUCTURE
================================================================================

run_manifest.json
  - Run metadata, model configuration, environment fingerprint
  - Configuration hash for reproducibility
  - Timestamp and author attribution

prompts_log.jsonl (REDACTED)
  - Each prompt logged as SHA-256 hash only
  - Protects confidentiality while maintaining traceability
  - Timestamps and character counts recorded

risk_log.json
  - All risks detected (auto-detected and manual)
  - Risk types: confidentiality, hallucination, missing_facts, traceability,
    false_rigor, decision_laundering, scope_creep
  - Severity levels: low | medium | high
  - Each risk timestamped and categorized

verification_register.json
  - Master verification tracking across all cases
  - Links to per-case verification checklists
  - Tracks completion status

change_log.json
  - Documents any modifications to artifacts
  - Maintains audit trail of edits

approvals_log.json
  - Human approval placeholders for each case
  - Approval conditions and sign-off requirements
  - Reviewer roles and timestamps

deliverables/
  - Per-case outputs organized by case name
  - Four files per case:
    * <case>_reasoning.json       - Full structured output from model
    * <case>_assumptions.json     - Assumption register with validation status
    * <case>_verification.json    - Verification checklist and questions
    * <case>_human_readable.txt   - Summary formatted for human review

debug_malformed_json.txt (if present)
  - Diagnostic output when JSON parsing fails
  - Shows original response, extracted JSON, and error details

================================================================================
HOW TO REVIEW THESE ARTIFACTS SAFELY
================================================================================

STEP 1: START WITH RISK_LOG.JSON
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Before examining any reasoning output:
  ‚ñ° Open risk_log.json
  ‚ñ° Review all flagged risks
  ‚ñ° Pay special attention to severity="high" items
  ‚ñ° Look for "false_rigor" and "missing_facts" risks
  ‚ñ° Understand what could go wrong before proceeding

STEP 2: READ RUN_MANIFEST.JSON
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Understand the context:
  ‚ñ° Which model was used?
  ‚ñ° What parameters (temperature, max_tokens)?
  ‚ñ° When was this run created?
  ‚ñ° What was the notebook's purpose?

STEP 3: REVIEW VERIFICATION_REGISTER.JSON
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Check verification requirements:
  ‚ñ° Each case should have verification_status = "Not verified"
  ‚ñ° Count total verifications needed
  ‚ñ° No reasoning should be used until verifications completed
  ‚ñ° Plan your validation approach

STEP 4: INSPECT HUMAN_READABLE.TXT FILES
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
For each case in deliverables/:
  ‚ñ° Read <case>_human_readable.txt first
  ‚ñ° Check for neutral language (no "best", "optimal", "recommended")
  ‚ñ° Verify trade-offs are symmetric (pros AND cons listed)
  ‚ñ° Confirm alternatives are framings, not rankings
  ‚ñ° Look for verification_status = "Not verified"

STEP 5: REVIEW ASSUMPTION REGISTERS
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
For each <case>_assumptions.json:
  ‚ñ° Every assumption must be validated independently
  ‚ñ° Do NOT accept assumptions as facts
  ‚ñ° Ask: "What would invalidate this assumption?"
  ‚ñ° Document validation_method and validation_outcome
  ‚ñ° Mark validation_status for each assumption

STEP 6: COMPLETE VERIFICATION CHECKLISTS
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
For each <case>_verification.json:
  ‚ñ° Answer each verification question
  ‚ñ° Document sources and evidence
  ‚ñ° Record verification_outcome
  ‚ñ° Note any discrepancies or contradictions

STEP 7: CHECK FOR DECISION LAUNDERING
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Critical red flags:
  ‚úó "The AI recommended..." (AI doesn't recommend)
  ‚úó Presenting AI output as complete analysis
  ‚úó Skipping validation because output "looks good"
  ‚úó Using structure as proof rather than hypothesis

Green flags:
  ‚úì "AI structured three alternatives for us to evaluate..."
  ‚úì "After validating assumptions, we determined..."
  ‚úì Clear attribution of human judgment
  ‚úì Transparent about what was verified vs assumed

================================================================================
DIFFERENCE BETWEEN STRUCTURE AND DECISION
================================================================================

WHAT AI PROVIDES (Structure):
  ‚Ä¢ Issue tree decomposition
  ‚Ä¢ Alternative framings
  ‚Ä¢ Trade-off mapping
  ‚Ä¢ Assumption lists
  ‚Ä¢ Open questions
  ‚Ä¢ Verification checklists

WHAT HUMAN PROVIDES (Decision):
  ‚Ä¢ Validation of assumptions
  ‚Ä¢ Choice among alternatives
  ‚Ä¢ Resolution of trade-offs
  ‚Ä¢ Stakeholder input
  ‚Ä¢ Context and constraints
  ‚Ä¢ Final recommendation
  ‚Ä¢ Accountability

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ö†Ô∏è  CRITICAL: Never present AI-generated structures as complete analysis.
    Always complete verification and approval steps.
    The consultant owns the judgment, not the AI.
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

================================================================================
COMMON PITFALLS TO AVOID
================================================================================

PITFALL 1: Treating Completeness as Correctness
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úó Just because an issue tree looks comprehensive doesn't mean it's right
‚úì Every branch is a hypothesis requiring validation

PITFALL 2: Skipping Assumption Validation
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úó "These assumptions seem reasonable, let's proceed"
‚úì Every assumption must be independently verified with evidence

PITFALL 3: Presenting Options as Recommendations
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úó "Here are three approaches" ‚Üí client hears "you should do approach A"
‚úì "Here are three framings we need to evaluate against criteria X, Y, Z"

PITFALL 4: Using AI Output Verbatim
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úó Copy-paste AI reasoning into client deliverable
‚úì Synthesize, validate, and add independent human judgment

PITFALL 5: Decision Laundering
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úó "The AI said we should do X" (hiding behind AI)
‚úì "After structuring alternatives with AI and validating assumptions,
    we recommend X because..."

PITFALL 6: Ignoring Risk Log
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úó Proceeding without reviewing flagged risks
‚úì Address all high-severity risks before using output

PITFALL 7: False Rigor
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úó Impressive-looking structure = rigorous analysis
‚úì Rigor comes from validation, not structure

================================================================================
GOVERNANCE CHECKLIST BEFORE USING ANY OUTPUT
================================================================================

Before presenting or acting on any AI-generated reasoning:

VALIDATION CHECKLIST:
‚ñ° All assumptions validated independently
‚ñ° All verification questions answered
‚ñ° Sources and evidence documented
‚ñ° Conflicting information reconciled
‚ñ° Missing data explicitly flagged

QUALITY CHECKLIST:
‚ñ° Risk assessment reviewed
‚ñ° Reasoning structure inspected for false rigor
‚ñ° Language is neutral (no rankings or recommendations)
‚ñ° Trade-offs are symmetric (pros AND cons)
‚ñ° Alternatives are genuine framings, not disguised rankings

GOVERNANCE CHECKLIST:
‚ñ° Human consultant has added independent judgment
‚ñ° Confidentiality redactions confirmed
‚ñ° Approval recorded by appropriate reviewer
‚ñ° Stakeholder input incorporated
‚ñ° Final recommendation is human-owned

COMMUNICATION CHECKLIST:
‚ñ° Clear that structure came from AI
‚ñ° Clear that validation was done by humans
‚ñ° Clear that decision is human judgment
‚ñ° Transparent about limitations and uncertainties

================================================================================
WHEN TO ESCALATE OR SEEK HELP
================================================================================

Escalate if any output:
  ‚úó Makes recommendations rather than structuring options
  ‚úó Presents unverified facts as authoritative
  ‚úó Ranks or scores alternatives with implied preference
  ‚úó Lacks assumption transparency
  ‚úó Contains fabricated data or citations

Contact governance lead if:
  ‚Ä¢ High-severity risks in risk_log.json
  ‚Ä¢ Verification questions cannot be answered with available data
  ‚Ä¢ Assumptions cannot be validated
  ‚Ä¢ Stakeholders want to use output without validation
  ‚Ä¢ Pressure to skip approval steps

================================================================================
QUESTIONS OR CONCERNS?
================================================================================

If you have questions about:
  ‚Ä¢ How to validate specific assumptions
  ‚Ä¢ Whether output is appropriate for use
  ‚Ä¢ Risk assessment and mitigation
  ‚Ä¢ Governance and approval process

Refer to your organization's AI governance policies or contact the
technical governance lead.

For questions about this specific notebook implementation, contact:
Alejandro Reynoso (alejandro.reynoso@example.com)

================================================================================
TECHNICAL DETAILS
================================================================================

Model: claude-sonnet-4-5-20250929
Parameters:
  - temperature: 0.2 (smoke test) / 0.1 (production)
  - max_tokens: 2500
  - system prompt: Enhanced with Level 2 reasoning guardrails

JSON Validation:
  - Strict schema enforcement
  - Automatic repair for common syntax errors
  - Retry logic for malformed responses
  - Debug output saved when parsing fails

Auto-Risk Detection:
  - Scope creep: < 2 alternatives flagged
  - Missing facts: No assumptions flagged
  - False rigor: Shallow tree depth flagged

================================================================================
LICENSE AND ATTRIBUTION
================================================================================

This notebook and framework: ¬© 2025 Alejandro Reynoso
Model: Claude Sonnet 4.5 by Anthropic

When citing this work:
  Reynoso, A. (2025). AI-Assisted Consulting Framework: Chapter 2 -
  Level 2 Reasoners. Judge Business School, University of Cambridge.

================================================================================
VERSION HISTORY
================================================================================

v1.0 - Initial release (""" + now_iso() + """)
  - 4 mini-case demonstrations
  - Full governance artifact generation
  - Strict JSON validation with repair
  - User exercise capability

================================================================================
END OF AUDIT README
================================================================================
"""

# Write AUDIT_README.txt
audit_readme_path = base_dir / "AUDIT_README.txt"
with open(audit_readme_path, 'w') as f:
    f.write(audit_readme)

print("‚úì Created AUDIT_README.txt")

# =============================================================================
# INVENTORY ALL FILES
# =============================================================================

print("\n" + "="*70)
print("AUDIT BUNDLE CONTENTS")
print("="*70 + "\n")

all_files = []
total_size = 0

for item in sorted(base_dir.rglob('*')):
    if item.is_file():
        rel_path = item.relative_to(base_dir)
        size = item.stat().st_size
        total_size += size
        all_files.append((str(rel_path), size))

# Print file listing with sizes
print(f"{'File Path':<55} {'Size (KB)':<12}")
print("-" * 70)

for filepath, size in all_files:
    size_kb = size / 1024
    # Highlight key governance files
    marker = "üìã" if any(x in filepath for x in ['manifest', 'risk', 'verification', 'approval']) else "  "
    print(f"{marker} {filepath:<53} {size_kb:>10.1f}")

print("-" * 70)
print(f"{'TOTAL FILES: ' + str(len(all_files)):<55} {total_size/1024:>10.1f}")
print()

# =============================================================================
# CREATE ZIP ARCHIVE
# =============================================================================

print("="*70)
print("CREATING ZIP ARCHIVE")
print("="*70)

import shutil

zip_name = f"{run_name}_audit_bundle"
zip_base_path = Path(f"/content/{zip_name}")

print(f"\nArchiving: {base_dir}")
print(f"Output: {zip_base_path}.zip")
print("\n‚è±Ô∏è  This may take a few seconds...")

# Create the zip
shutil.make_archive(str(zip_base_path), 'zip', base_dir)

final_zip_path = f"{zip_base_path}.zip"
zip_size_mb = Path(final_zip_path).stat().st_size / (1024 * 1024)

print(f"\n‚úì Archive created successfully")
print(f"  Path: {final_zip_path}")
print(f"  Size: {zip_size_mb:.2f} MB")
print(f"  Files: {len(all_files)}")

# =============================================================================
# FINAL GOVERNANCE CHECKLIST
# =============================================================================

print("\n" + "="*70)
print("FINAL GOVERNANCE CHECKLIST")
print("="*70 + "\n")

# Count items in governance logs
manifest = read_json(base_dir / "run_manifest.json")
risks = read_json(base_dir / "risk_log.json")
verifications = read_json(base_dir / "verification_register.json")
approvals = read_json(base_dir / "approvals_log.json")

# Count prompts
prompt_count = 0
if (base_dir / "prompts_log.jsonl").exists():
    with open(base_dir / "prompts_log.jsonl", 'r') as f:
        prompt_count = sum(1 for _ in f)

# Count deliverables
reasoning_files = list(deliverables_dir.glob("*_reasoning.json"))
case_count = len(reasoning_files)

checklist_items = [
    ("Run manifest generated", True, manifest.get("run_id") == run_name),
    ("Prompts logged (redacted)", True, prompt_count > 0),
    ("Risk log populated", True, len(risks.get("risks", [])) > 0),
    ("Verification register initialized", True, True),
    ("Change log created", True, (base_dir / "change_log.json").exists()),
    ("Approvals log with placeholders", True, len(approvals.get("approvals", [])) > 0),
    ("Cases processed", True, case_count >= 4),
    ("Deliverables saved", True, case_count * 4 <= len(all_files)),
    ("AUDIT_README.txt included", True, audit_readme_path.exists()),
    ("Zip bundle created", True, Path(final_zip_path).exists()),
]

for item_name, required, status in checklist_items:
    symbol = "‚úì" if status else ("‚úó" if required else "‚ö†")
    status_text = "PASS" if status else ("FAIL" if required else "WARN")
    print(f"  {symbol} {item_name:<40} [{status_text}]")

# Calculate success rate
passed = sum(1 for _, _, status in checklist_items if status)
total = len(checklist_items)

print(f"\n{'='*70}")
print(f"CHECKLIST RESULT: {passed}/{total} items passed")
print(f"{'='*70}")

# =============================================================================
# SUMMARY STATISTICS
# =============================================================================

print("\n" + "="*70)
print("RUN SUMMARY STATISTICS")
print("="*70)

print(f"\nRun ID: {run_name}")
print(f"Config Hash: {manifest['config_hash']}")
print(f"Model: {manifest['model']}")
print(f"Created: {manifest['created_at']}")

print(f"\nGovernance:")
print(f"  ‚Ä¢ Prompts logged: {prompt_count}")
print(f"  ‚Ä¢ Risks flagged: {len(risks.get('risks', []))}")
print(f"  ‚Ä¢ Verifications needed: {len(verifications.get('verifications', []))}")
print(f"  ‚Ä¢ Approvals pending: {len(approvals.get('approvals', []))}")

print(f"\nDeliverables:")
print(f"  ‚Ä¢ Cases processed: {case_count}")
print(f"  ‚Ä¢ Files created: {len(all_files)}")
print(f"  ‚Ä¢ Total size: {total_size/1024:.1f} KB")

# Count assumptions and verifications across all cases
total_assumptions = 0
total_verification_questions = 0

for reasoning_file in reasoning_files:
    case_name = reasoning_file.stem.replace('_reasoning', '')
    assumptions_file = deliverables_dir / f"{case_name}_assumptions.json"
    verification_file = deliverables_dir / f"{case_name}_verification.json"

    if assumptions_file.exists():
        assumptions_data = read_json(assumptions_file)
        total_assumptions += assumptions_data.get("total_assumptions", 0)

    if verification_file.exists():
        verification_data = read_json(verification_file)
        total_verification_questions += verification_data.get("total_verifications", 0)

print(f"\nValidation Required:")
print(f"  ‚Ä¢ Total assumptions to validate: {total_assumptions}")
print(f"  ‚Ä¢ Total verification questions: {total_verification_questions}")

# =============================================================================
# NEXT STEPS AND DOWNLOAD INSTRUCTIONS
# =============================================================================

print("\n" + "="*70)
print("NEXT STEPS")
print("="*70)

print(f"""
1. DOWNLOAD THE ZIP FILE
   ‚Üí Click the folder icon (Files) in the left sidebar
   ‚Üí Navigate to: {final_zip_path}
   ‚Üí Right-click ‚Üí Download
   ‚Üí Extract locally for review

2. READ THE AUDIT_README.txt
   ‚Üí Located in the root of the extracted folder
   ‚Üí Contains complete review instructions
   ‚Üí Explains governance requirements

3. REVIEW EACH CASE
   ‚Üí Start with <case>_human_readable.txt files
   ‚Üí Check assumptions in <case>_assumptions.json
   ‚Üí Complete verifications in <case>_verification.json

4. VALIDATE BEFORE USE
   ‚ñ° Verify all assumptions independently
   ‚ñ° Answer all verification questions
   ‚ñ° Review risk_log.json for high-severity items
   ‚ñ° Add human judgment and context
   ‚ñ° Obtain stakeholder approval

5. REMEMBER THE PRINCIPLE
   ‚ö†Ô∏è  AI structures reasoning; humans make decisions
   ‚ö†Ô∏è  Never use AI output without validation
   ‚ö†Ô∏è  The consultant owns the final judgment

""")

print("="*70)
print("AUDIT BUNDLE COMPLETE")
print("="*70)

print(f"""
‚úì All governance artifacts generated
‚úì All deliverables saved
‚úì Archive created and ready for download

Download: {final_zip_path}
Size: {zip_size_mb:.2f} MB
Files: {len(all_files)}

This notebook has completed successfully.
""")

print("="*70)

CREATING AUDIT BUNDLE

Finalizing governance artifacts and creating archive...
‚úì Created AUDIT_README.txt

AUDIT BUNDLE CONTENTS

File Path                                               Size (KB)   
----------------------------------------------------------------------
   AUDIT_README.txt                                            15.7
üìã approvals_log.json                                           1.8
   change_log.json                                              0.0
   debug_malformed_json.txt                                     6.4
   deliverables/capital_allocation_assumptions.json             6.5
   deliverables/capital_allocation_human_readable.txt           8.9
   deliverables/capital_allocation_reasoning.json               8.4
üìã deliverables/capital_allocation_verification.json            6.0
   deliverables/market_entry_assumptions.json                   5.9
   deliverables/market_entry_human_readable.txt                11.1
   deliverables/market_entry_reasoning.json 

##11.CONCLUSIONS

**Conclusion ‚Äî What You Should Take Away from This Notebook (Chapter 2, Level 2: Reasoners)**

This notebook is not designed to impress you with what a model can produce. It is designed to change your behavior as a consultant or finance professional. If you complete it properly, you should leave with a clear distinction between two very different things: structured reasoning and validated conclusions. Level 2 is powerful precisely because it can generate clean, coherent structures fast. But that same coherence is the hazard. It is easy to confuse a well-formed issue tree with a correct diagnosis, or a polished trade-off map with a sound decision. The notebook exists to train you out of that mistake.

The first and most important takeaway is that Level 2 produces **inspectable thinking**, not correct answers. Inspectability is a governance concept. It means a third party can look at the output and understand how the problem was framed, what alternatives were considered, what tensions were identified, and what assumptions were required. In professional work, that is often more valuable than speed. An inspectable structure can be challenged, improved, and documented. A vague ‚Äúanswer‚Äù cannot. This notebook forces structure so that your reasoning becomes easier to review and harder to hide behind. That is why it bans recommendations and rankings. If the model were allowed to ‚Äúpick,‚Äù you would quickly drift into delegating judgment. Level 2 is not about outsourcing judgment. It is about making judgment visible.

The second takeaway is that **false rigor is the signature risk** of this maturity level. At Level 1, the danger is often obvious: hallucinated facts, confident claims, sloppy drafting. At Level 2, the danger is more subtle. The model can produce something that looks like professional consulting work, with layered bullets, clear categories, and confident language. That appearance can seduce you into skipping the hard parts: checking the premises, testing the sensitivities, and confronting what you do not know. In other words, Level 2 can help you launder a decision you already want to make. The notebook‚Äôs discipline is meant to interrupt that. It forces explicit assumptions, explicit unknowns, and a permanent ‚ÄúNot verified‚Äù label to prevent you from mistaking structure for truth.

The third takeaway is that **assumptions are not an embarrassing flaw; they are the real work product.** In strategy and finance, the decision is rarely determined by the visible facts alone. The decision depends on the assumptions that bridge gaps in knowledge: how the market will react, how costs will scale, how competitors will respond, what regulatory constraints will bite, what execution capacity exists, what capital is truly scarce, and what risks the organization can tolerate. This notebook treats assumptions as first-class objects. It gathers them, labels them, and turns them into a register that can be reviewed and validated. If you are using Level 2 correctly, you should feel an increasing respect for the assumption register. It is the honest representation of what you do not yet know. It is also the most defensible basis for deciding what to verify first.

The fourth takeaway is that **verification is a workflow, not a moral aspiration.** Many teams say they will verify later, and then ‚Äúlater‚Äù disappears under deadlines. This notebook makes verification concrete by turning open questions into a checklist and by saving those questions as artifacts. The intent is to make it difficult to pretend that verification happened when it did not. In a real organization, that checklist becomes a management tool: who owns each verification, what method will be used, when it must be completed, and what evidence will be stored. Even in an educational setting, the practice matters. The habit you are building is not ‚Äútrust the model less.‚Äù The habit is ‚Äúdesign the process so trust is unnecessary.‚Äù

The fifth takeaway is that Level 2 teaches a professional form of humility: **neutrality and reversibility.** A strong reasoning structure should not collapse if someone argues the opposite conclusion. In fact, you should be able to defend competing options with comparable rigor because the structure is not a verdict. It is a map. Neutrality does not mean you never decide. It means you delay decision until you have clarified the decision criteria, surfaced the tensions, and identified which unknowns truly drive the outcome. Reversibility is a practical test of whether you are reasoning or rationalizing. If you cannot articulate the strongest version of the alternative, you are probably not doing analysis. You are doing advocacy.

The sixth takeaway is about confidentiality and professional boundaries. Even though this notebook is not a production system, it trains the right posture: provide the minimum necessary input, redact what you can, and treat prompts and outputs as records that may be reviewed. In real consulting or corporate settings, the ‚Äúinput discipline‚Äù is part of your professional duty. You do not get to outsource that duty to a tool. The notebook reinforces this by logging prompts in a redacted way and by emphasizing that governance includes what you choose to share, not only what you receive back.

The seventh takeaway is that **process is part of the deliverable.** This is the deeper shift that Level 2 introduces. Traditionally, analysts focus on the final deck or memo. This notebook trains you to treat the run manifest, risk log, assumption register, verification checklist, and approval placeholders as essential components of responsible work. These artifacts make your reasoning reconstructable. They allow someone else to understand what happened, why it happened, and where the vulnerabilities are. That is the foundation of defensibility. In real firms, this is how you prevent inconsistent quality, reduce rework, and avoid the repeated reinvention of analysis under pressure.

If you want a practical way to judge whether you used this notebook well, ask yourself these questions. Did the outputs make it easier to identify what matters most, or did they merely make the situation sound organized? Did you end with a clearer sense of which assumptions drive the decision, or did you end with a longer list of content? Did you produce verification questions that would genuinely change your thinking if answered, or did you produce generic ‚Äúto-dos‚Äù that nobody will complete? Did you maintain neutral language, or did the output subtly steer toward one option? And most importantly, could you hand the artifact bundle to a skeptical reviewer and have them understand your reasoning and your gaps without needing you in the room?

The final lesson is that Level 2 is a stepping stone, not a destination. It gives you a disciplined way to structure problems, but it does not give you the operational system to execute multi-step workflows with separation of duties, stage gates, and immutable logs across multiple actors. That is what the next level is about. If you mastered this notebook, you are ready to move from ‚Äústructured reasoning‚Äù to ‚Äústructured workflow.‚Äù You will carry forward the same principle: as capability increases, governance must increase in parallel. The model‚Äôs output will get more useful, and the risk of scaling error will rise with it. Your job is to scale control at the same pace.

Leave this notebook with one rule burned into your professional instincts: **structure is not truth, and confidence is not verification.** If you keep that rule, Level 2 will make you faster and sharper without making you reckless. If you forget it, Level 2 will make you persuasive while you are wrong. The notebook is designed so that the first outcome is easier than the second, but it still requires you to own the discipline. The model can produce the structure. Only you can produce defensibility.
