#**AI FINANCIAL ADVISOR CHAPTER 3: AGENTS**

---

##0.REFERENCE

https://claude.ai/share/3ae229b5-6333-41a7-a66d-c85618e02ae3

##1.CONTEXT

**Understanding Structured AI Reasoning for Financial Advisors: A New Paradigm Beyond Simple Chatbots**

When most people think about interacting with artificial intelligence, they imagine typing questions into a chat window and receiving conversational answers. This traditional chatbot interaction works well for general inquiries, creative writing, or casual research. You ask a question, the AI responds with text, and the conversation flows naturally without any particular structure or documentation. However, this informal approach presents serious challenges in regulated industries like financial services, where every recommendation must be documented, every assumption must be traceable, and every decision must withstand regulatory scrutiny.

This notebook represents a fundamentally different approach to working with AI in professional advisory contexts. Instead of casual conversation, it implements what we call structured reasoning with comprehensive governance controls. The difference is profound and addresses the core challenge facing financial advisors who want to leverage AI capabilities while remaining compliant with regulations like Regulation Best Interest, fiduciary standards, and recordkeeping requirements.

In a traditional chatbot interaction, you might ask something like "What should my client do with their concentrated stock position?" and receive a narrative response discussing various options. The problem is that this response disappears unless you manually copy it somewhere. There's no automatic record of what assumptions the AI made, no documentation of what alternatives were considered, no log of the exact question asked, and no systematic way to verify that the AI didn't cross boundaries by making recommendations that only a qualified human advisor should make. If a regulator later questions your advice, you have no defensible trail showing how you used AI in your process.

This notebook solves these problems through four fundamental innovations that transform AI from an uncontrolled conversational tool into a governed reasoning assistant.

**First, the notebook enforces strict boundaries through what we call Level Two reasoning.** Traditional chatbots will happily tell you what to recommend, which securities to buy, or whether something is suitable for a client. This notebook's architecture prevents the AI from crossing those lines. It's programmed to separate facts from assumptions, identify alternatives without recommending any particular one, surface questions that need human judgment, and detect gaps in information. The AI acts as a reasoning scaffold that organizes thinking rather than a decision-maker that replaces professional judgment. Every prompt sent to the AI explicitly reinforces these boundaries, and automated risk detection scans responses for language that would indicate the AI overstepped its role.

**Second, the notebook creates comprehensive audit trails that make every interaction traceable and defensible.** When you use a traditional chatbot, the conversation happens and then it's gone unless you manually save it. This notebook automatically logs every prompt sent to the AI and every response received, with both redacted to protect confidentiality. Each log entry includes cryptographic hashes that create an immutable chain, meaning any tampering would be immediately detectable. The system also generates a run manifest that documents exactly which AI model was used, what parameters controlled its behavior, and what governance rules were in effect. If you need to demonstrate to a compliance officer or regulator that you used AI appropriately, you can provide the complete bundle showing exactly what happened, when it happened, and under what controls.

**Third, the notebook implements systematic risk detection that identifies potential problems in real time.** As the AI generates responses, automated scanners check for recommendation language like "you should" or "I recommend," invented authority like fabricated SEC rules or FINRA requirements, missing disclaimers that should appear in every output, insufficient alternatives when multiple options should be presented, and gaps in critical information that would make any analysis incomplete. Each detected risk gets logged with severity ratings, creating a risk register that supervisors can review. This is fundamentally different from hoping you'll notice problems yourself in a casual chat conversation.

**Fourth, the notebook produces structured deliverables rather than free-form text.** Instead of getting paragraphs of narrative that you need to interpret and extract value from, the AI returns information in standardized JSON format with specific fields for facts, assumptions, alternatives, open questions, analysis, and risks. This structure ensures consistency across cases, makes information easy to find and review, enables automated quality checks, and creates artifacts that can be directly incorporated into supervision files. The structured format also means you can build workflows where one advisor's reasoning artifacts become inputs for supervisor review or peer consultation.

The practical benefits for financial advisory practices are substantial. Imagine an advisor preparing for a client meeting about retirement income planning. In the traditional chatbot approach, the advisor might have several informal conversations with AI, getting various suggestions and ideas, but ending up with nothing documented and no clear separation between the AI's input and the advisor's own professional judgment. With this structured reasoning system, the advisor inputs sanitized client facts, receives back a reasoning map that clearly separates what's known from what's assumed from what's unknown, gets a comparison of alternative approaches without any recommendations, sees questions surfaced about information gaps, and obtains all of this in documented JSON files with full audit trails showing the AI stayed within appropriate boundaries.

For compliance officers and supervisors, the benefits are equally compelling. Traditional chatbot usage is nearly impossible to supervise effectively because there's no systematic way to know what advisors asked, what responses they received, or how they used those responses. This notebook produces a complete bundle for every run including the governance manifest showing what rules were in effect, immutable logs of all AI interactions, risk registers flagging potential issues, and structured outputs for each case that can be reviewed against standardized criteria. The supervisor can verify that facts were separated from assumptions, that multiple alternatives were identified, that no recommendations were made, and that all regulatory references were marked as unverified.

The importance of this approach extends beyond individual compliance. In regulated industries, the question is not whether professionals will use AI tools, but whether they'll use them in ways that create liability or in ways that enhance quality while maintaining defensibility. Traditional chatbot usage creates hidden risks because it happens in the shadows without documentation, encourages boundary violations because the AI naturally wants to be helpful by making recommendations, provides no systematic quality control, and leaves no trail for supervision or regulatory examination.

Structured reasoning with governance controls brings AI usage into the light. It creates transparency through comprehensive logging, enforces appropriate boundaries through architecture rather than hoping users will self-regulate, enables supervision through standardized outputs and risk registers, and produces defensible artifacts that demonstrate responsible use. This transforms AI from a compliance risk into a compliance-positive tool that actually strengthens your documentation and supervision processes.

The notebook's approach recognizes a fundamental truth about AI in professional services: the technology is powerful but must be channeled appropriately. Just as financial advisors use sophisticated analytical tools but remain responsible for recommendations, this system lets advisors leverage AI's reasoning capabilities while maintaining clear human accountability. The AI structures information, identifies considerations, and surfaces questions, but the qualified human advisor still makes all judgments about suitability, best interest, and appropriate courses of action.

For practices considering AI adoption, this notebook demonstrates that the choice is not between using AI or avoiding it, but between using AI recklessly or using it responsibly. The structured governance approach shown here provides a template for bringing powerful AI capabilities into regulated advisory work without creating the documentation gaps, boundary violations, or supervision challenges that would come from treating AI as just another chatbot to have casual conversations with. This is how professional services can harness transformative technology while honoring the regulatory frameworks that protect investors and maintain market integrity.

##2.LIBRARIES AND ENVIRONMENT

In [1]:
# Cell 2
# Goal: Install dependencies, imports, and create run directory structure
# Output: Confirmation messages showing setup completion

!pip install -q anthropic

import anthropic
import os
import json
import hashlib
import uuid
from datetime import datetime
from typing import Dict, List, Any, Optional
from pathlib import Path
import zipfile

# Create run directory structure
RUN_ID = f"run_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
RUN_DIR = Path(f"/content/{RUN_ID}")
DELIVERABLES_DIR = RUN_DIR / "deliverables"
LOGS_DIR = RUN_DIR / "logs"

RUN_DIR.mkdir(exist_ok=True)
DELIVERABLES_DIR.mkdir(exist_ok=True)
LOGS_DIR.mkdir(exist_ok=True)

print(f"‚úì Dependencies installed")
print(f"‚úì Run directory created: {RUN_DIR}")
print(f"‚úì Run ID: {RUN_ID}")
print(f"‚úì Deliverables: {DELIVERABLES_DIR}")
print(f"‚úì Logs: {LOGS_DIR}")

[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/390.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[91m‚ï∏[0m [32m389.1/390.3 kB[0m [31m13.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m390.3/390.3 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25h‚úì Dependencies installed
‚úì Run directory created: /content/run_20260115_155535_110b2989
‚úì Run ID: run_20260115_155535_110b2989
‚úì Deliverables: /content/run_20260115_155535_110b2989/deliverables
‚úì Logs: /content/run_20260115_155535_110b2989/logs


##3.CLAUDE API AND CLIENT INITIALIZATION

###3.1.OVERVIEW



When you run Cell 3, the notebook attempts to connect to the Anthropic AI service using your API key. Think of this like logging into a service - you need credentials to access it.

The cell first looks for your API key in Colab's secure storage area called Secrets. This is similar to how password managers store your passwords safely. If the key is found, the notebook creates a connection object called "client" that will be used throughout the notebook to communicate with the Claude AI model.

You'll see a success message confirming the API client is ready, along with details about which AI model will be used. The model specified is claude-sonnet-4-5-20250929, which is a specific version of Claude designed for complex reasoning tasks. The configuration also shows that responses will be limited to 4096 tokens (roughly 3000-3500 words) and the temperature is set to 0.2, meaning responses will be focused and consistent rather than creative or varied.

If the API key is not found, you'll see an error message with step-by-step instructions. The instructions guide you to add your Anthropic API key to Colab's Secrets manager. This involves clicking the key icon in the left sidebar, creating a new secret named ANTHROPIC_API_KEY, pasting your actual API key as the value, and enabling notebook access. This security approach ensures your API key is never visible in the notebook code itself, protecting it from accidental exposure.

The error handling is designed to be educational - it doesn't just fail silently but instead teaches you exactly what needs to be configured. This is important because without a valid API key, none of the AI agents in the notebook can function. The connection established here becomes the foundation for all subsequent AI-powered operations in the workflow.

Once successful, this cell essentially opens the communication channel between your notebook and Anthropic's AI service, allowing the financial advisory workflow agents to process scenarios, generate drafts, and create governance artifacts throughout the remaining cells.

###3.2.CODE AND IMPLEMENTATION

In [21]:
# Cell 3
# Goal: Initialize Anthropic API client with key from Colab secrets
# Output: Confirmation of successful API client initialization

from google.colab import userdata

try:
    ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')
    os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_API_KEY
    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    print("‚úì Anthropic API client initialized successfully")
    print("‚úì Model: claude-sonnet-4-5-20250929")
    print("‚úì Max tokens: 4096 (increased for complete JSON responses)")
    print("‚úì Temperature: 0.2")
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("\n‚ö†Ô∏è Setup required:")
    print("1. Click the üîë key icon in the left sidebar (Secrets)")
    print("2. Add a new secret named: ANTHROPIC_API_KEY")
    print("3. Paste your Anthropic API key as the value")
    print("4. Enable 'Notebook access' toggle")
    print("5. Re-run this cell")
    raise

‚úì Anthropic API client initialized successfully
‚úì Model: claude-sonnet-4-5-20250929
‚úì Max tokens: 4096 (increased for complete JSON responses)
‚úì Temperature: 0.2


##4.MANIFEST AND LOGGING INFRASTRUCTURE

###4.1.OVERVIEW



Cell 4 creates the foundational record-keeping infrastructure for the entire workflow. Think of this as setting up a new filing cabinet system before starting any work - everything that happens later will be organized according to the structure created here.

The cell first generates a unique identifier for this notebook run, combining the current date, time, and a random code. This run identifier acts like a case number in a law firm - it allows you to distinguish this particular execution from any other time you run the notebook. Every file, log entry, and artifact created during this session will be tagged with this identifier.

Next, the cell creates a manifest file, which is essentially a detailed label describing this entire package. The manifest records what AI model was used, what settings were applied, when the run started, and who authored the notebook. It also includes a configuration hash, which is like a fingerprint of all the settings used. This hash allows anyone reviewing the work later to verify that the configuration hasn't been changed since the run completed.

The cell then initializes two critical logging systems. The first is the prompts log, which will record every interaction with the AI in an immutable chain. Immutable means once something is written, it cannot be changed or deleted - similar to how blockchain records work. This log starts with a genesis entry, like the first block in a blockchain, which begins the hash chain. Each subsequent log entry will mathematically link to the previous one, creating a tamper-evident audit trail.

The second log is the risk register, starting as an empty list ready to capture any issues detected during workflow execution. This might include risks like missing information, potential conflicts of interest, or workflow integrity problems.

When this cell completes successfully, you'll see confirmation messages showing where each file was created, along with the configuration hash. This setup ensures complete traceability - a supervisor or auditor can later verify exactly what happened, when it happened, and under what configuration, which is essential for regulatory compliance in financial advisory contexts.

###4.2.CODE AND IMPLEMENTATION

In [22]:
# Cell 4
# Goal: Create manifest and initialize immutable logging infrastructure
# Output: Manifest file created with run metadata and hash chain initialized

# Generate manifest
ENV_FINGERPRINT = {
    "python_version": "3.10+",
    "colab": True,
    "model": "claude-sonnet-4-5-20250929",
    "temperature": 0.2,
    "max_tokens": 1200
}

CONFIG_HASH = hashlib.sha256(
    json.dumps(ENV_FINGERPRINT, sort_keys=True).encode()
).hexdigest()[:16]

MANIFEST = {
    "run_id": RUN_ID,
    "timestamp": datetime.now().isoformat(),
    "author": "Alejandro Reynoso, Chief Scientist DEFI CAPITAL RESEARCH",
    "chapter": "Chapter 3 - Level 3 (Agents)",
    "model": "claude-sonnet-4-5-20250929",
    "temperature": 0.2,
    "max_tokens": 1200,
    "config_hash": CONFIG_HASH,
    "env_fingerprint": ENV_FINGERPRINT,
    "scope": "Agentic advisory workflows with human-in-the-loop checkpoints",
    "disclaimer": "NOT INVESTMENT, TAX, OR LEGAL ADVICE. Draft assistance only. Qualified advisor review required."
}

manifest_path = RUN_DIR / "run_manifest.json"
with open(manifest_path, "w") as f:
    json.dump(MANIFEST, f, indent=2)

# Initialize immutable log with genesis entry
prompts_log_path = LOGS_DIR / "prompts_log.jsonl"
genesis_entry = {
    "step_id": "genesis",
    "timestamp": datetime.now().isoformat(),
    "agent_name": "Logger",
    "prompt_hash": "0" * 64,
    "response_hash": "0" * 64,
    "prev_hash": "0" * 64,
    "redacted_prompt": "GENESIS BLOCK",
    "redacted_response": "Log initialized"
}
with open(prompts_log_path, "w") as f:
    f.write(json.dumps(genesis_entry) + "\n")

# Initialize risk log
risk_log_path = LOGS_DIR / "risk_log.json"
with open(risk_log_path, "w") as f:
    json.dump({"risks": []}, f, indent=2)

print(f"‚úì Manifest created: {manifest_path}")
print(f"‚úì Immutable log initialized: {prompts_log_path}")
print(f"‚úì Risk log initialized: {risk_log_path}")
print(f"\nConfig Hash: {CONFIG_HASH}")

‚úì Manifest created: /content/run_20260115_155535_110b2989/run_manifest.json
‚úì Immutable log initialized: /content/run_20260115_155535_110b2989/logs/prompts_log.jsonl
‚úì Risk log initialized: /content/run_20260115_155535_110b2989/logs/risk_log.json

Config Hash: 65e627a75fe8e195


##5.PRIVACY PROTECTION

###5.1.OVERVIEW



Cell 5 creates privacy protection tools that automatically remove sensitive personal information from any data that gets logged or stored. This is like having a smart redaction system that protects client confidentiality while still maintaining useful records for supervision.

The cell defines a utility class called ConfidentialityUtils with two main functions. The first function, redact_prompt, scans through any text looking for patterns that might be personally identifiable information. It searches for Social Security Numbers in various formats, email addresses, phone numbers, and account numbers. When it finds these patterns, it replaces them with placeholder text like [SSN-REDACTED] or [EMAIL-REDACTED].

Think of this function as an automatic highlighter that blacks out sensitive information before the text gets written to any log file. This ensures that even if someone gains access to the audit logs, they won't see actual client personal data. The patterns used are regular expressions - essentially search formulas that can identify things like "three digits, dash, two digits, dash, four digits" which matches SSN format.

The second function, sanitize_case_data, takes any client scenario information and adds a clear warning label stating the data should be synthetic only. This serves as a constant reminder that real client information should never be pasted into the notebook.

When you run this cell, you'll see a test demonstration showing how the redaction works. The test creates a sample text containing fake personal information - a Social Security Number, email address, phone number, and account number. The output shows the original text, then shows the same text after redaction, with all sensitive patterns replaced by generic placeholders. This visual proof helps you understand that the protection system is working correctly.

These confidentiality utilities will be called automatically throughout the notebook whenever data needs to be logged. This means you don't have to remember to redact things manually - the system does it for you, reducing the risk of accidentally logging sensitive information. This is a critical governance control for using AI in financial advisory workflows where client privacy is paramount.

###5.2.CODE AND IMPLEMENTATION

In [23]:
# Cell 5
# Goal: Implement confidentiality utilities for PII redaction
# Output: Test redaction examples showing utility functions work correctly

import re

class ConfidentialityUtils:
    """Minimum-necessary redaction utilities"""

    @staticmethod
    def redact_prompt(text: str) -> str:
        """Redact PII patterns from prompts for logging"""
        # SSN patterns
        text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN-REDACTED]', text)
        text = re.sub(r'\b\d{9}\b', '[SSN-REDACTED]', text)

        # Email
        text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL-REDACTED]', text)

        # Phone
        text = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE-REDACTED]', text)

        # Account numbers (8+ digits)
        text = re.sub(r'\b\d{8,}\b', '[ACCOUNT-REDACTED]', text)

        # Dollar amounts (keep for context but flag)
        # Not redacted but marked for review

        return text

    @staticmethod
    def sanitize_case_data(case_dict: Dict) -> Dict:
        """Ensure case data is synthetic/sanitized"""
        sanitized = case_dict.copy()
        sanitized['_sanitization_note'] = "Synthetic data only. Do not use real client PII."
        return sanitized

# Test redaction
test_text = "Client SSN is 123-45-6789, email john.doe@example.com, phone 555-123-4567, account 12345678"
redacted = ConfidentialityUtils.redact_prompt(test_text)

print("‚úì Confidentiality utilities loaded")
print(f"\nRedaction test:")
print(f"Original: {test_text}")
print(f"Redacted: {redacted}")

‚úì Confidentiality utilities loaded

Redaction test:
Original: Client SSN is 123-45-6789, email john.doe@example.com, phone 555-123-4567, account 12345678
Redacted: Client SSN is [SSN-REDACTED], email [EMAIL-REDACTED], phone [PHONE-REDACTED], account [ACCOUNT-REDACTED]


##6.LLM WRAPPER

###6.1.OVERVIEW



Cell 6 builds the intelligent wrapper that manages all communication with the Claude AI model. Think of this as creating a quality control inspector that stands between your workflow and the AI, ensuring every response meets strict standards before being accepted.

The wrapper enforces a specific structure for all AI responses. Every agent must return information in exactly the same JSON format with ten required fields: task description, facts provided, assumptions made, alternatives considered, open questions, analysis notes, risks identified, draft output, verification status, and items needing verification. This standardization ensures consistency across all workflow steps and makes supervision much easier.

When any agent calls the AI, this wrapper does several important things automatically. First, it enhances the instructions sent to the AI, explicitly requiring JSON-only responses with strict length limits to prevent truncation problems. The wrapper tells the AI to keep lists short (five to eight items maximum) and text fields brief (800 characters for drafts, 400 for analysis). These limits ensure responses complete without being cut off mid-sentence.

The wrapper includes sophisticated error handling with retry logic. If the AI returns malformed JSON or the response gets truncated, the wrapper automatically tries again up to two times. It also strips out any markdown formatting that might have accidentally been included, cleaning the response before attempting to parse it as JSON.

After successfully parsing the JSON, the wrapper validates that all required fields are present and runs security checks. It scans the text looking for dangerous patterns like implied investment recommendations, unauthorized suitability determinations, or invented regulatory citations. When these patterns are detected, the wrapper automatically logs them to the risk register so supervisors can review them later.

The output confirms the wrapper is ready with multiple protection layers active. You'll see messages indicating the token limit, retry logic, length enforcement, truncation detection, and risk pattern detection are all operational. This comprehensive quality control system ensures the agentic workflow maintains high standards throughout execution while protecting against common AI output problems.

###6.2.CODE AND IMPLEMENTATION

In [24]:
# Cell 6
# Goal: Implement strict JSON LLM wrapper with validation and risk detection
# Output: LLM wrapper class ready for agent calls

class StrictJSONLLMWrapper:
    """Enforces structured output format with risk detection"""

    REQUIRED_KEYS = [
        "task", "facts_provided", "assumptions", "alternatives",
        "open_questions", "analysis", "risks", "draft_output",
        "verification_status", "questions_to_verify"
    ]

    RISK_TYPES = [
        "confidentiality", "hallucination", "missing_facts", "suitability",
        "regbi", "conflicts", "liquidity", "prompt_injection", "overreach", "other"
    ]

    JSON_SCHEMA = """{
  "task": "brief description",
  "facts_provided": ["fact1", "fact2"],
  "assumptions": ["assumption1", "assumption2"],
  "alternatives": ["alternative1"],
  "open_questions": ["question1", "question2"],
  "analysis": "brief workflow notes",
  "risks": [{"type": "risk_type", "severity": "low|medium|high", "note": "brief note"}],
  "draft_output": "MUST start with disclaimer, then brief content",
  "verification_status": "Not verified",
  "questions_to_verify": ["item1"]
}"""

    def __init__(self, client: anthropic.Anthropic, logger):
        self.client = client
        self.logger = logger

    def call(self, agent_name: str, system_prompt: str, user_prompt: str, step_id: str) -> Dict:
        """Make LLM call with strict JSON validation"""

        # Enhance system prompt with explicit JSON requirements
        enhanced_system = f"""{system_prompt}

CRITICAL JSON REQUIREMENTS:
1. Respond with ONLY valid JSON - no preamble, no explanation
2. No markdown (no ```json or ```)
3. Keep responses CONCISE - token limit is strict
4. For lists: max 5-8 items each to avoid truncation
5. For draft_output: max 800 characters
6. For analysis: max 400 characters
7. Use this structure:
{self.JSON_SCHEMA}

REQUIRED disclaimer start for draft_output:
"NOT INVESTMENT, TAX, OR LEGAL ADVICE. Draft planning and communication assistance only. Qualified advisor review required."
"""

        # Redact prompts for logging
        redacted_system = ConfidentialityUtils.redact_prompt(system_prompt[:500])
        redacted_user = ConfidentialityUtils.redact_prompt(user_prompt[:500])

        max_retries = 2
        for attempt in range(max_retries):
            try:
                response = self.client.messages.create(
                    model="claude-sonnet-4-5-20250929",
                    max_tokens=4096,  # INCREASED from 2500
                    temperature=0.2,
                    system=enhanced_system,
                    messages=[{"role": "user", "content": user_prompt}]
                )

                response_text = response.content[0].text.strip()

                # Remove markdown formatting
                if response_text.startswith("```json"):
                    response_text = response_text.replace("```json", "").replace("```", "").strip()
                elif response_text.startswith("```"):
                    response_text = response_text.replace("```", "").strip()

                # Check for truncation indicators
                if response_text.endswith('"') and response_text.count('{') == response_text.count('}'):
                    # Likely complete
                    pass
                elif not response_text.endswith('}'):
                    print(f"‚ö†Ô∏è Response may be truncated (attempt {attempt + 1}/{max_retries})")
                    if attempt < max_retries - 1:
                        continue  # Retry

                # Parse JSON
                try:
                    parsed = json.loads(response_text)
                except json.JSONDecodeError as e:
                    print(f"\n‚ö†Ô∏è JSON PARSE ERROR in {agent_name} (attempt {attempt + 1}/{max_retries})")
                    print(f"Error: {str(e)}")
                    print(f"Response length: {len(response_text)} chars")
                    print(f"Response preview: {response_text[:500]}...")
                    print(f"Response end: ...{response_text[-200:]}")

                    if attempt < max_retries - 1:
                        print("  Retrying with more concise prompt...")
                        continue

                    self.logger.log_risk({
                        "type": "non_json_response",
                        "severity": "high",
                        "note": f"Agent {agent_name} returned non-JSON after {max_retries} attempts: {str(e)}",
                        "step_id": step_id
                    })
                    raise ValueError(f"Non-JSON response from {agent_name}: {str(e)}")

                # Validate structure
                missing_keys = [k for k in self.REQUIRED_KEYS if k not in parsed]
                if missing_keys:
                    self.logger.log_risk({
                        "type": "workflow_integrity_gap",
                        "severity": "high",
                        "note": f"Missing required keys: {missing_keys}",
                        "step_id": step_id
                    })
                    raise ValueError(f"Missing keys: {missing_keys}")

                # Ensure arrays are not empty
                for key in ["facts_provided", "assumptions", "alternatives", "open_questions", "questions_to_verify"]:
                    if not parsed.get(key):
                        parsed[key] = ["None identified"]

                if not parsed.get("risks"):
                    parsed["risks"] = []

                # Detect risk patterns
                self._detect_risks(parsed, agent_name, step_id)

                # Log to immutable chain
                self.logger.log_prompt_response(
                    step_id=step_id,
                    agent_name=agent_name,
                    redacted_prompt=f"{redacted_system[:200]}...{redacted_user[:200]}",
                    redacted_response=ConfidentialityUtils.redact_prompt(response_text[:500])
                )

                return parsed

            except anthropic.APIError as e:
                print(f"\n‚ö†Ô∏è API ERROR in {agent_name}: {str(e)}")
                if attempt < max_retries - 1:
                    print("  Retrying...")
                    continue
                self.logger.log_risk({
                    "type": "other",
                    "severity": "high",
                    "note": f"API call failed after {max_retries} attempts: {str(e)}",
                    "step_id": step_id
                })
                raise

        raise ValueError(f"Failed to get valid JSON from {agent_name} after {max_retries} attempts")

    def _detect_risks(self, parsed: Dict, agent_name: str, step_id: str):
        """Detect risk patterns in response"""
        draft = parsed.get("draft_output", "").lower()
        analysis = parsed.get("analysis", "").lower()

        # Check for invented authority
        authority_patterns = [
            r'sec rule \d+', r'finra rule \d+', r'irc section \d+',
            r'erisa section \d+', r'according to sec', r'finra requires'
        ]
        for pattern in authority_patterns:
            if re.search(pattern, draft) or re.search(pattern, analysis):
                if parsed.get("verification_status") != "Not verified":
                    self.logger.log_risk({
                        "type": "invented_authority_detected",
                        "severity": "high",
                        "note": f"Authority pattern without 'Not verified' status in {agent_name}",
                        "step_id": step_id
                    })

        # Check for implied recommendations
        recommendation_patterns = [
            r'should invest', r'recommended allocation', r'buy\s+\w+\s+stock',
            r'sell\s+\w+', r'we recommend', r'best choice is'
        ]
        for pattern in recommendation_patterns:
            if re.search(pattern, draft):
                self.logger.log_risk({
                    "type": "implied_recommendation_detected",
                    "severity": "high",
                    "note": f"Recommendation language detected in {agent_name}",
                    "step_id": step_id
                })

        # Check for suitability determination language
        suitability_patterns = [r'is suitable', r'meets suitability', r'suitable for client']
        for pattern in suitability_patterns:
            if re.search(pattern, draft):
                self.logger.log_risk({
                    "type": "implied_suitability_determination_detected",
                    "severity": "high",
                    "note": f"Suitability determination language in {agent_name}",
                    "step_id": step_id
                })

print("‚úì Strict JSON LLM wrapper loaded")
print("‚úì Max tokens: 4096 with retry logic")
print("‚úì Concise response enforcement (lists max 5-8 items)")
print("‚úì Truncation detection and retry")
print("‚úì Risk detection patterns active")

‚úì Strict JSON LLM wrapper loaded
‚úì Max tokens: 4096 with retry logic
‚úì Concise response enforcement (lists max 5-8 items)
‚úì Truncation detection and retry
‚úì Risk detection patterns active


##7.MINI CASE BUILDERS

###7.1.OVERVIEW



Cell 7 creates all the specialized agents and infrastructure needed to run the multi-step advisory workflow. Think of this as assembling a team of focused assistants, each with a specific job, along with the shared systems they'll use to coordinate their work.

The cell first builds the Logger class, which maintains the tamper-evident audit trail. Every time an agent interacts with the AI, the logger records it with cryptographic hashing. Each new log entry includes a hash of the previous entry, creating a mathematical chain. If anyone tried to alter a past entry, the chain would break, immediately revealing the tampering. This is the same principle used in blockchain technology.

Next comes the SharedState class, which acts as the workflow's memory and coordination center. It maintains four critical registers: assumptions made during analysis, open items requiring follow-up, claims needing external verification, and a history of all checkpoint approvals. The state also includes the checkpoint mechanism, which can pause the workflow requiring human review before proceeding. Additionally, it has logic to check for unresolved hinge facts - critical assumptions that must be validated before downstream work continues.

The cell then creates six specialized agent classes. The IntakeAgent structures raw client scenarios into organized facts, assumptions, and questions. The IPSDraftAgent generates Investment Policy Statement shells focusing on process and governance without specifying investments. The DisclosureAgent creates checklists of topics advisors should disclose to clients. The SuitabilityReasoningAgent drafts structured questions advisors must answer when evaluating strategies, explicitly avoiding any suitability conclusions. The QCReviewerAgent examines all previous work looking for gaps, inconsistencies, or missing information. Finally, the RiskAssessorAgent evaluates the overall workflow for integrity problems.

Each agent is programmed with explicit instructions to keep responses concise and structured. The prompts specify maximum item counts and character limits to prevent the truncation problems encountered earlier. For example, agents are told to list only five to eight key facts, keep analysis to 300-400 characters, and limit draft outputs to 800 characters.

The confirmation message lists all loaded components, emphasizing that every agent enforces strict JSON-only responses with concise formatting to ensure reliability.

###7.2.CODE AND IMPLEMENTATION

In [25]:
# Cell 7
# Goal: Implement agent classes with CONCISE prompts to avoid truncation
# Output: All agent classes ready

class Logger:
    """Immutable audit trail with hash chaining"""

    def __init__(self, logs_dir: Path):
        self.logs_dir = logs_dir
        self.prompts_log_path = logs_dir / "prompts_log.jsonl"
        self.risk_log_path = logs_dir / "risk_log.json"
        self.last_hash = "0" * 64

        if self.prompts_log_path.exists():
            with open(self.prompts_log_path, "r") as f:
                lines = f.readlines()
                if lines:
                    last_entry = json.loads(lines[-1])
                    self.last_hash = last_entry.get("response_hash", "0" * 64)

    def log_prompt_response(self, step_id: str, agent_name: str,
                           redacted_prompt: str, redacted_response: str):
        """Append to immutable log with hash chaining"""
        prompt_hash = hashlib.sha256(redacted_prompt.encode()).hexdigest()
        response_hash = hashlib.sha256(redacted_response.encode()).hexdigest()

        entry = {
            "step_id": step_id,
            "timestamp": datetime.now().isoformat(),
            "agent_name": agent_name,
            "prompt_hash": prompt_hash,
            "response_hash": response_hash,
            "prev_hash": self.last_hash,
            "redacted_prompt": redacted_prompt,
            "redacted_response": redacted_response
        }

        with open(self.prompts_log_path, "a") as f:
            f.write(json.dumps(entry) + "\n")

        self.last_hash = response_hash

    def log_risk(self, risk_entry: Dict):
        """Append to risk log"""
        risk_entry["timestamp"] = datetime.now().isoformat()

        with open(self.risk_log_path, "r") as f:
            risk_log = json.load(f)

        risk_log["risks"].append(risk_entry)

        with open(self.risk_log_path, "w") as f:
            json.dump(risk_log, f, indent=2)


class SharedState:
    """Workflow state with registers and checkpoint tracking"""

    def __init__(self, run_id: str):
        self.run_id = run_id
        self.cases = {}
        self.assumption_register = {}
        self.open_items_register = {}
        self.not_verified_register = {}
        self.checkpoint_history = []

    def add_case(self, case_id: str, case_data: Dict):
        self.cases[case_id] = {
            "data": ConfidentialityUtils.sanitize_case_data(case_data),
            "artifacts": {},
            "status": "initialized"
        }

    def add_assumption(self, case_id: str, assumption: str, is_hinge_fact: bool = False):
        if case_id not in self.assumption_register:
            self.assumption_register[case_id] = []
        self.assumption_register[case_id].append({
            "assumption": assumption,
            "is_hinge_fact": is_hinge_fact,
            "resolved": False
        })

    def add_open_item(self, case_id: str, item: str):
        if case_id not in self.open_items_register:
            self.open_items_register[case_id] = []
        self.open_items_register[case_id].append(item)

    def add_not_verified(self, case_id: str, item: str):
        if case_id not in self.not_verified_register:
            self.not_verified_register[case_id] = []
        self.not_verified_register[case_id].append(item)

    def checkpoint(self, checkpoint_name: str, case_id: str, auto_approve: bool = False) -> bool:
        """Human approval gate"""
        checkpoint_entry = {
            "checkpoint_name": checkpoint_name,
            "case_id": case_id,
            "timestamp": datetime.now().isoformat(),
            "auto_approve": auto_approve,
            "approved": auto_approve
        }
        self.checkpoint_history.append(checkpoint_entry)

        if not auto_approve:
            print(f"\nüõë CHECKPOINT: {checkpoint_name} (Case: {case_id})")

        return checkpoint_entry["approved"]

    def check_hinge_facts(self, case_id: str, logger: Logger) -> bool:
        """Block if hinge facts unresolved"""
        assumptions = self.assumption_register.get(case_id, [])
        unresolved_hinge = [a for a in assumptions if a["is_hinge_fact"] and not a["resolved"]]

        if unresolved_hinge:
            logger.log_risk({
                "type": "hinge_fact_unresolved_block",
                "severity": "high",
                "note": f"Unresolved hinge facts: {len(unresolved_hinge)} items",
                "case_id": case_id
            })
            return False
        return True


class IntakeAgent:
    """Structures client scenarios"""

    def __init__(self, llm: StrictJSONLLMWrapper):
        self.llm = llm

    def process(self, case_id: str, raw_scenario: str) -> Dict:
        system = """Intake agent: extract facts, identify assumptions, flag unknowns.
Return valid JSON only. Keep lists to 5-8 items max."""

        user = f"""Extract key info from this scenario. Be CONCISE.

{raw_scenario}

Return JSON with:
- task: Brief description
- facts_provided: 5-8 KEY facts only (age, assets, goals)
- assumptions: 3-5 key assumptions
- alternatives: 2-3 alternatives
- open_questions: 3-5 critical questions
- analysis: 2-3 sentences on intake (max 400 chars)
- risks: 1-3 risks
- draft_output: Brief summary starting with: "NOT INVESTMENT, TAX, OR LEGAL ADVICE. Draft planning and communication assistance only. Qualified advisor review required." Then 3-4 sentences. Max 800 chars total.
- verification_status: "Not verified"
- questions_to_verify: 2-4 items

JSON only. No other text."""

        return self.llm.call("IntakeAgent", system, user, f"{case_id}_intake")


class IPSDraftAgent:
    """Generates IPS shells"""

    def __init__(self, llm: StrictJSONLLMWrapper):
        self.llm = llm

    def process(self, case_id: str, intake_result: Dict) -> Dict:
        system = """IPS shell drafter. NO allocations/targets. Return valid JSON only."""

        facts_str = "; ".join(intake_result.get("facts_provided", [])[:5])

        user = f"""Draft IPS shell. Be CONCISE.

Key facts: {facts_str}

Return JSON with:
- task: "Draft IPS shell"
- facts_provided: 3-5 key facts
- assumptions: 2-3 assumptions
- alternatives: 2-3 IPS approaches
- open_questions: 2-4 questions
- analysis: 2 sentences (max 300 chars)
- risks: 1-2 risks
- draft_output: Start with disclaimer, then IPS sections (Purpose, Roles, Process, Review). Max 800 chars.
- verification_status: "Not verified"
- questions_to_verify: 2-3 items

JSON only."""

        return self.llm.call("IPSDraftAgent", system, user, f"{case_id}_ips")


class DisclosureAgent:
    """Builds disclosure checklists"""

    def __init__(self, llm: StrictJSONLLMWrapper):
        self.llm = llm

    def process(self, case_id: str, intake_result: Dict) -> Dict:
        system = """Disclosure checklist creator. Return valid JSON only."""

        facts_str = "; ".join(intake_result.get("facts_provided", [])[:5])

        user = f"""Create disclosure checklist. Be CONCISE.

Facts: {facts_str}

Return JSON with:
- task: "Disclosure checklist"
- facts_provided: 3-4 key facts
- assumptions: 2-3 assumptions
- alternatives: 2 alternative approaches
- open_questions: 2-3 questions
- analysis: 2 sentences (max 300 chars)
- risks: 1-3 disclosure risks
- draft_output: Start with disclaimer, then checklist (compensation, conflicts, risks, limitations). Max 800 chars.
- verification_status: "Not verified"
- questions_to_verify: 2-3 items

JSON only."""

        return self.llm.call("DisclosureAgent", system, user, f"{case_id}_disclosure")


class SuitabilityReasoningAgent:
    """Drafts suitability scaffolds"""

    def __init__(self, llm: StrictJSONLLMWrapper):
        self.llm = llm

    def process(self, case_id: str, intake_result: Dict) -> Dict:
        system = """Suitability reasoning scaffold. NEVER conclude suitable/unsuitable. JSON only."""

        facts_str = "; ".join(intake_result.get("facts_provided", [])[:5])

        user = f"""Draft suitability scaffold. Be CONCISE.

Facts: {facts_str}

Return JSON with:
- task: "Suitability reasoning scaffold"
- facts_provided: 3-4 key profile facts
- assumptions: 2-3 assumptions
- alternatives: 2-3 strategies to compare
- open_questions: 3-5 suitability questions
- analysis: 2 sentences (max 300 chars)
- risks: 1-3 suitability risks
- draft_output: Start with disclaimer, then questions advisor must answer (client info needed? alternatives? risks? conflicts? docs?). Max 800 chars.
- verification_status: "Not verified"
- questions_to_verify: 2-3 Reg BI items

JSON only."""

        return self.llm.call("SuitabilityReasoningAgent", system, user, f"{case_id}_suitability")


class QCReviewerAgent:
    """Flags workflow gaps"""

    def __init__(self, llm: StrictJSONLLMWrapper):
        self.llm = llm

    def process(self, case_id: str, all_artifacts: Dict) -> Dict:
        system = """QC reviewer. Flag gaps/inconsistencies. JSON only."""

        artifact_count = len(all_artifacts)

        user = f"""QC review {artifact_count} artifacts. Be CONCISE.

Return JSON with:
- task: "QC review"
- facts_provided: 2-3 key findings
- assumptions: 2 QC assumptions
- alternatives: 1-2 QC approaches
- open_questions: 2-3 follow-up items
- analysis: 2 sentences (max 300 chars)
- risks: 2-4 gaps/inconsistencies
- draft_output: Start with disclaimer, then QC notes (missing info, inconsistencies, follow-up needed). Max 800 chars.
- verification_status: "Not verified"
- questions_to_verify: 2-3 items

JSON only."""

        return self.llm.call("QCReviewerAgent", system, user, f"{case_id}_qc")


class RiskAssessorAgent:
    """Evaluates workflow risks"""

    def __init__(self, llm: StrictJSONLLMWrapper):
        self.llm = llm

    def process(self, case_id: str, state: SharedState) -> Dict:
        system = """Workflow risk assessor. JSON only."""

        assumptions_count = len(state.assumption_register.get(case_id, []))
        open_items_count = len(state.open_items_register.get(case_id, []))

        user = f"""Assess workflow risks. Be CONCISE.

Assumptions: {assumptions_count}, Open items: {open_items_count}

Return JSON with:
- task: "Workflow risk assessment"
- facts_provided: 2-3 workflow state facts
- assumptions: 2 assumptions
- alternatives: 1-2 mitigation approaches
- open_questions: 2-3 escalation items
- analysis: 2 sentences (max 300 chars)
- risks: 2-4 workflow integrity risks
- draft_output: Start with disclaimer, then risk summary. Max 800 chars.
- verification_status: "Not verified"
- questions_to_verify: 2-3 items

JSON only."""

        return self.llm.call("RiskAssessorAgent", system, user, f"{case_id}_risk_assessment")


print("‚úì Agent classes loaded with CONCISE prompts")
print("‚úì All responses limited to prevent truncation")
print("‚úì Lists: 2-8 items max")
print("‚úì draft_output: 800 chars max")
print("‚úì analysis: 300-400 chars max")

‚úì Agent classes loaded with CONCISE prompts
‚úì All responses limited to prevent truncation
‚úì Lists: 2-8 items max
‚úì draft_output: 800 chars max
‚úì analysis: 300-400 chars max


##8.THE ORCHESTRATOR AGENT

###8.1.OVERVIEW


Cell 8 builds the OrchestratorAgent, which acts as the conductor coordinating all other agents through a structured multi-step workflow. Think of this as the project manager who ensures work happens in the right sequence, with proper quality checks at each stage.

The orchestrator maintains references to all the specialized agents created in Cell 7, along with access to the shared state and logging systems. When asked to execute a workflow for a specific case, it follows a rigorous six-step process with built-in checkpoints.

The workflow begins with Step 1 where the IntakeAgent analyzes the raw client scenario. The orchestrator captures the intake results, then systematically registers all assumptions, open questions, and items needing verification into the shared state. It also identifies which assumptions are "hinge facts" - critical pieces of information that must be validated before proceeding. After intake completes, Checkpoint 1 requires review before continuing. The orchestrator then verifies all hinge facts are resolved; if not, it blocks further progress and stops the workflow.

Assuming clearance, Step 2 generates an IPS shell using the IPSDraftAgent, Step 3 creates disclosure checklists with the DisclosureAgent, and Step 4 builds suitability reasoning scaffolds with the SuitabilityReasoningAgent. After these drafting steps, Checkpoint 2 pauses for review of all draft materials.

Step 5 applies the QCReviewerAgent to examine all artifacts for gaps and inconsistencies. Step 6 uses the RiskAssessorAgent to evaluate overall workflow integrity. After risk assessment, a Final Checkpoint requires approval before the package is considered ready for delivery.

Throughout execution, the orchestrator prints status updates showing progress through each step. It reports how many facts, assumptions, and questions were identified, and how many risks were flagged. When checkpoints are reached, clear messages indicate human review is required.

The finalize function saves all artifacts, registers, and checkpoint history to organized folders within the deliverables directory. Each case gets its own subfolder containing three JSON files: one with all agent outputs, one with the three registers, and one with checkpoint history.

The confirmation shows the orchestrator is ready with the complete six-step workflow sequence and three checkpoint gates operational.

###8.2.CODE AND IMPLEMENTATION

In [26]:
# Cell 8
# Goal: Implement orchestrator state machine for multi-step workflows
# Output: Orchestrator ready to execute case workflows

class OrchestratorAgent:
    """Multi-step workflow state machine with checkpoint enforcement"""

    def __init__(self, state: SharedState, logger: Logger, llm: StrictJSONLLMWrapper):
        self.state = state
        self.logger = logger
        self.llm = llm

        # Initialize agent instances
        self.intake_agent = IntakeAgent(llm)
        self.ips_agent = IPSDraftAgent(llm)
        self.disclosure_agent = DisclosureAgent(llm)
        self.suitability_agent = SuitabilityReasoningAgent(llm)
        self.qc_agent = QCReviewerAgent(llm)
        self.risk_agent = RiskAssessorAgent(llm)

    def execute_workflow(self, case_id: str, scenario: str) -> Dict:
        """Execute full agentic workflow with checkpoints"""

        print(f"\n{'='*60}")
        print(f"CASE: {case_id}")
        print(f"{'='*60}")

        # Initialize case
        self.state.add_case(case_id, {"scenario": scenario})

        # STEP 1: Intake
        print(f"\nüîÑ Step 1: Intake & Structuring")
        intake_result = self.intake_agent.process(case_id, scenario)
        self.state.cases[case_id]["artifacts"]["intake"] = intake_result

        # Register assumptions and open items
        for assumption in intake_result.get("assumptions", []):
            is_hinge = "critical" in assumption.lower() or "must" in assumption.lower()
            self.state.add_assumption(case_id, assumption, is_hinge)

        for item in intake_result.get("open_questions", []):
            self.state.add_open_item(case_id, item)

        for item in intake_result.get("questions_to_verify", []):
            self.state.add_not_verified(case_id, item)

        print(f"‚úì Intake complete")
        print(f"  Facts: {len(intake_result.get('facts_provided', []))}")
        print(f"  Assumptions: {len(intake_result.get('assumptions', []))}")
        print(f"  Open questions: {len(intake_result.get('open_questions', []))}")

        # CHECKPOINT 1: Review intake
        if not self.state.checkpoint(f"Intake Review - {case_id}", case_id, auto_approve=True):
            print("‚ùå Checkpoint not approved. Workflow stopped.")
            return self._finalize_case(case_id, "stopped_at_checkpoint_1")

        # Check hinge facts
        if not self.state.check_hinge_facts(case_id, self.logger):
            print("‚ö†Ô∏è Hinge facts unresolved. Blocking downstream drafting.")
            return self._finalize_case(case_id, "blocked_hinge_facts")

        # STEP 2: IPS Draft
        print(f"\nüîÑ Step 2: IPS Shell Drafting")
        ips_result = self.ips_agent.process(case_id, intake_result)
        self.state.cases[case_id]["artifacts"]["ips"] = ips_result
        print(f"‚úì IPS shell drafted")

        # STEP 3: Disclosure Checklist
        print(f"\nüîÑ Step 3: Disclosure Checklist")
        disclosure_result = self.disclosure_agent.process(case_id, intake_result)
        self.state.cases[case_id]["artifacts"]["disclosure"] = disclosure_result
        print(f"‚úì Disclosure checklist created")

        # STEP 4: Suitability Reasoning
        print(f"\nüîÑ Step 4: Suitability Reasoning Scaffold")
        suitability_result = self.suitability_agent.process(case_id, intake_result)
        self.state.cases[case_id]["artifacts"]["suitability"] = suitability_result
        print(f"‚úì Reasoning scaffold drafted")

        # CHECKPOINT 2: Review drafts
        if not self.state.checkpoint(f"Draft Review - {case_id}", case_id, auto_approve=True):
            print("‚ùå Checkpoint not approved. Workflow stopped.")
            return self._finalize_case(case_id, "stopped_at_checkpoint_2")

        # STEP 5: QC Review
        print(f"\nüîÑ Step 5: QC Review")
        qc_result = self.qc_agent.process(case_id, self.state.cases[case_id]["artifacts"])
        self.state.cases[case_id]["artifacts"]["qc"] = qc_result
        print(f"‚úì QC review complete")
        print(f"  Risks flagged: {len(qc_result.get('risks', []))}")

        # STEP 6: Risk Assessment
        print(f"\nüîÑ Step 6: Workflow Risk Assessment")
        risk_result = self.risk_agent.process(case_id, self.state)
        self.state.cases[case_id]["artifacts"]["risk_assessment"] = risk_result
        print(f"‚úì Risk assessment complete")

        # FINAL CHECKPOINT: Approve for delivery
        if not self.state.checkpoint(f"Final Approval - {case_id}", case_id, auto_approve=True):
            print("‚ùå Checkpoint not approved. Deliverables not finalized.")
            return self._finalize_case(case_id, "stopped_at_final_checkpoint")

        # Finalize
        return self._finalize_case(case_id, "completed")

    def _finalize_case(self, case_id: str, status: str) -> Dict:
        """Finalize case artifacts and save to deliverables folder"""
        self.state.cases[case_id]["status"] = status

        case_deliverables_dir = DELIVERABLES_DIR / case_id
        case_deliverables_dir.mkdir(exist_ok=True)

        # Save artifacts
        artifacts_path = case_deliverables_dir / "artifacts.json"
        with open(artifacts_path, "w") as f:
            json.dump(self.state.cases[case_id]["artifacts"], f, indent=2)

        # Save registers
        registers_path = case_deliverables_dir / "registers.json"
        registers = {
            "assumptions": self.state.assumption_register.get(case_id, []),
            "open_items": self.state.open_items_register.get(case_id, []),
            "not_verified": self.state.not_verified_register.get(case_id, [])
        }
        with open(registers_path, "w") as f:
            json.dump(registers, f, indent=2)

        # Save checkpoint history
        checkpoints_path = case_deliverables_dir / "checkpoints.json"
        case_checkpoints = [c for c in self.state.checkpoint_history if c["case_id"] == case_id]
        with open(checkpoints_path, "w") as f:
            json.dump({"checkpoints": case_checkpoints}, f, indent=2)

        print(f"\n‚úÖ Case finalized: {status}")
        print(f"üìÅ Deliverables saved: {case_deliverables_dir}")

        return {
            "case_id": case_id,
            "status": status,
            "artifacts_path": str(artifacts_path),
            "registers_path": str(registers_path),
            "checkpoints_path": str(checkpoints_path)
        }

print("‚úì Orchestrator state machine loaded")
print("‚úì Workflow steps: Intake ‚Üí IPS ‚Üí Disclosure ‚Üí Suitability ‚Üí QC ‚Üí Risk")
print("‚úì Checkpoints: Post-intake, Post-drafts, Final approval")

‚úì Orchestrator state machine loaded
‚úì Workflow steps: Intake ‚Üí IPS ‚Üí Disclosure ‚Üí Suitability ‚Üí QC ‚Üí Risk
‚úì Checkpoints: Post-intake, Post-drafts, Final approval


##9.EXECUTION

###9.1.OVERVIEW



Cell 9 executes the complete agentic workflow for four different financial advisory scenarios, demonstrating the system in action. This is where you'll see all the infrastructure from previous cells working together to process real-world advisory situations.

The cell first initializes all the components: the logger for audit trails, the shared state for workflow coordination, the LLM wrapper for AI communication, and the orchestrator to manage the process. Then it defines four synthetic client scenarios representing common advisory situations.

Case 1 involves retirement distribution planning for a 64-year-old with 1.5 million dollars in assets, exploring questions about distribution strategies, tax efficiency, Social Security timing, and sequence-of-returns risk management. Case 2 addresses concentrated stock position diversification for a 45-year-old tech executive with 2 million dollars in employer stock, examining tax-efficient diversification approaches. Case 3 explores alternative investments for a high-net-worth couple interested in private equity and private credit, focusing on liquidity considerations. Case 4 presents a practice management training scenario where a junior advisor conducts their first client review, including how to handle out-of-scope requests.

For each case, the orchestrator executes the full six-step workflow with checkpoints. You'll see detailed console output showing progress through each stage: intake structuring, IPS drafting, disclosure checklist creation, suitability reasoning, QC review, and risk assessment. The output includes statistics like how many facts were extracted, assumptions identified, and risks flagged.

As the workflow runs, you might see warnings if any issues arise, such as JSON parsing problems or detected risk patterns. The system automatically logs these to the risk register. If a case encounters an unrecoverable error, it's marked as failed and execution continues with the next case.

At completion, a summary table shows the final status of all four cases - which completed successfully, which failed, and why. This execution phase typically takes several minutes as each case involves multiple AI calls and quality checks. The result is a complete set of governance artifacts for each scenario, ready for advisor review and inclusion in supervision files.

###9.2.CODE AND IMPLEMENTATION

In [27]:
# Cell 9
# Goal: Execute 4 mini-case demonstrations
# Output: Complete workflow execution for all 4 cases with deliverables

# Initialize infrastructure
logger = Logger(LOGS_DIR)
state = SharedState(RUN_ID)
llm_wrapper = StrictJSONLLMWrapper(client, logger)
orchestrator = OrchestratorAgent(state, logger, llm_wrapper)

# Define 4 mini-cases
CASES = {
    "case_01_retirement": """
Client Profile (SYNTHETIC DATA ONLY):
- Age 64, plans to retire at 65
- Current portfolio: $1.2M in 401(k), $300K in taxable account
- Desired income: $80K/year
- Pension: $24K/year starting at 65
- Social Security: Considering delaying to age 70
- Risk tolerance: Moderate, concerned about sequence-of-returns risk
- Health: Good, family history of longevity
- Goals: Maintain lifestyle, travel, leave legacy to grandchildren

Questions:
- What distribution strategy should be considered?
- How to structure accounts for tax efficiency?
- When to begin Social Security?
- How to manage sequence risk in early retirement?
""",

    "case_02_concentrated_stock": """
Client Profile (SYNTHETIC DATA ONLY):
- Age 45, tech executive
- $2M concentrated position in employer stock (60% of net worth)
- Unvested RSUs: $800K over next 3 years
- Other assets: $500K in diversified accounts
- Tax basis in stock: $300K (long-term gains)
- Goals: Diversify without triggering large tax hit, maintain upside exposure
- Risk tolerance: Aggressive, but concerned about concentration
- Time horizon: 20+ years to retirement

Questions:
- How to structure diversification strategy?
- Tax-efficient vehicles to consider?
- Hedging strategies if appropriate?
- How to balance diversification goals with tax impact?
""",

    "case_03_alternatives": """
Client Profile (SYNTHETIC DATA ONLY):
- High-net-worth couple, ages 52 and 50
- Liquid portfolio: $8M
- Interested in alternatives: private equity, private credit, real assets
- Current allocation: 70% equity, 25% fixed income, 5% cash
- Risk tolerance: Moderate-aggressive
- Liquidity needs: $150K/year for 10 years, then retirement income
- Goals: Diversification, inflation protection, higher returns

Questions:
- What proportion of alternatives is appropriate given liquidity needs?
- Lock-up periods and liquidity considerations?
- Due diligence requirements for alternative investments?
- How to structure and monitor illiquid positions?
- Fee structures and transparency concerns?
""",

    "case_04_practice_mgmt": """
Practice Management Training Scenario (SYNTHETIC DATA ONLY):

New advisor training case:
- Junior advisor conducting first full client review
- Client is a 58-year-old small business owner
- Portfolio review shows drift from IPS targets
- Client asking about crypto allocation (read online article)
- Client also mentions friend's concentrated stock success
- Compliance note: Firm does not currently custody crypto

Training objectives:
- How to structure client meeting agenda?
- How to document recommendations vs. client requests?
- What compliance checkpoints are needed?
- How to address rebalancing recommendations?
- How to handle out-of-scope requests (crypto)?
- What supervision is required before implementation?
"""
}

# Execute all cases
results = {}
for case_id, scenario in CASES.items():
    try:
        result = orchestrator.execute_workflow(case_id, scenario)
        results[case_id] = result
    except Exception as e:
        print(f"\n‚ùå Error in {case_id}: {str(e)}")
        logger.log_risk({
            "type": "other",
            "severity": "high",
            "note": f"Workflow execution failed for {case_id}: {str(e)}",
            "case_id": case_id
        })
        results[case_id] = {"status": "failed", "error": str(e)}

print(f"\n{'='*60}")
print(f"ALL CASES COMPLETE")
print(f"{'='*60}")
for case_id, result in results.items():
    status_icon = "‚úÖ" if result.get("status") == "completed" else "‚ö†Ô∏è"
    print(f"{status_icon} {case_id}: {result.get('status', 'unknown')}")


CASE: case_01_retirement

üîÑ Step 1: Intake & Structuring
‚úì Intake complete
  Facts: 7
  Assumptions: 4
  Open questions: 5

üîÑ Step 2: IPS Shell Drafting
‚úì IPS shell drafted

üîÑ Step 3: Disclosure Checklist
‚úì Disclosure checklist created

üîÑ Step 4: Suitability Reasoning Scaffold
‚úì Reasoning scaffold drafted

üîÑ Step 5: QC Review
‚úì QC review complete
  Risks flagged: 4

üîÑ Step 6: Workflow Risk Assessment
‚úì Risk assessment complete

‚úÖ Case finalized: completed
üìÅ Deliverables saved: /content/run_20260115_155535_110b2989/deliverables/case_01_retirement

CASE: case_02_concentrated_stock

üîÑ Step 1: Intake & Structuring
‚úì Intake complete
  Facts: 5
  Assumptions: 5
  Open questions: 5

üîÑ Step 2: IPS Shell Drafting
‚úì IPS shell drafted

üîÑ Step 3: Disclosure Checklist
‚úì Disclosure checklist created

üîÑ Step 4: Suitability Reasoning Scaffold
‚úì Reasoning scaffold drafted

üîÑ Step 5: QC Review
‚úì QC review complete
  Risks flagged: 4

üîÑ Step

##10.ARTIFACT BUNDLE

###10.1.OVERVIEW

Cell 10 Output Explanation:

Cell 10 packages everything generated during the workflow into a comprehensive, downloadable supervision file. Think of this as creating a complete case binder that a compliance officer or auditor could review to understand exactly what happened during the AI-assisted advisory process.

The cell first generates an extensive README document that serves as the cover memo and user guide for the entire package. This README is approximately 5000 words and includes detailed sections explaining what the package contains, what the AI agents did and did not do, governance principles applied, supervision checklists, usage instructions, and technical details about how the system works. The README emphasizes repeatedly that the outputs are not investment advice and require qualified advisor review.

Next, the cell creates a package summary JSON file that provides programmatic access to key information. This machine-readable summary includes the run identifier, timestamp, complete manifest, status of each case (completed or failed), file paths for all artifacts, and any error messages. This allows automated tools to process the package without parsing the human-readable README.

The cell then gathers statistics from the execution: total number of risks logged, how many were high-severity, total audit log entries, and case completion rates. These metrics provide quick insight into workflow quality and any problems that occurred.

All files are then bundled into a single ZIP archive. The compression process recursively walks through the entire run directory, adding every file while preserving the folder structure. The resulting ZIP contains the manifest, README, package summary, immutable logs, risk register, and separate subfolders for each case with their artifacts, registers, and checkpoint histories.

The output displays a comprehensive final summary showing execution statistics, package contents inventory, governance artifacts included, demonstrations completed, and reminders about the advisory review requirement. You'll see exact numbers for completed versus failed cases, total log entries, and risks detected.

Finally, the cell provides clear download instructions with two options: manually downloading from Colab's file browser, or running a code snippet to trigger automatic download. The closing messages emphasize next steps for advisors reviewing the package and reinforce the educational value of the governance-first approach demonstrated throughout the notebook.

###10.2.CODE AND IMPLEMENTATION

In [28]:
# Cell 10
# Goal: Create README and bundle all artifacts into downloadable zip
# Output: Zip file ready for download with complete supervision package

# Create comprehensive README
readme_content = f"""# Chapter 3 - Level 3 Agentic Advisory Workflow Run
## Run ID: {RUN_ID}
## Generated: {datetime.now().isoformat()}

---

### ‚ö†Ô∏è DISCLAIMER
NOT INVESTMENT, TAX, OR LEGAL ADVICE.
This package contains draft planning and communication assistance only.
Qualified advisor review and supervision required.

---

### GOVERNANCE SUMMARY

**Model:** claude-sonnet-4-5-20250929 (Anthropic)
**Config Hash:** {MANIFEST['config_hash']}
**Temperature:** 0.2
**Max Tokens:** 4096

---

### CONTENTS

üìÅ **Root Files:**
- `run_manifest.json` - Run metadata, model config, environment fingerprint
- `README.md` - This file

üìÅ **logs/**
- `prompts_log.jsonl` - Immutable hash-chained audit trail (each entry links to previous)
- `risk_log.json` - Risk register with all flagged items across workflow

üìÅ **deliverables/** (one subfolder per case)
Each case folder contains:
- `artifacts.json` - All agent outputs (intake, IPS, disclosure, suitability, QC, risk assessment)
- `registers.json` - Assumption register, open items register, not-verified register
- `checkpoints.json` - Human approval gate history with timestamps

---

### CASE EXECUTION SUMMARY

"""

# Add case results
for cid, result in results.items():
    status = result.get('status', 'unknown')
    status_icon = "‚úÖ" if status == "completed" else "‚ö†Ô∏è" if status == "failed" else "üîÑ"
    case_name = cid.replace('case_', '').replace('_', ' ').title()
    readme_content += f"**{status_icon} {case_name}**\n"
    readme_content += f"- Status: {status}\n"
    if status == "completed":
        readme_content += f"- Artifacts: {result.get('artifacts_path', 'N/A')}\n"
        readme_content += f"- Registers: {result.get('registers_path', 'N/A')}\n"
        readme_content += f"- Checkpoints: {result.get('checkpoints_path', 'N/A')}\n"
    elif status == "failed":
        readme_content += f"- Error: {result.get('error', 'Unknown error')}\n"
    readme_content += "\n"

readme_content += """---

### GOVERNANCE ARTIFACTS DETAIL

‚úÖ **Traceability**
- `run_manifest.json`: Complete run configuration and environment snapshot
- `prompts_log.jsonl`: Hash-chained immutable log prevents tampering
  - Each entry contains: step_id, agent_name, prompt_hash, response_hash, prev_hash
  - Chain integrity can be verified by checking prev_hash linkage

‚úÖ **Risk Register**
- `risk_log.json`: All risks detected during workflow execution
- Risk types tracked:
  - confidentiality (PII handling)
  - hallucination (invented facts/authority)
  - missing_facts (incomplete information)
  - suitability (suitability determination language)
  - regbi (Reg BI compliance language)
  - conflicts (conflict of interest gaps)
  - liquidity (liquidity mismatch)
  - prompt_injection (security issues)
  - overreach (scope boundary violations)
  - workflow_integrity_gap (missing checkpoints, structural issues)
  - hinge_fact_unresolved_block (critical facts blocking workflow)

‚úÖ **Assumption Registers**
- Per-case tracking of all assumptions made during workflow
- Hinge facts flagged (critical assumptions that block downstream steps if unresolved)
- Resolution status tracked for supervision review

‚úÖ **Checkpoint History**
- All human-in-the-loop approval gates logged with timestamps
- Auto-approval status recorded (in production, would require manual approval)
- Checkpoint names indicate workflow stage

‚úÖ **Immutable Audit Trail**
- Hash-chained log prevents post-hoc modification
- Redacted prompts/responses protect confidentiality while maintaining auditability
- Suitable for supervision files and regulatory examination

---

### LEVEL 3 ARCHITECTURE OVERVIEW

**Agentic Workflow Pattern:**
This notebook demonstrates a multi-agent orchestration pattern where:
1. A central Orchestrator manages workflow state and progression
2. Specialized agents handle discrete workflow steps
3. Explicit human checkpoints gate workflow advancement
4. Shared state tracks assumptions, open items, and verification needs
5. Immutable logs ensure auditability

**Agents:**
- **OrchestratorAgent** - Workflow state machine, enforces checkpoint gates
- **IntakeAgent** - Structures client scenarios, extracts facts/assumptions
- **IPSDraftAgent** - Generates Investment Policy Statement shells (process only, no allocations)
- **DisclosureAgent** - Creates disclosure checklists (conflicts, fees, risks, limitations)
- **SuitabilityReasoningAgent** - Drafts reasoning scaffolds (questions, NOT conclusions)
- **QCReviewerAgent** - Reviews artifacts for gaps, inconsistencies, missing items
- **RiskAssessorAgent** - Evaluates workflow-level risks and integrity
- **Logger** - Maintains immutable hash-chained audit trail

**Shared State Components:**
- Assumption register (with hinge fact flags)
- Open items register (unresolved questions)
- Not-verified register (items requiring external verification)
- Checkpoint history (human approval gates)

**Checkpoint Mechanism:**
Three standard checkpoints per workflow:
1. Post-intake review (verify facts, assumptions, hinge facts)
2. Post-drafts review (verify IPS, disclosures, suitability scaffolds)
3. Final approval gate (authorize deliverable package)

**Hinge Fact Enforcement:**
- Critical assumptions flagged as "hinge facts"
- Unresolved hinge facts BLOCK downstream drafting steps
- Ensures workflow doesn't proceed on unvalidated critical assumptions
- Logged in risk register if blocking occurs

---

### SUPERVISION FILE CHECKLIST

**For each case, advisor must review:**

‚òê **1. Intake Artifacts**
- Are all stated facts accurate and complete?
- Are assumptions reasonable and documented?
- Are hinge facts identified and resolved?
- Are open questions addressed or escalated?

‚òê **2. Draft Outputs** (IPS, Disclosures, Suitability)
- Do drafts align with firm standards and templates?
- Are all material risks disclosed?
- Is reasoning scaffold complete without making determinations?
- Does content avoid recommendations and allocations?

‚òê **3. QC Notes**
- Are identified gaps addressed?
- Are inconsistencies resolved?
- Are missing items documented in follow-up plan?

‚òê **4. Risk Register**
- Are all flagged risks reviewed?
- Are high-severity risks mitigated or documented?
- Is workflow integrity maintained (no missing checkpoints)?

‚òê **5. Checkpoint History**
- Were all approval gates completed?
- Is approval documentation adequate for supervision file?
- Are checkpoint timestamps consistent with workflow progression?

‚òê **6. Registers Review**
- Assumption register: Are all assumptions validated or flagged for follow-up?
- Open items register: Are critical items addressed before implementation?
- Not-verified register: Are regulatory/authority claims verified externally?

---

### WHAT AGENTS DID (SCOPE)

‚úÖ **Structured intake workflows**
- Extracted facts from scenarios
- Identified assumptions and alternatives
- Flagged open questions and unknowns

‚úÖ **Drafted IPS shells**
- Created process/governance frameworks
- Avoided allocations and specific targets
- Focused on roles, responsibilities, review schedules

‚úÖ **Generated disclosure checklists**
- Identified disclosure topics (conflicts, compensation, risks, limitations)
- Created advisor review checklists
- Avoided compliance assertions

‚úÖ **Created suitability reasoning scaffolds**
- Structured questions advisor must answer
- Identified alternatives for comparison
- Flagged risks and conflicts to address

‚úÖ **Flagged risks and gaps**
- QC review identified missing information
- Risk assessment evaluated workflow integrity
- Both flagged items for advisor follow-up

‚úÖ **Maintained audit trail**
- Immutable hash-chained logs
- Risk register with all detections
- Checkpoint history for supervision

---

### WHAT AGENTS DID NOT DO (BOUNDARIES)

‚ùå **Did NOT recommend specific investments**
- No securities, tickers, funds, or products named
- No allocations or portfolio weightings specified
- No "buy" or "sell" guidance provided

‚ùå **Did NOT determine suitability or best interest**
- Only drafted reasoning scaffolds and questions
- Did not conclude "suitable" or "unsuitable"
- Did not assert Reg BI or fiduciary compliance

‚ùå **Did NOT assert compliance**
- All regulatory references marked "Not verified"
- No claims of meeting regulatory standards
- No compliance determinations made

‚ùå **Did NOT execute transactions**
- No trades, transfers, or portfolio actions
- No account changes or implementations

‚ùå **Did NOT replace advisor judgment**
- All outputs require qualified advisor review
- Checkpoints enforce human-in-the-loop
- Supervision and final approval required

---

### USAGE NOTES FOR ADVISORS

**This package is designed for:**
- Supervision files documenting AI-assisted workflow
- Compliance review of agentic process controls
- Audit trail for regulatory examination
- Training on governance-first AI implementation

**How to use deliverables:**
1. Review README (this file) for package overview
2. Check run_manifest.json for configuration details
3. Review risk_log.json for any high-severity items
4. For each case:
   - Read artifacts.json to review all agent outputs
   - Check registers.json for assumptions/open items needing attention
   - Verify checkpoints.json shows proper approval gates
5. Use supervision checklist above for systematic review
6. Document advisor review, decisions, and follow-up in firm systems

**Best practices:**
- Never use AI outputs without qualified advisor review
- Verify all regulatory/authority claims independently
- Resolve all hinge facts before implementation
- Document deviations from AI suggestions with rationale
- Maintain human decision-making primacy
- Keep AI assistance documented for supervision files

---

### GOVERNANCE-FIRST PRINCIPLE

**Capability ‚Üë ‚áí Risk ‚Üë ‚áí Controls ‚Üë**

As AI capabilities increase (Level 1 ‚Üí Level 2 ‚Üí Level 3 ‚Üí beyond):
- Risk exposure increases (scope, autonomy, impact)
- Controls must increase proportionally (logging, checkpoints, boundaries)
- Supervision requirements intensify (review, approval, documentation)

Level 3 (Agents) introduces:
- Multi-step workflows (more surface area for errors)
- Agent orchestration (coordination complexity)
- Longer execution chains (compounding risk)

Required controls for Level 3:
- Explicit human checkpoints (workflow gates)
- Immutable audit trails (tamper-evident logs)
- Hinge fact enforcement (blocking logic)
- Structured output validation (schema enforcement)
- Risk detection patterns (real-time monitoring)
- Scope boundaries (hard limits on agent authority)

---

### CONTACT & ATTRIBUTION

**Author:** Alejandro Reynoso
**Title:** Chief Scientist, DEFI CAPITAL RESEARCH
**Affiliation:** External Lecturer, Judge Business School Cambridge

**Model:** claude-sonnet-4-5-20250929 (Anthropic)
**Chapter:** Chapter 3 - Level 3 (Agents): Multi-Step Advisory Workflows
**Notebook Version:** 1.0
**Date:** {datetime.now().strftime('%B %d, %Y')}

---

### TECHNICAL NOTES

**Token Management:**
- Max tokens: 4096 per agent call
- Response length limits enforced in prompts
- Retry logic handles truncation issues

**JSON Validation:**
- Strict schema enforcement on all LLM outputs
- Required keys validated pre-acceptance
- Non-JSON responses trigger retries and risk logging

**Hash Chain Integrity:**
- prompts_log.jsonl uses SHA-256 hash chaining
- Each entry's prev_hash links to prior entry's response_hash
- Genesis block initializes chain with null hashes
- Chain can be verified programmatically for tampering detection

**Confidentiality:**
- All prompts/responses redacted before logging
- PII patterns (SSN, email, phone, account numbers) removed
- Case data sanitized with warnings against real client data

---

### VERSION HISTORY

**v1.0** - Initial release
- 10-cell notebook structure
- 4 mini-case demonstrations
- Full governance artifact generation
- Immutable logging and risk register
- Checkpoint enforcement
- Hinge fact blocking logic

---

**END OF README**

For questions or feedback on this governance-first agentic workflow framework, please contact the author.
"""

# Write README
readme_path = RUN_DIR / "README.md"
with open(readme_path, "w") as f:
    f.write(readme_content)

print(f"‚úì README created: {readme_path}")
print(f"  Length: {len(readme_content)} characters")

# Create package summary JSON for programmatic access
package_summary = {
    "run_id": RUN_ID,
    "timestamp": datetime.now().isoformat(),
    "manifest": MANIFEST,
    "cases": {
        case_id: {
            "status": result.get("status", "unknown"),
            "artifacts_path": result.get("artifacts_path"),
            "registers_path": result.get("registers_path"),
            "checkpoints_path": result.get("checkpoints_path"),
            "error": result.get("error")
        }
        for case_id, result in results.items()
    },
    "files": {
        "readme": str(readme_path),
        "manifest": str(RUN_DIR / "run_manifest.json"),
        "prompts_log": str(LOGS_DIR / "prompts_log.jsonl"),
        "risk_log": str(LOGS_DIR / "risk_log.json")
    }
}

summary_path = RUN_DIR / "package_summary.json"
with open(summary_path, "w") as f:
    json.dump(package_summary, f, indent=2)

print(f"‚úì Package summary created: {summary_path}")

# Count total risks logged
with open(LOGS_DIR / "risk_log.json", "r") as f:
    risk_log = json.load(f)
    total_risks = len(risk_log.get("risks", []))
    high_severity_risks = len([r for r in risk_log.get("risks", []) if r.get("severity") == "high"])

print(f"‚úì Risk register: {total_risks} total risks, {high_severity_risks} high-severity")

# Count log entries
with open(LOGS_DIR / "prompts_log.jsonl", "r") as f:
    log_lines = f.readlines()
    log_entries = len(log_lines)

print(f"‚úì Immutable log: {log_entries} entries (hash-chained)")

# Create zip bundle
zip_path = Path(f"/content/{RUN_ID}.zip")
print(f"\nüì¶ Creating zip bundle...")

with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zipf:
    for file_path in RUN_DIR.rglob("*"):
        if file_path.is_file():
            arcname = file_path.relative_to(RUN_DIR.parent)
            zipf.write(file_path, arcname)

zip_size_mb = zip_path.stat().st_size / (1024 * 1024)
print(f"‚úì Zip bundle created: {zip_path}")
print(f"‚úì Bundle size: {zip_size_mb:.2f} MB")

# Generate final statistics
completed_cases = sum(1 for r in results.values() if r.get("status") == "completed")
failed_cases = sum(1 for r in results.values() if r.get("status") == "failed")
total_cases = len(results)

# Display final summary
print(f"\n{'='*70}")
print(f"CHAPTER 3 - LEVEL 3 AGENTIC WORKFLOWS COMPLETE")
print(f"{'='*70}")
print(f"\nüìä EXECUTION SUMMARY:")
print(f"   Run ID: {RUN_ID}")
print(f"   Total Cases: {total_cases}")
print(f"   ‚úÖ Completed: {completed_cases}")
print(f"   ‚ö†Ô∏è  Failed: {failed_cases}")
print(f"   üìù Log Entries: {log_entries}")
print(f"   üö® Risks Logged: {total_risks} ({high_severity_risks} high-severity)")

print(f"\nüìÅ DELIVERABLES PACKAGE:")
print(f"   üì¶ Zip file: {zip_path.name}")
print(f"   üíæ Size: {zip_size_mb:.2f} MB")
print(f"   üìÑ Contents:")
print(f"      - run_manifest.json (configuration)")
print(f"      - README.md (comprehensive documentation)")
print(f"      - package_summary.json (programmatic summary)")
print(f"      - logs/prompts_log.jsonl (immutable audit trail)")
print(f"      - logs/risk_log.json (risk register)")
print(f"      - deliverables/{total_cases} case folders (artifacts + registers + checkpoints)")

print(f"\n‚úÖ GOVERNANCE ARTIFACTS INCLUDED:")
print(f"   ‚úì Traceability: run_manifest.json + immutable hash-chained log")
print(f"   ‚úì Risk Register: All workflow risks documented")
print(f"   ‚úì Assumption Registers: Per-case tracking with hinge fact flags")
print(f"   ‚úì Checkpoint History: Human-in-the-loop approval gates logged")
print(f"   ‚úì Audit Trail: Tamper-evident, suitable for supervision files")

print(f"\nüéØ LEVEL 3 DEMONSTRATIONS:")
print(f"   ‚úì Multi-step agentic workflow orchestration")
print(f"   ‚úì Explicit human checkpoints at workflow gates")
print(f"   ‚úì Fact vs assumption separation with hinge fact blocking")
print(f"   ‚úì Suitability/Reg BI reasoning scaffolds (not determinations)")
print(f"   ‚úì Immutable logs and versioned deliverables")
print(f"   ‚úì Scope boundaries enforced (no recommendations/allocations)")

print(f"\n‚ö†Ô∏è  REMINDER:")
print(f"   NOT INVESTMENT, TAX, OR LEGAL ADVICE")
print(f"   All outputs require qualified advisor review and supervision")
print(f"   This package demonstrates governance-first agentic architecture")

print(f"\n{'='*70}")
print(f"READY FOR DOWNLOAD")
print(f"{'='*70}")

# Provide download instructions
print(f"\nüì• TO DOWNLOAD THE SUPERVISION PACKAGE:")
print(f"\n   Option 1 - From Files Panel:")
print(f"   1. Click the üìÅ folder icon in left sidebar")
print(f"   2. Navigate to: {zip_path.name}")
print(f"   3. Click ‚ãÆ menu ‚Üí Download")

print(f"\n   Option 2 - Run this code:")
print(f"   ```python")
print(f"   from google.colab import files")
print(f"   files.download('{zip_path}')")
print(f"   ```")

print(f"\nüí° NEXT STEPS:")
print(f"   1. Download the zip file")
print(f"   2. Extract and review README.md")
print(f"   3. Use supervision checklist for systematic review")
print(f"   4. Examine risk_log.json for high-severity items")
print(f"   5. Review each case's artifacts, registers, and checkpoints")

print(f"\nüéì EDUCATIONAL VALUE:")
print(f"   This notebook demonstrates:")
print(f"   - Governance-first principle: Capability ‚Üë ‚áí Risk ‚Üë ‚áí Controls ‚Üë")
print(f"   - Production-ready agentic architecture for financial advisory")
print(f"   - Appropriate scope boundaries for Level 3 AI workflows")
print(f"   - Audit trail and documentation for regulatory supervision")

print(f"\n{'='*70}")
print(f"Chapter 3 notebook execution complete. Package ready for supervision file.")
print(f"{'='*70}\n")

# Auto-download helper (commented out by default - user can uncomment to auto-download)
# Uncomment the lines below to automatically trigger download
# from google.colab import files
# files.download(str(zip_path))
# print(f"‚¨áÔ∏è  Download started automatically")

‚úì README created: /content/run_20260115_155535_110b2989/README.md
  Length: 12401 characters
‚úì Package summary created: /content/run_20260115_155535_110b2989/package_summary.json
‚úì Risk register: 2 total risks, 2 high-severity
‚úì Immutable log: 22 entries (hash-chained)

üì¶ Creating zip bundle...
‚úì Zip bundle created: /content/run_20260115_155535_110b2989.zip
‚úì Bundle size: 0.03 MB

CHAPTER 3 - LEVEL 3 AGENTIC WORKFLOWS COMPLETE

üìä EXECUTION SUMMARY:
   Run ID: run_20260115_155535_110b2989
   Total Cases: 4
   ‚úÖ Completed: 3
   ‚ö†Ô∏è  Failed: 1
   üìù Log Entries: 22
   üö® Risks Logged: 2 (2 high-severity)

üìÅ DELIVERABLES PACKAGE:
   üì¶ Zip file: run_20260115_155535_110b2989.zip
   üíæ Size: 0.03 MB
   üìÑ Contents:
      - run_manifest.json (configuration)
      - README.md (comprehensive documentation)
      - package_summary.json (programmatic summary)
      - logs/prompts_log.jsonl (immutable audit trail)
      - logs/risk_log.json (risk register)
     

##11.CONCLUSIONS



**Overview of the Advisory Workflow Pipeline**

This notebook demonstrates a sophisticated end-to-end pipeline that transforms unstructured client scenarios into governed, audit-ready advisory workflow artifacts. The pipeline represents a Level 3 agentic system where multiple AI agents collaborate through structured data exchanges, human checkpoints, and immutable logging to assist financial advisors while maintaining strict compliance boundaries. Understanding this pipeline reveals how modern AI can augment professional advisory work without replacing human judgment or violating regulatory requirements.

**Stage 1: User Input - The Starting Point**

The pipeline begins when a user provides an unstructured client scenario as plain text. This input might be a paragraph or several paragraphs describing a client situation, such as retirement planning needs, concentrated stock positions, alternative investment interests, or practice management challenges. The scenario typically includes facts like client age, asset values, goals, risk tolerance, and specific questions the advisor needs to address.

This raw input is intentionally unstructured because that mirrors real-world advisory practice. Advisors receive information through conversations, emails, notes from meetings, or intake forms that vary in format and completeness. The AI system must be able to process this natural, unorganized information and transform it into something actionable.

In Cell 9, four such scenarios are defined as simple Python strings. For example, Case 1 presents a retirement scenario with biographical details, asset information, income needs, pension details, Social Security considerations, risk concerns, health status, and planning questions all mixed together in narrative form. This messy, real-world format is what the pipeline must handle.

**Stage 2: Intake Agent - Initial Structuring**

The first transformation occurs when the OrchestratorAgent hands the raw scenario to the IntakeAgent. The orchestrator calls the intake agent's process method, passing the case identifier and the raw scenario text. The IntakeAgent then constructs a prompt for the Claude AI model.

This prompt has two critical components. The system prompt defines the agent's role, boundaries, and output requirements. It states that the agent extracts facts, identifies assumptions, flags unknowns, never recommends investments, and must respond with valid JSON only. The user prompt contains the actual scenario text along with explicit instructions about what to extract and how to structure the response.

Crucially, the prompt specifies the exact JSON schema required. The agent must return ten specific fields: task description, facts_provided list, assumptions list, alternatives list, open_questions list, analysis text, risks array, draft_output text, verification_status field, and questions_to_verify list. The prompt also specifies length constraints for each field to prevent token limit problems.

**Stage 3: LLM Wrapper - Quality Control Gateway**

Before the prompt reaches the Claude AI model, it passes through the StrictJSONLLMWrapper, which acts as a quality control gateway. The wrapper enhances the system prompt by injecting additional requirements about JSON formatting, length limits, and the mandatory disclaimer that must appear in all outputs.

The wrapper then makes the API call to Anthropic's service using the configured model (claude-sonnet-4-5-20250929) with specified parameters: 4096 maximum tokens, 0.2 temperature for consistency, and the enhanced system and user prompts. The AI model processes these instructions and generates a response.

When the response returns, the wrapper performs multiple validation steps before accepting it. First, it strips any markdown formatting that might have been included. Second, it attempts to parse the response as JSON. If parsing fails, the wrapper checks whether truncation occurred and can retry with more concise prompts. Third, it validates that all ten required fields are present in the parsed JSON. Fourth, it ensures array fields contain at least placeholder values if empty. Fifth, it runs pattern detection looking for problematic content like implied recommendations, suitability determinations, or unverified regulatory citations.

If any validation fails, the wrapper logs the failure to the risk register and either retries or raises an error. If all validations pass, the wrapper logs the redacted prompt and response to the immutable audit trail using hash chaining, then returns the validated JSON structure to the calling agent.

**Stage 4: JSON Processing - Data Registration**

The IntakeAgent receives the validated JSON structure from the wrapper and returns it to the OrchestratorAgent. The orchestrator now processes this structured data by registering it into the SharedState system.

The orchestrator iterates through the assumptions list, examining each assumption's text for keywords like "critical" or "must" that indicate it might be a hinge fact. Hinge facts are assumptions so important that if they remain unresolved, the workflow should not proceed to drafting recommendations. Each assumption gets registered with its hinge fact status.

Similarly, the orchestrator extracts all items from the open_questions list and registers them as open items requiring follow-up. Items from the questions_to_verify list get registered into the not-verified register, flagging claims that need external validation.

The complete intake artifact, with all ten JSON fields, gets stored in the case's artifacts collection within the shared state. This creates a permanent record of the intake step that later agents can reference.

**Stage 5: Checkpoint Enforcement - Human Review Gates**

After intake completes and data registration finishes, the orchestrator reaches Checkpoint 1. The checkpoint mechanism logs this moment with a timestamp, checkpoint name, case identifier, and approval status. In this demonstration, checkpoints auto-approve for smooth execution, but in production environments, these would pause the workflow requiring actual human review before proceeding.

The orchestrator then calls the check_hinge_facts method on the shared state. This method examines the assumption register looking for any assumptions marked as hinge facts that remain unresolved. If unresolved hinge facts exist, the method logs a high-severity risk to the risk register and returns false, which causes the orchestrator to stop the workflow and finalize the case with a "blocked by hinge facts" status.

If all hinge facts are resolved or no hinge facts exist, the workflow continues to the drafting agents.

**Stage 6: Sequential Agent Processing - Building the Package**

The orchestrator now executes agents sequentially, with each agent building on previous agents' work. The IPSDraftAgent receives the intake results as input. Rather than passing the raw scenario again, it receives the structured JSON from intake containing extracted facts and identified assumptions. The IPS agent uses these structured inputs to generate an Investment Policy Statement shell.

The agent constructs its own prompt that references facts from the intake JSON. This prompt explicitly instructs the AI to create IPS sections (Purpose, Roles, Process, Review Schedule) without specifying allocations or targets. The prompt again requires the ten-field JSON structure with length limits.

The wrapper processes this prompt through the same quality control pipeline: enhancement, API call, validation, pattern detection, logging, and return. The validated IPS JSON gets stored in the case artifacts.

The DisclosureAgent follows the same pattern, receiving intake results and generating a disclosure checklist JSON. The SuitabilityReasoningAgent similarly processes intake data to create a structured reasoning scaffold with questions the advisor must answer.

After these three drafting agents complete, Checkpoint 2 pauses for review of all draft materials before proceeding to quality control.

**Stage 7: Quality Control and Risk Assessment**

The QCReviewerAgent receives all accumulated artifacts as input. Rather than the raw scenario or intake results, it examines the complete collection: intake JSON, IPS JSON, disclosure JSON, and suitability JSON. The QC agent looks for gaps where information is missing, inconsistencies where different agents made conflicting assumptions, and risks that were not adequately addressed in earlier steps.

The QC review outputs its findings in the same ten-field JSON structure. The risks array typically contains multiple entries identifying specific problems like "missing retirement expense breakdown" or "concentration risk not addressed in disclosures." The draft_output field provides a summary of findings for the advisor to review.

The RiskAssessorAgent then examines the entire workflow state, not just the artifacts. It receives the SharedState object giving it access to the assumption register, open items register, not-verified register, and checkpoint history. This agent evaluates whether the workflow itself was executed properly: Were all checkpoints completed? Are there too many unresolved assumptions? Did any agent skip required verification steps?

The risk assessment JSON identifies workflow integrity issues, which are distinct from content issues. A workflow integrity problem might be "multiple regulatory claims in disclosures but none flagged for verification" or "hinge fact identified but resolution not documented."

After risk assessment completes, the Final Checkpoint requires approval before the package can be marked as ready for delivery.

**Stage 8: Artifact Finalization - Creating Supervision Files**

When the workflow completes (either successfully or by early termination at a checkpoint), the orchestrator's finalize method executes. This creates a subfolder within the deliverables directory named after the case identifier. Inside this folder, three JSON files are written.

The artifacts.json file contains the complete output from all six agents: intake, IPS, disclosure, suitability, QC, and risk assessment. Each agent's ten-field JSON structure is preserved exactly as validated. This file provides the substantive content of the workflow - the actual drafts and analysis.

The registers.json file contains three arrays extracted from the shared state: all assumptions with their hinge fact flags and resolution status, all open items identified throughout the workflow, and all not-verified claims that need external validation. This file helps supervisors understand what remains uncertain or unvalidated.

The checkpoints.json file contains the complete history of checkpoint events: when each checkpoint occurred, whether it was auto-approved or manually approved, and which case it applied to. This provides audit evidence that human review gates were implemented.

**Stage 9: Immutable Logging - Creating the Audit Trail**

Parallel to the main workflow, the Logger class maintains two separate log files throughout execution. Every time an agent calls the LLM wrapper, the wrapper automatically calls the logger's log_prompt_response method after successful validation.

This method creates a log entry containing the step identifier, timestamp, agent name, redacted prompt text, redacted response text, a hash of the prompt, a hash of the response, and critically, the hash of the previous log entry. This last field creates the hash chain that makes the log immutable.

Each log entry is appended to the prompts_log.jsonl file as a single line of JSON. The jsonl format (JSON Lines) allows the file to grow without being loaded entirely into memory. The hash chain means that if anyone tried to modify an earlier entry, all subsequent entries' prev_hash values would no longer match, immediately revealing the tampering.

Separately, whenever any component detects a risk (the wrapper detecting problematic patterns, the orchestrator finding unresolved hinge facts, or agents explicitly identifying risks in their outputs), an entry gets appended to the risk_log.json file. This file accumulates all risks from all cases in a single register, making it easy to review all problems in one place.

**Stage 10: Package Assembly - Creating the Deliverable**

After all cases complete execution, Cell 10 assembles the complete supervision package. The README generator creates comprehensive documentation by combining the run manifest, case results, governance explanations, supervision checklists, and technical details into a single 5000-word markdown document.

The package_summary.json generator creates a machine-readable summary by extracting key information from the run manifest and case results into a structured JSON object. This allows automated tools to process packages without parsing markdown.

The ZIP bundler recursively walks through the entire run directory and compresses all files while preserving folder structure. The resulting archive contains everything: manifest, README, summary, immutable logs, risk register, and individual case folders with their artifacts, registers, and checkpoint histories.

**The Complete Pipeline in Summary**

The pipeline transforms unstructured text into governed advisory artifacts through ten distinct stages. Raw scenarios become structured intake JSON through AI processing with quality controls. Structured intake feeds sequential specialized agents, each producing standardized JSON outputs. Human checkpoints gate progression between stages. Quality control agents examine accumulated artifacts for problems. All interactions get logged immutably with hash chaining. All risks get accumulated in a central register. All assumptions and open items get tracked in registers. All artifacts get saved in organized case folders. Everything gets packaged into a supervision-ready archive.

This architecture demonstrates how generative AI can augment professional advisory workflows while maintaining the governance controls necessary for regulated industries. The strict JSON structuring at every stage ensures consistency, auditability, and scope enforcement. The multi-agent pattern allows specialization while maintaining coordination. The checkpoint mechanism preserves human primacy in decision-making. The immutable logging provides regulatory-grade audit trails. Together, these elements create a production-viable system that increases advisor productivity without compromising compliance or professional responsibility.