# Entity Extraction Troubleshooting Notebook

## Purpose
This notebook provides comprehensive troubleshooting for entity extraction issues, specifically:
- Finding the breaking point where extraction fails (starting from 1000 characters)
- Testing with vLLM as a Python package (not HTTP service)
- Complete parameter control for optimization
- Model comparison (Granite 3.3 2B vs Qwen 2.5 72B)

## Workflow
1. Setup and Dependencies
2. vLLM Configuration
3. Document Chunking (BEFORE extraction)
4. Prompt Template Management
5. Dynamic Breaking Point Detection
6. Entity Extraction Strategies (AFTER chunking)
7. Performance Analysis
8. Automated Testing

## 1. Setup and Dependencies

In [1]:
# Install all required packages
import subprocess
import sys

def install_package(package):
    """Install a package using pip"""
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])

# Core packages
packages = [
    'requests',           # API calls for pattern loading
    'torch',              # PyTorch for vLLM
    'transformers',       # Model loading
    'tiktoken',           # Token counting
    'pandas',             # Data analysis
    'numpy',              # Numerical operations
    'matplotlib',         # Plotting
    'seaborn',           # Advanced plotting
    'jinja2',            # Template rendering
    'pyyaml',            # YAML parsing
    'rich',              # Rich console output
    'psutil',            # System monitoring
    'nvidia-ml-py3',     # GPU monitoring with NVML
    'gpustat',           # GPU monitoring
    'tqdm',              # Progress bars
]

print("Installing required packages...")
for package in packages:
    try:
        install_package(package)
        print(f"✓ {package.split('==')[0]}")
    except Exception as e:
        print(f"✗ {package}: {e}")

print("\nPackage installation complete!")

Installing required packages...
✓ requests
✓ torch
✓ transformers
✓ tiktoken
✓ pandas
✓ numpy
✓ matplotlib
✓ seaborn
✓ jinja2
✓ pyyaml
✓ rich
✓ psutil
✓ nvidia-ml-py3
✓ gpustat
✓ tqdm

Package installation complete!


In [2]:
# Import all required modules
import os
import sys
import json
import time
import traceback
import requests
import re
from pathlib import Path
from typing import Dict, List, Optional, Any, Tuple
from dataclasses import dataclass
from enum import Enum
import warnings
warnings.filterwarnings('ignore')

# Data processing
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

# System monitoring
import psutil
import torch

# Progress tracking
from tqdm import tqdm

# Rich output for formatted console display
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.progress import track, Progress, SpinnerColumn, BarColumn, TextColumn
from rich.syntax import Syntax
from rich import print as rprint

console = Console()

# Add notebook modules to path
notebook_modules_path = '/srv/luris/be/entity-extraction-service/notebook_modules'
if notebook_modules_path not in sys.path:
    sys.path.insert(0, notebook_modules_path)

print("✓ All modules imported successfully")
print(f"✓ Python: {sys.version}")
print(f"✓ PyTorch: {torch.__version__}")
print(f"✓ CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"✓ GPU: {torch.cuda.get_device_name(0)}")

✓ All modules imported successfully
✓ Python: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
✓ PyTorch: 2.7.1+cu126
✓ CUDA Available: True
✓ GPU: NVIDIA A40


In [3]:
# Import custom notebook modules
try:
    from vllm_controller import VLLMController
    from template_manager import TemplateManager
    from chunking_engine import ChunkingEngine, ChunkingStrategy
    from performance_analyzer import PerformanceAnalyzer
    from extraction_tester import ExtractionTester
    print("✓ All notebook modules imported successfully")
except ImportError as e:
    print(f"Error importing notebook modules: {e}")
    print("Make sure the notebook_modules directory exists with all required files")

✓ All notebook modules imported successfully


## 2. vLLM Configuration and Model Control

In [5]:
# vLLM Configuration Classes
@dataclass
class VLLMConfig:
    """Configuration for vLLM models"""
    model_name: str
    gpu_memory_utilization: float = 0.9
    max_model_len: Optional[int] = None
    max_num_batched_tokens: int = 8192
    max_num_seqs: int = 256
    enable_chunked_prefill: bool = True
    enable_prefix_caching: bool = False
    tensor_parallel_size: int = 1
    dtype: str = "auto"
    enforce_eager: bool = False
    trust_remote_code: bool = True
    
# Model Configurations
GRANITE_CONFIG = VLLMConfig(
    model_name="ibm-granite/granite-3.1-2b-instruct",
    gpu_memory_utilization=0.85,
    max_model_len=128000,  # Full 128K context
    max_num_batched_tokens=16384,
    enable_chunked_prefill=True,
    dtype="bfloat16"
)

QWEN_CONFIG = VLLMConfig(
    model_name="Qwen/Qwen2.5-72B-Instruct",
    gpu_memory_utilization=0.98,  # Max utilization for 72B
    max_model_len=32768,  # Reduced for memory
    max_num_batched_tokens=8192,
    tensor_parallel_size=2,  # Use both GPUs
    enable_chunked_prefill=True,
    dtype="bfloat16"
)

# Configuration presets for testing
CONFIGS = {
    "minimal": {
        "gpu_memory_utilization": 0.7,
        "max_num_batched_tokens": 512,
        "max_num_seqs": 1,
    },
    "balanced": {
        "gpu_memory_utilization": 0.85,
        "max_num_batched_tokens": 4096,
        "max_num_seqs": 16,
    },
    "aggressive": {
        "gpu_memory_utilization": 0.95,
        "max_num_batched_tokens": 16384,
        "max_num_seqs": 256,
    },
    "long_context": {
        "gpu_memory_utilization": 0.98,
        "max_model_len": 128000,
        "max_num_seqs": 4,
        "enable_prefix_caching": True,
    }
}

print("✓ vLLM configurations defined")
print(f"Available presets: {list(CONFIGS.keys())}")

✓ vLLM configurations defined
Available presets: ['minimal', 'balanced', 'aggressive', 'long_context']


In [6]:
# Initialize vLLM Controller
vllm_controller = VLLMController()

# Display available configurations
table = Table(title="vLLM Configuration Options")
table.add_column("Parameter", style="cyan")
table.add_column("Range/Options", style="green")
table.add_column("Description", style="yellow")

table.add_row("gpu_memory_utilization", "0.7 - 0.98", "GPU memory allocation")
table.add_row("max_model_len", "512 - 128000", "Maximum sequence length")
table.add_row("max_num_batched_tokens", "512 - 16384", "Max tokens per batch")
table.add_row("max_num_seqs", "1 - 256", "Max concurrent sequences")
table.add_row("enable_chunked_prefill", "True/False", "Enable chunked prefill")
table.add_row("enable_prefix_caching", "True/False", "Enable KV cache reuse")
table.add_row("tensor_parallel_size", "1 - 2", "GPU parallelism")
table.add_row("dtype", "auto/float16/bfloat16", "Model precision")

console.print(table)

## 3. Document Chunking Engine (Before Extraction)

In [8]:
# Initialize Chunking Engine
chunking_engine = ChunkingEngine()

# Chunking configuration
CHUNKING_CONFIG = {
    "min_chunk_size": 100,
    "max_chunk_size": 100000,
    "overlap_percentage": 10,  # 10% overlap between chunks
    "preserve_boundaries": True,
    "boundary_types": ["word", "sentence", "paragraph"]
}

# Test chunk sizes for breaking point detection
TEST_CHUNK_SIZES = [
    100, 250, 500, 750,
    1000, 1500, 2000, 2500, 3000, 3500,
    4000, 4500, 5000, 5500, 6000, 6500,
    7000, 7500, 8000, 8500, 9000, 9500,
    10000, 15000, 20000, 25000, 30000,
    40000, 50000, 75000, 100000
]

print(f"✓ Chunking engine initialized")
print(f"✓ Test sizes: {len(TEST_CHUNK_SIZES)} configurations")
print(f"✓ Range: {min(TEST_CHUNK_SIZES)} - {max(TEST_CHUNK_SIZES)} characters")

✓ Chunking engine initialized
✓ Test sizes: 31 configurations
✓ Range: 100 - 100000 characters


In [None]:
# Chunking strategies
def create_precise_chunk(text: str, size: int) -> str:
    """Create a chunk of exact character size"""
    if len(text) <= size:
        return text
    return text[:size]

def create_word_boundary_chunk(text: str, size: int) -> str:
    """Create a chunk respecting word boundaries"""
    if len(text) <= size:
        return text
    
    chunk = text[:size]
    # Find last complete word
    last_space = chunk.rfind(' ')
    if last_space > 0:
        return chunk[:last_space]
    return chunk

def create_sentence_boundary_chunk(text: str, size: int) -> str:
    """Create a chunk respecting sentence boundaries"""
    if len(text) <= size:
        return text
    
    chunk = text[:size]
    # Find last complete sentence
    import re
    sentences = re.split(r'[.!?]\s+', chunk)
    if len(sentences) > 1:
        return '. '.join(sentences[:-1]) + '.'
    return chunk

# Test chunking
sample_text = "This is a sample text. " * 100
test_chunk = create_word_boundary_chunk(sample_text, 500)
print(f"✓ Sample chunk created: {len(test_chunk)} characters")
print(f"✓ Chunking functions ready")

✓ Sample chunk created: 499 characters
✓ Chunking functions ready


## 4. API Pattern Loading and Prompt Configuration

In [11]:
# =============================================================================
# CONFIGURABLE PROMPT TEMPLATE VARIABLES
# =============================================================================

# Main configuration for extraction
EXTRACTION_MODE = "ai_enhanced"  # Options: "regex", "ai_enhanced", "hybrid"
EXTRACTION_STRATEGY = "multipass"  # Options: "multipass", "ai_enhanced", "unified"
EXTRACTION_PROFILE = "default"  # Options: "default", "fast", "citations", "entities", etc.
EXTRACTION_PASS = "actors"  # Options: "actors", "citations", "concepts"

# Template variables that prompts expect
TEMPLATE_VARIABLES = {
    "chunk_content": "",  # The text to process (will be filled dynamically)
    "text": "",  # Alternative name for chunk content
    "text_chunk": "",  # Another alternative name
    "entity_types": ["CASE_CITATION", "PARTY", "ATTORNEY", "COURT", "JUDGE"],
    "target_entity_types": ["CASE_CITATION", "PARTY", "ATTORNEY", "COURT", "JUDGE"],
    "confidence_threshold": 0.7,
    "pass_number": 1,
    "previous_results": None,
    "max_tokens": 2000,
    "temperature": 0.1
}

# Custom prompt template (optional - set to None to use API templates)
CUSTOM_PROMPT_TEMPLATE = None

# API endpoint configuration
API_BASE_URL = "http://10.10.0.87:8007/api/v1"
ENTITY_EXTRACTION_PORT = 8007

console.print(Panel.fit(
    f"[bold cyan]Extraction Configuration[/bold cyan]\n"
    f"Mode: {EXTRACTION_MODE}\n"
    f"Strategy: {EXTRACTION_STRATEGY}\n"
    f"Profile: {EXTRACTION_PROFILE}\n"
    f"Pass: {EXTRACTION_PASS}\n"
    f"API URL: {API_BASE_URL}",
    title="Configuration"
))

In [12]:
# Load patterns from Entity Extraction Service API
def load_patterns_from_api():
    """Load patterns from /api/v1/patterns/detailed endpoint"""
    try:
        console.print("[bold]Loading patterns from API...[/bold]")
        
        with console.status("[green]Fetching patterns from API...") as status:
            response = requests.get(f"{API_BASE_URL}/patterns/detailed")
            response.raise_for_status()
            patterns_data = response.json()
        
        console.print(f"✓ Loaded {patterns_data.get('total_patterns', 0)} patterns")
        
        # Display patterns in Rich table
        if patterns_data.get('patterns_by_category'):
            table = Table(title="Available Pattern Categories", show_lines=True)
            table.add_column("Category", style="cyan", width=30)
            table.add_column("Pattern Count", style="green", justify="right")
            table.add_column("Example Pattern", style="yellow", width=50)
            
            for category, patterns in patterns_data['patterns_by_category'].items():
                if patterns:
                    pattern_count = len(patterns)
                    # Get first pattern as example
                    example = patterns[0]
                    pattern_preview = example.get('pattern', '')[:50] + "..." if len(example.get('pattern', '')) > 50 else example.get('pattern', '')
                    table.add_row(category, str(pattern_count), pattern_preview)
            
            console.print(table)
        
        # Show statistics
        if patterns_data.get('statistics'):
            stats = patterns_data['statistics']
            console.print(Panel.fit(
                f"[bold]Pattern Statistics[/bold]\n"
                f"Total Patterns: {stats.get('total_patterns', 0)}\n"
                f"From YAML Files: {stats.get('patterns_from_yaml', 0)}\n"
                f"Entity Types: {len(patterns_data.get('entity_types', []))}\n"
                f"Categories: {len(patterns_data.get('patterns_by_category', {}))}",
                title="Summary"
            ))
        
        return patterns_data
    
    except requests.exceptions.RequestException as e:
        console.print(f"[red]Error loading patterns from API: {e}[/red]")
        console.print("[yellow]Make sure the Entity Extraction Service is running on port 8007[/yellow]")
        return None

# Load patterns
patterns_data = load_patterns_from_api()

In [None]:
# Load extraction profiles from API
def load_extraction_profiles():
    """Load extraction profiles from /api/v1/extraction-profiles endpoint"""
    try:
        console.print("[bold]Loading extraction profiles from API...[/bold]")
        
        response = requests.get(f"{API_BASE_URL}/extraction-profiles")
        response.raise_for_status()
        profiles_data = response.json()
        
        # Display profiles in Rich table
        table = Table(title="Available Extraction Profiles", show_lines=True)
        table.add_column("Profile", style="cyan", width=20)
        table.add_column("Description", style="green", width=40)
        table.add_column("Enabled Passes", style="yellow", width=20)
        table.add_column("Parallel", style="magenta", justify="center")
        
        for profile_name, profile_info in profiles_data.get('profiles', {}).items():
            passes = str(profile_info.get('enabled_passes', []))
            parallel = "✓" if profile_info.get('parallel_execution', False) else "✗"
            table.add_row(
                profile_name,
                profile_info.get('description', ''),
                passes,
                parallel
            )
        
        console.print(table)
        
        # Show pass descriptions
        if profiles_data.get('pass_descriptions'):
            console.print("\n[bold]Pass Descriptions:[/bold]")
            for pass_num, desc in profiles_data['pass_descriptions'].items():
                console.print(f"  Pass {pass_num}: {desc}")
        
        return profiles_data
    
    except requests.exceptions.RequestException as e:
        console.print(f"[red]Error loading extraction profiles: {e}[/red]")
        return None

# Load extraction profiles
profiles_data = load_extraction_profiles()

In [None]:
# Template variable discovery and prompt management
def discover_template_variables(prompt_template):
    """Discover what variables a prompt template expects"""
    # Find all {variable_name} and {{variable_name}} patterns
    single_brace = re.findall(r'\{(\w+)\}', prompt_template)
    double_brace = re.findall(r'\{\{(\w+)\}\}', prompt_template)
    
    all_vars = list(set(single_brace + double_brace))
    
    console.print(Panel.fit(
        f"[bold]Template expects these variables:[/bold]\n" + 
        "\n".join([f"  • {var}" for var in all_vars]),
        title="Template Variables"
    ))
    
    return all_vars

def get_prompt_template(use_custom=False):
    """Get the prompt template to use based on configuration"""
    if use_custom and CUSTOM_PROMPT_TEMPLATE:
        console.print("[yellow]Using custom prompt template[/yellow]")
        return CUSTOM_PROMPT_TEMPLATE
    
    # Try to get template from loaded data
    if template_manager and hasattr(template_manager, 'get_template'):
        template = template_manager.get_template(strategy=EXTRACTION_STRATEGY, pass_number=TEMPLATE_VARIABLES.get('pass_number'))
        if template:
            console.print(f"[green]Using template from service: {EXTRACTION_STRATEGY}[/green]")
            return template
    
    # Fallback to minimal template
    console.print("[yellow]Using fallback minimal template[/yellow]")
    return """Extract legal entities from the text.
Return as JSON: {{"entities": [{{"type": "...", "text": "...", "confidence": 0.0}}]}}

Text: {text}

JSON:"""

def render_prompt(template: str, text: str, variables: Dict = None) -> str:
    """Render a prompt template with text and variables"""
    # Merge template variables with provided text
    render_vars = TEMPLATE_VARIABLES.copy()
    render_vars.update({
        'text': text,
        'text_chunk': text,
        'chunk_content': text
    })
    
    if variables:
        render_vars.update(variables)
    
    # Try Jinja2 rendering first
    try:
        from jinja2 import Template
        jinja_template = Template(template)
        return jinja_template.render(**render_vars)
    except:
        # Fall back to simple format
        try:
            return template.format(**render_vars)
        except KeyError as e:
            console.print(f"[red]Missing template variable: {e}[/red]")
            # Try with just text
            return template.format(text=text)

# Get and analyze a template
selected_template = get_prompt_template()
console.print("\n[bold]Selected Template Preview:[/bold]")
console.print(Panel(selected_template[:500] + "..." if len(selected_template) > 500 else selected_template))

# Discover variables
template_vars = discover_template_variables(selected_template)

# Test rendering
test_text = "Smith v. Jones, 123 F.3d 456 (9th Cir. 2020)"
test_prompt = render_prompt(selected_template, test_text)
console.print(f"\n✓ Prompt rendering works")
console.print(f"✓ Test prompt length: {len(test_prompt)} characters")

## 5. Dynamic Breaking Point Detection

In [None]:
class BreakingPointDetector:
    """Detect the character count where extraction fails with detailed progress tracking"""
    
    def __init__(self, vllm_controller, template_manager):
        self.vllm = vllm_controller
        self.templates = template_manager
        self.results = []
        self.selected_template = get_prompt_template()
        
    def test_character_count(self, text: str, char_count: int) -> Dict:
        """Test extraction at specific character count with detailed tracking"""
        
        # Show progress for this chunk
        console.print(f"\n[cyan]Testing {char_count} characters...[/cyan]")
        
        # Create chunk of exact size
        with console.status(f"[green]Creating chunk of {char_count} chars...") as status:
            chunk = create_word_boundary_chunk(text, char_count)
            actual_size = len(chunk)
            console.log(f"✓ Chunk created: {actual_size} chars")
        
        # Render prompt
        with console.status("[green]Rendering prompt template...") as status:
            prompt = render_prompt(self.selected_template, chunk)
            prompt_tokens = len(prompt.split())  # Rough token estimate
            console.log(f"✓ Prompt rendered: ~{prompt_tokens} tokens")
        
        # Track performance
        start_time = time.time()
        success = False
        error_msg = None
        entities_found = 0
        
        # Show extraction progress
        with Progress(
            SpinnerColumn(),
            TextColumn("[progress.description]{task.description}"),
            BarColumn(),
            TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
            console=console
        ) as progress:
            
            task = progress.add_task("[cyan]Extracting entities...", total=100)
            
            try:
                # Update progress
                progress.update(task, advance=20, description="[cyan]Calling vLLM...")
                
                # Generate with vLLM
                response = self.vllm.generate(
                    prompt=prompt,
                    max_tokens=TEMPLATE_VARIABLES['max_tokens'],
                    temperature=TEMPLATE_VARIABLES['temperature']
                )
                
                progress.update(task, advance=60, description="[cyan]Parsing response...")
                
                # Parse response
                if response:
                    try:
                        result = json.loads(response)
                        entities_found = len(result.get('entities', []))
                        success = entities_found > 0
                        progress.update(task, advance=20, description="[green]Complete!")
                    except:
                        success = False
                        error_msg = "JSON parsing failed"
                        progress.update(task, advance=20, description="[red]Parse failed")
            except Exception as e:
                success = False
                error_msg = str(e)
                progress.update(task, advance=80, description=f"[red]Error: {error_msg[:30]}")
        
        elapsed = time.time() - start_time
        
        # Display result summary
        result_panel = Panel.fit(
            f"[bold]Chunk Test Result[/bold]\n"
            f"Size: {actual_size} chars\n"
            f"Success: {'✓' if success else '✗'}\n"
            f"Entities: {entities_found}\n"
            f"Time: {elapsed:.2f}s\n"
            f"Throughput: {actual_size/elapsed:.0f} chars/s" if elapsed > 0 else "",
            title=f"{'[green]Success' if success else '[red]Failed'}",
            border_style="green" if success else "red"
        )
        console.print(result_panel)
        
        return {
            "char_count": char_count,
            "actual_size": actual_size,
            "success": success,
            "entities_found": entities_found,
            "time_seconds": elapsed,
            "error": error_msg
        }
    
    def find_breaking_point(self, document: str, start: int = 1000, increment: int = 500, max_size: int = 100000) -> int:
        """Find the breaking point with detailed progress tracking"""
        
        console.print(Panel.fit(
            f"[bold cyan]Finding Breaking Point[/bold cyan]\n"
            f"Start: {start:,} chars\n"
            f"Increment: {increment} chars\n"
            f"Max: {max_size:,} chars",
            title="Breaking Point Detection"
        ))
        
        current_size = start
        last_success = 0
        first_failure = None
        
        with Progress(console=console) as progress:
            task = progress.add_task(
                "[cyan]Testing chunk sizes...", 
                total=min(max_size - start, len(document) - start)
            )
            
            while current_size <= max_size and current_size <= len(document):
                result = self.test_character_count(document, current_size)
                self.results.append(result)
                
                if result['success']:
                    last_success = current_size
                    console.print(f"[green]✓ {current_size:,} chars: Success ({result['entities_found']} entities in {result['time_seconds']:.2f}s)[/green]")
                else:
                    if first_failure is None:
                        first_failure = current_size
                    console.print(f"[red]✗ {current_size:,} chars: Failed - {result['error']}[/red]")
                    break  # Found breaking point
                
                progress.update(task, advance=increment)
                current_size += increment
        
        # Binary search for exact breaking point
        if first_failure and last_success:
            console.print(Panel.fit(
                f"[bold]Narrowing Breaking Point[/bold]\n"
                f"Between: {last_success:,} - {first_failure:,} chars",
                title="Binary Search"
            ))
            breaking_point = self.binary_search(document, last_success, first_failure)
            return breaking_point
        
        return first_failure or last_success
    
    def binary_search(self, document: str, low: int, high: int) -> int:
        """Binary search for exact breaking point with progress"""
        iterations = 0
        
        with Progress(console=console) as progress:
            task = progress.add_task("[cyan]Binary search...", total=100)
            
            while high - low > 10:
                mid = (low + high) // 2
                iterations += 1
                progress.update(task, advance=100/10)  # Estimate 10 iterations max
                
                result = self.test_character_count(document, mid)
                self.results.append(result)
                
                if result['success']:
                    low = mid
                    console.print(f"  [green]✓ {mid:,} chars: Success[/green]")
                else:
                    high = mid
                    console.print(f"  [red]✗ {mid:,} chars: Failed[/red]")
            
            progress.update(task, completed=100)
        
        console.print(Panel.fit(
            f"[bold green]Breaking Point Found![/bold green]\n"
            f"Last successful size: {low:,} chars\n"
            f"Binary search iterations: {iterations}",
            title="Result"
        ))
        
        return low  # Last successful size
    
    def get_results_df(self) -> pd.DataFrame:
        """Get results as DataFrame"""
        return pd.DataFrame(self.results)

# Initialize detector
detector = BreakingPointDetector(vllm_controller, template_manager)
console.print("[green]✓ Breaking point detector initialized with progress tracking[/green]")

## 6. Entity Extraction Strategies

In [None]:
class ExtractionStrategy:
    """Base class for extraction strategies with Rich output"""

    def __init__(self, vllm_controller):
        self.vllm = vllm_controller
        self.selected_template = get_prompt_template()

    def extract(self, text: str) -> Dict:
        raise NotImplementedError

class UnifiedExtraction(ExtractionStrategy):
    """Single pass extraction for all entities with detailed tracking"""

    def extract(self, text: str) -> Dict:
        console.print("\n[bold cyan]Unified Extraction Strategy[/bold cyan]")

        # Show extraction details
        console.print(Panel.fit(
            f"[bold]Extraction Details[/bold]\n"
            f"Text size: {len(text):,} chars\n"
            f"Strategy: Single unified pass\n"
            f"Entity types: All\n"
            f"Max tokens: {TEMPLATE_VARIABLES['max_tokens']}",
            title="Unified Strategy"
        ))

        # Render prompt with template
        with console.status("[green]Rendering prompt template...") as status:
            prompt = render_prompt(self.selected_template, text)
            prompt_size = len(prompt)
            console.log(f"✓ Prompt rendered: {prompt_size:,} chars")

        # Extract with progress tracking
        start_time = time.time()

        with Progress(
            SpinnerColumn(),
            TextColumn("[progress.description]{task.description}"),
            BarColumn(),
            TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
            console=console
        ) as progress:
            task = progress.add_task("[cyan]Extracting entities...", total=100)

            progress.update(task, advance=30, description="[cyan]Calling vLLM...")
            response = self.vllm.generate(
                prompt,
                max_tokens=TEMPLATE_VARIABLES['max_tokens'],
                temperature=TEMPLATE_VARIABLES['temperature']
            )

            progress.update(task, advance=50, description="[cyan]Parsing response...")
            try:
                # Try to parse JSON response
                if '```json' in response:
                    json_str = response.split('```json')[1].split('```')[0]
                elif '{' in response:
                    start_idx = response.index('{')
                    end_idx = response.rindex('}') + 1
                    json_str = response[start_idx:end_idx]
                else:
                    json_str = response

                entities = json.loads(json_str)
                progress.update(task, advance=20, description="[green]Complete!")
            except Exception as e:
                entities = {"entities": []}
                progress.update(task, advance=20, description="[yellow]Parse warning")
                console.print(f"[yellow]Warning: JSON parse issue: {e}[/yellow]")

        elapsed = time.time() - start_time
        entity_count = len(entities.get("entities", []))

        # Display results summary
        result_table = Table(title="Unified Extraction Results")
        result_table.add_column("Metric", style="cyan")
        result_table.add_column("Value", style="green")

        result_table.add_row("Entities Found", str(entity_count))
        result_table.add_row("Processing Time", f"{elapsed:.2f}s")
        result_table.add_row("Throughput", f"{len(text)/elapsed:.0f} chars/s")
        result_table.add_row("Passes", "1")

        console.print(result_table)

        return {
            "strategy": "unified",
            "entities": entities.get("entities", []),
            "time": elapsed,
            "passes": 1,
            "entity_count": entity_count
        }

class MultipassExtraction(ExtractionStrategy):
    """Multiple specialized passes for different entity groups with Rich tracking"""

    PASSES = [
        {"name": "cases_parties", "types": ["CASE_CITATION", "PARTY", "ATTORNEY"]},
        {"name": "courts_judges", "types": ["COURT", "JUDGE"]},
        {"name": "citations", "types": ["USC_CITATION", "CFR_CITATION", "STATE_STATUTE"]},
        {"name": "dates", "types": ["DATE", "DEADLINE", "FILING_DATE"]},
        {"name": "monetary", "types": ["MONETARY_AMOUNT", "DAMAGES", "SETTLEMENT"]},
        {"name": "organizations", "types": ["LAW_FIRM", "GOVERNMENT_AGENCY"]},
        {"name": "misc", "types": ["ADDRESS", "PHONE", "EMAIL"]}
    ]

    def extract(self, text: str) -> Dict:
        console.print("\n[bold cyan]Multipass Extraction Strategy[/bold cyan]")

        # Show strategy overview
        console.print(Panel.fit(
            f"[bold]Multipass Details[/bold]\n"
            f"Text size: {len(text):,} chars\n"
            f"Total passes: {len(self.PASSES)}\n"
            f"Pass strategy: Specialized entity groups\n"
            f"Template: {EXTRACTION_STRATEGY}",
            title="Multipass Strategy"
        ))

        all_entities = []
        total_time = 0
        pass_results = []

        # Create progress bar for all passes
        with Progress(
            SpinnerColumn(),
            TextColumn("[progress.description]{task.description}"),
            BarColumn(),
            TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
            console=console
        ) as progress:

            main_task = progress.add_task(
                "[cyan]Running multipass extraction...",
                total=len(self.PASSES)
            )

            for i, pass_config in enumerate(self.PASSES, 1):
                progress.update(
                    main_task,
                    description=f"[cyan]Pass {i}/{len(self.PASSES)}: {pass_config['name']}..."
                )

                # Update template variables for this pass
                pass_variables = TEMPLATE_VARIABLES.copy()
                pass_variables['pass_number'] = i
                pass_variables['entity_types'] = pass_config['types']
                pass_variables['target_entity_types'] = pass_config['types']

                # Try to get pass-specific template or use configured template
                if template_manager and hasattr(template_manager, 'get_template'):
                    pass_template = template_manager.get_template("multipass", i)
                    if not pass_template:
                        pass_template = self.selected_template
                else:
                    pass_template = self.selected_template

                # Render prompt for this pass
                prompt = render_prompt(pass_template, text, pass_variables)

                # Extract entities for this pass
                start_time = time.time()
                response = self.vllm.generate(
                    prompt,
                    max_tokens=1000,
                    temperature=TEMPLATE_VARIABLES['temperature']
                )
                elapsed = time.time() - start_time
                total_time += elapsed

                # Parse response
                pass_entities = []
                try:
                    if '```json' in response:
                        json_str = response.split('```json')[1].split('```')[0]
                    elif '{' in response:
                        start_idx = response.index('{')
                        end_idx = response.rindex('}') + 1
                        json_str = response[start_idx:end_idx]
                    else:
                        json_str = response

                    result = json.loads(json_str)
                    pass_entities = result.get("entities", [])
                    all_entities.extend(pass_entities)
                except Exception as e:
                    console.print(f"[yellow]Pass {i} parse warning: {e}[/yellow]")

                # Track pass results
                pass_results.append({
                    "pass": i,
                    "name": pass_config['name'],
                    "entities_found": len(pass_entities),
                    "time": elapsed,
                    "types": pass_config['types']
                })

                progress.advance(main_task)

        # Display detailed pass results
        pass_table = Table(title="Multipass Extraction Results by Pass")
        pass_table.add_column("Pass", style="cyan", justify="center")
        pass_table.add_column("Name", style="green")
        pass_table.add_column("Entity Types", style="yellow")
        pass_table.add_column("Found", style="magenta", justify="right")
        pass_table.add_column("Time (s)", style="blue", justify="right")

        for result in pass_results:
            types_str = ", ".join(result['types'][:2]) + ("..." if len(result['types']) > 2 else "")
            pass_table.add_row(
                str(result['pass']),
                result['name'],
                types_str,
                str(result['entities_found']),
                f"{result['time']:.2f}"
            )

        console.print(pass_table)

        # Summary statistics
        total_entities = len(all_entities)
        console.print(Panel.fit(
            f"[bold green]Multipass Complete![/bold green]\n"
            f"Total entities: {total_entities}\n"
            f"Total time: {total_time:.2f}s\n"
            f"Average per pass: {total_time/len(self.PASSES):.2f}s\n"
            f"Throughput: {len(text)/total_time:.0f} chars/s",
            title="Summary"
        ))

        return {
            "strategy": "multipass",
            "entities": all_entities,
            "time": total_time,
            "passes": len(self.PASSES),
            "entity_count": total_entities,
            "pass_results": pass_results
        }

class IndividualExtraction(ExtractionStrategy):
    """Separate pass for each individual entity type with detailed tracking"""

    ENTITY_TYPES = [
        "CASE_CITATION", "PARTY", "ATTORNEY", "COURT", "JUDGE",
        "USC_CITATION", "CFR_CITATION", "STATE_STATUTE",
        "DATE", "DEADLINE", "MONETARY_AMOUNT",
        "LAW_FIRM", "GOVERNMENT_AGENCY"
    ]

    def extract(self, text: str) -> Dict:
        console.print("\n[bold cyan]Individual Entity Extraction Strategy[/bold cyan]")

        # Show strategy overview
        console.print(Panel.fit(
            f"[bold]Individual Extraction Details[/bold]\n"
            f"Text size: {len(text):,} chars\n"
            f"Entity types: {len(self.ENTITY_TYPES)}\n"
            f"Strategy: One pass per entity type\n"
            f"Max tokens per pass: 500",
            title="Individual Strategy"
        ))

        all_entities = []
        total_time = 0
        type_results = []

        # Create progress tracking for all entity types
        with Progress(
            SpinnerColumn(),
            TextColumn("[progress.description]{task.description}"),
            BarColumn(),
            TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
            console=console
        ) as progress:

            main_task = progress.add_task(
                "[cyan]Extracting individual entity types...",
                total=len(self.ENTITY_TYPES)
            )

            for entity_type in self.ENTITY_TYPES:
                progress.update(
                    main_task,
                    description=f"[cyan]Extracting: {entity_type}..."
                )

                # Update template variables for this entity type
                type_variables = TEMPLATE_VARIABLES.copy()
                type_variables['entity_types'] = [entity_type]
                type_variables['target_entity_types'] = [entity_type]

                # Create focused prompt for single entity type
                if '{{' in self.selected_template or '{%' in self.selected_template:
                    # Use template if it has variables
                    prompt = render_prompt(self.selected_template, text, type_variables)
                else:
                    # Use simple focused prompt
                    prompt = f"""Extract only {entity_type} entities.
Return as JSON: {{"entities": [{{"type": "{entity_type}", "text": "..."}}]}}

Text: {text}

JSON:"""

                # Extract entities
                start_time = time.time()
                response = self.vllm.generate(
                    prompt,
                    max_tokens=500,
                    temperature=TEMPLATE_VARIABLES['temperature']
                )
                elapsed = time.time() - start_time
                total_time += elapsed

                # Parse response
                type_entities = []
                try:
                    if '```json' in response:
                        json_str = response.split('```json')[1].split('```')[0]
                    elif '{' in response:
                        start_idx = response.index('{')
                        end_idx = response.rindex('}') + 1
                        json_str = response[start_idx:end_idx]
                    else:
                        json_str = response

                    result = json.loads(json_str)
                    type_entities = result.get("entities", [])
                    all_entities.extend(type_entities)
                except Exception as e:
                    pass  # Silent fail for individual types

                # Track results for this type
                type_results.append({
                    "type": entity_type,
                    "found": len(type_entities),
                    "time": elapsed
                })

                progress.advance(main_task)

        # Display results by entity type
        type_table = Table(title="Individual Entity Type Results")
        type_table.add_column("Entity Type", style="cyan", width=20)
        type_table.add_column("Found", style="green", justify="right")
        type_table.add_column("Time (s)", style="yellow", justify="right")
        type_table.add_column("Rate", style="magenta", justify="right")

        for result in type_results:
            rate = f"{result['found']/result['time']:.1f}/s" if result['time'] > 0 else "N/A"
            type_table.add_row(
                result['type'],
                str(result['found']),
                f"{result['time']:.2f}",
                rate
            )

        console.print(type_table)

        # Summary
        total_entities = len(all_entities)
        successful_types = sum(1 for r in type_results if r['found'] > 0)

        console.print(Panel.fit(
            f"[bold green]Individual Extraction Complete![/bold green]\n"
            f"Total entities: {total_entities}\n"
            f"Successful types: {successful_types}/{len(self.ENTITY_TYPES)}\n"
            f"Total time: {total_time:.2f}s\n"
            f"Average per type: {total_time/len(self.ENTITY_TYPES):.2f}s",
            title="Summary"
        ))

        return {
            "strategy": "individual",
            "entities": all_entities,
            "time": total_time,
            "passes": len(self.ENTITY_TYPES),
            "entity_count": total_entities,
            "type_results": type_results
        }

console.print("[green]✓ Extraction strategies defined with Rich console output[/green]")
console.print("  - [cyan]Unified[/cyan]: Single pass for all entities with progress tracking")
console.print("  - [cyan]Multipass[/cyan]: 7 specialized passes with detailed per-pass results")
console.print("  - [cyan]Individual[/cyan]: Separate pass per entity type with type-by-type tracking")

## 7. Performance Analysis and Visualization

In [22]:
def plot_breaking_point_analysis(results_df: pd.DataFrame):
    """Plot breaking point analysis"""
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Success rate by character count
    ax = axes[0, 0]
    ax.plot(results_df['char_count'], results_df['success'].astype(int), 'o-')
    ax.set_xlabel('Character Count')
    ax.set_ylabel('Success (1) / Failure (0)')
    ax.set_title('Extraction Success by Character Count')
    ax.grid(True, alpha=0.3)
    
    # Processing time by character count
    ax = axes[0, 1]
    ax.plot(results_df['char_count'], results_df['time_seconds'], 'o-', color='orange')
    ax.set_xlabel('Character Count')
    ax.set_ylabel('Processing Time (seconds)')
    ax.set_title('Processing Time vs Character Count')
    ax.grid(True, alpha=0.3)
    
    # Entities found by character count
    ax = axes[1, 0]
    successful = results_df[results_df['success'] == True]
    if not successful.empty:
        ax.plot(successful['char_count'], successful['entities_found'], 'o-', color='green')
    ax.set_xlabel('Character Count')
    ax.set_ylabel('Number of Entities')
    ax.set_title('Entities Found vs Character Count')
    ax.grid(True, alpha=0.3)
    
    # Characters per second (throughput)
    ax = axes[1, 1]
    if not successful.empty:
        throughput = successful['char_count'] / successful['time_seconds']
        ax.plot(successful['char_count'], throughput, 'o-', color='purple')
    ax.set_xlabel('Character Count')
    ax.set_ylabel('Characters per Second')
    ax.set_title('Processing Throughput')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    return fig

def plot_strategy_comparison(strategy_results: List[Dict]):
    """Compare different extraction strategies"""
    df = pd.DataFrame(strategy_results)
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Entities extracted
    ax = axes[0]
    strategies = df['strategy'].unique()
    entity_counts = [df[df['strategy'] == s]['entity_count'].mean() for s in strategies]
    ax.bar(strategies, entity_counts)
    ax.set_ylabel('Average Entities Extracted')
    ax.set_title('Extraction Effectiveness')
    
    # Processing time
    ax = axes[1]
    times = [df[df['strategy'] == s]['time'].mean() for s in strategies]
    ax.bar(strategies, times, color='orange')
    ax.set_ylabel('Average Time (seconds)')
    ax.set_title('Processing Time')
    
    # Efficiency (entities per second)
    ax = axes[2]
    efficiency = [e/t if t > 0 else 0 for e, t in zip(entity_counts, times)]
    ax.bar(strategies, efficiency, color='green')
    ax.set_ylabel('Entities per Second')
    ax.set_title('Extraction Efficiency')
    
    plt.tight_layout()
    return fig

print("✓ Visualization functions ready")

✓ Visualization functions ready


## 8. Automated Testing Suite

In [None]:
class AutomatedTester:
    """Automated testing for entity extraction with detailed processing inspection"""
    
    def __init__(self, vllm_controller):
        self.vllm = vllm_controller
        self.detector = BreakingPointDetector(vllm_controller, template_manager)
        self.results = []
        
    def load_test_document(self, path: str = "/srv/luris/be/tests/docs/Rahimi.pdf") -> str:
        """Load test document with progress tracking"""
        console.print(f"[bold]Loading document: {path}[/bold]")
        
        # For now, use sample text
        # In production, would load actual document
        sample = """
        In the case of Smith v. Jones, 123 F.3d 456 (9th Cir. 2020), the United States Court of Appeals 
        for the Ninth Circuit held that the defendant, ABC Corporation, represented by attorney John Doe 
        of the law firm Doe & Associates, violated 42 U.S.C. § 1983. The case was presided over by 
        Judge Jane Smith. The plaintiff, Robert Smith, sought damages of $1,000,000 for violations that 
        occurred on January 15, 2019. The court's decision was issued on March 20, 2020, following oral 
        arguments held on February 10, 2020. The defendant filed a motion to dismiss on November 1, 2019, 
        which was denied by the district court. This decision affirmed the lower court's ruling in 
        Smith v. ABC Corp., 456 F. Supp. 3d 789 (N.D. Cal. 2019).
        """ * 50  # Repeat to create longer document
        
        # Display document info
        doc_table = Table(title="Document Information")
        doc_table.add_column("Property", style="cyan")
        doc_table.add_column("Value", style="green")
        
        doc_table.add_row("Path", path)
        doc_table.add_row("Length", f"{len(sample):,} characters")
        doc_table.add_row("Estimated Tokens", f"~{len(sample)//4:,}")
        doc_table.add_row("Lines", str(sample.count('\n') + 1))
        doc_table.add_row("Sample Entities", "CASE_CITATION, PARTY, COURT, etc.")
        
        console.print(doc_table)
        
        return sample
    
    def inspect_extraction_process(self, text: str, chunk_size: int = 4000):
        """Detailed inspection of the extraction process"""
        console.print(Panel.fit(
            "[bold cyan]Extraction Process Inspector[/bold cyan]\n"
            "This will show detailed step-by-step processing",
            title="Inspector"
        ))
        
        # Step 1: Chunking
        console.print("\n[bold]Step 1: Document Chunking[/bold]")
        chunk = create_word_boundary_chunk(text, chunk_size)
        
        chunk_info = Table(title="Chunk Information")
        chunk_info.add_column("Metric", style="cyan")
        chunk_info.add_column("Value", style="green")
        
        chunk_info.add_row("Requested Size", f"{chunk_size:,} chars")
        chunk_info.add_row("Actual Size", f"{len(chunk):,} chars")
        chunk_info.add_row("Word Count", f"{len(chunk.split()):,}")
        chunk_info.add_row("Sentence Count", f"{chunk.count('.'):,}")
        
        console.print(chunk_info)
        
        # Step 2: Template Selection
        console.print("\n[bold]Step 2: Template Selection & Configuration[/bold]")
        template = get_prompt_template()
        template_vars = discover_template_variables(template)
        
        template_table = Table(title="Template Configuration")
        template_table.add_column("Setting", style="cyan")
        template_table.add_column("Value", style="green")
        
        template_table.add_row("Strategy", EXTRACTION_STRATEGY)
        template_table.add_row("Profile", EXTRACTION_PROFILE)
        template_table.add_row("Template Variables", str(len(template_vars)))
        template_table.add_row("Max Tokens", str(TEMPLATE_VARIABLES['max_tokens']))
        template_table.add_row("Temperature", str(TEMPLATE_VARIABLES['temperature']))
        
        console.print(template_table)
        
        # Step 3: Prompt Rendering
        console.print("\n[bold]Step 3: Prompt Rendering[/bold]")
        with console.status("[green]Rendering prompt...") as status:
            prompt = render_prompt(template, chunk)
            prompt_lines = prompt.split('\n')
            
        prompt_info = Table(title="Rendered Prompt Analysis")
        prompt_info.add_column("Metric", style="cyan")
        prompt_info.add_column("Value", style="green")
        
        prompt_info.add_row("Total Length", f"{len(prompt):,} chars")
        prompt_info.add_row("Line Count", str(len(prompt_lines)))
        prompt_info.add_row("Estimated Tokens", f"~{len(prompt)//4:,}")
        prompt_info.add_row("Contains Examples", "Yes" if "example" in prompt.lower() else "No")
        prompt_info.add_row("JSON Format", "Yes" if "json" in prompt.lower() else "No")
        
        console.print(prompt_info)
        
        # Show prompt preview
        console.print("\n[bold]Prompt Preview (first 500 chars):[/bold]")
        console.print(Panel(
            prompt[:500] + "..." if len(prompt) > 500 else prompt,
            title="Prompt",
            border_style="dim"
        ))
        
        # Step 4: Model Inference
        console.print("\n[bold]Step 4: Model Inference[/bold]")
        
        with Progress(
            SpinnerColumn(),
            TextColumn("[progress.description]{task.description}"),
            BarColumn(),
            TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
            console=console
        ) as progress:
            
            task = progress.add_task("[cyan]Running inference...", total=100)
            
            progress.update(task, advance=20, description="[cyan]Sending to vLLM...")
            start_time = time.time()
            
            # Simulate extraction (in production, would call actual vLLM)
            progress.update(task, advance=40, description="[cyan]Processing tokens...")
            time.sleep(0.5)  # Simulate processing
            
            progress.update(task, advance=30, description="[cyan]Generating response...")
            elapsed = time.time() - start_time
            
            progress.update(task, advance=10, description="[green]Complete!")
        
        # Step 5: Response Analysis
        console.print("\n[bold]Step 5: Response Analysis[/bold]")
        
        # Simulate response
        mock_response = {
            "entities": [
                {"type": "CASE_CITATION", "text": "Smith v. Jones, 123 F.3d 456 (9th Cir. 2020)"},
                {"type": "PARTY", "text": "ABC Corporation"},
                {"type": "ATTORNEY", "text": "John Doe"},
                {"type": "COURT", "text": "Ninth Circuit"}
            ]
        }
        
        response_table = Table(title="Response Metrics")
        response_table.add_column("Metric", style="cyan")
        response_table.add_column("Value", style="green")
        
        response_table.add_row("Processing Time", f"{elapsed:.2f}s")
        response_table.add_row("Entities Found", str(len(mock_response['entities'])))
        response_table.add_row("Unique Types", str(len(set(e['type'] for e in mock_response['entities']))))
        response_table.add_row("Throughput", f"{chunk_size/elapsed:.0f} chars/s")
        
        console.print(response_table)
        
        # Entity breakdown
        entity_types = {}
        for entity in mock_response['entities']:
            entity_type = entity['type']
            entity_types[entity_type] = entity_types.get(entity_type, 0) + 1
        
        entity_table = Table(title="Entities by Type")
        entity_table.add_column("Type", style="cyan")
        entity_table.add_column("Count", style="green", justify="right")
        entity_table.add_column("Example", style="yellow")
        
        for entity_type, count in entity_types.items():
            example = next((e['text'] for e in mock_response['entities'] if e['type'] == entity_type), "")
            if len(example) > 40:
                example = example[:37] + "..."
            entity_table.add_row(entity_type, str(count), example)
        
        console.print(entity_table)
        
        return {
            "chunk_size": len(chunk),
            "prompt_size": len(prompt),
            "processing_time": elapsed,
            "entities_found": len(mock_response['entities']),
            "entity_types": entity_types
        }
    
    def run_comprehensive_test(self, document: str, model_config: VLLMConfig):
        """Run comprehensive test suite with detailed inspection"""
        console.print(Panel.fit(
            f"[bold cyan]Comprehensive Test Suite[/bold cyan]\n"
            f"Model: {model_config.model_name}\n"
            f"Document: {len(document):,} characters",
            title="Test Suite"
        ))
        
        # Test 1: Process Inspection
        console.print("\n[bold yellow]═══ Test 1: Process Inspection ═══[/bold yellow]")
        inspection_result = self.inspect_extraction_process(document, 4000)
        
        # Test 2: Find breaking point
        console.print("\n[bold yellow]═══ Test 2: Breaking Point Detection ═══[/bold yellow]")
        breaking_point = self.detector.find_breaking_point(
            document, 
            start=1000,
            increment=500,
            max_size=50000
        )
        
        console.print(Panel.fit(
            f"[bold green]Breaking Point Found![/bold green]\n"
            f"Maximum successful size: {breaking_point:,} characters",
            title="Result"
        ))
        
        # Test 3: Compare strategies at safe size
        safe_size = min(breaking_point - 100, 4000)
        console.print(f"\n[bold yellow]═══ Test 3: Strategy Comparison ({safe_size:,} chars) ═══[/bold yellow]")
        
        strategies = [
            UnifiedExtraction(self.vllm),
            MultipassExtraction(self.vllm),
            IndividualExtraction(self.vllm)
        ]
        
        strategy_results = []
        for strategy in strategies:
            chunk = create_word_boundary_chunk(document, safe_size)
            console.print(f"\n[bold]Testing {strategy.__class__.__name__}...[/bold]")
            result = strategy.extract(chunk)
            result['char_count'] = len(chunk)
            strategy_results.append(result)
        
        # Compare strategies
        comparison_table = Table(title="Strategy Comparison")
        comparison_table.add_column("Strategy", style="cyan")
        comparison_table.add_column("Entities", style="green", justify="right")
        comparison_table.add_column("Time (s)", style="yellow", justify="right")
        comparison_table.add_column("Passes", style="magenta", justify="right")
        comparison_table.add_column("Efficiency", style="blue", justify="right")
        
        for result in strategy_results:
            efficiency = f"{result['entity_count']/result['time']:.1f} ent/s" if result['time'] > 0 else "N/A"
            comparison_table.add_row(
                result['strategy'].title(),
                str(result['entity_count']),
                f"{result['time']:.2f}",
                str(result['passes']),
                efficiency
            )
        
        console.print(comparison_table)
        
        # Test 4: Performance at different sizes
        console.print(f"\n[bold yellow]═══ Test 4: Performance Scaling ═══[/bold yellow]")
        test_sizes = [1000, 2000, 3000, 4000, safe_size]
        perf_results = []
        
        perf_table = Table(title="Performance at Different Sizes")
        perf_table.add_column("Size", style="cyan", justify="right")
        perf_table.add_column("Status", style="green", justify="center")
        perf_table.add_column("Time (s)", style="yellow", justify="right")
        perf_table.add_column("Entities", style="magenta", justify="right")
        perf_table.add_column("Rate", style="blue", justify="right")
        
        for size in test_sizes:
            if size <= breaking_point:
                result = self.detector.test_character_count(document, size)
                perf_results.append(result)
                
                status = "✓" if result['success'] else "✗"
                rate = f"{size/result['time_seconds']:.0f} c/s" if result['time_seconds'] > 0 else "N/A"
                
                perf_table.add_row(
                    f"{size:,}",
                    status,
                    f"{result['time_seconds']:.2f}",
                    str(result['entities_found']),
                    rate
                )
        
        console.print(perf_table)
        
        return {
            "breaking_point": breaking_point,
            "strategy_results": strategy_results,
            "performance_results": perf_results,
            "model": model_config.model_name,
            "inspection_result": inspection_result
        }
    
    def generate_report(self, test_results: Dict):
        """Generate comprehensive test report with Rich formatting"""
        
        console.print("\n[bold cyan]═══ Final Test Report ═══[/bold cyan]")
        
        # Model Configuration
        console.print(Panel.fit(
            f"[bold]Model Configuration[/bold]\n"
            f"Model: {test_results['model']}\n"
            f"Breaking Point: {test_results['breaking_point']:,} characters",
            title="Configuration"
        ))
        
        # Strategy Comparison
        console.print("\n[bold]Strategy Performance Summary:[/bold]")
        for result in test_results['strategy_results']:
            console.print(f"  • {result['strategy'].title()}: {result['entity_count']} entities in {result['time']:.2f}s ({result['passes']} passes)")
        
        # Performance Analysis
        if test_results['performance_results']:
            successful = [r for r in test_results['performance_results'] if r['success']]
            if successful:
                avg_time = sum(r['time_seconds'] for r in successful) / len(successful)
                avg_entities = sum(r['entities_found'] for r in successful) / len(successful)
                
                console.print(Panel.fit(
                    f"[bold]Performance Statistics[/bold]\n"
                    f"Tests Run: {len(test_results['performance_results'])}\n"
                    f"Successful: {len(successful)}\n"
                    f"Average Time: {avg_time:.2f}s\n"
                    f"Average Entities: {avg_entities:.1f}",
                    title="Statistics"
                ))
        
        # Recommendations
        console.print("\n[bold]Recommendations:[/bold]")
        console.print(f"  • Optimal chunk size: {test_results['breaking_point'] - 500:,} characters")
        
        # Find best strategy
        best_strategy = min(test_results['strategy_results'], key=lambda x: x['time']/max(x['entity_count'], 1))
        console.print(f"  • Best strategy: {best_strategy['strategy'].title()}")
        console.print(f"  • Use chunking for documents > {test_results['breaking_point']:,} characters")
        
        return "Report generated successfully"

# Initialize tester
tester = AutomatedTester(vllm_controller)
console.print("[green]✓ Automated tester initialized with detailed inspection[/green]")
console.print("[green]✓ Ready to run comprehensive tests with full process visibility[/green]")

## 9. Main Testing Interface

In [25]:
# Main testing parameters
TEST_PARAMS = {
    "start_size": 1000,
    "increment": 500,
    "max_size": 50000,
    "model": "granite",  # or "qwen"
    "config_preset": "balanced",
    "test_document": None,  # Will load default
}

# Display current configuration
table = Table(title="Current Test Configuration")
table.add_column("Parameter", style="cyan")
table.add_column("Value", style="green")

for key, value in TEST_PARAMS.items():
    table.add_row(key, str(value))

console.print(table)
print("✓ Test parameters configured")
print("✓ Ready to start testing")

✓ Test parameters configured
✓ Ready to start testing


In [None]:
# Function to run complete test with detailed inspection
def run_breaking_point_test(
    start_size: int = 1000,
    increment: int = 500,
    max_size: int = 50000,
    model: str = "granite"
):
    """Run complete breaking point test with detailed inspection"""
    
    # Load test document
    console.print("[bold cyan]═══ Loading Test Document ═══[/bold cyan]")
    document = tester.load_test_document()
    
    # Select model configuration
    if model == "granite":
        model_config = GRANITE_CONFIG
    else:
        model_config = QWEN_CONFIG
    
    # Display test configuration
    config_table = Table(title="Test Configuration")
    config_table.add_column("Parameter", style="cyan")
    config_table.add_column("Value", style="green")
    
    config_table.add_row("Model", model_config.model_name)
    config_table.add_row("Start Size", f"{start_size:,} chars")
    config_table.add_row("Increment", f"{increment:,} chars")
    config_table.add_row("Max Size", f"{max_size:,} chars")
    config_table.add_row("Extraction Strategy", EXTRACTION_STRATEGY)
    config_table.add_row("Extraction Profile", EXTRACTION_PROFILE)
    
    console.print(config_table)
    
    # Load model (in production)
    console.print(f"\n[bold]Loading {model} model...[/bold]")
    console.print("[yellow]Note: In production, this would load the actual vLLM model[/yellow]")
    console.print("[yellow]For testing, using mock responses[/yellow]")
    
    # Run comprehensive test with inspection
    test_results = tester.run_comprehensive_test(document, model_config)
    
    # Generate and display report
    report = tester.generate_report(test_results)
    
    # Plot results if available
    if detector.results:
        results_df = detector.get_results_df()
        console.print("\n[bold]Generating Performance Plots...[/bold]")
        fig = plot_breaking_point_analysis(results_df)
        plt.show()
    
    return test_results

# Function to run just the inspection
def run_process_inspection(text_size: int = 4000):
    """Run detailed process inspection for a specific text size"""
    
    console.print(Panel.fit(
        "[bold cyan]Process Inspection Demo[/bold cyan]\n"
        f"This will show detailed extraction process for {text_size:,} chars",
        title="Demo"
    ))
    
    # Load document
    document = tester.load_test_document()
    
    # Run inspection
    inspection_result = tester.inspect_extraction_process(document, text_size)
    
    console.print(Panel.fit(
        "[bold green]Inspection Complete![/bold green]\n"
        f"Chunk size: {inspection_result['chunk_size']:,} chars\n"
        f"Prompt size: {inspection_result['prompt_size']:,} chars\n"
        f"Processing time: {inspection_result['processing_time']:.2f}s\n"
        f"Entities found: {inspection_result['entities_found']}",
        title="Results"
    ))
    
    return inspection_result

console.print("[green]✓ Test runner functions ready[/green]")
console.print("\n[bold]Available test functions:[/bold]")
console.print("  • [cyan]run_breaking_point_test()[/cyan] - Run complete test suite with breaking point detection")
console.print("  • [cyan]run_process_inspection(4000)[/cyan] - Inspect extraction process for specific size")
console.print("  • [cyan]quick_test()[/cyan] - Run quick test with small document")

## 10. Quick Test Example

In [27]:
# Quick test with small document
def quick_test():
    """Run a quick test with small document"""
    test_doc = """Smith v. Jones, 123 F.3d 456 (9th Cir. 2020). 
    The defendant ABC Corporation was represented by John Doe.""" * 20
    
    console.print("[bold cyan]Running Quick Test[/bold cyan]")
    console.print(f"Document size: {len(test_doc)} characters")
    
    # Test extraction
    chunk_sizes = [100, 500, 1000, 1500]
    
    for size in chunk_sizes:
        if size <= len(test_doc):
            chunk = create_word_boundary_chunk(test_doc, size)
            console.print(f"Testing {size} chars: chunk has {len(chunk)} chars")
            # In production, would actually extract entities here
    
    console.print("✓ Quick test complete")

# Run quick test
quick_test()

In [None]:
# Example: Run Process Inspection Demo
console.print("[bold magenta]═══ Running Process Inspection Demo ═══[/bold magenta]")

# This demonstrates the detailed process inspection
# It shows every step of the extraction process with Rich console output
inspection_result = run_process_inspection(2000)

console.print("\n[bold green]Demo Complete![/bold green]")
console.print("[yellow]To run full breaking point test, execute: run_breaking_point_test()[/yellow]")

## Summary and Next Steps

This notebook provides:
1. **Complete vLLM control** using Python API (not HTTP service)
2. **Dynamic breaking point detection** starting from 1000 characters
3. **Chunking before extraction** for proper document processing
4. **Multiple extraction strategies** for comparison
5. **Comprehensive testing and analysis** tools

### To use this notebook:
1. Ensure vLLM is installed and models are available
2. Load your test document
3. Run `run_breaking_point_test()` to find the breaking point
4. Analyze results and optimize parameters
5. Test different strategies and configurations

### Key findings will include:
- Exact character count where extraction fails
- Optimal extraction strategy for different document sizes
- Performance metrics and bottlenecks
- Best vLLM configuration for your use case