# 🔬 Aavishkar.ai Expert System Notebook

<img src="https://github.com/astitvac/AI4Science/raw/main/assets/AA_Main_Banner.jpg" alt="Aavishkar.ai Banner" width="600"/>

### <span style="color:#6C5CE7;">AI for Science</span>
<small><i>Democratizing advanced AI capabilities for scientific research</i></small>

---

## 👋 Welcome to Aavishkar.ai Expert Systems!

<small>This notebook is part of the <b>Aavishkar.ai AI4Science</b> initiative, which develops LLM-based expert systems to enhance scientific research workflows. Our expert systems formalize scientific cognitive processes using Large Language Models, structured knowledge representations, and interactive interfaces.</small>

### 🧠 About This Expert System

<small>This notebook implements one of the five scientific cognitive archetypes developed by Aavishkar.ai:</small>

<small>
1. 📚 <b>Literature Synthesist</b>: Identifies patterns, contradictions, and knowledge gaps across research corpora<br>
2. 🧪 <b>Experimental Architect</b>: Translates abstract hypotheses into methodologically sound experimental designs<br>
3. 📊 <b>Analytical Navigator</b>: Constructs adaptive analytical pathways through complex datasets<br>
4. 📝 <b>Research Documentarian</b>: Structures and articulates scientific findings and methodologies<br>
5. 🔄 <b>Interdisciplinary Translator</b>: Establishes conceptual bridges between disparate knowledge domains
</small>

### 👥 Who Can Use This?

<small>
Aavishkar.ai tools are designed for all practitioners of hypothesis-driven science:<br>
• 🎓 Academic researchers and students<br>
• 🏢 Commercial/industrial researchers<br>
• 🏛️ Government scientists<br>
• 🔭 Citizen scientists<br>
• 🧩 Independent researchers
</small>

<small>No matter your technical background or institutional affiliation, this notebook provides accessible AI capabilities for rigorous scientific work.</small>

---

### ⚙️ Setup Instructions

<small>

**Google Colab**
* Click on "Runtime" in the menu
* Select "Run all" to install dependencies and initialize the system
* Ensure you have your API keys ready for the LLM provider

**Local Environment**
* Ensure you have Python 3.8+ installed
* Install dependencies by running the installation cell below
* Set up your API keys as instructed in the initialization section

**Prerequisites**
* Python 3.8+
* API key for OpenAI or Google Vertex AI
* Basic familiarity with Jupyter notebooks

</small>

---

### 📜 License

<small>This project is licensed under the <b>MIT License</b></small>

<small>
<details>
<summary>View License Text</summary>
MIT License<br><br>
Copyright (c) 2023-2024 Aavishkar.ai<br><br>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:<br><br>
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
</details>
</small>

<small>

### 🔗 Connect with Aavishkar.ai
* 📦 **GitHub**: [github.com/astitvac/AI4Science](https://github.com/astitvac/AI4Science)
* 🌐 **Website**: [aavishkar.ai](https://aavishkar.ai)
* 💬 **Community**: [Discord](https://discord.gg/aavishkar)
* 🤝 **Contribute**: [Contribution Guidelines](https://github.com/astitvac/AI4Science/tree/main/Contributing)

</small>

## ⚙️ Installation
<small>
This cell installs all required dependencies for this expert system notebook. The installation process uses uv for faster package management when available, with automatic fallback to standard pip.
Key components being installed:
LLM frameworks: LangChain and provider-specific libraries
Data modeling: Pydantic
UI: Gradio
Core utilities: Data processing and visualization libraries
Troubleshooting tips:
If you encounter errors, try running the cell again
For persistent issues, check your Python version (3.8+ required)
In Colab, restart the runtime if packages aren't recognized after installation
Note: Initial installation may take 1-2 minutes to complete. A confirmation message will appear when successful.
</small>

In [None]:
# Installation
import sys, os, subprocess, time
from IPython.display import HTML, display

# All required packages (no version constraints for better future-proofing)
ALL_PACKAGES = {
    "core": "langchain pydantic python-dotenv langgraph",
    "providers": "langchain-openai langchain-google-vertexai langchain_experimental",
    "ui": "gradio",
    "data": "numpy pandas matplotlib plotly",
    "documents": "pypdf PyPDF2 pillow",
    "vectors": "chromadb sentence-transformers"
}

# Environment detection
IN_COLAB = 'google.colab' in sys.modules

def show(msg, type="info"):
    """Message Formatting"""
    colors = {"info": "#3a7bd5", "success": "#00c853", "warning": "#f57c00", "error": "#d50000"}
    icons = {"info": "ℹ️", "success": "✅", "warning": "⚠️", "error": "❌"}
    display(HTML(f"<div style='color:white; background:{colors[type]}; padding:5px; margin:2px 0; border-radius:3px'>{icons[type]} {msg}</div>"))

def install_packages():
    """Install all packages"""
    start = time.time()
    show("Starting installation...", "info")
    
    # Try to use uv for faster installation
    try:
        subprocess.run("pip install -q uv", shell=True, check=True, timeout=30)
        installer = "uv pip"
    except:
        installer = "pip"
    
    # Install each category
    success_count = 0
    total_categories = len(ALL_PACKAGES)
    
    for category, packages in ALL_PACKAGES.items():
        try:
            # Install entire category at once for speed
            cmd = f"{installer} install -q {packages}"
            result = subprocess.run(cmd, shell=True, capture_output=True, timeout=120)
            
            if result.returncode == 0:
                success_count += 1
        except Exception:
            pass  # Silent failure, will be reflected in final success rate
    
    # Simple verification of core packages
    try:
        import langchain
        import pydantic
        import gradio
        verification = "with verification"
    except ImportError:
        verification = "with partial verification failures"
    
    # Single completion message with success rate
    elapsed = time.time() - start
    success_rate = int((success_count / total_categories) * 100)
    show(f"Installation completed in {elapsed:.1f}s ({success_rate}% success) {verification}", 
         "success" if success_rate > 80 else "warning")
    
    return success_rate > 80

# Run installation
install_packages()

## 🔧 Initialization

<small>This section configures the LLM provider, API keys, and core components needed for this expert system. The implementation follows a modular architecture that supports multiple AI providers and environments.</small>

## Purpose

<small>The initialization process:
1. **Sets up environment variables** including API keys
2. **Configures the LLM provider** with appropriate models and settings
3. **Initializes specialized capabilities** when needed (e.g., vision, embedding)
4. **Validates the environment** to ensure all requirements are met
</small>

## Configuration Options

<small>
You can customize the initialization by adjusting these parameters:

| Parameter | Description | Default |
|-----------|-------------|---------|
| **Provider** | AI service to use (OpenAI, Google, etc.) | OpenAI |
| **Model** | Specific model name | Depends on provider |
| **Temperature** | Creativity level (0.0-1.0) | 0.7 |
| **Features** | Additional capabilities to enable | None |

**💡 Tip**: For reproducible results, use lower temperature values (0.0-0.3).
</small>

## Provider Support

<small>
This notebook supports these LLM providers:

- **OpenAI**: GPT-4, GPT-3.5-Turbo
- **Google**: Gemini Pro, PaLM
- **Anthropic**: Claude (optional)
- **Local**: Ollama with various models (optional)

**Note**: Different providers may have varying capabilities and pricing structures.
</small>

## Setup Instructions

<small>
**For Google Colab:**
1. Store your API keys in Colab Secrets
2. Select your provider from the dropdown
3. Run the initialization cell

**For Local Environment:**
1. Create a `.env` file with your API keys
2. Select your provider
3. Run the initialization cell

**API Key Variables:**
- OpenAI: `OPENAI_API_KEY`
- Google: `GOOGLE_API_KEY`
- Anthropic: `ANTHROPIC_API_KEY`
</small>

## Troubleshooting

<small>
Common issues:
- **Authentication errors**: Check your API key is correctly set
- **Model unavailability**: Ensure you have access to the specified model
- **Import errors**: Run the installation cell first
- **Memory issues**: Select a smaller model or reduce context length

The initialization cell includes diagnostics that will help identify any configuration problems.
</small>


In [None]:
# 🔧 LLM Setup
import os, sys
from IPython.display import Markdown, display
from typing import Dict, Any, Tuple, Optional

# Colab form fields for configuration
# @title LLM Configuration
api_key = "" # @param {type:"string"}
model = "gpt-4o" # @param ["gpt-4o", "gpt-4-turbo", "gpt-4", "gpt-3.5-turbo"]
embedding_model = "text-embedding-3-small" # @param ["text-embedding-3-small", "text-embedding-3-large", "text-embedding-ada-002"]
temperature = 0.7 # @param {type:"slider", min:0, max:1, step:0.1}
debug = False # @param {type:"boolean"}

# Environment detection
IN_COLAB = 'google.colab' in sys.modules

def show(msg, type="info"):
    """Display styled message"""
    if type == "debug" and not debug:
        return
    colors = {"success": "#00C853", "info": "#2196F3", "warning": "#FF9800", "error": "#F44336", "debug": "#9C27B0"}
    icons = {"success": "✅", "info": "ℹ️", "warning": "⚠️", "error": "❌", "debug": "🔍"}
    display(Markdown(f"<div style='padding:8px;border-radius:4px;background:{colors[type]};color:white'>{icons[type]} {msg}</div>"))

def get_api_key() -> Optional[str]:
    """Get API key from various possible sources"""
    # Check form input first
    key = api_key
    
    # Try Colab secret if empty and in Colab
    if not key and IN_COLAB:
        try:
            from google.colab import userdata
            key = userdata.get('openai_api_key')
            if key:
                show("API key loaded from Colab secret", "success")
        except Exception as e:
            show(f"Error accessing Colab secrets: {e}", "debug")
    
    # Try environment variable
    if not key:
        key = os.environ.get("OPENAI_API_KEY", "")
        if key:
            show("API key loaded from environment variable", "debug")
    
    # Try .env file
    if not key:
        try:
            from dotenv import load_dotenv
            load_dotenv()
            key = os.environ.get("OPENAI_API_KEY", "")
            if key:
                show("API key loaded from .env file", "debug")
        except:
            pass
    
    # Final check and request if needed
    if not key:
        if IN_COLAB:
            show("""
            No API key found. Either:
            1. Add it in the form field above
            2. Set a Colab secret named 'openai_api_key'
            """, "warning")
        else:
            show("No API key found. Add it in the form field or set OPENAI_API_KEY environment variable", "warning")
        return None
        
    return key

# === PROVIDER-SPECIFIC: OPENAI ===
def initialize_models(api_key: str) -> Tuple[Optional[Any], Optional[Any]]:
    """Initialize OpenAI models with the provided API key"""
    from langchain_openai import ChatOpenAI, OpenAIEmbeddings
    
    # Set environment variable for consistency
    os.environ["OPENAI_API_KEY"] = api_key
    
    try:
        llm = ChatOpenAI(
            model_name=model,
            temperature=temperature,
            openai_api_key=api_key
        )
        
        embeddings = OpenAIEmbeddings(
            model=embedding_model,
            openai_api_key=api_key
        )
        
        show(f"OpenAI initialized with {model} and {embedding_model}", "success")
        return llm, embeddings
        
    except Exception as e:
        show(f"Error initializing OpenAI: {e}", "error")
        return None, None
# === END PROVIDER-SPECIFIC ===

def initialize_llm() -> Tuple[Optional[Any], Optional[Any]]:
    """Main function to set up and initialize LLM"""
    show("Initializing LLM...", "info")
    
    # Get API key
    key = get_api_key()
    if not key:
        return None, None
    
    # Initialize models
    llm, embeddings = initialize_models(key)
    
    if llm and embeddings:
        show("Initialization complete! LLM and embeddings ready to use.", "success")
    
    return llm, embeddings

# Run initialization
llm, embeddings = initialize_llm()

In [2]:
# 🛠️ Core Utilities
"""
Core utilities for Aavishkar.ai expert systems.
Includes logging, caching, error handling, and JSON parsing.
"""

import os, json, time, hashlib, functools
from typing import Dict, Any, Optional, Callable, Union
from IPython.display import Markdown, display

# === GLOBAL SETTINGS ===
DEBUG_MODE = False
CACHE_ENABLED = True
CACHE_DIR = "./cache"
os.makedirs(CACHE_DIR, exist_ok=True)

# === DISPLAY & ERROR HANDLING ===
def show(msg: str, level: str = "info") -> None:
    """Display formatted message with appropriate styling.
    
    Args:
        msg: Message to display
        level: Message level (success, info, warning, error, debug)
    """
    colors = {"success": "#00C853", "info": "#2196F3", "warning": "#FF9800", "error": "#F44336", "debug": "#9C27B0"}
    icons = {"success": "✅", "info": "ℹ️", "warning": "⚠️", "error": "❌", "debug": "🔍"}
    
    if level == "debug" and not DEBUG_MODE:
        return
        
    color = colors.get(level, colors["info"])
    icon = icons.get(level, icons["info"])
    display(Markdown(f"<div style='padding:6px;border-radius:4px;background:{color};color:white'>{icon} {msg}</div>"))

def retry(max_attempts: int = 3, delay: float = 1.0) -> Callable:
    """Decorator for retrying functions with exponential backoff."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_attempts:
                        raise
                    wait = delay * (2 ** (attempt - 1))
                    show(f"Attempt {attempt} failed: {str(e)}. Retrying in {wait:.1f}s...", "warning")
                    time.sleep(wait)
        return wrapper
    return decorator

# === CACHE SYSTEM ===
def cache_key(**kwargs) -> str:
    """Generate a cache key from input parameters."""
    serialized = json.dumps({k: v for k, v in kwargs.items() if v is not None}, sort_keys=True)
    return hashlib.md5(serialized.encode()).hexdigest()

def get_cache(key: str) -> Optional[Any]:
    """Get item from cache if available and not expired."""
    if not CACHE_ENABLED:
        return None
        
    path = os.path.join(CACHE_DIR, f"{key}.json")
    if not os.path.exists(path):
        return None
        
    try:
        with open(path, 'r') as f:
            data = json.load(f)
            
        # Check if expired (default: 1 day)
        if time.time() - data.get("timestamp", 0) > 86400:
            return None
            
        return data.get("value")
    except:
        return None

def set_cache(key, value):
    """Store value in cache with current timestamp."""
    if not CACHE_ENABLED:
        return
        
    path = os.path.join(CACHE_DIR, f"{key}.json")
    try:
        # Handle Pydantic models by converting to dictionaries
        def serialize_pydantic(obj):
            if hasattr(obj, 'model_dump'):  # Pydantic v2 models use model_dump
                return obj.model_dump()
            elif hasattr(obj, 'dict'):      # Older Pydantic models use dict()
                return obj.dict()
            elif isinstance(obj, list):
                return [serialize_pydantic(item) for item in obj]
            elif isinstance(obj, dict):
                return {k: serialize_pydantic(v) for k, v in obj.items()}
            return obj
            
        serialized_value = serialize_pydantic(value)
        
        with open(path, 'w') as f:
            json.dump({"timestamp": time.time(), "value": serialized_value}, f)
            
    except Exception as e:
        show(f"Cache write error: {str(e)}", "debug")

def clear_cache(older_than: Optional[int] = None) -> int:
    """Clear cache entries, optionally only those older than specified seconds.
    
    Returns:
        Number of entries cleared
    """
    if not os.path.exists(CACHE_DIR):
        return 0
        
    count = 0
    for filename in os.listdir(CACHE_DIR):
        if not filename.endswith('.json'):
            continue
            
        path = os.path.join(CACHE_DIR, filename)
        
        if older_than:
            try:
                with open(path, 'r') as f:
                    data = json.load(f)
                if time.time() - data.get("timestamp", 0) <= older_than:
                    continue
            except:
                pass
        
        try:
            os.remove(path)
            count += 1
        except:
            pass
            
    return count

# === LLM & JSON HELPERS ===
def call_llm_with_cache(llm, prompt: str, **kwargs) -> Any:
    """Call LLM with caching to avoid redundant API calls."""
    if CACHE_ENABLED:
        key = cache_key(prompt=prompt, **kwargs)
        cached = get_cache(key)
        if cached:
            show("Using cached response", "debug")
            return cached
    
    response = llm.invoke(prompt, **kwargs)
    
    if CACHE_ENABLED:
        set_cache(key, response)
    
    return response

def parse_json_safely(text: str, default: Any = None) -> Any:
    """Extract and parse JSON from text with multiple fallback strategies."""
    import re
    
    # Try direct parsing first
    try:
        return json.loads(text)
    except:
        pass
    
    # Try to extract JSON blocks
    try:
        # Try code blocks with JSON
        if "```json" in text:
            json_block = text.split("```json")[1].split("```")[0].strip()
            return json.loads(json_block)
            
        # Try any code blocks
        if "```" in text:
            code_block = text.split("```")[1].split("```")[0].strip()
            if code_block.strip().startswith(("{", "[")):
                return json.loads(code_block)
        
        # Try regex patterns for JSON objects/arrays
        patterns = [
            r'\{[\s\S]*?\}',  # JSON objects
            r'\[[\s\S]*?\]'   # JSON arrays
        ]
        
        for pattern in patterns:
            matches = re.findall(pattern, text)
            for match in matches:
                try:
                    return json.loads(match)
                except:
                    continue
    except:
        pass
    
    return default

# Initialization
show("Core utilities initialized", "debug")

## 🧪 Experimental Design Optimizer

## Purpose
Analyzes experimental protocols from scientific papers to identify methodological improvements, ensuring greater rigor, reproducibility, and statistical validity.

## Core Functions

* **Protocol Extraction**
  * Identifies variables, controls, and experimental procedures
  * Extracts sample sizes and statistical approaches
  * Captures methodological details and measurement techniques

* **Literature Search**
  * Finds related protocols in scientific literature
  * Identifies methodological standards and best practices
  * Discovers field-specific guidelines and requirements

* **Weakness Analysis**
  * Evaluates statistical rigor and power
  * Identifies potential sources of bias
  * Assesses reproducibility and reporting completeness

* **Protocol Optimization**
  * Generates specific improvements with justifications
  * Enhances controls and randomization procedures
  * Aligns with field-specific standards and practices

* **Validation**
  * Verifies optimized protocol against methodological standards
  * Provides confidence ratings for recommendations
  * Highlights areas requiring human expert review

## Input

* PDF upload, plain text, or structured method description
* Best results with papers containing detailed methods sections
* Works with protocols from various scientific domains

## Usage

1. Run setup cells (installation and initialization)
2. Upload or input a scientific paper's methods section
3. Extract the experimental protocol
4. Review and initiate optimization process
5. Explore improvements and supporting literature

**Note**: The system assists researchers in designing more robust experiments but should be used alongside human expertise.

---

*Implementation of the Experimental Architect cognitive archetype from Aavishkar.ai*


## Data Models

## Purpose

<small>
This section defines the structured data representations that power our Literature Synthesis system. These Pydantic models perform two essential functions:

1. **Represent Knowledge**: Define how scientific concepts, relationships, and documents are structured
2. **Control System Behavior**: Configure how the system processes and analyzes content
</small>

## How It Works

<small>
Our system uses Pydantic models to ensure data validation and clear structure. Think of these models as "smart containers" that:

- Validate data to prevent errors
- Provide helpful error messages when something is wrong
- Include documentation for each field
- Support extensibility for specialized needs
</small>

## Core Models

<small>
Our implementation uses these key models:

| Model | Purpose |
|-------|---------|
| **LitSynthConfig** | Consolidated configuration for all system parameters |
| **Concept** | Scientific concepts extracted from literature |
| **Relationship** | Connections between scientific concepts |
| **ResearchGap** | Identified research gaps and opportunities |
| **LiteratureSynthesisOutput** | Complete analysis results container |

We've simplified the configuration into a single model (`LitSynthConfig`) to make customization easier. You can adjust parameters by modifying the `config` variable in the code cell.
</small>

## Customization

<small>
To customize the system behavior, simply modify the config variable:

```python
# Example: Increase sensitivity to detect more concepts
config.extraction_confidence = 0.6
config.max_concepts = 40

# Example: Focus only on high-importance concepts
config.min_concept_importance = "high"
```

This approach allows you to tune the system's behavior without changing the core models or implementation.
</small>


In [None]:
# 📋 Data Models
"""
This section defines the data structures used throughout the Experimental Architect system.
These models represent experimental protocols, process agent reasoning, and track tool usage.

CUSTOMIZATION TIPS:
1. Update field descriptions to match your domain terminology
2. Add domain-specific protocol elements for specialized experiments
3. Extend the protocol validation for field-specific requirements
4. Modify confidence metrics for different protocol components

Each model includes validators to ensure data integrity and clear documentation.
"""

from pydantic import BaseModel, Field, field_validator
from typing import List, Dict, Optional, Literal, Any, Union, Set
from datetime import datetime
from IPython.display import Markdown, display
import uuid

def show_info(message):
    """Display styled info message."""
    display(Markdown(f"<div style='padding:5px;border-radius:3px;background:#2196F3;color:white'>ℹ️ {message}</div>"))


# ===== INPUT MODELS =====
# Models for document input and configuration

class DocumentSource(BaseModel):
    """Input document source with type and content.
    
    This model represents an input document to be analyzed.
    It supports multiple source types and includes metadata.
    
    Attributes:
        source_type: The type of document source (pdf, text, etc.)
        content: The document content or file path
        metadata: Additional information about the document
    """
    source_type: Literal["pdf", "text", "url", "doi", "arxiv"]
    content: str
    metadata: Dict[str, Any] = Field(default_factory=dict)


# ===== AGENT STATE MODELS =====
# Models for tracking agent reasoning, actions, and state

class AgentStep(BaseModel):
    """Represents a single step in the agent's reasoning process.
    
    This model captures the agent's thought process, actions, and observations
    to enable transparent reasoning and effective debugging.
    
    Attributes:
        id: Unique identifier for this step
        timestamp: When this step occurred
        step_type: Type of step (thought, action, observation, final_answer)
        content: The actual content of the step
        tool_name: Name of the tool used (if action)
        tool_input: Input provided to the tool (if action)
        tokens_used: Estimated tokens used in this step
    """
    id: str = Field(default_factory=lambda: str(uuid.uuid4())[:8])
    timestamp: datetime = Field(default_factory=datetime.now)
    step_type: Literal["thought", "action", "observation", "final_answer"]
    content: str
    tool_name: Optional[str] = None
    tool_input: Optional[str] = None
    tokens_used: int = 0
    
    @field_validator('step_type')
    @classmethod
    def validate_step_type(cls, v):
        """Ensure step type is valid."""
        valid_types = ["thought", "action", "observation", "final_answer"]
        if v not in valid_types:
            raise ValueError(f"Step type must be one of: {', '.join(valid_types)}")
        return v

class SearchResult(BaseModel):
    """Structured representation of a search tool result.
    
    This model represents results from search tools (ArXiv, web search),
    with metadata to track relevance and source information.
    
    Attributes:
        query: Original search query
        source: Source of the result (arxiv, web, etc.)
        title: Title of the result
        content: Main content of the result
        url: Source URL if available
        date: Publication date if available
        authors: List of authors if available
        relevance_score: Estimated relevance to the query (0-1)
        keywords: Extracted keywords from the result
    """
    query: str
    source: Literal["arxiv", "web", "pubmed", "other"]
    title: str
    content: str
    url: Optional[str] = None
    date: Optional[str] = None
    authors: List[str] = Field(default_factory=list)
    relevance_score: float = Field(default=0.5, ge=0.0, le=1.0)
    keywords: List[str] = Field(default_factory=list)
    
    def truncated_content(self, max_chars: int = 500) -> str:
        """Return a truncated version of the content for display."""
        if len(self.content) <= max_chars:
            return self.content
        return self.content[:max_chars] + "..."

class AgentState(BaseModel):
    """Complete agent state tracking structure.
    
    This model maintains the full state of the agent during execution,
    including reasoning steps, search results, and any extracted protocols.
    It provides methods for updating state and accessing history.
    
    Attributes:
        steps: List of agent reasoning steps
        search_results: List of search results from tools
        extracted_protocol: The extracted protocol being analyzed
        optimized_protocol: The optimized protocol (final output)
        start_time: When the agent began execution
        finished: Whether the agent has completed its task
        final_answer: The agent's final response
        facts: Set of established facts (for consistency)
    """
    steps: List[AgentStep] = Field(default_factory=list)
    search_results: List[SearchResult] = Field(default_factory=list)
    extracted_protocol: Optional["ExperimentalProtocol"] = None
    optimized_protocol: Optional["OptimizedProtocolOutput"] = None
    start_time: datetime = Field(default_factory=datetime.now)
    finished: bool = False
    final_answer: str = ""
    facts: Set[str] = Field(default_factory=set)
    
    def add_thought(self, thought: str) -> None:
        """Add a thought step to the agent state."""
        self.steps.append(AgentStep(
            step_type="thought",
            content=thought
        ))
    
    def add_action(self, tool_name: str, tool_input: str) -> None:
        """Add an action step to the agent state."""
        self.steps.append(AgentStep(
            step_type="action",
            content=f"{tool_name}: {tool_input}",
            tool_name=tool_name,
            tool_input=tool_input
        ))
    
    def add_observation(self, observation: str) -> None:
        """Add an observation step to the agent state."""
        self.steps.append(AgentStep(
            step_type="observation",
            content=observation
        ))
    
    def add_search_result(self, result: SearchResult) -> None:
        """Add a search result to the agent state."""
        self.search_results.append(result)
    
    def add_final_answer(self, answer: str) -> None:
        """Add the final answer and mark as finished."""
        self.steps.append(AgentStep(
            step_type="final_answer",
            content=answer
        ))
        self.final_answer = answer
        self.finished = True
    
    def get_recent_steps(self, n: int = 5) -> List[AgentStep]:
        """Get the most recent n steps."""
        return self.steps[-n:] if self.steps else []
    
    def get_formatted_history(self) -> str:
        """Get a formatted string of the agent's reasoning history."""
        history = ""
        for step in self.steps:
            if step.step_type == "thought":
                history += f"\nThought: {step.content}\n"
            elif step.step_type == "action":
                history += f"Action: {step.tool_name}\nAction Input: {step.tool_input}\n"
            elif step.step_type == "observation":
                history += f"Observation: {step.content}\n"
            elif step.step_type == "final_answer":
                history += f"Final Answer: {step.content}\n"
        return history
    
    def add_fact(self, fact: str) -> None:
        """Add a verified fact to the agent's knowledge."""
        self.facts.add(fact)
    
    def execution_time(self) -> float:
        """Calculate execution time in seconds."""
        return (datetime.now() - self.start_time).total_seconds()


# ===== PROTOCOL REPRESENTATION MODELS =====
# Models for representing experimental protocols and their elements

class VariableDefinition(BaseModel):
    """Definition of an experimental variable.
    
    This model represents a variable in an experimental protocol,
    including its type, description, and measurement details.
    
    Attributes:
        name: Name of the variable
        var_type: Type of variable (independent, dependent, control, covariate)
        description: Detailed description of the variable
        measurement_method: How the variable is measured
        units: Units of measurement if applicable
        levels: Different levels/values for the variable (e.g., dosages)
        operational_definition: Precise operational definition of the variable
    """
    name: str
    var_type: Literal["independent", "dependent", "control", "covariate"]
    description: str
    measurement_method: Optional[str] = None
    units: Optional[str] = None
    levels: List[str] = Field(default_factory=list)
    operational_definition: Optional[str] = None
    
    @field_validator('name')
    @classmethod
    def validate_name(cls, v):
        """Ensure variable name is valid."""
        if len(v.strip()) < 2:
            raise ValueError("Variable name must be at least 2 characters")
        return v.strip()

class ProcedureStep(BaseModel):
    """A step in the experimental procedure.
    
    This model represents a single step in an experimental protocol,
    including what to do, timing, and any relevant materials.
    
    Attributes:
        step_number: Sequence number of this step
        description: Detailed description of the step
        duration: Time required for this step
        materials: Materials needed for this step
        equipment: Equipment needed for this step
        critical: Whether this step is critical to protocol integrity
        notes: Additional notes or cautions
    """
    step_number: int
    description: str
    duration: Optional[str] = None
    materials: List[str] = Field(default_factory=list)
    equipment: List[str] = Field(default_factory=list)
    critical: bool = False
    notes: Optional[str] = None

class AnalysisMethod(BaseModel):
    """Statistical or analytical method used in the experiment.
    
    This model represents a statistical or data analysis approach
    used to analyze experimental results.
    
    Attributes:
        name: Name of the analysis method
        description: Detailed description of the method
        software: Software used for analysis
        parameters: Key parameters for the analysis
        assumptions: Assumptions that must be met
        justification: Justification for using this method
    """
    name: str
    description: str
    software: Optional[str] = None
    parameters: Dict[str, Any] = Field(default_factory=dict)
    assumptions: List[str] = Field(default_factory=list)
    justification: Optional[str] = None

class ExperimentalProtocol(BaseModel):
    """Comprehensive representation of an experimental protocol.
    
    This model represents a complete experimental protocol
    extracted from scientific literature or created through optimization.
    
    Attributes:
        protocol_id: Unique identifier for this protocol
        title: Title or name of the protocol
        study_design: Overall study design (e.g., RCT, cohort study)
        variables: Variables included in the experiment
        sample_size: Total sample size
        sample_size_justification: Justification for the sample size
        inclusion_criteria: Criteria for including subjects/samples
        exclusion_criteria: Criteria for excluding subjects/samples
        randomization: Randomization method
        blinding: Blinding procedure
        controls: Control conditions or groups
        procedures: Ordered list of procedure steps
        analysis_methods: Statistical/analytical methods
        limitations: Known limitations of the protocol
        source: Source of this protocol (extraction, optimization)
        confidence_score: Overall confidence in protocol completeness (0-1)
    """
    protocol_id: str = Field(default_factory=lambda: str(uuid.uuid4())[:8])
    title: Optional[str] = None
    study_design: str
    variables: Dict[str, List[VariableDefinition]] = Field(default_factory=dict)
    sample_size: Optional[int] = None
    sample_size_justification: Optional[str] = None
    inclusion_criteria: List[str] = Field(default_factory=list)
    exclusion_criteria: List[str] = Field(default_factory=list)
    randomization: Optional[str] = None
    blinding: Optional[str] = None
    controls: List[str] = Field(default_factory=list)
    procedures: List[ProcedureStep] = Field(default_factory=list)
    analysis_methods: List[AnalysisMethod] = Field(default_factory=list)
    limitations: List[str] = Field(default_factory=list)
    source: Literal["extracted", "optimized", "manual"] = "extracted"
    confidence_score: float = Field(default=0.7, ge=0.0, le=1.0)
    
    def add_variable(self, variable: VariableDefinition) -> None:
        """Add a variable to the protocol."""
        var_type = variable.var_type
        if var_type not in self.variables:
            self.variables[var_type] = []
        self.variables[var_type].append(variable)
    
    def add_procedure_step(self, step: ProcedureStep) -> None:
        """Add a procedure step to the protocol."""
        self.procedures.append(step)
        # Sort procedures by step number
        self.procedures.sort(key=lambda x: x.step_number)
    
    def add_analysis_method(self, method: AnalysisMethod) -> None:
        """Add an analysis method to the protocol."""
        self.analysis_methods.append(method)
    
    def get_summary(self) -> str:
        """Generate a brief summary of the protocol."""
        summary = f"Study design: {self.study_design}\n"
        
        if self.sample_size:
            summary += f"Sample size: {self.sample_size}\n"
        
        # Variables
        var_counts = {k: len(v) for k, v in self.variables.items()}
        if var_counts:
            summary += "Variables: " + ", ".join([f"{count} {var_type}" for var_type, count in var_counts.items()]) + "\n"
        
        # Procedures
        if self.procedures:
            summary += f"Procedure steps: {len(self.procedures)}\n"
        
        # Analysis
        if self.analysis_methods:
            summary += f"Analysis methods: {len(self.analysis_methods)}\n"
        
        return summary
    
    def get_completeness_score(self) -> float:
        """Calculate a protocol completeness score based on required elements."""
        required_elements = [
            bool(self.study_design),
            bool(self.variables.get("independent")),
            bool(self.variables.get("dependent")),
            bool(self.sample_size),
            bool(self.procedures),
            bool(self.analysis_methods)
        ]
        
        return sum(required_elements) / len(required_elements)

class ProtocolImprovement(BaseModel):
    """Represents a specific improvement to an experimental protocol.
    
    This model captures a single improvement suggestion, including
    the original element, the improved version, and justification.
    
    Attributes:
        component: Protocol component being improved
        original_text: Original protocol text/element
        improved_text: Improved protocol text/element
        justification: Justification for the improvement
        evidence: Supporting evidence from literature
        confidence: Confidence in this improvement (0-1)
        source_url: URL to supporting literature if available
        improvement_type: Category of improvement
    """
    component: str
    original_text: str
    improved_text: str
    justification: str
    evidence: Optional[str] = None
    confidence: float = Field(default=0.7, ge=0.0, le=1.0)
    source_url: Optional[str] = None
    improvement_type: Literal[
        "statistical_power", "bias_reduction", "reproducibility", 
        "reporting", "measurement", "analysis", "other"
    ] = "other"

class ProtocolComparison(BaseModel):
    """Side-by-side comparison of original and optimized protocols.
    
    This model provides a structured comparison between the original
    and optimized versions of an experimental protocol, with improvements.
    
    Attributes:
        original_protocol: The original experimental protocol
        optimized_protocol: The optimized experimental protocol
        improvements: List of specific improvements made
        overall_quality_score: Quality score of optimized protocol (0-1)
        remaining_issues: Issues that could not be fully resolved
    """
    original_protocol: ExperimentalProtocol
    optimized_protocol: ExperimentalProtocol
    improvements: List[ProtocolImprovement] = Field(default_factory=list)
    overall_quality_score: float = Field(default=0.0, ge=0.0, le=1.0)
    remaining_issues: List[str] = Field(default_factory=list)
    
    def add_improvement(self, improvement: ProtocolImprovement) -> None:
        """Add an improvement to the comparison."""
        self.improvements.append(improvement)
    
    def calculate_quality_improvement(self) -> float:
        """Calculate the improvement in quality score."""
        # Simple calculation based on completeness scores
        original_score = self.original_protocol.get_completeness_score()
        optimized_score = self.optimized_protocol.get_completeness_score()
        
        return optimized_score - original_score

class OptimizedProtocolOutput(BaseModel):
    """Final output from the protocol optimization process.
    
    This model represents the complete results of protocol optimization,
    including the improved protocol, justifications, and metadata.
    
    Attributes:
        protocol_id: Unique identifier for this optimization
        original_protocol: The original protocol that was optimized
        optimized_protocol: The improved protocol
        improvements: List of specific improvements made
        literature_references: References to supporting literature
        optimization_method: Method used for optimization
        timestamp: When the optimization was performed
        optimization_time: Time taken for optimization (seconds)
        version: Version of the optimizer used
    """
    protocol_id: str = Field(default_factory=lambda: str(uuid.uuid4())[:8])
    original_protocol: ExperimentalProtocol
    optimized_protocol: ExperimentalProtocol
    improvements: List[ProtocolImprovement] = Field(default_factory=list)
    literature_references: List[Dict[str, str]] = Field(default_factory=list)
    optimization_method: str = "agent-search"
    timestamp: datetime = Field(default_factory=datetime.now)
    optimization_time: float = 0.0
    version: str = "1.0"
    
    def get_summary(self) -> str:
        """Generate a summary of the optimization results."""
        improvement_types = {}
        for imp in self.improvements:
            imp_type = imp.improvement_type
            improvement_types[imp_type] = improvement_types.get(imp_type, 0) + 1
        
        summary = f"Protocol Optimization Summary:\n"
        summary += f"- Original protocol: {self.original_protocol.title or 'Unnamed'}\n"
        summary += f"- Optimization time: {self.optimization_time:.1f} seconds\n"
        summary += f"- Total improvements: {len(self.improvements)}\n"
        
        if improvement_types:
            summary += "- Improvement types:\n"
            for imp_type, count in improvement_types.items():
                summary += f"  • {imp_type.replace('_', ' ').title()}: {count}\n"
        
        return summary
    
    def add_literature_reference(self, title: str, authors: str, url: Optional[str] = None, 
                               year: Optional[str] = None, journal: Optional[str] = None) -> None:
        """Add a literature reference supporting the optimization."""
        self.literature_references.append({
            "title": title,
            "authors": authors,
            "url": url,
            "year": year,
            "journal": journal
        })


# ===== CONFIGURATION MODEL =====
# Controls system behavior and processing parameters

class ExpArchConfig(BaseModel):
    """Configuration for the Experimental Architect system.
    
    This model controls how the system processes and analyzes documents.
    Adjust these parameters to customize the analysis for your needs.
    
    Customization Tips:
    - Adjust confidence thresholds based on LLM quality
    - Set your scientific domain for more targeted analysis
    - Configure tool search depth and timeout settings
    - Change agent parameters for different reasoning patterns
    
    Attributes:
        text_chunk_size: Characters per text chunk for processing
        text_chunk_overlap: Overlap between chunks to maintain context
        extraction_confidence: Minimum confidence for protocol extraction
        scientific_domain: Optional domain specialization
        agent_max_iterations: Maximum number of iterations for the agent
        agent_temperature: Temperature setting for agent reasoning
        search_results_count: Maximum number of search results to use
        search_timeout: Timeout for search operations (seconds)
        optimization_mode: Processing mode (quick, balanced, thorough)
    """
    # Text Processing Parameters
    text_chunk_size: int = Field(
        default=2000, ge=500, le=8000, 
        description="Characters per text chunk"
    )
    text_chunk_overlap: int = Field(
        default=200, ge=50, le=1000,
        description="Overlap between chunks"
    )
    
    # Extraction Parameters
    extraction_confidence: float = Field(
        default=0.7, ge=0.0, le=1.0,
        description="Minimum extraction confidence"
    )
    
    # Agent Parameters
    agent_max_iterations: int = Field(
        default=10, ge=3, le=30,
        description="Maximum agent iterations"
    )
    agent_temperature: float = Field(
        default=0.2, ge=0.0, le=1.0,
        description="Temperature for agent reasoning"
    )
    
    # Search Parameters
    search_results_count: int = Field(
        default=3, ge=1, le=10,
        description="Maximum search results to process"
    )
    search_timeout: float = Field(
        default=30.0, ge=5.0, le=120.0,
        description="Search timeout in seconds"
    )
    
    # Customization
    scientific_domain: Optional[str] = Field(
        default=None,
        description="Scientific domain for specialized analysis"
    )
    
    # Processing Mode
    optimization_mode: Literal["quick", "balanced", "thorough"] = Field(
        default="balanced",
        description="Processing depth and thoroughness"
    )
    
    @field_validator('text_chunk_overlap')
    @classmethod
    def validate_overlap(cls, v, info):
        """Ensure overlap is less than chunk size."""
        if 'text_chunk_size' in info.data and v >= info.data['text_chunk_size']:
            raise ValueError("text_chunk_overlap must be less than text_chunk_size")
        return v
    
    def get_mode_settings(self) -> Dict[str, Any]:
        """Get specific settings based on the selected mode."""
        mode_settings = {
            "quick": {
                "search_results_count": 2,
                "agent_max_iterations": 5,
                "search_timeout": 20.0,
            },
            "balanced": {
                "search_results_count": 3,
                "agent_max_iterations": 10,
                "search_timeout": 30.0,
            },
            "thorough": {
                "search_results_count": 5,
                "agent_max_iterations": 15,
                "search_timeout": 45.0,
            }
        }
        
        return mode_settings.get(self.optimization_mode, mode_settings["balanced"])
    
    def apply_mode_settings(self) -> None:
        """Apply settings based on the selected optimization mode."""
        settings = self.get_mode_settings()
        for key, value in settings.items():
            setattr(self, key, value)


# Initialize default configuration
config = ExpArchConfig()

# Forward references
AgentState.model_rebuild()

# Show configuration message
show_info("Data models for Experimental Architect configured successfully")

## Core Functions

<small>
This section contains the heart of our Literature Synthesis system - the functions that analyze documents, extract key information, and generate insights.

## What's Included

1. **Prompt Library**: The instructions we give to the AI model
2. **Function Definitions**: The code that processes documents and manages the analysis

## How to Customize

You can easily modify the system's behavior by:

- **Changing prompts**: Edit the instructions to focus on specific types of information
- **Adjusting parameters**: Fine-tune the analysis by modifying the `config` settings

No coding knowledge is required - simply edit the text of prompts in the first code cell below.
</small>


In [None]:
# 📖 Prompt Library
"""
This section contains all prompts used by the Experimental Architect system.
Each prompt guides the AI to perform specific tasks in the protocol analysis workflow.
You can customize these prompts to focus on particular aspects of experimental design.

CUSTOMIZATION TIPS:
1. Keep the output format instructions intact (especially for JSON responses)
2. Add field-specific guidance for your research domain
3. Emphasize particular aspects of protocol design important to your work
4. Include examples from your discipline to improve extraction quality
"""

PROMPTS = {
    # === PROTOCOL EXTRACTION ===
    # Purpose: Extract structured experimental protocol from methods text
    # Output: ExperimentalProtocol object
    "protocol_extraction": """
    You are an experimental design specialist. Extract a structured experimental protocol 
    from the following methods section text.
    
    METHODS TEXT:
    {text}
    
    INSTRUCTIONS:
    1. Identify the core experimental design (e.g., RCT, case-control, cohort study)
    2. Extract all variables (independent, dependent, control, covariate)
    3. Identify sample size and any randomization procedures
    4. Extract procedures as sequential steps
    5. Identify statistical analysis methods
    6. Note any limitations or controls mentioned
    
    Return a structured protocol as JSON with this structure:
    ```json
    {{
      "protocol_id": "auto-generated",
      "study_design": "Study design type (e.g., RCT, cohort study)",
      "variables": {{
        "independent": [
          {{"name": "Variable name", "description": "Description", "measurement_method": "How measured", "levels": ["level1", "level2"], "var_type": "independent"}}
        ],
        "dependent": [
          {{"name": "Variable name", "description": "Description", "measurement_method": "How measured", "units": "Units if applicable", "var_type": "dependent"}}
        ],
        "control": [
          {{"name": "Variable name", "description": "Description", "measurement_method": "How measured", "var_type": "control"}}
        ],
        "covariate": [
          {{"name": "Variable name", "description": "Description", "measurement_method": "How measured", "var_type": "covariate"}}
        ]
      }},
      "sample_size": 120,
      "sample_size_justification": "Justification if mentioned",
      "inclusion_criteria": ["criterion 1", "criterion 2"],
      "exclusion_criteria": ["criterion 1", "criterion 2"],
      "randomization": "Randomization method if applicable",
      "blinding": "Blinding procedure if applicable",
      "controls": ["Control 1", "Control 2"],
      "procedures": [
        {{"step_number": 1, "description": "Step description", "duration": "Time if applicable"}},
        {{"step_number": 2, "description": "Step description", "duration": "Time if applicable"}}
      ],
      "analysis_methods": [
        {{"name": "Analysis method", "description": "Description", "software": "Software used", "parameters": {{}}}}
      ],
      "limitations": ["Limitation 1", "Limitation 2"]
    }}
    ```
    
    Pay special attention to:
    - IMPORTANT: Ensure all variables have the "var_type" field explicitly specified 
    - Statistical methods (power analysis, sample size calculation)
    - Randomization and blinding procedures
    - Control conditions and baseline measurements
    - Subject selection criteria and recruitment
    
    If information is missing, omit those fields rather than guessing.
    """,
    
    # === SEARCH AGENT PROMPT ===
    # Purpose: Guide ReAct agent to search for protocol improvements
    # Output: Agent reasoning and search results
    "search_agent": """
    You are an experimental methodology expert helping to improve a scientific protocol.
    Analyze this protocol and use the tools to find relevant methodological guidelines,
    similar protocols, and best practices.
    
    PROTOCOL SUMMARY:
    {protocol}
    
    STUDY DESIGN: {study_design}
    
    KEY VARIABLES: {variables}
    
    ANALYSIS METHODS: {analysis_methods}
    
    Your task is to search for:
    1. Methodological standards for this type of study
    2. Common pitfalls and how to avoid them
    3. Statistical best practices for this design
    4. Improvements to randomization, blinding, and controls
    5. Validation techniques for measurements
    
    Use the tools to conduct targeted searches. Be specific in your search queries.
    Focus on finding concrete, actionable improvements to the protocol.
    
    Available tools: {tool_names}
    
    {agent_scratchpad}
    """,
    
    # === PROTOCOL WEAKNESSES ANALYSIS ===
    # Purpose: Identify methodological weaknesses in the protocol
    # Output: Structured list of weaknesses with improvement opportunities
    "protocol_weaknesses": """
    You are a methodological reviewer for scientific protocols. Analyze this experimental
    protocol and identify methodological weaknesses or opportunities for improvement.
    
    EXPERIMENTAL PROTOCOL:
    {protocol}
    
    RELEVANT METHODOLOGICAL GUIDELINES:
    {search_context}
    
    INSTRUCTIONS:
    1. Compare the protocol against best practices for {study_design} studies
    2. Identify missing elements or unclear specifications
    3. Evaluate statistical rigor and potential sources of bias
    4. Assess controls, randomization, and blinding procedures
    5. Consider reproducibility and reporting completeness
    
    Focus on these common areas for improvement:
    - Sample size justification and power analysis
    - Randomization and allocation procedures
    - Blinding of participants, investigators, and analysts
    - Selection of appropriate controls
    - Validation of measurement methods
    - Statistical analysis approach and assumptions
    - Reporting standards and transparency
    
    Provide specific, actionable suggestions for each weakness identified.
    """,
    
    # === PARSE WEAKNESSES ===
    # Purpose: Parse weakness analysis into structured format
    # Output: JSON list of weaknesses
    "parse_weaknesses": """
    Convert the following protocol weakness analysis into a structured JSON format:
    
    {weaknesses_text}
    
    Return ONLY a JSON array with this structure:
    ```json
    [
      {{
        "component": "Component name (e.g., 'Sample Size', 'Randomization', 'Statistical Analysis')",
        "description": "Description of the weakness",
        "impact": "Potential impact on results or validity",
        "improvement_type": "One of: statistical_power, bias_reduction, reproducibility, reporting, measurement, analysis, other",
        "suggestion": "Specific suggestion for improvement"
      }}
    ]
    ```
    
    Focus on extracting clear, specific weaknesses with actionable suggestions.
    """,
    
    # === PROTOCOL OPTIMIZATION ===
    # Purpose: Generate optimized protocol with improvements
    # Output: OptimizedProtocolOutput object
    "protocol_optimization": """
    You are an experimental design optimization expert. Create an improved version of this 
    protocol based on the identified weaknesses and methodological standards.
    
    ORIGINAL PROTOCOL:
    {protocol}
    
    IDENTIFIED WEAKNESSES:
    {weaknesses}
    
    METHODOLOGICAL GUIDELINES:
    {search_context}
    
    INSTRUCTIONS:
    1. Create an optimized version of the original protocol
    2. Make specific improvements to address each weakness
    3. Provide clear justification for each improvement
    4. Ensure all changes are evidence-based
    5. Maintain the core research question and design approach
    
    Return a comprehensive optimization result in this JSON structure:
    ```json
    {{
      "original_protocol": <keep unchanged from input>,
      "optimized_protocol": {{
        "protocol_id": "opt-<original_id>",
        "study_design": "Original or improved study design",
        "variables": {{
          "independent": [],
          "dependent": [],
          "control": [],
          "covariate": []
        }},
        "sample_size": 0,
        "sample_size_justification": "Improved justification",
        "inclusion_criteria": [],
        "exclusion_criteria": [],
        "randomization": "Improved randomization method",
        "blinding": "Improved blinding procedure",
        "controls": [],
        "procedures": [],
        "analysis_methods": [],
        "limitations": [],
        "source": "optimized",
        "confidence_score": 0.9
      }},
      "improvements": [
        {{
          "component": "Component name",
          "original_text": "Original protocol element",
          "improved_text": "Improved protocol element",
          "justification": "Evidence-based justification for change",
          "evidence": "Supporting evidence from literature",
          "improvement_type": "One of: statistical_power, bias_reduction, reproducibility, reporting, measurement, analysis, other"
        }}
      ],
      "literature_references": [
        {{
          "title": "Title of reference",
          "authors": "Authors of reference",
          "url": "URL if available",
          "year": "Year if available",
          "journal": "Journal if available"
        }}
      ]
    }}
    ```
    
    For each improvement, provide a strong, evidence-based justification referring to methodological standards 
    or research best practices in {study_design} studies.
    """,
    
    # === PROTOCOL VALIDATION ===
    # Purpose: Validate optimized protocol against standards
    # Output: Validation assessment
    "protocol_validation": """
    You are a protocol validation expert. Assess the quality and completeness of this
    optimized experimental protocol against methodological standards.
    
    OPTIMIZED PROTOCOL:
    {protocol}
    
    KEY IMPROVEMENTS:
    {improvements}
    
    INSTRUCTIONS:
    1. Evaluate adherence to reporting guidelines for {study_design} studies
    2. Assess statistical rigor and appropriateness
    3. Verify completeness of all required protocol elements
    4. Check for remaining methodological issues
    5. Identify areas still needing human expert review
    
    Your assessment should cover these validation domains:
    - Design validity (internal and external)
    - Statistical validity
    - Measurement validity
    - Procedural completeness
    - Reproducibility
    - Transparency and reporting
    
    Provide specific recommendations for any remaining issues.
    """,
    
    # === PARSE VALIDATION ===
    # Purpose: Parse validation text to structured format
    # Output: JSON validation result
    "parse_validation": """
    Convert the following protocol validation assessment into a structured JSON format:
    
    {validation_text}
    
    Return ONLY a JSON object with this structure:
    ```json
    {{
      "overall_score": 0.85,
      "validation_domains": [
        {{
          "domain": "Domain name (e.g., 'Design Validity', 'Statistical Validity')",
          "score": 0.9,
          "strengths": ["Strength 1", "Strength 2"],
          "weaknesses": ["Weakness 1", "Weakness 2"]
        }}
      ],
      "remaining_issues": [
        {{
          "issue": "Description of remaining issue",
          "severity": "high|medium|low",
          "recommendation": "Recommendation to address"
        }}
      ],
      "expert_review_needed": ["Area 1", "Area 2"],
      "overall_assessment": "Overall textual assessment"
    }}
    ```
    
    Ensure scores are between 0.0 and 1.0, with higher scores indicating better quality.
    """
}

# You can customize these prompts to better suit your specific research domain
# Add field-specific terminology, criteria, or examples to improve protocol extraction
# For medical research, emphasize regulatory compliance and patient safety
# For psychology studies, focus on ethical considerations and measurement validity
# For biological research, emphasize replication and control conditions

show("Prompt library initialized with 7 specialized prompts for experimental protocol optimization", "success")

In [None]:
# 📊 Core Functions
"""
This section contains all the core functionality for the Experimental Architect system.
Each function is designed to be modular, well-documented, and easy to customize.
"""

import json, re, os, time, hashlib, uuid
from typing import List, Dict, Any, Optional, Tuple, Union, Callable
from datetime import datetime

from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableLambda
from langchain.tools import DuckDuckGoSearchRun, ArxivQueryRun, Tool
from langchain.agents import AgentExecutor, create_react_agent

# ===== DOCUMENT PROCESSING =====
# These functions handle loading documents and text processing

def load_document(source_type: str, content: str) -> Dict[str, Any]:
    """Load document from various sources (text or PDF)
    
    Args:
        source_type: Type of document ("text" or "pdf")
        content: Document content or file path
        
    Returns:
        Dictionary with extracted text and metadata
    """
    if source_type == "text":
        return {"text": content, "metadata": {"source_type": "text", "length": len(content)}}
    elif source_type == "pdf":
        try:
            from pypdf import PdfReader
            reader = PdfReader(content)
            text = "\n\n".join(page.extract_text() for page in reader.pages)
            return {"text": text, "metadata": {"source_type": "pdf", "filename": os.path.basename(content), "pages": len(reader.pages)}}
        except Exception as e:
            return {"text": "", "metadata": {"error": str(e)}}
    else:
        return {"text": "", "metadata": {"error": "Unsupported source type"}}

def chunk_text(text: str, config: ExpArchConfig) -> List[str]:
    """Split text into semantically coherent chunks optimized for scientific papers
    
    Args:
        text: The text to chunk
        config: Configuration with chunk_size and overlap parameters
        
    Returns:
        List of text chunks optimized for scientific content
    """
    if not text: 
        return []
    
    # Safety bounds for config values
    chunk_size = min(max(config.text_chunk_size, 1000), 4000)
    overlap = min(config.text_chunk_overlap, chunk_size // 4)
    
    # Scientific paper section headers (common patterns in papers)
    section_headers = [
        r'\n+\s*ABSTRACT\s*\n+',
        r'\n+\s*INTRODUCTION\s*\n+', 
        r'\n+\s*METHODS?\s*\n+',
        r'\n+\s*RESULTS\s*\n+',
        r'\n+\s*DISCUSSION\s*\n+',
        r'\n+\s*CONCLUSION\s*\n+',
        r'\n+\s*REFERENCES\s*\n+'
    ]
    
    # First try to split by major sections
    section_splits = [0]
    for pattern in section_headers:
        for match in re.finditer(pattern, text, re.IGNORECASE):
            section_splits.append(match.start())
    section_splits.append(len(text))
    section_splits = sorted(set(section_splits))
    
    # Generate base chunks from sections
    base_chunks = []
    if len(section_splits) > 2:  # More than just start and end
        for i in range(len(section_splits) - 1):
            start, end = section_splits[i], section_splits[i+1]
            if end - start > 100:  # Avoid tiny sections
                base_chunks.append(text[start:end].strip())
    else:
        base_chunks = [text]  # No clear sections, use whole text
    
    # Further split any chunks that exceed the max size
    final_chunks = []
    for chunk in base_chunks:
        if len(chunk) <= chunk_size:
            final_chunks.append(chunk)
        else:
            # Split large chunks by paragraphs or sentences
            start = 0
            while start < len(chunk):
                end = min(start + chunk_size, len(chunk))
                
                if end < len(chunk):
                    # Try paragraph boundaries
                    para_end = chunk.rfind("\n\n", start + (chunk_size // 2), end)
                    if para_end > start + 200:
                        end = para_end + 2
                    else:
                        # Fall back to sentence boundaries
                        for sep in [". ", ".\n", "? ", "! "]:
                            sent_end = chunk.rfind(sep, start + (chunk_size // 2), end)
                            if sent_end > start + 200:
                                end = sent_end + len(sep)
                                break
                
                final_chunks.append(chunk[start:end].strip())
                start = max(end - overlap, start + (chunk_size // 2))
    
    return final_chunks

# ===== TOOL IMPLEMENTATIONS =====
# Tools for the ReAct agent to search and retrieve information

def get_web_search_tool() -> Tool:
    """Create web search tool for finding methodological guidelines and best practices."""
    search = DuckDuckGoSearchRun()
    
    def web_search(query: str) -> str:
        """Search the web for methodological guidelines and best practices.
        
        This tool searches the web for information about experimental design best practices,
        methodological standards, and scientific guidelines.
        
        Args:
            query: Search query specifically about experimental methodology
            
        Returns:
            Formatted search results with titles and snippets
        """
        try:
            results = search.run(f"experimental methodology {query}")
            
            # Format and truncate results
            if len(results) > 1000:
                results = results[:1000] + "...[truncated for readability]"
                
            return results
        except Exception as e:
            return f"Error performing web search: {str(e)}"
    
    return Tool(
        name="web_search",
        func=web_search,
        description="Search the web for experimental methodology guidelines and best practices"
    )

def get_arxiv_search_tool() -> Tool:
    """Create arXiv search tool for finding academic papers with similar protocols."""
    arxiv = ArxivQueryRun()
    
    def arxiv_search(query: str) -> str:
        """Search academic papers on arXiv for experimental protocols and methods.
        
        This tool searches academic literature for similar experimental protocols,
        methodological approaches, and experimental designs.
        
        Args:
            query: Search query about experimental protocols or methods
            
        Returns:
            Formatted search results with titles, authors, and abstracts
        """
        try:
            # Add methodology-specific terms to improve results
            enhanced_query = f"methodology experimental_design protocol {query}"
            results = arxiv.run(enhanced_query)
            
            # Format and truncate results
            if len(results) > 1500:
                results = results[:1500] + "...[truncated for readability]"
                
            return results
        except Exception as e:
            return f"Error searching arXiv: {str(e)}"
    
    return Tool(
        name="arxiv_search",
        func=arxiv_search,
        description="Search academic papers on arXiv for similar experimental protocols and methods"
    )

def process_search_result(tool_name: str, query: str, result: str) -> SearchResult:
    """Process and structure a search tool result.
    
    Args:
        tool_name: Name of the tool used ("web_search" or "arxiv_search")
        query: Original search query
        result: Raw search result text
        
    Returns:
        Structured SearchResult object
    """
    source_type = "arxiv" if tool_name == "arxiv_search" else "web"
    
    # Extract title, URL, and other metadata using regex patterns
    title = ""
    url = ""
    date = ""
    authors = []
    
    if source_type == "arxiv":
        # Extract arXiv paper details
        title_match = re.search(r'Title:(.*?)(?:Authors:|$)', result, re.DOTALL)
        if title_match:
            title = title_match.group(1).strip()
            
        author_match = re.search(r'Authors:(.*?)(?:Submitted:|$)', result, re.DOTALL)
        if author_match:
            author_text = author_match.group(1).strip()
            authors = [a.strip() for a in author_text.split(',')]
            
        url_match = re.search(r'https://arxiv.org/\S+', result)
        if url_match:
            url = url_match.group(0)
            
    else:
        # Extract web search details
        # Handle typical DuckDuckGo format
        title_match = re.search(r'^(.*?)\n', result)
        if title_match:
            title = title_match.group(1).strip()
            
        url_match = re.search(r'https?://\S+', result)
        if url_match:
            url = url_match.group(0)
    
    # Create SearchResult with extracted information
    return SearchResult(
        query=query,
        source=source_type,
        title=title or "Untitled result",
        content=result,
        url=url,
        date=date,
        authors=authors,
        relevance_score=0.7,  # Default relevance score
        keywords=[]  # Could be extracted with additional processing
    )

# ===== LLM INTEGRATION =====
# These functions manage LLM interactions with enhanced robustness

def create_chain(prompt_name: str, output_model: Optional[Any] = None) -> Any:
    """Create an LCEL chain for a specific LLM task
    
    Args:
        prompt_name: Name of the prompt from PROMPTS dictionary
        output_model: Optional Pydantic model for structured output
        
    Returns:
        LCEL chain configured for the specified task
    """
    # Guard against missing prompts
    if prompt_name not in PROMPTS:
        show(f"Prompt '{prompt_name}' not found in prompt library", "error")
        return None
        
    template = ChatPromptTemplate.from_template(PROMPTS[prompt_name])
    
    def parse_output(response):
        """Parse LLM response with multi-strategy approach"""
        content = response.content if hasattr(response, 'content') else response
        
        # For text output (no model), return directly
        if not output_model:
            return content
        
        # For structured output, try multiple parsing strategies:
        
        # 1. Code block extraction
        if isinstance(content, str) and "```" in content:
            try:
                # Handle ```json blocks
                if "```json" in content:
                    json_block = content.split("```json")[1].split("```")[0].strip()
                    data = json.loads(json_block)
                    # Handle both single object and list of objects
                    if isinstance(data, list):
                        return output_model(**data[0]) if len(data) == 1 else output_model(**data)
                    else:
                        return output_model(**data)
                # Handle any other code blocks
                else:
                    code_block = content.split("```")[1].split("```")[0].strip()
                    if code_block.strip():
                        try:
                            data = json.loads(code_block)
                            if isinstance(data, list):
                                return output_model(**data[0]) if len(data) == 1 else output_model(**data)
                            else:
                                return output_model(**data)
                        except:
                            pass
            except Exception as e:
                show(f"Code block parsing error: {str(e)}", "debug")
        
        # 2. Direct JSON parsing
        try:
            data = json.loads(content if isinstance(content, str) else content.content)
            if isinstance(data, list):
                return output_model(**data[0]) if len(data) == 1 else output_model(**data)
            else:
                return output_model(**data)
        except Exception:
            pass
            
        # 3. Fallback regex extraction
        try:
            json_pattern = r'(\{.*?\})'
            matches = re.findall(json_pattern, content, re.DOTALL)
            if matches:
                data = json.loads(max(matches, key=len))  # Use the longest match
                return output_model(**data)
        except Exception as e:
            show(f"All parsing methods failed: {str(e)}", "debug")
        
        # Fallback: try to create an empty model instance with error info
        try:
            return output_model(error=f"Failed to parse response: {content[:100]}...")
        except:
            show("Could not create fallback model", "warning")
            return None
    
    # Create and return the chain
    if "llm" in globals():
        return template | llm | RunnableLambda(parse_output)
    else:
        show("Warning: No LLM configured. Using placeholder output.", "warning")
        return RunnableLambda(lambda _: "LLM missing: output unavailable")

def cached_run(chain, inputs: Dict, key_prefix: str = "") -> Any:
    """Run chain with caching to minimize API calls
    
    Args:
        chain: LCEL chain to run
        inputs: Input parameters 
        key_prefix: Cache key prefix for identification
        
    Returns:
        Chain output (cached or fresh)
    """
    if not chain: 
        return None
    
    # Use the caching functions from Core Utilities
    if 'CACHE_ENABLED' in globals() and CACHE_ENABLED:
        # Prepare inputs for caching - limit text size for reasonable cache keys
        cache_inputs = {}
        for k, v in inputs.items():
            if k == 'text' and isinstance(v, str) and len(v) > 500:
                cache_inputs[k] = v[:500]  # Use first 500 chars of text for cache key
            else:
                cache_inputs[k] = v
                
        # Add prefix and timestamp to differentiate between similar calls
        cache_inputs['_function'] = key_prefix
        
        # Try to get cached result
        try:
            key = cache_key(**cache_inputs)
            cached_result = get_cache(key)
            
            if cached_result is not None:
                show(f"Using cached result for {key_prefix}", "debug")
                return cached_result
        except Exception as e:
            show(f"Cache access error: {str(e)}", "debug")
    
    # Run chain if not in cache or caching disabled
    try:
        result = chain.invoke(inputs)
        
        # Try to cache the result
        if 'CACHE_ENABLED' in globals() and CACHE_ENABLED:
            try:
                set_cache(key, result)
            except Exception as e:
                show(f"Cache storage error: {str(e)}", "debug")
                
        return result
    except Exception as e:
        error_msg = str(e)
        show(f"Error in {key_prefix}: {error_msg}", "error")
        return None

# ===== AGENT IMPLEMENTATION =====
# Create and manage the ReAct agent for protocol optimization

def create_search_agent(config: ExpArchConfig = None) -> AgentExecutor:
    """Create a ReAct agent for searching and analyzing protocols."""
    config = config or ExpArchConfig()
    
    # Initialize tools
    tools = [
        get_web_search_tool(),
        get_arxiv_search_tool()
    ]
    
    # Extract tool names for the prompt
    tool_names = ", ".join([tool.name for tool in tools])
    
    # Create the partial prompt with tool names already included
    agent_prompt = ChatPromptTemplate.from_template(PROMPTS["search_agent"])
    
    # Ensure we have LLM initialized
    if "llm" not in globals() or not llm:
        show("LLM not initialized, agent creation failed", "error")
        return None
    
    try:
        # Create the agent with tool names already in the prompt
        agent = create_react_agent(
            llm=llm,
            tools=tools,
            prompt=agent_prompt
        )
        
        # Store the tool_names in the agent's metadata for later use
        agent._tool_names = tool_names
        
        # Configure the agent executor
        return AgentExecutor(
            agent=agent,
            tools=tools,
            verbose=False,
            handle_parsing_errors=True,
            max_iterations=config.agent_max_iterations,
            max_execution_time=config.search_timeout
        )
    except Exception as e:
        show(f"Agent creation failed: {str(e)}", "error")
        return None

def run_agent_with_state(agent_executor: AgentExecutor, 
                        prompt_inputs: Dict[str, Any], 
                        agent_state: AgentState) -> Tuple[Any, AgentState]:
    """Run ReAct agent while tracking state for visualization.
    
    Args:
        agent_executor: The configured agent executor
        prompt_inputs: Inputs for the agent prompt
        agent_state: Current agent state to update
        
    Returns:
        Tuple of (result, updated_agent_state)
    """
    if not agent_executor:
        agent_state.add_thought("Agent not properly initialized")
        return None, agent_state
    
    # Create callback handler for state tracking
    class StateTrackingCallbackHandler:
        def __init__(self, state: AgentState):
            self.state = state
            self.current_step = None
            
        def on_llm_start(self, *args, **kwargs):
            pass
            
        def on_llm_end(self, response, *args, **kwargs):
            # Extract thought from response
            thought = response.generations[0][0].text
            self.state.add_thought(thought)
            
        def on_tool_start(self, serialized, input_str, **kwargs):
            # Track tool usage
            tool_name = serialized["name"]
            self.state.add_action(tool_name, input_str)
            self.current_step = {"tool": tool_name, "input": input_str}
            
        def on_tool_end(self, output, **kwargs):
            # Process tool output
            if self.current_step:
                tool_name = self.current_step["tool"]
                query = self.current_step["input"]
                self.state.add_observation(output)
                
                # Process search result if from search tools
                if tool_name in ["web_search", "arxiv_search"]:
                    result = process_search_result(tool_name, query, output)
                    self.state.add_search_result(result)
                
                self.current_step = None
    
    # Create callback handler
    callbacks = [StateTrackingCallbackHandler(agent_state)]
    
    try:
        # Execute agent
        start_time = time.time()
        result = agent_executor.invoke(prompt_inputs, callbacks=callbacks)
        execution_time = time.time() - start_time
        
        # Update agent state with result
        output = result.get("output", "No output generated")
        agent_state.add_final_answer(output)
        
        # Add execution metadata
        agent_state.add_detail("execution_time", execution_time)
        
        return result, agent_state
    except Exception as e:
        error_msg = f"Agent execution failed: {str(e)}"
        agent_state.add_thought(error_msg)
        show(error_msg, "error")
        return None, agent_state

# ===== CORE PROTOCOL FUNCTIONS =====
# These functions implement the main protocol processing capabilities

def extract_protocol_chain(methods_text: str, config: ExpArchConfig = None) -> ExperimentalProtocol:
    """Extract experimental protocol from methods section text.
    
    Args:
        methods_text: Methods section text to analyze
        config: Configuration parameters
        
    Returns:
        Extracted ExperimentalProtocol with variables, steps, and analysis methods
    """
    if not methods_text or len(methods_text.strip()) < 50:
        show("Methods text too short for protocol extraction", "warning")
        return None
    
    config = config or ExpArchConfig()
    show(f"Extracting protocol from methods text ({len(methods_text)} chars)...", "info")
    
    try:
        # Create and run extraction chain
        chain = create_chain("protocol_extraction", ExperimentalProtocol)
        protocol = cached_run(chain, {
            "text": methods_text, 
            "config": config.model_dump()
        }, "protocol_extraction")
        
        if not protocol:
            show("Protocol extraction failed", "warning")
            return None
        
        # Validate extracted protocol
        completeness = protocol.get_completeness_score()
        protocol.confidence_score = completeness
        
        show(f"Extracted protocol with {completeness:.1%} completeness score", 
            "success" if completeness > 0.6 else "warning")
        
        return protocol
        
    except Exception as e:
        show(f"Error in protocol extraction: {str(e)}", "error")
        return None

def search_related_protocols_agent(protocol: ExperimentalProtocol, 
                                 agent_state: AgentState,
                                 config: ExpArchConfig = None) -> Tuple[Dict[str, Any], AgentState]:
    """Search for related protocols and methodological guidelines."""
    if not protocol:
        show("No protocol provided for search", "warning")
        return {}, agent_state
    
    config = config or ExpArchConfig()
    
    # Create protocol summary for agent
    protocol_summary = protocol.get_summary()
    
    # Create agent
    agent_executor = create_search_agent(config)
    
    if not agent_executor:
        agent_state.add_thought("Failed to create search agent")
        return {}, agent_state
    
    # Get tool names from the agent if available
    tool_names = getattr(agent_executor.agent, "_tool_names", "web_search, arxiv_search")
    
    # Create prompt inputs with tool_names included
    prompt_inputs = {
        "protocol": protocol_summary,
        "study_design": protocol.study_design,
        "variables": ", ".join([v.name for vlist in protocol.variables.values() for v in vlist][:5]),
        "analysis_methods": ", ".join([m.name for m in protocol.analysis_methods][:3]),
        "tool_names": tool_names,  # Add tool names to inputs
        "agent_scratchpad": ""  # Initialize empty scratchpad
    }
    
    show("Searching for related protocols and methodological guidelines...", "info")
    
    # Run agent with state tracking
    result, updated_state = run_agent_with_state(agent_executor, prompt_inputs, agent_state)
    
    if not result:
        show("Protocol search failed", "warning")
        return {}, updated_state
    
    show(f"Search complete: found {len(updated_state.search_results)} relevant sources", "success")
    
    return result, updated_state

def analyze_protocol_weaknesses_chain(protocol: ExperimentalProtocol, 
                                     search_results: List[SearchResult],
                                     config: ExpArchConfig = None) -> List[Dict[str, Any]]:
    """Analyze protocol weaknesses based on methodological standards."""
    if not protocol:
        show("No protocol provided for weakness analysis", "warning")
        return []
    
    config = config or ExpArchConfig()
    
    # Format protocol for prompt
    protocol_json = protocol.model_dump_json()
    
    # Format search results for prompt - provide fallback for empty results
    search_context = ""
    if search_results:
        for i, result in enumerate(search_results[:5], 1):
            search_context += f"{i}. {result.title} (from {result.source}):\n"
            search_context += f"{result.truncated_content(300)}\n\n"
    else:
        search_context = "No search results available. Analyzing protocol based on general methodological standards."
    
    # Create and run weakness analysis chain
    chain = create_chain("protocol_weaknesses", None)  # Returns text
    weaknesses_text = cached_run(chain, {
        "protocol": protocol_json,
        "search_context": search_context,
        "study_design": protocol.study_design
    }, "protocol_weaknesses")
    
    if not weaknesses_text:
        show("Protocol weakness analysis failed, using fallback analysis", "warning")
        # Provide fallback basic weaknesses for common issues
        return [{
            "component": "General Methodology",
            "description": "Unable to perform detailed analysis due to missing search context",
            "impact": "Reduced ability to identify specific methodological improvements",
            "improvement_type": "other",
            "suggestion": "Consider reviewing protocol with domain experts"
        }]
    
    # Parse weaknesses text into structured format using LLM
    parse_chain = create_chain("parse_weaknesses", None)  # Returns JSON
    weaknesses_json = cached_run(parse_chain, {
        "weaknesses_text": weaknesses_text
    }, "parse_weaknesses")
    
    try:
        # Parse JSON string to Python object if needed
        if isinstance(weaknesses_json, str):
            weaknesses = json.loads(weaknesses_json)
        else:
            weaknesses = weaknesses_json
            
        if not isinstance(weaknesses, list):
            weaknesses = [weaknesses]
            
        show(f"Identified {len(weaknesses)} protocol weaknesses", "info")
        return weaknesses
    except Exception as e:
        show(f"Error parsing weaknesses: {str(e)}. Using fallback analysis.", "error")
        # Provide fallback if JSON parsing fails
        return [{
            "component": "JSON Parsing",
            "description": f"Error parsing weakness analysis: {str(e)}",
            "impact": "Unable to properly structure identified weaknesses",
            "improvement_type": "other",
            "suggestion": "Review raw weakness analysis and manually extract key points"
        }]

def generate_optimized_protocol_chain(original_protocol: ExperimentalProtocol,
                                    weaknesses: List[Dict[str, Any]],
                                    search_results: List[SearchResult],
                                    config: ExpArchConfig = None) -> OptimizedProtocolOutput:
    """Generate optimized protocol with improvements.
    
    Args:
        original_protocol: Original experimental protocol
        weaknesses: Identified weaknesses
        search_results: Search results for methodological standards
        config: Configuration parameters
        
    Returns:
        OptimizedProtocolOutput with improvements and justifications
    """
    if not original_protocol:
        show("No protocol provided for optimization", "warning")
        return None
    
    config = config or ExpArchConfig()
    
    # Format inputs for prompt
    protocol_json = original_protocol.model_dump_json()
    
    weaknesses_text = ""
    for i, weakness in enumerate(weaknesses[:5], 1):
        weaknesses_text += f"{i}. {weakness.get('component', 'Unknown component')}: "
        weaknesses_text += f"{weakness.get('description', 'No description')}\n"
    
    # Format search results
    search_context = ""
    for i, result in enumerate(search_results[:5], 1):
        search_context += f"{i}. {result.title} (from {result.source}):\n"
        search_context += f"{result.truncated_content(300)}\n\n"
    
    # Create and run optimization chain
    chain = create_chain("protocol_optimization", OptimizedProtocolOutput)
    
    show("Generating optimized protocol...", "info")
    start_time = time.time()
    
    optimized_result = cached_run(chain, {
        "protocol": protocol_json,
        "weaknesses": weaknesses_text,
        "search_context": search_context,
        "study_design": original_protocol.study_design
    }, "protocol_optimization")
    
    execution_time = time.time() - start_time
    
    if not optimized_result:
        show("Protocol optimization failed", "warning")
        return None
    
    # Set execution timing
    if hasattr(optimized_result, 'optimization_time'):
        optimized_result.optimization_time = execution_time
    
    # Make sure original protocol is correctly linked
    if hasattr(optimized_result, 'original_protocol') and not optimized_result.original_protocol:
        optimized_result.original_protocol = original_protocol
    
    show(f"Protocol optimization complete with {len(optimized_result.improvements)} improvements", 
         "success")
    
    return optimized_result

def validation_summary_chain(optimized_output: OptimizedProtocolOutput,
                          config: ExpArchConfig = None) -> Dict[str, Any]:
    """Generate validation summary for the optimized protocol.
    
    Args:
        optimized_output: The optimized protocol output
        config: Configuration parameters
        
    Returns:
        Validation summary with scores and recommendations
    """
    if not optimized_output or not hasattr(optimized_output, 'optimized_protocol'):
        show("No optimized protocol provided for validation", "warning")
        return {"error": "Missing optimized protocol"}
    
    config = config or ExpArchConfig()
    
    # Format optimized protocol for prompt
    protocol_json = optimized_output.optimized_protocol.model_dump_json()
    
    # Format improvements
    improvements_text = ""
    for i, imp in enumerate(optimized_output.improvements[:5], 1):
        improvements_text += f"{i}. {imp.component}: {imp.justification[:100]}...\n"
    
    # Create and run validation chain
    chain = create_chain("protocol_validation", None)  # Returns text
    
    validation_text = cached_run(chain, {
        "protocol": protocol_json,
        "improvements": improvements_text,
        "study_design": optimized_output.optimized_protocol.study_design
    }, "protocol_validation")
    
    if not validation_text:
        show("Protocol validation failed", "warning")
        return {"error": "Validation failed"}
    
    # Parse validation text to structured format
    try:
        validation_chain = create_chain("parse_validation", None)  # Returns JSON
        validation_json = cached_run(validation_chain, {
            "validation_text": validation_text
        }, "parse_validation")
        
        # Parse JSON if needed
        if isinstance(validation_json, str):
            validation = json.loads(validation_json)
        else:
            validation = validation_json
            
        show("Protocol validation complete", "success")
        return validation
    except Exception as e:
        show(f"Error parsing validation: {str(e)}", "error")
        return {"error": str(e), "raw_validation": validation_text}

# ===== ORCHESTRATION FUNCTION =====
# Coordinates the entire workflow

def experimental_design_orchestrator(methods_text: str, 
                                  config: ExpArchConfig = None) -> Tuple[OptimizedProtocolOutput, AgentState]:
    """End-to-end orchestration of experimental protocol optimization.
    
    Args:
        methods_text: Methods section text to analyze
        config: Configuration parameters
        
    Returns:
        Tuple of (optimized_protocol_output, agent_state)
    """
    config = config or ExpArchConfig()
    
    # Apply settings based on optimization mode
    config.apply_mode_settings()
    
    # Initialize agent state
    agent_state = AgentState()
    
    show(f"Starting experimental design optimization in {config.optimization_mode} mode...", "info")
    overall_start_time = time.time()
    
    # Step 1: Extract protocol
    show("Step 1/5: Extracting experimental protocol...", "info")
    protocol = extract_protocol_chain(methods_text, config)
    
    if not protocol:
        show("Protocol extraction failed, aborting optimization", "error")
        agent_state.add_thought("Protocol extraction failed")
        return None, agent_state
    
    agent_state.extracted_protocol = protocol
    
    # Step 2: Search for related protocols and guidelines
    show("Step 2/5: Searching for related protocols and guidelines...", "info")
    search_result, agent_state = search_related_protocols_agent(protocol, agent_state, config)
    
    if not search_result:
        show("Protocol search completed with no results, continuing with limited context", "warning")
    
    # Step 3: Analyze protocol weaknesses
    show("Step 3/5: Analyzing protocol weaknesses...", "info")
    weaknesses = analyze_protocol_weaknesses_chain(protocol, agent_state.search_results, config)
    
    if not weaknesses:
        show("Protocol weakness analysis failed, continuing with limited improvements", "warning")
        weaknesses = []  # Use empty list to continue
    
    # Step 4: Generate optimized protocol
    show("Step 4/5: Generating optimized protocol...", "info")
    optimized_output = generate_optimized_protocol_chain(protocol, weaknesses, 
                                                       agent_state.search_results, config)
    
    if not optimized_output:
        show("Protocol optimization failed, aborting", "error")
        return None, agent_state
    
    agent_state.optimized_protocol = optimized_output
    
    # Step 5: Validate optimized protocol
    show("Step 5/5: Validating optimized protocol...", "info")
    validation = validation_summary_chain(optimized_output, config)
    
    if validation and "error" not in validation:
        # Add validation results to output
        optimized_output.validation = validation
    
    overall_execution_time = time.time() - overall_start_time
    show(f"Optimization complete in {overall_execution_time:.1f} seconds", "success")
    
    return optimized_output, agent_state

# Initialization complete
show("Core functions initialized - ready for protocol optimization", "success")

## Launch UI

In [None]:
# System Initialization
from pathlib import Path

try:
    # Initialize the Literature Synthesis System
    class LitSynthSystem:
        """Main system class that coordinates all components of the Literature Synthesis system."""
        
        def __init__(self, config=None):
            """Initialize the system with configuration and components."""
            # Use existing config or create new one
            self.config = config or globals().get('config', LitSynthConfig())
            
            # No cache setup needed - already handled in Core Utilities
        
        def analyze_document(self, source_type, content):
            """Main entry point to analyze a document."""
            return analyze_document(source_type, content, self.config)
        
        def extract_concepts_from_text(self, text):
            """Extract concepts from text directly."""
            return extract_concepts(text, self.config)
        
        def identify_relationships_from_concepts(self, text, concepts):
            """Identify relationships between concepts."""
            return identify_relationships(text, concepts, self.config)
        
        def identify_gaps_from_concepts_relationships(self, text, concepts, relationships):
            """Identify research gaps from concepts and relationships."""
            return identify_research_gaps(text, concepts, relationships)
        
        def generate_synthesis_from_components(self, text, concepts, relationships, gaps):
            """Generate synthesis from all components."""
            return generate_synthesis(text, concepts, relationships, gaps)

    # Initialize the system (connects previously defined components)
    litsynth = LitSynthSystem()
    
    # Confirm successful initialization - using show_info correctly
    show_info("Literature Synthesis System initialized successfully")
    
except Exception as e:
    # Use show from Core Utilities for error (with level parameter)
    error_msg = f"System failed to initialize: {str(e)}"
    show(f"{error_msg}\nPlease make sure you run the cells in order: Installation-Initialization-Data Models-Core Functions", "error")

In [None]:
# UI Structure
import gradio as gr
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

def create_ui():
    """Create optimized Literature Synthesis UI."""
    
    with gr.Blocks() as app:
        gr.Markdown("# Literature Synthesis Expert System")
        
        with gr.Tabs() as tabs:
            # === INPUT TAB ===
            with gr.Tab("Document Input"):
                with gr.Row():
                    with gr.Column(scale=3):
                        # PDF Upload (only option)
                        pdf_input = gr.File(
                            label="Upload Scientific PDF", 
                            file_types=[".pdf"],
                            file_count="single"
                        )
                        
                        with gr.Row():
                            analysis_mode = gr.Radio(
                                choices=["quick", "balanced", "thorough"],
                                label="Analysis Mode",
                                value="quick",
                                info="Quick: 30-40s, Balanced: 1-2min, Thorough: 3-5min"
                            )
                            analyze_btn = gr.Button("Analyze Document", variant="primary")
                    
                    with gr.Column(scale=2):
                        status_box = gr.Textbox(label="Status", interactive=False)
                        progress_bar = gr.Slider(
                            minimum=0, maximum=100, value=0, 
                            label="Processing Progress",
                            interactive=False
                        )
            
            # === CONCEPTS TAB ===
            with gr.Tab("Concepts & Relationships"):
                with gr.Row():
                    # Add document metadata box at the top
                    doc_info = gr.Markdown("*Upload and analyze a document to see results*")
                
                with gr.Row():
                    with gr.Column():
                        concepts_filter = gr.Radio(
                            choices=["all", "high", "medium", "low"],
                            label="Filter by Importance",
                            value="all"
                        )
                        concepts_table = gr.DataFrame(
                            headers=["Concept", "Definition", "Importance", "Confidence"]
                        )
                    
                    with gr.Column():
                        relationships_table = gr.DataFrame(
                            headers=["Source", "Relationship", "Target", "Evidence", "Confidence"]
                        )
            
            # === VISUALIZATION TAB ===
            with gr.Tab("Visualization"):
                with gr.Row():
                    with gr.Column():
                        with gr.Row():
                            min_confidence = gr.Slider(
                                minimum=0.0, maximum=1.0, value=0.5, step=0.1,
                                label="Minimum Confidence"
                            )
                            layout_type = gr.Dropdown(
                                choices=["Force-directed", "Circular", "Spectral", "Spring"],
                                label="Layout Type",
                                value="Force-directed"
                            )
                            refresh_viz_btn = gr.Button("Refresh")
                        
                        network_plot = gr.Plot(label="Concept Network")
                            
            # === SYNTHESIS TAB ===
            with gr.Tab("Research Synthesis"):
                with gr.Row():
                    with gr.Column(scale=3):
                        synthesis_output = gr.Markdown()
                    
                    with gr.Column(scale=2):
                        gr.Markdown("### Research Gaps")
                        gaps_table = gr.DataFrame(
                            headers=["Description", "Related Concepts", "Importance"]
                        )
                        
                        with gr.Row():
                            export_format = gr.Dropdown(
                                choices=["Markdown", "Text", "JSON"],
                                label="Export Format",
                                value="Markdown"
                            )
                            export_btn = gr.Button("Export")
                            
            # === SETTINGS TAB ===
            with gr.Tab("Settings"):
                with gr.Row():
                    with gr.Column():
                        # Analysis mode
                        gr.Markdown("#### Analysis Settings")
                        settings_analysis_mode = gr.Radio(
                            choices=["quick", "balanced", "thorough"],
                            label="Analysis Mode",
                            value="quick",
                            info="Affects document sampling and processing depth"
                        )
                        
                        # Text processing settings
                        gr.Markdown("#### Text Processing")
                        chunk_size = gr.Slider(
                            minimum=500, maximum=8000, value=2000, step=500,
                            label="Chunk Size (chars)",
                            info="Larger chunks capture more context but process slower"
                        )
                        chunk_overlap = gr.Slider(
                            minimum=50, maximum=1000, value=200, step=50,
                            label="Chunk Overlap"
                        )
                        
                        # Concept settings
                        gr.Markdown("#### Concepts")
                        min_importance = gr.Dropdown(
                            choices=["low", "medium", "high"],
                            label="Min Importance",
                            value="medium"
                        )
                        max_concepts = gr.Slider(
                            minimum=5, maximum=100, value=25, step=5,
                            label="Max Concepts"
                        )
                        
                        # Relationship settings
                        gr.Markdown("#### Relationships")
                        relationship_confidence = gr.Slider(
                            minimum=0.0, maximum=1.0, value=0.6, step=0.1,
                            label="Min Confidence"
                        )
                        max_relationships = gr.Slider(
                            minimum=10, maximum=200, value=50, step=10,
                            label="Max Relationships"
                        )
                        
                        apply_settings_btn = gr.Button("Apply Settings", variant="primary")
                        settings_status = gr.Textbox(label="Settings Status", interactive=False)
        
        # === State Variables ===
        results_state = gr.State(None)
    
    return app

# Create the UI
ui = create_ui()

# Display success message in notebook
show_info("UI structure defined successfully")

In [None]:
# UI Launch
import tempfile
import os
import time
import json
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
from pathlib import Path

def launch_litsynth_ui():
    """Launch the Literature Synthesis Expert System UI."""
    
    # Document Processing Functions
    def smart_sample_document(text, sample_percentage=15, min_chars=4000, max_chars=15000):
        """Sample document to reduce processing time while keeping key sections."""
        if not text or len(text) <= min_chars:
            return (text, 100) if text else ("", 0)
            
        target_size = max(min_chars, min(max_chars, int(len(text) * sample_percentage / 100)))
        
        # Extract document sections
        import re
        section_patterns = {
            'abstract': r'(?i)abstract\s*\n',
            'introduction': r'(?i)(introduction|background)\s*\n',
            'methods': r'(?i)(methods|methodology|materials\s+and\s+methods)\s*\n',
            'results': r'(?i)results\s*\n',
            'discussion': r'(?i)discussion\s*\n',
            'conclusion': r'(?i)(conclusion|conclusions|summary)\s*\n'
        }
        
        sections = {}
        for name, pattern in section_patterns.items():
            matches = list(re.finditer(pattern, text))
            if matches:
                start = matches[0].end()
                next_starts = [m.start() for m in re.finditer('|'.join(section_patterns.values()), text[start:])]
                end = start + next_starts[0] if next_starts else len(text)
                sections[name] = (start, end)
        
        # Prioritize sections or use beginning-middle-end approach
        sampled_text = ""
        if sections:
            priority_order = ['abstract', 'introduction', 'conclusion', 'discussion', 'results', 'methods']
            chars_remaining = target_size
            
            for section in priority_order:
                if section in sections and chars_remaining > 0:
                    start, end = sections[section]
                    section_text = text[start:end]
                    chars_to_take = min(len(section_text), chars_remaining)
                    sampled_text += section_text[:chars_to_take] + "\n\n"
                    chars_remaining -= chars_to_take
        
        if len(sampled_text) < min_chars or not sections:
            sampled_text = ""
            part_size = target_size // 3
            
            sampled_text += text[:part_size] + "\n\n"
            if len(text) > part_size * 3:
                middle_start = len(text) // 2 - part_size // 2
                sampled_text += "[...]\n\n" + text[middle_start:middle_start + part_size] + "\n\n"
            if len(text) > part_size * 2:
                sampled_text += "[...]\n\n" + text[max(len(text) - part_size, part_size * 2)]
        
        coverage = min(100, round(len(sampled_text) / len(text) * 100))
        return sampled_text, coverage
    
    def process_pdf_document(pdf_file, analysis_mode="quick"):
        """Process PDF with staged concept extraction and relationship mapping."""
        if not pdf_file:
            return "Please upload a PDF file.", None, 0
        
        try:
            yield "Loading PDF document...", None, 0
            start_time = time.time()
            
            # Process file upload
            temp_path = Path(tempfile.mkdtemp()) / "uploaded.pdf"
            if hasattr(pdf_file, 'name'):
                with open(pdf_file.name, "rb") as src_file, open(temp_path, "wb") as dest_file:
                    dest_file.write(src_file.read())
            else:
                with open(temp_path, "wb") as f:
                    f.write(pdf_file)
            
            yield "Extracting text...", None, 10
            doc_data = load_document("pdf", str(temp_path))
            text = doc_data.get("text", "")
            
            if not text:
                return "Failed to extract text from PDF.", None, 0
            
            # Sample text based on analysis mode
            if len(text) <= 4000:
                sampled_text, coverage = text, 100
            else:
                sample_percent = {"quick": 10, "balanced": 25, "thorough": 50}.get(analysis_mode, 10)
                target_size = min(len(text), max(4000, int(len(text) * sample_percent / 100)))
                
                segment_size = min(target_size // 3, 3000)
                start_text = text[:segment_size]
                mid_point = len(text) // 2
                mid_text = text[mid_point - segment_size//2:mid_point + segment_size//2]
                end_text = text[max(0, len(text) - segment_size):]
                
                sampled_text = start_text + "\n\n[...]\n\n" + mid_text + "\n\n[...]\n\n" + end_text
                coverage = round((len(sampled_text) / len(text)) * 100)
            
            yield f"Processing document ({len(text)} characters, {coverage}% sample)...", None, 20
            doc_id = hashlib.md5(text[:1000].encode()).hexdigest()[:10]
            
            # Multi-stage analysis
            concepts, relationships, gaps = [], [], []
            synthesis = "No synthesis generated."
            
            # 1. Extract concepts
            try:
                yield "Extracting concepts...", None, 30
                current_config = config if 'config' in globals() else LitSynthConfig()
                concepts = extract_concepts(sampled_text, current_config)
                yield f"Found {len(concepts)} concepts", None, 50
            except Exception as e:
                print(f"ERROR in concept extraction: {str(e)}")
                yield f"Error extracting concepts: {str(e)}", None, 50
            
            if concepts:
                # 2. Identify relationships
                try:
                    yield "Identifying relationships...", None, 60
                    relationships = identify_relationships(sampled_text, concepts, current_config)
                    yield f"Found {len(relationships)} relationships", None, 70
                except Exception as e:
                    yield f"Error identifying relationships, continuing...", None, 70
                
                # 3. Identify research gaps
                try:
                    yield "Identifying research gaps...", None, 80
                    gaps = identify_research_gaps(sampled_text, concepts, relationships)
                    yield f"Found {len(gaps)} research gaps", None, 90
                except Exception as e:
                    yield f"Error identifying research gaps, continuing...", None, 90
                
                # 4. Generate synthesis
                try:
                    yield "Generating synthesis...", None, 95
                    synthesis = generate_synthesis(sampled_text, concepts, relationships, gaps)
                except Exception as e:
                    synthesis = "Synthesis generation failed. Please check the extracted concepts and relationships."
            else:
                yield "No concepts found, skipping further analysis...", None, 95
            
            # Package results
            results = LiteratureSynthesisOutput(
                document_id=doc_id,
                document_metadata={
                    "original_length": len(text),
                    "processed_length": len(sampled_text),
                    "coverage_percentage": coverage,
                    "processing_time": round(time.time() - start_time, 2),
                    "analysis_mode": analysis_mode
                },
                concepts=concepts or [],
                relationships=relationships or [],
                research_gaps=gaps or [],
                synthesis_text=synthesis or "No synthesis available."
            )
            
            processing_time = round(time.time() - start_time, 2)
            status_msg = (f"Analysis complete in {processing_time}s: {len(results.concepts)} concepts, "
                        f"{len(results.relationships)} relationships, {len(results.research_gaps)} research gaps "
                        f"({coverage}% of document processed)")
            
            yield status_msg, results, 100
            
        except Exception as e:
            return f"Error analyzing document: {str(e)}", None, 0
    
    # Visualization Function
    def create_network_visualization(concepts, relationships, min_confidence=0.5):
        """Create network visualization with improved readability."""
        if not concepts or len(concepts) < 2:
            fig, ax = plt.subplots(figsize=(8, 6))
            ax.text(0.5, 0.5, "Not enough concepts to create visualization", 
                   ha='center', va='center', fontsize=12)
            ax.axis('off')
            return fig
        
        # Create directed graph
        G = nx.DiGraph()
        
        # Add nodes and edges
        for concept in concepts:
            G.add_node(concept.name, importance=concept.importance, definition=concept.definition)
        
        edge_count = 0
        for rel in relationships:
            if rel.confidence >= min_confidence and rel.source in G.nodes and rel.target in G.nodes:
                G.add_edge(rel.source, rel.target, 
                          relationship=rel.relationship_type,
                          evidence=rel.evidence,
                          confidence=rel.confidence)
                edge_count += 1
        
        # Enhanced visualization styling
        plt.rcParams.update({'font.size': 12})
        fig, ax = plt.subplots(figsize=(12, 10))
        
        importance_colors = {"high": "#e41a1c", "medium": "#377eb8", "low": "#4daf4a"}
        node_colors = [importance_colors.get(G.nodes[node]["importance"], "#999999") for node in G.nodes]
        
        centrality = nx.degree_centrality(G)
        node_sizes = [3000 * (centrality[node] + 0.1) for node in G.nodes]
        
        pos = nx.spring_layout(G, k=0.4, seed=42)
        
        # Draw network with improved visibility
        nx.draw_networkx_nodes(G, pos, node_color=node_colors, node_size=node_sizes, 
                              alpha=0.85, edgecolors='white', linewidths=1.5)
        nx.draw_networkx_edges(G, pos, edge_color='#555555', width=2.0, alpha=0.7, 
                              arrows=True, arrowsize=20, node_size=node_sizes)
        
        # Improved label rendering
        labels_pos = {node: (pos[node][0], pos[node][1] + 0.02) for node in G.nodes}
        nx.draw_networkx_labels(G, labels_pos, font_size=12, font_weight='bold', 
                               bbox=dict(facecolor='white', alpha=0.7, edgecolor='none', 
                                        boxstyle='round,pad=0.3'))
        
        # Add legend
        legend_elements = [plt.Line2D([0], [0], marker='o', color='w', 
                                     markerfacecolor=color, markersize=12, 
                                     label=f"{importance.capitalize()} Importance") 
                          for importance, color in importance_colors.items()]
        ax.legend(handles=legend_elements, loc='upper right', fontsize=11, 
                 frameon=True, facecolor='white', edgecolor='#cccccc')
        
        ax.axis('off')
        plt.title(f"Concept Relationship Network ({edge_count} connections at {min_confidence:.1f}+ confidence)",
                 fontsize=14, fontweight='bold', pad=20)
        
        return fig
    
    # Display Helper Functions
    def update_doc_info(results):
        """Format document metadata for display."""
        if not results:
            return "*No document analyzed yet*"
            
        metadata = results.document_metadata
        return (f"### Document Analysis Details\n"
                f"**Coverage**: {metadata.get('coverage_percentage', 'Unknown')}% of document processed\n"
                f"**Processing Time**: {metadata.get('processing_time', 'Unknown')}s\n"
                f"**Analysis Mode**: {metadata.get('analysis_mode', 'Unknown')}\n"
                f"**Concepts**: {len(results.concepts)}, "
                f"**Relationships**: {len(results.relationships)}, "
                f"**Research Gaps**: {len(results.research_gaps)}")
    
    def update_concepts_display(results, filter_type="all"):
        """Format concepts data for display with optional filtering."""
        if not results or not hasattr(results, 'concepts') or not results.concepts:
            return None
        
        filtered_concepts = results.concepts if filter_type == "all" else [c for c in results.concepts if c.importance == filter_type]
        
        return pd.DataFrame({
            "Concept": [c.name for c in filtered_concepts],
            "Definition": [c.definition for c in filtered_concepts],
            "Importance": [c.importance.capitalize() for c in filtered_concepts],
            "Confidence": [f"{c.confidence:.2f}" for c in filtered_concepts]
        })
    
    def update_relationships_display(results):
        """Format relationships data for display."""
        if not results or not hasattr(results, 'relationships') or not results.relationships:
            return None
        
        return pd.DataFrame({
            "Source": [r.source for r in results.relationships],
            "Relationship": [r.relationship_type for r in results.relationships],
            "Target": [r.target for r in results.relationships],
            "Evidence": [r.evidence or "N/A" for r in results.relationships],
            "Confidence": [f"{r.confidence:.2f}" for r in results.relationships]
        })
    
    def update_gaps_display(results):
        """Format research gaps data for display."""
        if not results or not hasattr(results, 'research_gaps') or not results.research_gaps:
            return None
        
        return pd.DataFrame({
            "Description": [g.description for g in results.research_gaps],
            "Related Concepts": [", ".join(g.related_concepts) for g in results.research_gaps],
            "Importance": [g.importance.capitalize() for g in results.research_gaps]
        })
    
    def update_visualization(results, min_confidence):
        """Update network visualization based on minimum confidence."""
        if not results or not hasattr(results, 'concepts') or len(results.concepts) < 2:
            fig, ax = plt.subplots(figsize=(8, 6))
            ax.text(0.5, 0.5, "Not enough concepts to create visualization", 
                   ha='center', va='center', fontsize=12)
            ax.axis('off')
            return fig
        
        return create_network_visualization(results.concepts, results.relationships, min_confidence)
    
    def update_config(analysis_mode, chunk_size, chunk_overlap, min_importance, 
                     max_concepts, relationship_confidence, max_relationships):
        """Update system configuration."""
        try:
            new_config = LitSynthConfig(
                text_chunk_size=chunk_size,
                text_chunk_overlap=chunk_overlap,
                min_concept_importance=min_importance,
                max_concepts=max_concepts,
                relationship_confidence=relationship_confidence,
                max_relationships=max_relationships
            )
            
            litsynth.config = new_config
            settings_summary = f"Settings updated: {analysis_mode} mode, {chunk_size} chunk size, {min_importance} min importance"
            return new_config, settings_summary
        except Exception as e:
            return None, f"Error updating configuration: {str(e)}"
    
    # Create UI
    with gr.Blocks(css="""
        /* Table cell wrapping */
        table td {
            white-space: normal !important;
            word-wrap: break-word !important;
            max-width: 300px !important;
        }
        
        /* Clean styling */
        .section-header {
            font-weight: bold;
            margin-top: 10px;
            margin-bottom: 5px;
            border-bottom: 1px solid rgba(128, 128, 128, 0.3);
            padding-bottom: 3px;
        }
        
        .info-text {
            font-style: italic;
            margin-bottom: 10px;
        }
    """) as app:
        gr.Markdown("# Literature Synthesis Expert System")
        
        # State variables
        results_state = gr.State(None)
        
        with gr.Tabs() as tabs:
            # INPUT TAB
            with gr.Tab("Document Input"):
                with gr.Row():
                    with gr.Column(scale=3):
                        pdf_input = gr.File(
                            label="Upload Scientific PDF", 
                            file_types=[".pdf"],
                            file_count="single"
                        )
                        
                        with gr.Row():
                            analysis_mode = gr.Radio(
                                choices=["quick", "balanced", "thorough"],
                                label="Analysis Mode",
                                value="quick",
                                info="Quick: 30-40s, Balanced: 1-2min, Thorough: 3-5min"
                            )
                            analyze_btn = gr.Button("Analyze Document", variant="primary")
                    
                    with gr.Column(scale=2):
                        status_box = gr.Textbox(
                            label="Status", 
                            interactive=False,
                            value="Ready to analyze. Please upload a PDF document."
                        )
                        progress_bar = gr.Slider(
                            minimum=0, maximum=100, value=0, 
                            label="Processing Progress",
                            interactive=False
                        )
            
            # CONCEPTS TAB
            with gr.Tab("Concepts & Relationships"):
                with gr.Row():
                    doc_info = gr.Markdown("*Upload and analyze a document to see results*")
                
                gr.Markdown("### Understanding the Concepts Table")
                gr.Markdown("This table shows key concepts extracted from the document. Use the filter to focus on specific importance levels.")
                
                with gr.Row():
                    with gr.Column():
                        gr.Markdown("#### Key Concepts", elem_classes=["section-header"])
                        concepts_filter = gr.Radio(
                            choices=["all", "high", "medium", "low"],
                            label="Filter by Importance",
                            value="all"
                        )
                        concepts_table = gr.DataFrame(
                            headers=["Concept", "Definition", "Importance", "Confidence"]
                        )
                    
                    with gr.Column():
                        gr.Markdown("#### Relationships Between Concepts", elem_classes=["section-header"])
                        gr.Markdown("This table shows how concepts connect to each other.", elem_classes=["info-text"])
                        relationships_table = gr.DataFrame(
                            headers=["Source", "Relationship", "Target", "Evidence", "Confidence"]
                        )
            
            # VISUALIZATION TAB
            with gr.Tab("Visualization"):
                gr.Markdown("### Network Visualization Guide")
                gr.Markdown("This visualization shows how concepts relate to each other:")
                gr.Markdown("- **Red nodes**: High importance concepts")
                gr.Markdown("- **Blue nodes**: Medium importance concepts")
                gr.Markdown("- **Green nodes**: Low importance concepts")
                gr.Markdown("- **Node size**: Larger nodes have more connections")
                
                with gr.Row():
                    with gr.Column():
                        with gr.Row():
                            min_confidence = gr.Slider(
                                minimum=0.0, maximum=1.0, value=0.5, step=0.1,
                                label="Minimum Confidence",
                                info="Only show relationships with confidence above this threshold"
                            )
                            refresh_viz_btn = gr.Button("Refresh Visualization")
                        
                        network_plot = gr.Plot(label="Concept Network")
                            
            # SYNTHESIS TAB
            with gr.Tab("Research Synthesis"):
                with gr.Row():
                    with gr.Column(scale=3):
                        gr.Markdown("#### Overview & Key Findings", elem_classes=["section-header"])
                        synthesis_output = gr.Markdown()
                    
                    with gr.Column(scale=2):
                        gr.Markdown("#### Research Gaps", elem_classes=["section-header"])
                        gr.Markdown("These are potential areas for future research that weren't fully addressed.", elem_classes=["info-text"])
                        gaps_table = gr.DataFrame(
                            headers=["Description", "Related Concepts", "Importance"]
                        )
                            
            # SETTINGS TAB
            with gr.Tab("Settings"):
                gr.Markdown("### Settings Guide")
                gr.Markdown("These settings control how document analysis works. For most users, the default settings work well.", elem_classes=["info-text"])
                
                with gr.Row():
                    with gr.Column():
                        gr.Markdown("#### Analysis Settings", elem_classes=["section-header"])
                        settings_analysis_mode = gr.Radio(
                            choices=["quick", "balanced", "thorough"],
                            label="Analysis Mode",
                            value="quick",
                            info="Quick (30-40s): Basic overview | Balanced (1-2min): Standard analysis | Thorough (3-5min): Deep analysis"
                        )
                        
                        gr.Markdown("#### Text Processing", elem_classes=["section-header"])
                        chunk_size = gr.Slider(
                            minimum=500, maximum=8000, value=2000, step=500,
                            label="Chunk Size (chars)",
                            info="Larger = Better context but slower | Smaller = Faster but less context"
                        )
                        chunk_overlap = gr.Slider(
                            minimum=50, maximum=1000, value=200, step=50,
                            label="Chunk Overlap",
                            info="Higher overlap maintains context between chunks"
                        )
                        
                        gr.Markdown("#### Concepts Settings", elem_classes=["section-header"])
                        min_importance = gr.Dropdown(
                            choices=["low", "medium", "high"],
                            label="Min Importance",
                            value="medium",
                            info="Only include concepts above this importance level"
                        )
                        max_concepts = gr.Slider(
                            minimum=5, maximum=100, value=25, step=5,
                            label="Max Concepts",
                            info="Maximum number of concepts to extract"
                        )
                        
                        gr.Markdown("#### Relationships Settings", elem_classes=["section-header"])
                        relationship_confidence = gr.Slider(
                            minimum=0.0, maximum=1.0, value=0.6, step=0.1,
                            label="Min Confidence",
                            info="Only include relationships with confidence above this threshold"
                        )
                        max_relationships = gr.Slider(
                            minimum=10, maximum=200, value=50, step=10,
                            label="Max Relationships",
                            info="Maximum number of relationships to identify"
                        )
                        
                        apply_settings_btn = gr.Button("Apply Settings", variant="primary")
                        settings_status = gr.Textbox(label="Settings Status", interactive=False)
        
        # Event Handlers
        analyze_btn.click(
            fn=process_pdf_document,
            inputs=[pdf_input, analysis_mode],
            outputs=[status_box, results_state, progress_bar]
        ).then(
            fn=update_doc_info,
            inputs=[results_state],
            outputs=[doc_info]
        ).then(
            fn=update_concepts_display,
            inputs=[results_state, gr.State("all")],
            outputs=[concepts_table]
        ).then(
            fn=update_relationships_display,
            inputs=[results_state],
            outputs=[relationships_table]
        ).then(
            fn=update_gaps_display,
            inputs=[results_state],
            outputs=[gaps_table]
        ).then(
            fn=lambda x: x.synthesis_text if x and hasattr(x, 'synthesis_text') else "No synthesis available.",
            inputs=[results_state],
            outputs=[synthesis_output]
        ).then(
            fn=update_visualization,
            inputs=[results_state, min_confidence],
            outputs=[network_plot]
        )
        
        # Filter concepts by importance
        concepts_filter.change(
            fn=update_concepts_display,
            inputs=[results_state, concepts_filter],
            outputs=[concepts_table]
        )
        
        # Refresh visualization
        refresh_viz_btn.click(
            fn=update_visualization,
            inputs=[results_state, min_confidence],
            outputs=[network_plot]
        )
        
        # Update system settings
        apply_settings_btn.click(
            fn=update_config,
            inputs=[
                settings_analysis_mode, chunk_size, chunk_overlap, 
                min_importance, max_concepts, relationship_confidence, max_relationships
            ],
            outputs=[results_state, settings_status]
        )
    
    # Launch the app
    app.launch(inline=True, share=True, inbrowser=True, debug=True)

# Launch with error handling
try:
    launch_litsynth_ui()
    show("UI launched successfully", "success")
except Exception as e:
    show("UI launch failed: " + str(e), "error")

## [Temp] Diagnostics

In [None]:
# Quick Protocol Extraction Diagnostic

# 1. Simple test data
test_text = "We conducted a randomized controlled trial with 60 participants. Participants were randomly assigned to mindfulness intervention or wait-list control. Intervention: 8 weekly 90-minute sessions. Measured at baseline and post-intervention using BDI-II and PSS. Analysis: repeated measures ANOVA."

# 2. Check basic LLM functionality
print("== BASIC LLM TEST ==")
try:
    test_response = llm.invoke("Say 'TEST OK'")
    print(f"LLM response: {test_response.content if hasattr(test_response, 'content') else test_response}")
except Exception as e:
    print(f"LLM ERROR: {str(e)}")

# 3. Check prompt formatting
print("\n== PROMPT TEST ==")
try:
    protocol_prompt = PROMPTS["protocol_extraction"].format(text=test_text)
    print(f"Prompt formatting OK - Length: {len(protocol_prompt)} chars")
except Exception as e:
    print(f"PROMPT ERROR: {str(e)}")
    print("First 200 chars of prompt template:")
    print(PROMPTS["protocol_extraction"][:200])

# 4. Test raw LLM response to protocol prompt
print("\n== RAW LLM RESPONSE ==")
try:
    raw_response = llm.invoke(protocol_prompt)
    print(f"LLM responded - Response type: {type(raw_response)}")
    raw_content = raw_response.content if hasattr(raw_response, 'content') else str(raw_response)
    print(f"First 500 chars:\n{raw_content[:500]}...")
except Exception as e:
    print(f"LLM RESPONSE ERROR: {str(e)}")

# 5. Test parsing function directly
print("\n== PARSING TEST ==")
try:
    from pydantic import ValidationError
    # Get the raw parsing function logic from create_chain
    def try_parse(content):
        # Try direct JSON parsing first
        try:
            if "```json" in content:
                json_block = content.split("```json")[1].split("```")[0].strip()
                data = json.loads(json_block)
                return "JSON block extracted and parsed", data
            # Try regex
            import re
            json_pattern = r'(\{.*?\})'
            matches = re.findall(json_pattern, content, re.DOTALL)
            if matches:
                data = json.loads(max(matches, key=len))
                return "Regex extraction successful", data
            return "No JSON found", None
        except Exception as e:
            return f"Parsing error: {str(e)}", None
    
    parse_result, data = try_parse(raw_content)
    print(f"Parse result: {parse_result}")
    if data:
        print(f"Found data with keys: {list(data.keys())}")
        # Try constructing model
        try:
            protocol = ExperimentalProtocol(**data)
            print(f"Model creation SUCCESS: {protocol.protocol_id}")
        except ValidationError as ve:
            print(f"Validation errors: {ve}")
        except Exception as e:
            print(f"Model creation failed: {str(e)}")
except Exception as e:
    print(f"PARSING TEST ERROR: {str(e)}")

print("\n== DIAGNOSTIC COMPLETE ==")

## Test

In [None]:
# 🧪 Testing & Validation
"""
This condensed testing framework validates the Experimental Architect system.
It tests data models, core functions, and UI rendering in a single workflow.

Run this cell to verify that all components are working as expected before 
implementing the full UI.
"""

import gradio as gr
import time
import json
import re
from typing import Dict, Any, List, Optional

# === METRICS TRACKING ===
class ExpArchMetrics:
    """Tracks performance metrics for ExpArch function calls"""
    
    def __init__(self):
        self.reset()
    
    def reset(self):
        """Reset all metrics"""
        self.steps = []
        self.start_time = time.time()
        self.total_tokens = 0
        self.total_time = 0
    
    def record_step(self, name: str, duration: float, tokens: int = 0, error: Optional[str] = None):
        """Record a processing step with metrics"""
        self.steps.append({
            "name": name,
            "duration": duration,
            "tokens": tokens,
            "timestamp": time.time() - self.start_time,
            "error": error
        })
        self.total_tokens += tokens
        self.total_time += duration
    
    def get_summary(self) -> Dict[str, Any]:
        """Get summary statistics"""
        if not self.steps:
            return {"total_time": 0, "total_tokens": 0, "steps": 0, "est_cost": "$0.00"}
        
        # Calculate estimated cost (rough approximation)
        est_cost = self.total_tokens * 0.00001  # $0.01 per 1K tokens
        
        return {
            "total_time": self.total_time,
            "total_tokens": self.total_tokens,
            "steps": len(self.steps),
            "steps_data": self.steps,
            "est_cost": f"${est_cost:.4f}"
        }
    
    def format_summary(self) -> str:
        """Format metrics as plain text for display"""
        summary = self.get_summary()
        
        text = f"ExpArch System Test Results\n\n"
        text += f"Total Time: {summary['total_time']:.2f}s\n"
        text += f"Total Tokens: {summary['total_tokens']:,}\n"
        text += f"Steps: {summary['steps']}\n"
        text += f"Est. Cost: {summary['est_cost']}\n\n"
        
        text += "Step Details:\n"
        for step in self.steps:
            status = f"Error: {step['error']}" if step["error"] else "Success"
            text += f"- {step['name']}: {step['duration']:.2f}s, {step['tokens']:,} tokens, {status}\n"
        
        return text

# Initialize metrics
metrics = ExpArchMetrics()

# === TEST WORKFLOW ===
def test_exparch_workflow(methods_text: str, config: Optional[ExpArchConfig] = None) -> Dict[str, Any]:
    """Run a complete ExpArch workflow test with metrics tracking.
    
    Args:
        methods_text: Method section text to analyze
        config: Optional configuration override
        
    Returns:
        Dictionary with test results and metrics
    """
    if not methods_text or len(methods_text.strip()) < 100:
        return {
            "error": "Method text too short for meaningful testing. Please provide a longer methods section.",
            "metrics": metrics.get_summary()
        }
    
    # Reset metrics
    metrics.reset()
    
    # Use default config if none provided
    config = config or ExpArchConfig(
        optimization_mode="quick",  # Use quick mode for testing
        agent_max_iterations=3      # Limit iterations for faster testing
    )
    
    results = {}
    
    # Test 1: Protocol Extraction
    try:
        start_time = time.time()
        protocol = extract_protocol_chain(methods_text, config)
        duration = time.time() - start_time
        
        if protocol:
            results["protocol"] = protocol
            # Estimate tokens based on protocol complexity
            tokens = len(methods_text) // 4  # Rough estimate
            metrics.record_step("Protocol Extraction", duration, tokens)
        else:
            metrics.record_step("Protocol Extraction", duration, 0, "Failed to extract protocol")
            return {
                "error": "Protocol extraction failed. Please check the methods text and try again.",
                "metrics": metrics.get_summary()
            }
    except Exception as e:
        metrics.record_step("Protocol Extraction", time.time() - start_time, 0, str(e))
        return {
            "error": f"Protocol extraction error: {str(e)}",
            "metrics": metrics.get_summary()
        }
    
    # Test 2: Agent State Initialization and Search
    try:
        start_time = time.time()
        agent_state = AgentState()
        agent_state.extracted_protocol = protocol
        
        # Simplified search to avoid long waits during testing
        search_result, agent_state = search_related_protocols_agent(
            protocol, 
            agent_state, 
            config
        )
        duration = time.time() - start_time
        
        results["agent_state"] = agent_state
        # Estimate tokens for search based on number of results and steps
        search_tokens = sum(len(res.content) // 4 for res in agent_state.search_results)
        metrics.record_step("Protocol Search", duration, search_tokens)
    except Exception as e:
        metrics.record_step("Protocol Search", time.time() - start_time, 0, str(e))
        results["search_error"] = str(e)
        # Continue even if search fails
    
    # Test 3: Weakness Analysis
    try:
        start_time = time.time()
        weaknesses = analyze_protocol_weaknesses_chain(
            protocol, 
            agent_state.search_results if "agent_state" in results else [],
            config
        )
        duration = time.time() - start_time
        
        results["weaknesses"] = weaknesses
        # Estimate tokens based on results size
        weakness_tokens = len(str(weaknesses)) // 2
        metrics.record_step("Weakness Analysis", duration, weakness_tokens)
    except Exception as e:
        metrics.record_step("Weakness Analysis", time.time() - start_time, 0, str(e))
        results["weakness_error"] = str(e)
        # Continue with empty weaknesses if analysis fails
        results["weaknesses"] = []
    
    # Test 4: Protocol Optimization
    try:
        start_time = time.time()
        optimized_output = generate_optimized_protocol_chain(
            protocol,
            results.get("weaknesses", []),
            agent_state.search_results if "agent_state" in results else [],
            config
        )
        duration = time.time() - start_time
        
        if optimized_output:
            results["optimized_output"] = optimized_output
            # Estimate tokens based on output complexity
            optimization_tokens = len(str(optimized_output.model_dump())) // 2
            metrics.record_step("Protocol Optimization", duration, optimization_tokens)
        else:
            metrics.record_step("Protocol Optimization", duration, 0, "Failed to generate optimized protocol")
    except Exception as e:
        metrics.record_step("Protocol Optimization", time.time() - start_time, 0, str(e))
        results["optimization_error"] = str(e)
    
    # Test 5: Validation (if optimization succeeded)
    if "optimized_output" in results:
        try:
            start_time = time.time()
            validation = validation_summary_chain(
                results["optimized_output"],
                config
            )
            duration = time.time() - start_time
            
            results["validation"] = validation
            # Estimate tokens for validation
            validation_tokens = len(str(validation)) // 2
            metrics.record_step("Protocol Validation", duration, validation_tokens)
        except Exception as e:
            metrics.record_step("Protocol Validation", time.time() - start_time, 0, str(e))
            results["validation_error"] = str(e)
    
    # Add metrics to results
    results["metrics"] = metrics.get_summary()
    
    # Add test status
    if "optimized_output" in results:
        results["status"] = "success"
    elif "protocol" in results:
        results["status"] = "partial"
    else:
        results["status"] = "failed"
    
    return results

# === SIMPLIFIED VISUALIZATION HELPERS ===

def format_protocol_text(protocol):
    """Format protocol as plain text for display"""
    if not protocol:
        return "No protocol available"
    
    text = f"Study Design: {protocol.study_design}\n"
    text += f"Sample Size: {protocol.sample_size or 'Not specified'}\n\n"
    
    # Add variables by type
    text += "Variables:\n"
    for var_type, variables in protocol.variables.items():
        if variables:
            text += f"- {var_type.capitalize()} Variables:\n"
            for var in variables:
                text += f"  * {var.name}: {var.description}\n"
    
    # Add analysis methods
    if protocol.analysis_methods:
        text += "\nAnalysis Methods:\n"
        for method in protocol.analysis_methods:
            text += f"- {method.name}: {method.description}\n"
    
    # Add procedures summary
    if protocol.procedures:
        text += f"\nProcedures ({len(protocol.procedures)} steps):\n"
        text += "First 3 steps:\n"
        for i, step in enumerate(protocol.procedures[:3], 1):
            text += f"{i}. {step.description}\n"
        if len(protocol.procedures) > 3:
            text += f"... {len(protocol.procedures) - 3} more steps ...\n"
    
    return text

def format_improvements_text(optimized_output):
    """Format improvements as plain text for display"""
    if not optimized_output:
        return "No optimization results available"
    
    text = "Protocol Improvements:\n\n"
    
    # Add improvements
    if optimized_output.improvements:
        for i, imp in enumerate(optimized_output.improvements, 1):
            text += f"Improvement {i}: {imp.component}\n"
            text += f"Original: {imp.original_text[:80]}...\n"
            text += f"Improved: {imp.improved_text[:80]}...\n"
            text += f"Justification: {imp.justification[:150]}...\n\n"
    else:
        text += "No specific improvements identified.\n"
    
    return text

def format_agent_reasoning_text(agent_state):
    """Format agent reasoning as plain text for display"""
    if not agent_state or not agent_state.steps:
        return "No agent reasoning available"
    
    text = "Agent Reasoning Process:\n\n"
    
    # Format steps
    for step in agent_state.steps:
        if step.step_type == "thought":
            text += f"THOUGHT:\n{step.content}\n\n"
        elif step.step_type == "action":
            text += f"ACTION: {step.tool_name}\nINPUT: {step.tool_input}\n\n"
        elif step.step_type == "observation":
            observation = step.content[:300] + ('...' if len(step.content) > 300 else '')
            text += f"OBSERVATION:\n{observation}\n\n"
        elif step.step_type == "final_answer":
            text += f"FINAL ANSWER:\n{step.content}\n\n"
    
    return text

# === TESTING INTERFACE ===

def create_test_interface():
    """Create Gradio interface for testing the ExpArch system"""
    
    # Sample methods text for quick testing
    sample_methods = """
    We conducted a randomized controlled trial with 60 participants (30 experimental, 30 control).
    Participants were randomly assigned to either the mindfulness meditation intervention or a wait-list control.
    The intervention group received 8 weekly 90-minute sessions of mindfulness training.
    Outcomes were measured at baseline and post-intervention (8 weeks) using the Beck Depression Inventory (BDI-II)
    and the Perceived Stress Scale (PSS). Analysis was performed using repeated measures ANOVA.
    """
    
    def run_exparch_test(methods_text, optimization_mode):
        """Run ExpArch pipeline test with the provided methods text"""
        if not methods_text or len(methods_text.strip()) < 100:
            return "Methods text too short. Please provide a complete methods section.", "", "", "", ""
        
        # Configure test settings
        config = ExpArchConfig(
            optimization_mode=optimization_mode,
            agent_max_iterations=5 if optimization_mode == "quick" else 8
        )
        
        # Run test workflow
        results = test_exparch_workflow(methods_text, config)
        
        if "error" in results:
            return results["error"], "", "", metrics.format_summary(), ""
        
        # Prepare text outputs
        protocol_text = format_protocol_text(results.get("protocol"))
        agent_text = format_agent_reasoning_text(results.get("agent_state"))
        
        if "optimized_output" in results:
            improvements_text = format_improvements_text(results["optimized_output"])
        else:
            improvements_text = "Optimization did not complete successfully."
        
        metrics_text = metrics.format_summary()
        
        # Return all text outputs
        status = f"Test completed with status: {results['status'].upper()}"
        return status, protocol_text, improvements_text, metrics_text, agent_text
    
    def process_pdf(pdf_file):
        """Extract text from PDF for testing"""
        if not pdf_file:
            return "Please upload a PDF file."
        
        try:
            # Extract text from PDF
            doc_data = load_document("pdf", pdf_file.name)
            text = doc_data.get("text", "")
            
            if not text:
                return "Failed to extract text from PDF."
            
            # Focus on methods section if possible
            methods_section = ""
            
            # Try to find methods section using common headings
            patterns = [
                r"(?i)methods\s*\n(.*?)(?:\n\s*(?:results|discussion)|\Z)",
                r"(?i)methodology\s*\n(.*?)(?:\n\s*(?:results|discussion)|\Z)",
                r"(?i)experimental design\s*\n(.*?)(?:\n\s*(?:results|discussion)|\Z)"
            ]
            
            for pattern in patterns:
                match = re.search(pattern, text, re.DOTALL)
                if match:
                    methods_section = match.group(1).strip()
                    break
            
            if methods_section and len(methods_section) > 200:
                return methods_section
            else:
                # No clear methods section, return first 2000 chars
                return text[:2000]
                
        except Exception as e:
            return f"Error processing PDF: {str(e)}"
    
    # Create the interface
    with gr.Blocks() as demo:
        gr.Markdown("# 🧪 ExpArch System Test")
        gr.Markdown("Test the Experimental Architect system with a methods section from a scientific paper.")
        
        with gr.Row():
            with gr.Column():
                gr.Markdown("### Input Options")
                pdf_input = gr.File(label="Upload PDF (Optional)", file_types=[".pdf"])
                extract_btn = gr.Button("Extract Methods")
                
                methods_text = gr.Textbox(
                    value=sample_methods, 
                    label="Methods Section Text",
                    lines=8, 
                    placeholder="Paste methods section or use PDF extraction"
                )
                
                optimization_mode = gr.Radio(
                    choices=["quick", "balanced", "thorough"],
                    value="quick",
                    label="Optimization Mode",
                    info="Quick is fastest but less comprehensive"
                )
                
                test_btn = gr.Button("Run System Test", variant="primary")
                status_box = gr.Textbox(label="Status")
            
        with gr.Tabs() as tabs:
            with gr.TabItem("Protocol"):
                protocol_display = gr.Textbox(label="Extracted Protocol", lines=15)
            
            with gr.TabItem("Improvements"):
                improvements_display = gr.Textbox(label="Protocol Improvements", lines=15)
            
            with gr.TabItem("Agent Reasoning"):
                agent_display = gr.Textbox(label="Agent Reasoning Process", lines=15)
                
            with gr.TabItem("Metrics"):
                metrics_display = gr.Textbox(label="System Metrics", lines=15)
        
        # Set up event handlers
        pdf_input.change(
            fn=lambda x: None,  # Clear methods text when PDF uploaded
            inputs=[],
            outputs=[methods_text]
        )
        
        extract_btn.click(
            fn=process_pdf,
            inputs=[pdf_input],
            outputs=[methods_text]
        )
        
        test_btn.click(
            fn=run_exparch_test,
            inputs=[methods_text, optimization_mode],
            outputs=[status_box, protocol_display, improvements_display, metrics_display, agent_display]
        )
    
    return demo

# Create and launch test interface
test_interface = create_test_interface()
test_interface.launch(inline=True, share=False)

# Show confirmation
show("ExpArch testing framework initialized - Use the interface to test the system", "success")