##**Claude-Only CERT Multi-Agent Coordination Demo**
---

This framework provides systematic measurement of coordination patterns in current LLM systems. While these systems manipulate discrete tokens based on statistical correlations rather than learning continuous representations of the physical world, this infrastructure will be essential when we develop architectures based on learned world models.

**What this measures:**
Coordination behaviors between sophisticated pattern-matching systems.

**What this enables:**
Deployment scaffolding for current technology + experimental apparatus for studying genuine coordination when it emerges from proper architectures.

###**Setup Instructions**
####1. API Configuration
python# Set your Anthropic API key
api_key = "sk-ant-api03-your-key-here"
####2. Agent Configuration (2-10 agents)
Each agent requires four parameters:

>Agent ID: Unique identifier (agent_1, summarizer, etc.)

>Model: Choose from available Claude models

>Role: Agent's specialized function (Document Analyst, Critical Reviewer)

>Task: Specific instructions for this agent's analysis approach

Available Models:
```
claude-opus-4-20250514 - #Most powerful, complex reasoning
claude-sonnet-4-20250514 - #Balanced performance (recommended)
claude-3-5-haiku-20241022 - #Fastest response times
claude-3-haiku-20240307 - #Legacy model for comparison
```
####3. Coordination Configuration
Global Task: Overall objective for the multi-agent system

Coordination Pattern:
>Sequential: Agents process in order, building on previous outputs

>Parallel: All agents analyze simultaneously, independent perspectives

####4. Document Upload
Upload PDF documents for analysis. The system extracts text content and uses it as context for agent coordination.

###**Execution Process**

####Phase 1: Individual Analysis

>**Behavioral Consistency Score ($C$)**


>How reliably an agent produces similar responses to identical tasks.

>for $C$ in the range 0.9-1.0: Highly reliable

>for $C$ in the range 0.7-0.9: Moderately consistent

>for $C$<0.7: Unreliable for deployment


####Phase 2: Multi-Agent Coordination
Agents coordinate according to selected pattern while system tracks:

>Conversation flow: Complete interaction sequence

>Response quality: Success/failure rates

>Timing patterns: Response latencies and bottlenecks

####Phase 3: Coordination Effect Measurement
>**Coordination Effect ($\gamma$)**

>Performance change when agents work together vs. alone.

<center>$\gamma = \frac{\textrm{Coordinated Performance}}{\textrm{Individual Performance}}$</center>

> $\gamma$ > 1.0: Agents help each other

> $\gamma$ = 1.0: No coordination benefit

> $\gamma$ < 1.0: Agents interfere with each other

###**Interactive Visualization**
Conversation Timeline: Real-time tracking of agent interactions with step-by-step conversation flow
performance four-panel analysis showing:

> Agent consistency scores over time

> Response time patterns by agent

> Coordination effects across experiments

> Success rates and error analysis




## Installs and Imports

In [None]:
##Install required packages and clone the CERT repository##
#!pip install anthropic transformers torch dotenv pycryptodome PyPDF2
#!pip install -q watermark
## Clone CERT repository##
#!git clone https://github.com/Javihaus/cert-coordination-observability.git
#!cd cert-coordination-observability && pip install -e .

In [None]:
%load_ext watermark
%watermark

**Test environment**

Python implementation: CPython <br>
Python version       : 3.11.13<br>
IPython version      : 7.34.0<br>

Compiler    : GCC 11.4.0<br>
OS          : Linux<br>
Release     : 6.1.123+<br>
Machine     : x86_64<br>
Processor   : x86_64<br>
CPU cores   : 2<br>
Architecture: 64bit<br>

In [None]:
import os
import asyncio
import time
import json
import numpy as np
import pandas as pd
from datetime import datetime
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import anthropic
import PyPDF2
import io
from google.colab import files
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets


In [None]:
ANTHROPIC_API_KEY="xxxxxxxxxxx"

## Code

In [None]:
# CERT Core Measurement Components
@dataclass
class AgentInteraction:
    timestamp: datetime
    agent_id: str
    model: str
    role: str
    task: str
    prompt: str
    response: str
    response_time: float
    success: bool
    error: Optional[str] = None
    metadata: Dict[str, Any] = None

@dataclass
class CoordinationStep:
    step_number: int
    agent_id: str
    input_context: str
    output: str
    reasoning: str
    timestamp: datetime

class CERTMeasurement:
    """Core measurement logic for behavioral consistency and coordination effects"""

    @staticmethod
    def calculate_behavioral_consistency(responses: List[str]) -> float:
        """Calculate consistency score (β) from multiple responses"""
        if len(responses) < 2:
            return 1.0

        # Simple token-based consistency measurement
        token_sets = [set(response.lower().split()) for response in responses]

        similarities = []
        for i in range(len(token_sets)):
            for j in range(i + 1, len(token_sets)):
                intersection = len(token_sets[i].intersection(token_sets[j]))
                union = len(token_sets[i].union(token_sets[j]))
                similarity = intersection / union if union > 0 else 0
                similarities.append(similarity)

        return np.mean(similarities) if similarities else 0.0

    @staticmethod
    def calculate_coordination_effect(individual_performances: List[float],
                                    coordinated_performance: float) -> float:
        """Calculate coordination effect (γ)"""
        expected_performance = np.mean(individual_performances)
        if expected_performance == 0:
            return 1.0
        return coordinated_performance / expected_performance

class ClaudeAgent:
    """Individual Claude agent with specific role and model"""

    def __init__(self, agent_id: str, model: str, role: str, task_description: str, api_key: str):
        self.agent_id = agent_id
        self.model = model
        self.role = role
        self.task_description = task_description
        self.client = anthropic.Anthropic(api_key=api_key)
        self.interaction_history = []
        self.performance_metrics = {
            "consistency_scores": [],
            "response_times": [],
            "error_count": 0,
            "success_count": 0
        }

    async def generate_response(self, prompt: str, context: str = "",
                              max_tokens: int = 500) -> AgentInteraction:
        """Generate response and track performance metrics"""
        start_time = time.time()

        try:
            full_prompt = f"""Role: {self.role}
Task: {self.task_description}

Context: {context}

Request: {prompt}

Respond according to your role and task. Be concise but thorough."""

            response = self.client.messages.create(
                model=self.model,
                max_tokens=max_tokens,
                messages=[{"role": "user", "content": full_prompt}]
            )

            response_time = time.time() - start_time
            response_text = response.content[0].text

            interaction = AgentInteraction(
                timestamp=datetime.now(),
                agent_id=self.agent_id,
                model=self.model,
                role=self.role,
                task=self.task_description,
                prompt=prompt,
                response=response_text,
                response_time=response_time,
                success=True,
                metadata={"context": context, "max_tokens": max_tokens}
            )

            self.interaction_history.append(interaction)
            self.performance_metrics["success_count"] += 1
            self.performance_metrics["response_times"].append(response_time)

            return interaction

        except Exception as e:
            response_time = time.time() - start_time

            interaction = AgentInteraction(
                timestamp=datetime.now(),
                agent_id=self.agent_id,
                model=self.model,
                role=self.role,
                task=self.task_description,
                prompt=prompt,
                response="",
                response_time=response_time,
                success=False,
                error=str(e)
            )

            self.interaction_history.append(interaction)
            self.performance_metrics["error_count"] += 1

            return interaction

    async def measure_consistency(self, prompt: str, trials: int = 3) -> float:
        """Measure behavioral consistency across multiple trials"""
        responses = []

        for _ in range(trials):
            interaction = await self.generate_response(prompt)
            if interaction.success:
                responses.append(interaction.response)
            await asyncio.sleep(0.5)  # Rate limiting

        if len(responses) >= 2:
            consistency = CERTMeasurement.calculate_behavioral_consistency(responses)
            self.performance_metrics["consistency_scores"].append(consistency)
            return consistency

        return 0.0

class CoordinationOrchestrator:
    """Manages multi-agent coordination and conversation tracking"""

    def __init__(self, agents: List[ClaudeAgent]):
        self.agents = agents
        self.coordination_history = []
        self.conversation_log = []
        self.coordination_effects = []

    async def run_sequential_coordination(self, initial_prompt: str,
                                        document_content: str = "") -> List[CoordinationStep]:
        """Run sequential coordination between agents"""
        coordination_steps = []
        current_context = f"Document: {document_content}\n\nInitial Task: {initial_prompt}"

        for i, agent in enumerate(self.agents):
            step_prompt = f"Step {i+1}: {current_context}"

            interaction = await agent.generate_response(step_prompt, current_context)

            step = CoordinationStep(
                step_number=i + 1,
                agent_id=agent.agent_id,
                input_context=current_context,
                output=interaction.response if interaction.success else f"ERROR: {interaction.error}",
                reasoning=f"Agent {agent.agent_id} ({agent.role}) processing step {i+1}",
                timestamp=interaction.timestamp
            )

            coordination_steps.append(step)
            self.conversation_log.append({
                "step": i + 1,
                "agent": agent.agent_id,
                "role": agent.role,
                "input": current_context[:200] + "..." if len(current_context) > 200 else current_context,
                "output": step.output,
                "timestamp": step.timestamp,
                "success": interaction.success
            })

            # Update context for next agent
            if interaction.success:
                current_context = f"Previous analysis: {interaction.response}\n\nContinue the analysis:"

            await asyncio.sleep(1)  # Rate limiting between agents

        self.coordination_history.extend(coordination_steps)
        return coordination_steps

    async def run_parallel_coordination(self, task_prompt: str,
                                      document_content: str = "") -> List[CoordinationStep]:
        """Run parallel coordination where all agents work simultaneously"""
        coordination_steps = []
        context = f"Document: {document_content}\n\nTask: {task_prompt}"

        # All agents process the same prompt simultaneously
        tasks = []
        for i, agent in enumerate(self.agents):
            tasks.append(agent.generate_response(task_prompt, context))

        interactions = await asyncio.gather(*tasks)

        for i, (agent, interaction) in enumerate(zip(self.agents, interactions)):
            step = CoordinationStep(
                step_number=i + 1,
                agent_id=agent.agent_id,
                input_context=context,
                output=interaction.response if interaction.success else f"ERROR: {interaction.error}",
                reasoning=f"Agent {agent.agent_id} ({agent.role}) parallel processing",
                timestamp=interaction.timestamp
            )

            coordination_steps.append(step)
            self.conversation_log.append({
                "step": i + 1,
                "agent": agent.agent_id,
                "role": agent.role,
                "input": context[:200] + "..." if len(context) > 200 else context,
                "output": step.output,
                "timestamp": step.timestamp,
                "success": interaction.success
            })

        self.coordination_history.extend(coordination_steps)
        return coordination_steps

    def measure_coordination_effect(self, coordination_steps: List[CoordinationStep]) -> float:
        """Measure overall coordination effect"""
        # Get individual baseline performances
        individual_performances = []
        for agent in self.agents:
            if agent.performance_metrics["consistency_scores"]:
                individual_performances.append(np.mean(agent.performance_metrics["consistency_scores"]))
            else:
                individual_performances.append(0.5)  # Default baseline

        # Simulate coordinated performance based on successful steps
        successful_steps = [step for step in coordination_steps if "ERROR" not in step.output]
        coordinated_performance = len(successful_steps) / len(coordination_steps) if coordination_steps else 0

        gamma = CERTMeasurement.calculate_coordination_effect(individual_performances, coordinated_performance)

        self.coordination_effects.append({
            "timestamp": datetime.now(),
            "individual_performances": individual_performances,
            "coordinated_performance": coordinated_performance,
            "coordination_effect": gamma,
            "successful_steps": len(successful_steps),
            "total_steps": len(coordination_steps)
        })

        return gamma

class CERTVisualizer:
    """Creates interactive visualizations of coordination behavior"""

    @staticmethod
    def create_conversation_timeline(conversation_log: List[Dict]) -> go.Figure:
        """Create interactive timeline of agent conversations"""
        df = pd.DataFrame(conversation_log)

        if df.empty:
            return go.Figure().add_annotation(text="No conversation data available")

        fig = px.timeline(
            df,
            x_start="timestamp",
            x_end="timestamp",
            y="agent",
            color="role",
            title="Agent Conversation Timeline",
            hover_data=["step", "success"]
        )

        # Add conversation content as annotations
        for i, row in df.iterrows():
            fig.add_annotation(
                x=row["timestamp"],
                y=row["agent"],
                text=f"Step {row['step']}: {row['output'][:50]}...",
                showarrow=True,
                arrowhead=2
            )

        fig.update_layout(height=400)
        return fig

    @staticmethod
    def create_performance_dashboard(agents: List[ClaudeAgent],
                                   coordination_effects: List[Dict]) -> go.Figure:
        """Create comprehensive performance dashboard"""

        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=(
                "Agent Consistency Scores (β)",
                "Response Times",
                "Coordination Effects (γ)",
                "Success Rates"
            ),
            specs=[
                [{"type": "bar"}, {"type": "scatter"}],
                [{"type": "scatter"}, {"type": "bar"}]
            ]
        )

        # Consistency scores
        agent_names = [agent.agent_id for agent in agents]
        consistency_scores = []

        for agent in agents:
            if agent.performance_metrics["consistency_scores"]:
                consistency_scores.append(np.mean(agent.performance_metrics["consistency_scores"]))
            else:
                consistency_scores.append(0.0)

        fig.add_trace(
            go.Bar(
                x=agent_names,
                y=consistency_scores,
                name="Consistency (β)",
                marker=dict(color=consistency_scores, colorscale="Viridis")
            ),
            row=1, col=1
        )

        # Response times
        for agent in agents:
            if agent.performance_metrics["response_times"]:
                fig.add_trace(
                    go.Scatter(
                        x=list(range(len(agent.performance_metrics["response_times"]))),
                        y=agent.performance_metrics["response_times"],
                        mode="lines+markers",
                        name=f"{agent.agent_id} Response Time"
                    ),
                    row=1, col=2
                )

        # Coordination effects
        if coordination_effects:
            gammas = [ce["coordination_effect"] for ce in coordination_effects]
            timestamps = [ce["timestamp"] for ce in coordination_effects]

            fig.add_trace(
                go.Scatter(
                    x=timestamps,
                    y=gammas,
                    mode="lines+markers",
                    name="Coordination Effect (γ)",
                    line=dict(color="red")
                ),
                row=2, col=1
            )

        # Success rates
        success_rates = []
        for agent in agents:
            total = agent.performance_metrics["success_count"] + agent.performance_metrics["error_count"]
            if total > 0:
                success_rates.append(agent.performance_metrics["success_count"] / total)
            else:
                success_rates.append(0.0)

        fig.add_trace(
            go.Bar(
                x=agent_names,
                y=success_rates,
                name="Success Rate",
                marker=dict(color=success_rates, colorscale="RdYlGn")
            ),
            row=2, col=2
        )

        fig.update_layout(height=800, title_text="CERT Multi-Agent Performance Dashboard")
        return fig

class PDFProcessor:
    """Handles PDF upload and text extraction"""

    @staticmethod
    def upload_and_extract() -> Dict[str, str]:
        """Upload PDF and extract text content"""
        print("📄 Upload PDF documents for analysis...")
        uploaded = files.upload()

        documents = {}
        for filename, content in uploaded.items():
            if filename.endswith('.pdf'):
                try:
                    pdf_reader = PyPDF2.PdfReader(io.BytesIO(content))
                    text = ""
                    for page in pdf_reader.pages:
                        text += page.extract_text() + "\n"

                    documents[filename] = text
                    print(f"✅ Extracted {len(text)} characters from {filename}")

                except Exception as e:
                    print(f"❌ Error processing {filename}: {str(e)}")
            else:
                print(f"⚠️ Skipping {filename} (not a PDF)")

        return documents

# Available Claude Models
CLAUDE_MODELS = {
    "claude-opus-4-20250514": "Claude Opus 4 (Latest, Most Powerful)",
    "claude-sonnet-4-20250514": "Claude Sonnet 4 (Latest, Balanced)",
    "claude-3-5-haiku-20241022": "Claude 3.5 Haiku (Fast)",
    "claude-3-7-sonnet-20250219": "Claude 3.7 Sonnet",
    "claude-3-5-sonnet-20241022": "Claude 3.5 Sonnet",
    "claude-3-5-sonnet-20240620": "Claude 3.5 Sonnet (June)",
    "claude-3-haiku-20240307": "Claude 3 Haiku (Legacy)"
}

# Configuration Interface
def create_agent_config_interface(max_agents: int = 10):
    """Create interactive interface for configuring agents"""

    def create_agent_widgets(agent_num: int):
        agent_id = widgets.Text(
            value=f"agent_{agent_num}",
            description=f"Agent {agent_num} ID:",
            style={'description_width': 'initial'}
        )

        model = widgets.Dropdown(
            options=list(CLAUDE_MODELS.keys()),
            value="claude-sonnet-4-20250514",
            description=f"Model:",
            style={'description_width': 'initial'}
        )

        role = widgets.Text(
            value=f"Analyst {agent_num}",
            description=f"Role:",
            style={'description_width': 'initial'}
        )

        task = widgets.Textarea(
            value=f"Analyze documents and provide insights based on your specialized perspective",
            description=f"Task:",
            style={'description_width': 'initial'}
        )

        return {
            "id": agent_id,
            "model": model,
            "role": role,
            "task": task
        }

    # Number of agents selector
    num_agents = widgets.IntSlider(
        value=3,
        min=2,
        max=max_agents,
        description="Number of Agents:",
        style={'description_width': 'initial'}
    )

    # Global task configuration
    global_task = widgets.Textarea(
        value="Analyze the uploaded PDF document and provide comprehensive insights",
        description="Global Task:",
        style={'description_width': 'initial'}
    )

    coordination_pattern = widgets.Dropdown(
        options=["sequential", "parallel"],
        value="sequential",
        description="Coordination Pattern:",
        style={'description_width': 'initial'}
    )

    return {
        "num_agents": num_agents,
        "global_task": global_task,
        "coordination_pattern": coordination_pattern,
        "create_agent_widgets": create_agent_widgets
    }

In [None]:
async def run_claude_cert_demo(
    api_key: str,
    agent_configs: List[Dict[str, str]],
    global_task: str,
    coordination_pattern: str = "sequential",
    consistency_trials: int = 3
):
    """
    Run the complete Claude-only CERT coordination demonstration

    Args:
        api_key: Anthropic API key
        agent_configs: List of agent configurations with id, model, role, task
        global_task: Overall coordination task
        coordination_pattern: "sequential" or "parallel"
        consistency_trials: Number of trials for consistency measurement
    """

    print("🎯 Claude-Only CERT Multi-Agent Coordination Analysis")
    print("=" * 60)
    print(f"Agents: {len(agent_configs)}")
    print(f"Pattern: {coordination_pattern}")
    print(f"Task: {global_task}")
    print()

    # Create agents
    agents = []
    for config in agent_configs:
        agent = ClaudeAgent(
            agent_id=config["id"],
            model=config["model"],
            role=config["role"],
            task_description=config["task"],
            api_key=api_key
        )
        agents.append(agent)
        print(f"✅ Created {config['id']} using {CLAUDE_MODELS[config['model']]}")

    # Upload and process documents
    documents = PDFProcessor.upload_and_extract()

    if not documents:
        print("❌ No documents uploaded")
        return None

    # Take first document for analysis
    doc_name, doc_content = list(documents.items())[0]
    print(f"\n📄 Analyzing: {doc_name}")

    # Phase 1: Individual agent consistency measurement
    print("\n🔍 Phase 1: Individual Agent Analysis")
    for agent in agents:
        print(f"  Testing {agent.agent_id}...")
        consistency = await agent.measure_consistency(global_task, consistency_trials)
        print(f"    Consistency (β): {consistency:.3f}")

    # Phase 2: Coordination measurement
    print(f"\n🤝 Phase 2: Multi-Agent Coordination ({coordination_pattern})")

    orchestrator = CoordinationOrchestrator(agents)

    if coordination_pattern == "sequential":
        coordination_steps = await orchestrator.run_sequential_coordination(global_task, doc_content)
    else:
        coordination_steps = await orchestrator.run_parallel_coordination(global_task, doc_content)

    # Calculate coordination effect
    gamma = orchestrator.measure_coordination_effect(coordination_steps)
    print(f"Coordination Effect (γ): {gamma:.3f}")

    # Phase 3: Generate visualizations
    print("\n📊 Phase 3: Generating Interactive Visualizations")

    # Conversation timeline
    timeline_fig = CERTVisualizer.create_conversation_timeline(orchestrator.conversation_log)

    # Performance dashboard
    dashboard_fig = CERTVisualizer.create_performance_dashboard(agents, orchestrator.coordination_effects)

    # Display results
    display(HTML("<h2>🎯 CERT Claude-Only Coordination Results</h2>"))
    display(HTML(f"<p><b>Document:</b> {doc_name}</p>"))
    display(HTML(f"<p><b>Agents:</b> {len(agents)} | <b>Pattern:</b> {coordination_pattern}</p>"))

    print("\n📈 Agent Conversation Timeline:")
    display(timeline_fig)

    print("\n📊 Performance Dashboard:")
    display(dashboard_fig)

    # Summary statistics
    print("\n📋 Summary Statistics:")
    print("=" * 30)

    for agent in agents:
        if agent.performance_metrics["consistency_scores"]:
            avg_consistency = np.mean(agent.performance_metrics["consistency_scores"])
            avg_response_time = np.mean(agent.performance_metrics["response_times"])
            success_rate = agent.performance_metrics["success_count"] / (
                agent.performance_metrics["success_count"] + agent.performance_metrics["error_count"]
            )

            print(f"{agent.agent_id}:")
            print(f"  • Model: {CLAUDE_MODELS[agent.model]}")
            print(f"  • Consistency (β): {avg_consistency:.3f}")
            print(f"  • Avg Response Time: {avg_response_time:.2f}s")
            print(f"  • Success Rate: {success_rate:.2%}")

    print(f"\nOverall Coordination Effect (γ): {gamma:.3f}")

    if gamma > 1.0:
        print("✅ Positive coordination detected - agents enhance each other's performance")
    elif gamma < 1.0:
        print("⚠️ Negative coordination detected - agents may be interfering with each other")
    else:
        print("➡️ Neutral coordination - no significant coordination effect")

    # Return complete results
    return {
        "agents": agents,
        "orchestrator": orchestrator,
        "coordination_steps": coordination_steps,
        "coordination_effect": gamma,
        "conversation_log": orchestrator.conversation_log,
        "documents": documents
    }

## Run the Demo

###**Example Configuration**
```
agent_configs = [
    {
        "id": "primary_analyst",
        "model": "claude-sonnet-4-20250514",
        "role": "Primary Document Analyst",
        "task": "Extract key themes and main arguments from documents"
    },
    {
        "id": "critical_reviewer",
        "model": "claude-3-5-haiku-20241022",
        "role": "Critical Reviewer",
        "task": "Identify gaps, contradictions, and areas needing clarification"
    },
    {
        "id": "synthesizer",
        "model": "claude-opus-4-20250514",
        "role": "Synthesis Specialist",
        "task": "Integrate multiple perspectives into coherent conclusions"
    }
]

global_task = "Analyze the uploaded document for strategic insights and actionable recommendations"

run_claude_cert_demo(
    api_key: str,
    agent_configs: agent_configs,
    global_task: global_task,
    coordination_pattern: str = "sequential" #Optional:Parallel - All agents analyze simultaneously, independent perspectives
    consistency_trials: int = 3
)
```



In [None]:
agent_configs = [
    {
        "id": "primary_analyst",
        "model": "claude-sonnet-4-20250514",
        "role": "Primary Document Analyst",
        "task": "Extract key themes and main arguments from documents"
    },
    {
        "id": "critical_reviewer",
        "model": "claude-3-5-haiku-20241022",
        "role": "Critical Reviewer",
        "task": "Identify gaps, contradictions, and areas needing clarification"
    },
    {
        "id": "synthesizer",
        "model": "claude-opus-4-20250514",
        "role": "Synthesis Specialist",
        "task": "Integrate multiple perspectives into coherent conclusions"
    }
]

global_task = "Analyze the uploaded document for strategic insights and actionable recommendations"

run_claude_cert_demo(
    api_key: ANTHROPIC_API_KEY,
    agent_configs: agent_configs,
    global_task: global_task,
    coordination_pattern: str = "sequential" #Optional:Parallel - All agents analyze simultaneously, independent perspectives
    consistency_trials: int = 3
)