# Real Multi-Agent Pipeline: Research → Writer → Editor

This notebook demonstrates a **real production multi-agent pipeline** using LangChain with CERT SDK instrumentation.

## Pipeline Architecture

```
User Query → Researcher Agent → Writer Agent → Editor Agent → Final Output
```

**Agents:**
1. **Researcher**: Gathers information and key points
2. **Writer**: Creates initial content draft
3. **Editor**: Refines and polishes the final output

**CERT Metrics Measured:**
- Individual agent quality scores
- Coordination effect (γ) - how agents improve each other
- Pipeline health score
- Execution timing and observability

**Estimated time:** 3-5 minutes

## Setup and Installation

In [None]:
# Install required packages
# !pip install cert-sdk langchain langgraph langchain-openai

In [None]:
import asyncio
import cert
from cert.integrations.langchain import CERTLangChain

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import HumanMessage

## Enter API Key

In [None]:
from getpass import getpass

api_key = getpass("Enter your OpenAI API key: ")

## Initialize CERT Provider and Integration

In [None]:
# Create CERT provider for baseline comparison
cert_provider = cert.create_provider(
    api_key=api_key,
    model_name="gpt-4o",
    temperature=0.7,
    max_tokens=1024,
)

# Get validated baseline
baseline = cert.ModelRegistry.get_model("gpt-4o")

print(f"✓ Using {baseline.model_id}")
print(f"  Baseline: C={baseline.consistency:.3f}, μ={baseline.mean_performance:.3f}")

In [None]:
# Initialize CERT LangChain integration
cert_integration = CERTLangChain(
    provider=cert_provider,
    baseline=baseline,
    verbose=True,  # Print execution details
)

print("✓ CERT LangChain integration initialized")

## Define the Three Agents

We'll create three specialized agents with distinct roles.

In [None]:
# Initialize LangChain LLM
llm = ChatOpenAI(
    api_key=api_key,
    model="gpt-4o",
    temperature=0.7,
)

print("✓ LangChain LLM initialized")

In [None]:
# Agent 1: Researcher
researcher_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research expert. Analyze the user's question and provide key research points, facts, and insights. Be thorough and factual."),
    ("human", "{input}"),
])

class ResearcherAgent:
    def __init__(self, llm, prompt):
        self.llm = llm
        self.prompt = prompt
    
    def invoke(self, input_data):
        messages = input_data.get("messages", [])
        user_input = messages[-1] if messages else input_data.get("input", "")
        
        if isinstance(user_input, dict):
            user_input = user_input.get("content", str(user_input))
        elif hasattr(user_input, "content"):
            user_input = user_input.content
        else:
            user_input = str(user_input)
        
        formatted = self.prompt.format_messages(input=user_input)
        response = self.llm.invoke(formatted)
        
        return {
            "messages": [
                {"role": "user", "content": user_input},
                {"role": "assistant", "content": response.content}
            ]
        }

researcher = ResearcherAgent(llm, researcher_prompt)
print("✓ Researcher agent created")

In [None]:
# Agent 2: Writer
writer_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a professional writer. Take the research points provided and create a well-structured, engaging article. Focus on clarity and flow."),
    ("human", "Research points: {input}"),
])

class WriterAgent:
    def __init__(self, llm, prompt):
        self.llm = llm
        self.prompt = prompt
    
    def invoke(self, input_data):
        messages = input_data.get("messages", [])
        research_content = messages[-1].get("content", "") if messages else ""
        
        formatted = self.prompt.format_messages(input=research_content)
        response = self.llm.invoke(formatted)
        
        return {
            "messages": [
                {"role": "user", "content": research_content},
                {"role": "assistant", "content": response.content}
            ]
        }

writer = WriterAgent(llm, writer_prompt)
print("✓ Writer agent created")

In [None]:
# Agent 3: Editor
editor_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert editor. Review the draft article and improve it by fixing grammar, enhancing clarity, and ensuring professional quality. Keep the core content but polish it."),
    ("human", "Draft to edit: {input}"),
])

class EditorAgent:
    def __init__(self, llm, prompt):
        self.llm = llm
        self.prompt = prompt
    
    def invoke(self, input_data):
        messages = input_data.get("messages", [])
        draft_content = messages[-1].get("content", "") if messages else ""
        
        formatted = self.prompt.format_messages(input=draft_content)
        response = self.llm.invoke(formatted)
        
        return {
            "messages": [
                {"role": "user", "content": draft_content},
                {"role": "assistant", "content": response.content}
            ]
        }

editor = EditorAgent(llm, editor_prompt)
print("✓ Editor agent created")

## Create Instrumented Pipeline

Now we'll wrap each agent with CERT instrumentation and create the pipeline.

In [None]:
# Wrap agents with CERT instrumentation
instrumented_researcher = cert_integration.wrap_agent(
    agent=researcher,
    agent_id="researcher",
    agent_name="Research Agent",
    calculate_quality=True,
)

instrumented_writer = cert_integration.wrap_agent(
    agent=writer,
    agent_id="writer",
    agent_name="Writer Agent",
    calculate_quality=True,
)

instrumented_editor = cert_integration.wrap_agent(
    agent=editor,
    agent_id="editor",
    agent_name="Editor Agent",
    calculate_quality=True,
)

print("✓ All agents instrumented with CERT metrics")

In [None]:
# Create the pipeline using CERT's helper
pipeline = cert_integration.create_multi_agent_pipeline([
    {"agent": researcher, "agent_id": "researcher", "agent_name": "Research Agent"},
    {"agent": writer, "agent_id": "writer", "agent_name": "Writer Agent"},
    {"agent": editor, "agent_id": "editor", "agent_name": "Editor Agent"},
])

print("✓ Multi-agent pipeline created")
print("\n  Pipeline: Research → Write → Edit → Final Output")

## Run the Pipeline

Let's test the pipeline with a real query.

In [None]:
# Define the user query
user_query = "Explain the key factors in building successful multi-agent AI systems for production."

print(f"User Query: {user_query}")
print("\n" + "="*70)
print("Executing Pipeline...")
print("="*70)

In [None]:
# Run the pipeline
result = pipeline({"messages": [{"role": "user", "content": user_query}]})

print("\n" + "="*70)
print("Pipeline Execution Complete")
print("="*70)

## View Results

In [None]:
# Display final output
final_output = result["messages"][-1]["content"]

print("\n" + "="*70)
print("FINAL OUTPUT")
print("="*70)
print(final_output)
print("="*70)

## View CERT Metrics

Now let's see the CERT metrics collected during execution.

In [None]:
# Print comprehensive metrics
cert_integration.print_metrics()

## Interpret the Results

### Quality Scores
- Each agent's output is scored for semantic relevance, linguistic coherence, and content density
- Higher scores (closer to 1.0) indicate better quality

### Coordination Effect (γ)
- **γ > 1.0**: Agents are coordinating well - each agent improves on the previous
- **γ = 1.0**: No coordination benefit
- **γ < 1.0**: Agents may be interfering with each other

### Pipeline Health
- **H > 0.8**: Production ready - deploy with confidence
- **0.6 < H < 0.8**: Acceptable - deploy with monitoring
- **H < 0.6**: Needs investigation before production

### Execution Timing
- Shows duration for each agent
- Helps identify bottlenecks

## Get Metrics Programmatically

In [None]:
# Access metrics as dictionary
metrics_dict = cert_integration.get_metrics_summary()

print("Metrics Summary:")
for key, value in metrics_dict.items():
    print(f"  {key}: {value}")

## Try Different Queries

Test the pipeline with different types of queries to see how coordination effects vary.

In [None]:
# Example queries to try
example_queries = [
    "What are the challenges in deploying LLM agents to production?",
    "Explain how to measure AI agent performance and reliability.",
    "Compare different multi-agent frameworks for enterprise applications.",
]

print("Try these queries:")
for i, query in enumerate(example_queries, 1):
    print(f"  {i}. {query}")

## Production Deployment

### Key Takeaways:

1. **Use CERT metrics** to validate your pipeline before production
2. **Monitor coordination effect** - if γ drops, investigate agent interactions
3. **Track health score** - set alerts if it falls below your threshold
4. **Measure consistently** - run CERT measurements regularly to detect drift

### Next Steps:

- Try the CrewAI integration: `examples/crewai_pipeline.ipynb`
- Learn about custom baselines: `examples/advanced_usage.ipynb`
- Explore the full API: See README.md for documentation