# Real Multi-Agent Pipeline: Research → Writer → Editor

This notebook demonstrates a **real production multi-agent pipeline** using LangChain with CERT SDK instrumentation.

## Pipeline Architecture

```
User Query → Researcher Agent → Writer Agent → Editor Agent → Final Output
```

**Agents:**
1. **Researcher**: Gathers information and key points
2. **Writer**: Creates initial content draft
3. **Editor**: Refines and polishes the final output

**CERT Metrics Measured:**
- Individual agent quality scores
- Context propagation effect (γ) - performance changes from accumulated context
- Pipeline health score
- Execution timing and observability

**Estimated time:** 3-5 minutes

## Setup and Installation

In [None]:
# Install required packages
# Option 1: From PyPI (when available)
# !pip install cert-sdk langchain langchain-openai

# Option 2: Directly from GitHub repository (development version)
# !pip install git+https://github.com/Javihaus/CERT.git langchain langchain-openai

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

import cert
from cert.integrations.langchain import CERTLangChain

## Enter API Key

In [None]:
from getpass import getpass

api_key = getpass("Enter your OpenAI API key: ")

## Initialize CERT Provider and Integration

In [None]:
# Create CERT provider for baseline comparison
cert_provider = cert.create_provider(
    api_key=api_key,
    model_name="gpt-4o",
    temperature=0.7,
    max_tokens=1024,
)

# Get validated baseline
baseline = cert.ModelRegistry.get_model("gpt-4o")

print(f"✓ Using {baseline.model_id}")
print(f"  Baseline: C={baseline.consistency:.3f}, μ={baseline.mean_performance:.3f}")

In [None]:
# Initialize CERT LangChain integration
cert_integration = CERTLangChain(
    provider=cert_provider,
    baseline=baseline,
    verbose=True,  # Print execution details
)

print("✓ CERT LangChain integration initialized")

## Define the Three Agents

We'll create three specialized agents using **LangChain's LCEL (LangChain Expression Language)** - the simplest possible approach!

In [None]:
# Initialize LangChain LLM
llm = ChatOpenAI(
    api_key=api_key,
    model="gpt-4o",
    temperature=0.7,
)

print("✓ LangChain LLM initialized")

In [None]:
# Agent 1: Researcher (using LCEL: prompt | llm)
researcher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a research expert. Analyze the user's question and provide key research points, facts, and insights. Be thorough and factual.",
        ),
        ("human", "{input}"),
    ]
)

researcher = researcher_prompt | llm
print("✓ Researcher agent created (LCEL chain: prompt | llm)")

In [None]:
# Agent 2: Writer (using LCEL: prompt | llm)
writer_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a professional writer. Take the research points provided and create a well-structured, engaging article. Focus on clarity and flow.",
        ),
        ("human", "Research points:\n{input}"),
    ]
)

writer = writer_prompt | llm
print("✓ Writer agent created (LCEL chain: prompt | llm)")

In [None]:
# Agent 3: Editor (using LCEL: prompt | llm)
editor_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert editor. Review the draft article and improve it by fixing grammar, enhancing clarity, and ensuring professional quality. Keep the core content but polish it.",
        ),
        ("human", "Draft to edit:\n{input}"),
    ]
)

editor = editor_prompt | llm
print("✓ Editor agent created (LCEL chain: prompt | llm)")

## Create Instrumented Agents

Wrap each LCEL chain with CERT instrumentation. CERT now natively supports LCEL chains!

In [None]:
# Wrap the LCEL chains with CERT instrumentation
instrumented_researcher = cert_integration.wrap_agent(
    agent=researcher,
    agent_id="researcher",
    agent_name="Research Agent",
    calculate_quality=True,
)

instrumented_writer = cert_integration.wrap_agent(
    agent=writer,
    agent_id="writer",
    agent_name="Writer Agent",
    calculate_quality=True,
)

instrumented_editor = cert_integration.wrap_agent(
    agent=editor,
    agent_id="editor",
    agent_name="Editor Agent",
    calculate_quality=True,
)

print("✓ All agents wrapped with CERT instrumentation")

## Run the Pipeline

Execute the agents sequentially, with each agent processing the output of the previous one.

In [None]:
# Define the user query
user_query = (
    "Explain the key factors in building successful multi-model LLM systems for production."
)

print(f"User Query: {user_query}")
print("\n" + "=" * 70)
print("Executing Pipeline...")
print("=" * 70)

In [None]:
# Reset metrics
cert_integration.reset_metrics()

# Execute pipeline: each agent processes the output of the previous one
research_output = instrumented_researcher.invoke({"input": user_query})
draft_output = instrumented_writer.invoke({"input": research_output.content})
final_output = instrumented_editor.invoke({"input": draft_output.content})

# Calculate CERT metrics
cert_integration.calculate_coordination_effect()
cert_integration.calculate_pipeline_health()

print("\n" + "=" * 70)
print("Pipeline Execution Complete")
print("=" * 70)

## View Results

In [None]:
# Display final output
print("\n" + "=" * 70)
print("FINAL OUTPUT")
print("=" * 70)
print(final_output.content)
print("=" * 70)

## View CERT Metrics

Now let's see the CERT metrics collected during execution.

In [None]:
# Print comprehensive metrics
cert_integration.print_metrics()

## Interpret the Results

### Quality Scores
- Each agent's output is scored for semantic relevance, linguistic coherence, and content density
- Higher scores (closer to 1.0) indicate better quality

### Context Propagation Effect (γ)
**What it measures:**
- Performance changes when models process accumulated context
- How attention mechanisms handle extended context in sequential processing

**What it does NOT measure:**
- ❌ Agent coordination, collaboration, or planning
- ❌ Intelligence or reasoning capabilities
- ❌ WHY context helps (black box measurement)

**Interpretation:**
- **γ > 1.0**: Sequential context accumulation improves performance
- **γ = 1.0**: No benefit from accumulated context
- **γ < 1.0**: Context accumulation degrades performance

### Pipeline Health
- **H > 0.8**: Production ready - deploy with confidence
- **0.6 < H < 0.8**: Acceptable - deploy with monitoring
- **H < 0.6**: Needs investigation before production

### Execution Timing
- Shows duration for each agent
- Helps identify bottlenecks

## Get Metrics Programmatically

In [None]:
# Access metrics as dictionary
metrics_dict = cert_integration.get_metrics_summary()

print("Metrics Summary:")
for key, value in metrics_dict.items():
    print(f"  {key}: {value}")

## Try Different Queries

Test the pipeline with different types of queries to see how context propagation effects vary.

In [None]:
# Example queries to try
example_queries = [
    "What are the challenges in deploying LLM agents to production?",
    "Explain how to measure AI agent performance and reliability.",
    "Compare different multi-model frameworks for enterprise applications.",
]

print("Try these queries:")
for i, query in enumerate(example_queries, 1):
    print(f"  {i}. {query}")

## Production Deployment

### Key Takeaways:

1. **Use CERT metrics** to validate your pipeline before production
2. **Monitor context propagation effect (γ)** - if γ drops, investigate sequential processing behavior
3. **Track health score** - set alerts if it falls below your threshold
4. **Measure consistently** - run CERT measurements regularly to detect drift

### What CERT Measures:

✅ **Engineering Characterization:**
- Statistical behavior of sequential LLM processing
- Performance changes from context accumulation
- Attention mechanism effects (black box measurement)
- Operational metrics for deployment decisions

❌ **What CERT Does NOT Measure:**
- Agent coordination or collaboration
- Intelligence or reasoning capabilities
- WHY models improve with context

This is **engineering instrumentation**, not coordination science.

### Next Steps:

- Try the CrewAI integration: `examples/crewai_pipeline.ipynb`
- Learn about custom baselines: `examples/advanced_usage.py`
- Explore the full API: See README.md for documentation