# ðŸ“˜ Agentic Architectures 6: Planner â†’ Executor â†’ Verifier (PEV)

In this notebook, we explore the **Planner â†’ Executor â†’ Verifier (PEV)** architecture, a pattern that introduces a critical layer of robustness and self-correction into agentic systems. This architecture is inspired by rigorous software engineering and quality assurance processes, where work is not considered 'done' until it has been verified.

While a standard Planning agent brings structure and predictability, it operates on a crucial assumption: that its tools will work perfectly and return valid data every time. In the real world, APIs fail, searches return no results, and data can be malformed. The PEV pattern addresses this by adding a dedicated **Verifier** agent that acts as a quality assurance check after every action, enabling the system to detect failures and dynamically recover.

To demonstrate its value, we will first build a standard **Planner-Executor agent** and show how it fails when a tool returns an error. Then, we will build a full **PEV agent** to show how the Verifier catches the error, triggers a re-planning loop, and ultimately guides the system to a successful outcome.

### Definition
The **Planner â†’ Executor â†’ Verifier (PEV)** architecture is a three-stage workflow that explicitly separates the acts of planning, executing, and verifying. It ensures that the output of each step is validated before the agent proceeds, creating a robust, self-correcting loop.

### High-level Workflow

1.  **Plan:** A 'Planner' agent decomposes a high-level goal into a sequence of concrete, executable steps.
2.  **Execute:** An 'Executor' agent takes the *next* step from the plan and calls the appropriate tool.
3.  **Verify:** A 'Verifier' agent examines the output from the Executor. It checks for correctness, relevance, and potential errors. It then produces a judgment: did the step succeed or fail?
4.  **Route & Iterate:** Based on the Verifier's judgment, a router decides the next move:
    *   If the step **succeeded** and the plan is not complete, loop back to the Executor for the next step.
    *   If the step **failed**, loop back to the Planner to create a *new* plan, often providing the failure context so the new plan can be smarter.
    *   If the step **succeeded** and the plan is complete, proceed to a final synthesis step.

### When to Use / Applications
*   **Safety-Critical Applications (Finance, Healthcare):** When the cost of an error is high, PEV provides essential guardrails to prevent the agent from acting on bad data.
*   **Systems with Unreliable Tools:** When dealing with external APIs that might be flaky or return inconsistent data, the Verifier can catch failures gracefully.
*   **High-Precision Tasks (Legal, Scientific):** For tasks requiring a high degree of factual accuracy, the Verifier ensures each piece of retrieved information is valid before it's used in downstream reasoning.

### Strengths & Weaknesses
*   **Strengths:**
    *   **Robustness & Reliability:** Its core strength is the ability to detect and recover from errors.
    *   **Modularity:** The separation of concerns makes the system easier to debug and maintain.
*   **Weaknesses:**
    *   **Increased Latency & Cost:** The addition of a verification step after every action adds more LLM calls, making it the slowest and most expensive of the architectures we've covered so far.
    *   **Verifier Complexity:** Designing an effective Verifier can be challenging. It needs to be smart enough to distinguish between minor issues and critical failures.

## Phase 0: Foundation & Setup

We will begin by installing our libraries and configuring our API keys for Nebius, LangSmith, and our tools.

### Step 0.1: Installing Core Libraries

**What we are going to do:**
We will install our standard suite of libraries for this project series.

In [1]:
# !pip install -q -U langchain-nebius langchain langgraph rich python-dotenv langchain-tavily

### Step 0.2: Importing Libraries and Setting Up Keys

**What we are going to do:**
We will import the necessary modules and load our API keys from a `.env` file.

**Action Required:** Create a `.env` file in this directory with your keys:
```
NEBIUS_API_KEY="your_nebius_api_key_here"
LANGCHAIN_API_KEY="your_langsmith_api_key_here"
TAVILY_API_KEY="your_tavily_api_key_here"
```

In [1]:
import os
import re
from typing import List, Annotated, TypedDict, Optional
from dotenv import load_dotenv
import json

# LangChain components
from langchain_nebius import ChatNebius
from langchain_tavily import TavilySearch
from langchain_core.messages import BaseMessage, ToolMessage
from pydantic import BaseModel, Field

# LangGraph components
from langgraph.graph import StateGraph, END

# For pretty printing
from rich.console import Console
from rich.markdown import Markdown
from langchain_openai import ChatOpenAI

# --- API Key and Tracing Setup ---
load_dotenv()

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Agentic Architecture - PEV (Nebius)"

for key in ["NEBIUS_API_KEY", "LANGCHAIN_API_KEY", "TAVILY_API_KEY"]:
    if not os.environ.get(key):
        print(f"{key} not found. Please create a .env file and set it.")

print("Environment variables loaded and tracing is set up.")

Environment variables loaded and tracing is set up.


  class TavilyResearch(BaseTool):  # type: ignore[override, override]
  class TavilyResearch(BaseTool):  # type: ignore[override, override]


## Phase 1: The Baseline - A Planner-Executor Agent

To understand the need for a Verifier, we must first build an agent without one. This agent will create a plan and follow it blindly, demonstrating the potential for failure when a tool call goes wrong.

### Step 1.1: Building the Planner-Executor Agent

**What we are going to do:**
We will construct a simple Planner-Executor graph, similar to the one from the previous notebook. To simulate a real-world failure, we'll create a special 'flaky' tool. This tool will intentionally return an error message for a specific query, which our basic agent will have no way of handling.

In [2]:
console = Console()
llm = ChatNebius(model="meta-llama/Meta-Llama-3.1-8B-Instruct", temperature=0)
llm = ChatOpenAI(model='gpt-4o-mini')

# Define a 'flaky' tool that will fail for a specific query
def flaky_web_search(query: str) -> str:
    """Performs a web search, but is designed to fail for a specific query."""
    console.print(f"--- TOOL: Searching for '{query}'... ---")
    if "employee count" in query.lower():
        console.print("--- TOOL: [bold red]Simulating API failure![/bold red] ---")
        return "Error: Could not retrieve data. The API endpoint is currently unavailable."
    else:
        result = TavilySearch(max_results=2).invoke(query)
        # ðŸ”‘ Ensure result is always a string
        if isinstance(result, (dict, list)):
            return json.dumps(result, indent=2)
        return str(result)

# Define the state for the basic P-E agent
class BasicPEState(TypedDict):
    user_request: str
    plan: Optional[List[str]]
    intermediate_steps: List[str]
    final_answer: Optional[str]

class Plan(BaseModel):
    steps: List[str] = Field(description="A list of tool calls to execute.")

def basic_planner_node(state: BasicPEState):
    console.print("--- (Basic) PLANNER: Creating plan... ---")
    planner_llm = llm.with_structured_output(Plan)

    prompt = f"""
    You are a planning agent. 
    Your job is to decompose the user's request into a list of clear tool queries.

    - Only return JSON that matches this schema: {{ "steps": [ "query1", "query2", ... ] }}
    - Do NOT return any prose or explanation.
    - Always use the 'flaky_web_search' tool for queries.

    User's request: "{state['user_request']}"
    """
    plan = planner_llm.invoke(prompt)
    return {"plan": plan.steps}

def basic_executor_node(state: BasicPEState):
    console.print("--- (Basic) EXECUTOR: Running next step... ---")
    next_step = state["plan"][0]
    result = flaky_web_search(next_step)
    return {"plan": state["plan"][1:], "intermediate_steps": state["intermediate_steps"] + [result]}

def basic_synthesizer_node(state: BasicPEState):
    console.print("--- (Basic) SYNTHESIZER: Generating final answer... ---")
    context = "\n".join(state["intermediate_steps"])
    prompt = f"Synthesize an answer for '{state['user_request']}' using this data:\n{context}"
    answer = llm.invoke(prompt).content
    return {"final_answer": answer}

# Build the graph
pe_graph_builder = StateGraph(BasicPEState)
pe_graph_builder.add_node("plan", basic_planner_node)
pe_graph_builder.add_node("execute", basic_executor_node)
pe_graph_builder.add_node("synthesize", basic_synthesizer_node)

pe_graph_builder.set_entry_point("plan")
pe_graph_builder.add_conditional_edges("plan", lambda s: "execute" if s["plan"] else "synthesize")
pe_graph_builder.add_conditional_edges("execute", lambda s: "execute" if s["plan"] else "synthesize")
pe_graph_builder.add_edge("synthesize", END)

basic_pe_app = pe_graph_builder.compile()
print("Basic Planner-Executor agent compiled successfully.")

Basic Planner-Executor agent compiled successfully.


### Step 1.2: Testing the Basic Agent on the 'Flaky' Problem

**What we are going to do:**
We will now give the basic agent a task that requires it to call our `flaky_web_search` tool with the specific query we know will fail. This will demonstrate its inability to handle the error.

In [3]:
flaky_query = "What was Apple's R&D spend in their last fiscal year, and what was their total employee count? Calculate the R&D spend per employee."

console.print(f"[bold yellow]Testing BASIC P-E agent on a flaky query:[/bold yellow]\n'{flaky_query}'\n")

initial_pe_input = {"user_request": flaky_query, "intermediate_steps": []}
final_pe_output = basic_pe_app.invoke(initial_pe_input)

console.print("\n--- [bold red]Final Output from Basic P-E Agent[/bold red] ---")
console.print(Markdown(final_pe_output['final_answer']))

**Discussion of the Output:**
Failure, as expected. The execution trace shows the agent created a plan, likely `["Apple R&D spend last fiscal year", "Apple total employee count"]`. It successfully executed the first step. However, on the second step, our `flaky_web_search` tool returned an error message string.

The crucial failure is in the final step. The **Synthesizer**, having no way to know the second step failed, received the error message as if it were valid data. Its final answer is therefore nonsensical, likely stating something like "I cannot perform the calculation because one of the inputs is an error message." It blindly followed the plan to completion, resulting in a useless output. This demonstrates the critical need for a verification step.

## Phase 2: The Advanced Approach - A Planner-Executor-Verifier Agent

Now we'll build the full PEV agent. We will add a dedicated **Verifier** node and create a more sophisticated routing logic that enables the agent to recover from the tool failure.

### Step 2.1: Defining the Verifier and the PEV Graph

**What we are going to do:**
1.  Define a `VerificationResult` Pydantic model for the Verifier's structured output.
2.  Create the `verifier_node`, which will analyze the output of the Executor.
3.  Create a new, more complex `router` that can handle the Verifier's feedback and trigger a re-planning loop.

In [4]:
class VerificationResult(BaseModel):
    """Schema for the Verifier's output."""
    is_successful: bool = Field(description="True if the tool execution was successful and the data is valid.")
    reasoning: str = Field(description="Reasoning for the verification decision.")

class PEVState(TypedDict):
    user_request: str
    plan: Optional[List[str]]
    last_tool_result: Optional[str]
    intermediate_steps: List[str]
    final_answer: Optional[str]
    retries: int  # count how many times weâ€™ve replanned

from langchain_core.exceptions import OutputParserException

class Plan(BaseModel):
    steps: List[str] = Field(
        description="List of queries (max 5).",
        max_items=5
    )

def pev_planner_node(state: PEVState):
    retries = state.get("retries", 0)
    if retries > 3:  # stop after 3 replans
        console.print("--- (PEV) PLANNER: Retry limit reached. Stopping. ---")
        return {
            "plan": [],
            "final_answer": "Error: Unable to complete task after multiple retries."
        }

    console.print(f"--- (PEV) PLANNER: Creating/revising plan (retry {retries})... ---")

    planner_llm = llm.with_structured_output(Plan, strict=True)  # âœ… strict schema

    past_context = "\n".join(state["intermediate_steps"])
    base_prompt = f"""
    You are a planning agent. 
    Create a plan to answer: '{state['user_request']}'. 
    Use the 'flaky_web_search' tool.

    Rules:
    - Return ONLY valid JSON in this exact format: {{ "steps": ["query1", "query2"] }}
    - Maximum 5 steps.
    - Do NOT repeat failed queries or endless variations.
    - Do NOT output explanations, only JSON.

    Previous attempts and results:
    {past_context}
    """

    # âœ… retry wrapper for bad JSON
    for attempt in range(2):
        try:
            plan = planner_llm.invoke(base_prompt)
            return {"plan": plan.steps, "retries": retries + 1}
        except OutputParserException as e:
            console.print(f"[red]Planner parsing failed (attempt {attempt+1}): {e}[/red]")
            base_prompt = f"Return ONLY valid JSON with {{'steps': ['...']}}. {base_prompt}"

    # ultimate fallback to avoid crashing
    return {"plan": ["Apple R&D spend last fiscal year"], "retries": retries + 1}



def pev_executor_node(state: PEVState):
    if not state.get("plan"):  # âœ… guard against empty plan
        console.print("--- (PEV) EXECUTOR: No steps left, skipping execution. ---")
        return {}
    
    console.print("--- (PEV) EXECUTOR: Running next step... ---")
    next_step = state["plan"][0]
    result = flaky_web_search(next_step)
    return {"plan": state["plan"][1:], "last_tool_result": result}

def verifier_node(state: PEVState):
    console.print("--- VERIFIER: Checking last tool result... ---")
    verifier_llm = llm.with_structured_output(VerificationResult)
    prompt = f"Verify if the following tool output is a successful result or an error message. The task was '{state['user_request']}'.\n\nTool Output: '{state['last_tool_result']}'"
    verification = verifier_llm.invoke(prompt)
    console.print(f"--- VERIFIER: Judgment is '{'Success' if verification.is_successful else 'Failure'}' ---")
    if verification.is_successful:
        # If successful, add the valid result to our list of good steps
        return {"intermediate_steps": state["intermediate_steps"] + [state['last_tool_result']]}
    else:
        # If failed, add the failure reason and trigger re-planning by clearing the plan
        return {"plan": [], "intermediate_steps": state["intermediate_steps"] + [f"Verification Failed: {state['last_tool_result']}"]}

pev_synthesizer_node = basic_synthesizer_node # We can reuse the same synthesizer

def pev_router(state: PEVState):
    # âœ… If we already have a final answer (e.g. retry limit reached), stop
    if state.get("final_answer"):
        console.print("--- ROUTER: Final answer available. Moving to synthesizer. ---")
        return "synthesize"

    if not state["plan"]:
        # Check if plan is empty because of verification failure
        if state["intermediate_steps"] and "Verification Failed" in state["intermediate_steps"][-1]:
            console.print("--- ROUTER: Verification failed. Re-planning... ---")
            return "plan"
        else:
            console.print("--- ROUTER: Plan complete. Moving to synthesizer. ---")
            return "synthesize"
    else:
        console.print("--- ROUTER: Plan has more steps. Continuing execution. ---")
        return "execute"


# Build the PEV graph
pev_graph_builder = StateGraph(PEVState)
pev_graph_builder.add_node("plan", pev_planner_node)
pev_graph_builder.add_node("execute", pev_executor_node)
pev_graph_builder.add_node("verify", verifier_node)
pev_graph_builder.add_node("synthesize", pev_synthesizer_node)

pev_graph_builder.set_entry_point("plan")
pev_graph_builder.add_edge("plan", "execute")
pev_graph_builder.add_edge("execute", "verify")
pev_graph_builder.add_conditional_edges("verify", pev_router)
pev_graph_builder.add_edge("synthesize", END)

pev_agent_app = pev_graph_builder.compile()
print("Planner-Executor-Verifier (PEV) agent compiled successfully.")

Planner-Executor-Verifier (PEV) agent compiled successfully.


/var/folders/b_/0hx_r16d2v712fg4k7y_rh140000gp/T/ipykernel_81998/2298251164.py:17: PydanticDeprecatedSince20: `max_items` is deprecated and will be removed, use `max_length` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
  steps: List[str] = Field(


## Phase 3: Head-to-Head Comparison

Now for the crucial test. We will run the robust PEV agent on the same flaky task and observe how it successfully navigates the tool failure.

In [5]:
console.print(f"[bold green]Testing PEV agent on the same flaky query:[/bold green]\n'{flaky_query}'\n")

initial_pev_input = {"user_request": flaky_query, "intermediate_steps": [], "retries": 0}

final_pev_output = pev_agent_app.invoke(initial_pev_input)

console.print("\n--- [bold green]Final Output from PEV Agent[/bold green] ---")
console.print(Markdown(final_pev_output['final_answer']))

**Discussion of the Output:**
Success! The execution trace tells a story of resilience:
1.  **Plan 1:** The agent initially creates a plan similar to the basic agent's.
2.  **Execute & Fail:** It executes the first step successfully but fails on the second (employee count), receiving an error message.
3.  **Verify & Catch:** The `Verifier` node receives the error message, and its LLM correctly judges that this is a failed step (`is_successful: False`). It adds this failure information to the state.
4.  **Router & Re-plan:** The `Router` sees the verification failure and sends the execution back to the `Planner`.
5.  **Plan 2:** The `Planner`, now aware of the previous failure, creates a *new, smarter plan*. It might try a different search query, like "Apple number of employees worldwide", to circumvent the API failure.
6.  **Execute & Succeed:** It executes the new plan, which now succeeds.
7.  **Verify & Pass:** The Verifier confirms the new data is valid.
8.  **Synthesize:** The Synthesizer receives only valid data and produces the correct final answer.

This clearly demonstrates how the PEV architecture's self-correction loop allows it to overcome obstacles that would completely derail a simpler agent.

## Phase 4: Quantitative Evaluation

Finally, we'll use an LLM-as-a-Judge to score both agents on their robustness and ability to handle errors.

In [6]:
class RobustnessEvaluation(BaseModel):
    """Schema for evaluating an agent's robustness and error handling."""
    task_completion_score: int = Field(description="Score 1-10 on whether the agent successfully completed the task, ignoring data errors.")
    error_handling_score: int = Field(description="Score 1-10 on the agent's ability to detect and recover from errors.")
    justification: str = Field(description="A brief justification for the scores.")

judge_llm = llm.with_structured_output(RobustnessEvaluation)

def evaluate_agent_robustness(query: str, final_state: dict):
    context = "\n".join(final_state.get("intermediate_steps", []))
    final_answer = final_state.get("final_answer", "")
    trace = f"Context:\n{context}\n\nFinal Answer:\n{final_answer}"
        
    prompt = f"""You are an expert judge of AI agents. A tool used by the agent was designed to fail on a specific query. Evaluate the agent's ability to handle this failure.
    
    **User's Task:** {query}
    **Full Agent Trace:**\n```\n{trace}\n```
    """
    return judge_llm.invoke(prompt)

console.print("--- Evaluating Basic P-E Agent's Robustness ---")
pe_agent_evaluation = evaluate_agent_robustness(flaky_query, final_pe_output)
console.print(pe_agent_evaluation.model_dump())

console.print("\n--- Evaluating PEV Agent's Robustness ---")
pev_agent_evaluation = evaluate_agent_robustness(flaky_query, final_pev_output)
console.print(pev_agent_evaluation.model_dump())

**Discussion of the Output:**
The judge's scores provide a stark contrast. The **Basic P-E Agent** will receive a very low `error_handling_score` because it failed to recognize the tool error and produced a nonsensical final answer. In contrast, the **PEV Agent** will receive a near-perfect `error_handling_score`. The judge's justification will praise its ability to detect the failure, trigger a re-planning loop, and ultimately recover to provide a correct answer.

This evaluation quantitatively proves the value of the PEV architecture. It is not just about getting the right answer when things go well; it's about not getting the wrong answer when things go wrong.

## Conclusion

In this notebook, we have implemented the **Planner â†’ Executor â†’ Verifier** architecture and demonstrated its superior robustness compared to a simple Planner-Executor model. By introducing a dedicated Verifier node, we have given our agent a critical 'immune system' that can detect and recover from failures that would otherwise be fatal to the task.

This pattern is more resource-intensive, but for applications where reliability and accuracy are paramount, the trade-off is essential. The PEV architecture represents a significant step towards building truly dependable AI agents that can operate safely and effectively in the unpredictable, real-world environment of external tools and APIs.