# Tutorial 23: Reflexion Pattern

In this tutorial, you'll learn how to implement the **Reflexion pattern** for iterative answer improvement with episodic memory. Unlike simple reflection that improves a single output, Reflexion learns from multiple attempts across iterations.

**What you'll learn:**
- Difference between Reflection and Reflexion patterns
- Creating structured self-critique with missing/superfluous analysis
- Using episodic memory to learn from past attempts
- Executing search queries to gather missing information
- Revising answers based on reflection and search results
- Building the complete Reflexion loop

By the end, you'll have a working Reflexion agent that improves answers through iterative learning and external search.

## Prerequisites

- Understanding of basic LangGraph patterns
- Ollama running with a capable model (llama3.1:8b or larger recommended)
- (Optional) Tavily API key for search functionality

## Why Reflexion?

Traditional agents answer questions once and move on. Even with reflection, they improve a single answer iteratively. **Reflexion** goes further by:

1. **Learning across attempts**: Maintains episodic memory of all previous tries
2. **Self-critique**: Identifies what's missing and what's superfluous
3. **External search**: Actively seeks information to fill knowledge gaps
4. **Iterative refinement**: Each attempt learns from all previous failures

**Comparison:**

| Pattern | Memory | Search | Use Case |
|---------|--------|--------|----------|
| **Reflection** | Single output | No | Improve writing quality |
| **Reflexion** | All attempts | Yes | Research questions, knowledge gaps |

```
┌───────────────────────────────────────────────────────────┐
│                   Reflexion Loop                           │
│                                                             │
│  ┌──────────┐    ┌────────────┐    ┌─────────────┐       │
│  │  Draft   │───►│  Execute   │───►│   Revise    │       │
│  │ Answer + │    │  Search    │    │  Based on   │       │
│  │ Reflect  │    │  Queries   │    │  Results    │       │
│  └────┬─────┘    └────────────┘    └──────┬──────┘       │
│       │                                     │               │
│       │          Episodic Memory            │               │
│       └─────────────────────────────────────┘               │
│                    (Loop until max iterations)              │
└───────────────────────────────────────────────────────────┘
```

## Step 1: Setup and Verify Environment

In [None]:
# Verify our setup
from langgraph_ollama_local import LocalAgentConfig

config = LocalAgentConfig()
print(f"Ollama URL: {config.ollama.base_url}")
print(f"Model: {config.ollama.model}")
print("Setup verified!")

## Step 2: Define the Reflexion State

The key difference from Reflection: we maintain **episodic memory** of ALL attempts and reflections.

In [None]:
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
import operator


class ReflexionState(TypedDict):
    """State for Reflexion pattern with episodic memory."""
    
    # Message history (for tool execution)
    messages: Annotated[list, add_messages]
    
    # The task/question to solve
    task: str
    
    # ALL attempts - uses operator.add to accumulate
    attempts: Annotated[list[dict], operator.add]
    
    # Current attempt being worked on
    current_attempt: str
    
    # ALL reflections - uses operator.add to accumulate
    reflections: Annotated[list[str], operator.add]
    
    # Latest reflection
    current_reflection: str
    
    # Iteration tracking
    iteration: int
    max_iterations: int
    
    # Success flag
    success_achieved: bool


print("Reflexion state defined!")
print("\nKey fields:")
print("- attempts: Accumulates ALL attempts (episodic memory)")
print("- reflections: Accumulates ALL self-critiques")
print("- iteration: Tracks current iteration number")

## Step 3: Define Structured Output Models

We use Pydantic models for reliable self-critique and answer generation.

In [None]:
from pydantic import BaseModel, Field


class Reflection(BaseModel):
    """Structured self-critique of an answer."""
    
    missing: str = Field(
        description="What critical information is missing or incomplete?"
    )
    superfluous: str = Field(
        description="What information is unnecessary or irrelevant?"
    )


class AnswerQuestion(BaseModel):
    """Structured answer with self-critique and search queries."""
    
    answer: str = Field(
        description="Your answer to the question (~250 words)"
    )
    reflection: Reflection = Field(
        description="Self-critique identifying gaps and excess"
    )
    search_queries: list[str] = Field(
        description="1-3 search queries to fill knowledge gaps",
        max_length=3,
    )


class ReviseAnswer(AnswerQuestion):
    """Revised answer with citations."""
    
    references: list[str] = Field(
        description="Sources and citations used"
    )


print("Structured output models defined!")
print("\nModels:")
print("- Reflection: missing + superfluous analysis")
print("- AnswerQuestion: answer + reflection + search_queries")
print("- ReviseAnswer: extends AnswerQuestion with references")

## Step 4: Create the Initial Responder Node

This node generates the first answer attempt with immediate self-critique.

In [None]:
from langchain_core.messages import HumanMessage
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model=config.ollama.model,
    base_url=config.ollama.base_url,
    temperature=0.7,
)


def create_initial_responder(llm):
    """Create initial responder that drafts answer with self-critique."""
    
    # Try structured output
    try:
        structured_llm = llm.with_structured_output(AnswerQuestion)
        use_structured = True
    except (AttributeError, NotImplementedError):
        structured_llm = llm
        use_structured = False
    
    def responder(state: ReflexionState) -> dict:
        """Generate initial answer with reflection."""
        task = state["task"]
        iteration = state.get("iteration", 0)
        previous_attempts = state.get("attempts", [])
        previous_reflections = state.get("reflections", [])
        
        # Build prompt with episodic memory
        prompt = f"""Answer the following question thoughtfully.\n\nQuestion: {task}"""
        
        # Include previous attempts if available
        if previous_attempts and previous_reflections:
            prompt += """\n\nPrevious attempts and what was wrong:"""
            for i, (attempt, reflection) in enumerate(zip(previous_attempts, previous_reflections), 1):
                prompt += f"""\n\nAttempt {i}:
Answer: {attempt.get('answer', 'N/A')[:150]}...
Issues: {reflection[:150]}..."""
            prompt += """\n\nLearn from these mistakes. Avoid repeating them."""
        
        prompt += """\n\nAfter answering:
1. Critique your answer - what's missing? What's unnecessary?
2. Suggest 1-3 search queries to fill knowledge gaps."""
        
        if use_structured:
            try:
                response = structured_llm.invoke([HumanMessage(content=prompt)])
                answer = response.answer
                reflection_obj = response.reflection
                search_queries = response.search_queries
            except Exception:
                response = llm.invoke([HumanMessage(content=prompt)])
                answer = response.content
                reflection_obj = Reflection(
                    missing="Unable to determine",
                    superfluous="Unable to determine"
                )
                search_queries = [task]
        else:
            response = llm.invoke([HumanMessage(content=prompt)])
            answer = response.content
            reflection_obj = Reflection(
                missing="Manual review needed",
                superfluous="Manual review needed"
            )
            search_queries = [task]
        
        reflection_text = f"Missing: {reflection_obj.missing}\nSuperfluous: {reflection_obj.superfluous}"
        
        return {
            "current_attempt": answer,
            "current_reflection": reflection_text,
            "attempts": [{
                "num": iteration + 1,
                "answer": answer,
                "search_queries": search_queries,
            }],
            "iteration": iteration + 1,
        }
    
    return responder


print("Initial responder node creator defined!")

## Step 5: Create a Mock Search Tool

For demonstration, we'll use a mock search tool. In production, use TavilySearchResults or DuckDuckGoSearch.

In [None]:
from langchain_core.tools import BaseTool
from typing import Type, Optional
from pydantic import BaseModel as PydanticBaseModel, Field as PydanticField


class SearchInput(PydanticBaseModel):
    """Input for the search tool."""
    query: str = PydanticField(description="The search query")


class MockSearchTool(BaseTool):
    """Mock search tool for demonstration."""
    
    name: str = "search"
    description: str = "Search for information on the web"
    args_schema: Type[PydanticBaseModel] = SearchInput
    
    def _run(self, query: str) -> str:
        """Mock search that returns generic results."""
        return f"""Search results for: {query}

Result 1: Recent developments in {query} show significant progress.
Result 2: Experts in the field emphasize the importance of understanding {query}.
Result 3: Latest research indicates new applications of {query} in various domains."""
    
    async def _arun(self, query: str) -> str:
        """Async version."""
        return self._run(query)


search_tool = MockSearchTool()
print("Mock search tool created!")
print("\nNote: In production, use:")
print("  from langchain_community.tools.tavily_search import TavilySearchResults")
print("  search_tool = TavilySearchResults(max_results=3)")

## Step 6: Create the Tool Executor Node

This node executes the search queries generated during reflection.

In [None]:
from langchain_core.messages import ToolMessage


def create_tool_executor(search_tool):
    """Create tool executor that runs search queries."""
    
    def executor(state: ReflexionState) -> dict:
        """Execute search queries from last attempt."""
        attempts = state.get("attempts", [])
        
        if not attempts:
            return {"messages": [ToolMessage(
                content="No attempts available",
                tool_call_id="search"
            )]}
        
        last_attempt = attempts[-1]
        queries = last_attempt.get("search_queries", [])
        
        if not queries:
            return {"messages": [ToolMessage(
                content="No search queries generated",
                tool_call_id="search"
            )]}
        
        # Execute each query
        all_results = []
        for query in queries:
            try:
                result = search_tool.invoke(query)
                all_results.append(f"Query: {query}\nResults: {result}\n")
            except Exception as e:
                all_results.append(f"Query: {query}\nError: {str(e)}\n")
        
        combined_results = "\n---\n".join(all_results)
        
        return {
            "messages": [ToolMessage(
                content=combined_results,
                tool_call_id="search"
            )],
        }
    
    return executor


print("Tool executor node creator defined!")

## Step 7: Create the Revisor Node

This node revises the answer using search results and reflection.

In [None]:
def create_revisor(llm):
    """Create revisor that improves answer using search results."""
    
    # Try structured output
    try:
        structured_llm = llm.with_structured_output(ReviseAnswer)
        use_structured = True
    except (AttributeError, NotImplementedError):
        structured_llm = llm
        use_structured = False
    
    def revisor(state: ReflexionState) -> dict:
        """Revise answer using search results and reflection."""
        task = state["task"]
        current_reflection = state.get("current_reflection", "")
        messages = state.get("messages", [])
        
        # Get search results
        search_results = "No search results available"
        if messages:
            for msg in reversed(messages):
                if isinstance(msg, ToolMessage):
                    search_results = msg.content
                    break
        
        prompt = f"""Revise your previous answer using new information.

Question: {task}

Previous reflection:
{current_reflection}

Search results:
{search_results}

Instructions:
1. Incorporate relevant information from search results
2. Address gaps identified in reflection
3. Remove unnecessary information
4. Include citations/references
5. Keep answer around 250 words

Provide revised answer with references."""
        
        if use_structured:
            try:
                response = structured_llm.invoke([HumanMessage(content=prompt)])
                revised_answer = response.answer
                new_reflection = response.reflection
                references = response.references
            except Exception:
                response = llm.invoke([HumanMessage(content=prompt)])
                revised_answer = response.content
                new_reflection = Reflection(
                    missing="Unable to determine",
                    superfluous="Unable to determine"
                )
                references = ["Search results incorporated"]
        else:
            response = llm.invoke([HumanMessage(content=prompt)])
            revised_answer = response.content
            new_reflection = Reflection(
                missing="Manual review needed",
                superfluous="Manual review needed"
            )
            references = ["See search results above"]
        
        reflection_text = f"Missing: {new_reflection.missing}\nSuperfluous: {new_reflection.superfluous}"
        iteration = state.get("iteration", 0)
        
        return {
            "current_attempt": revised_answer,
            "attempts": [{
                "num": iteration + 1,
                "answer": revised_answer,
                "references": references,
            }],
            "reflections": [current_reflection],
            "current_reflection": reflection_text,
        }
    
    return revisor


print("Revisor node creator defined!")

## Step 8: Build the Reflexion Graph

Now we assemble all pieces into the Reflexion loop.

In [None]:
from langgraph.graph import StateGraph, START, END


def create_reflexion_graph(llm, search_tool):
    """Build Reflexion graph for iterative improvement."""
    
    workflow = StateGraph(ReflexionState)
    
    # Add nodes
    workflow.add_node("draft", create_initial_responder(llm))
    workflow.add_node("execute_tools", create_tool_executor(search_tool))
    workflow.add_node("revise", create_revisor(llm))
    
    # Entry: draft initial answer
    workflow.add_edge(START, "draft")
    
    # Draft -> Execute tools
    workflow.add_edge("draft", "execute_tools")
    
    # Execute tools -> Revise
    workflow.add_edge("execute_tools", "revise")
    
    # Conditional: continue or end
    def should_continue(state: ReflexionState) -> str:
        """Determine whether to continue iterating."""
        iteration = state.get("iteration", 0)
        max_iterations = state.get("max_iterations", 3)
        success = state.get("success_achieved", False)
        
        if success or iteration >= max_iterations:
            return END
        
        return "draft"
    
    workflow.add_conditional_edges(
        "revise",
        should_continue,
        {
            "draft": "draft",
            END: END,
        }
    )
    
    return workflow.compile()


# Build the graph
reflexion_graph = create_reflexion_graph(llm, search_tool)

print("Reflexion graph compiled!")
print("\nGraph flow:")
print("  START -> draft")
print("  draft -> execute_tools")
print("  execute_tools -> revise")
print("  revise -> [draft | END]")
print("\nLoop continues until max_iterations or success")

## Step 9: Run a Reflexion Task

Let's test our Reflexion agent on a research question.

In [None]:
def run_reflexion_task(graph, task: str, max_iterations: int = 3):
    """Run a Reflexion task."""
    
    print("="*60)
    print("REFLEXION TASK")
    print("="*60)
    print(f"Question: {task}")
    print(f"Max iterations: {max_iterations}")
    
    initial_state: ReflexionState = {
        "messages": [],
        "task": task,
        "attempts": [],
        "current_attempt": "",
        "reflections": [],
        "current_reflection": "",
        "iteration": 0,
        "max_iterations": max_iterations,
        "success_achieved": False,
    }
    
    result = graph.invoke(initial_state)
    
    # Display all attempts
    print("\n" + "="*60)
    print("ALL ATTEMPTS (Episodic Memory)")
    print("="*60)
    for attempt in result["attempts"]:
        print(f"\n--- Attempt {attempt['num']} ---")
        print(f"Answer: {attempt['answer'][:300]}...")
        if 'search_queries' in attempt:
            print(f"Search queries: {attempt['search_queries']}")
        if 'references' in attempt:
            print(f"References: {attempt['references']}")
    
    # Display all reflections
    print("\n" + "="*60)
    print("ALL REFLECTIONS")
    print("="*60)
    for i, reflection in enumerate(result["reflections"], 1):
        print(f"\nReflection {i}:")
        print(reflection)
    
    # Display final answer
    print("\n" + "="*60)
    print("FINAL ANSWER")
    print("="*60)
    print(result["current_attempt"])
    
    print(f"\nTotal iterations: {result['iteration']}")
    print(f"Total attempts: {len(result['attempts'])}")
    print(f"Total reflections: {len(result['reflections'])}")
    
    return result


# Run a task
result = run_reflexion_task(
    reflexion_graph,
    task="What are the key applications of quantum computing?",
    max_iterations=2
)

## Step 10: Observe Episodic Memory

Let's examine how Reflexion accumulates knowledge across attempts.

In [None]:
# Examine the episodic memory structure
print("Episodic Memory Structure:")
print("\nAttempts:")
for i, attempt in enumerate(result["attempts"], 1):
    print(f"\nAttempt {i}:")
    print(f"  - Number: {attempt['num']}")
    print(f"  - Has answer: {len(attempt.get('answer', '')) > 0}")
    print(f"  - Has search queries: {'search_queries' in attempt}")
    print(f"  - Has references: {'references' in attempt}")

print("\nReflections:")
for i, reflection in enumerate(result["reflections"], 1):
    print(f"\nReflection {i}:")
    lines = reflection.split('\n')
    for line in lines:
        print(f"  {line}")

## Step 11: Try Different Questions

Test Reflexion with various types of questions.

In [None]:
# Question requiring technical depth
result2 = run_reflexion_task(
    reflexion_graph,
    task="Explain how CRISPR gene editing works and its current limitations.",
    max_iterations=2
)

In [None]:
# Question requiring recent information
result3 = run_reflexion_task(
    reflexion_graph,
    task="What are the latest developments in renewable energy storage?",
    max_iterations=2
)

## Step 12: Using the Built-in Module

The `langgraph_ollama_local.patterns` module provides ready-to-use Reflexion functions.

In [None]:
from langgraph_ollama_local.patterns.reflexion import (
    create_reflexion_graph,
    run_reflexion_task,
)

# Use the module's implementation
module_graph = create_reflexion_graph(llm, search_tool)

result = run_reflexion_task(
    module_graph,
    task="What are the ethical implications of artificial general intelligence?",
    max_iterations=3
)

print("\nFinal Answer:")
print(result["current_attempt"])
print(f"\nTotal attempts: {len(result['attempts'])}")
print(f"Total reflections: {len(result['reflections'])}")

## Step 13: Compare Reflection vs Reflexion

Let's highlight the key differences in a side-by-side comparison.

In [None]:
print("="*60)
print("REFLECTION vs REFLEXION")
print("="*60)

comparison = [
    ("Memory", "Single output", "All attempts (episodic)"),
    ("Search", "No external search", "Active search for info"),
    ("Learning", "Improve one answer", "Learn from failures"),
    ("Iterations", "Generate-critique-revise", "Draft-search-revise"),
    ("Use Case", "Writing quality", "Research questions"),
]

print(f"\n{'Aspect':<15} {'Reflection':<25} {'Reflexion':<25}")
print("-" * 65)
for aspect, reflection, reflexion in comparison:
    print(f"{aspect:<15} {reflection:<25} {reflexion:<25}")

print("\nKey Insight:")
print("Reflexion's episodic memory allows it to avoid repeating")
print("the same mistakes across attempts, making it ideal for")
print("complex research questions requiring external information.")

## Complete Code

Here's the complete implementation in one cell for reference.

In [None]:
# === Complete Reflexion Implementation ===

from typing import Annotated, Type
from typing_extensions import TypedDict
import operator

from langchain_core.messages import HumanMessage, ToolMessage
from langchain_core.tools import BaseTool
from langchain_ollama import ChatOllama
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field

from langgraph_ollama_local import LocalAgentConfig


# === State ===
class ReflexionState(TypedDict):
    messages: Annotated[list, add_messages]
    task: str
    attempts: Annotated[list[dict], operator.add]
    current_attempt: str
    reflections: Annotated[list[str], operator.add]
    current_reflection: str
    iteration: int
    max_iterations: int
    success_achieved: bool


# === Models ===
class Reflection(BaseModel):
    missing: str
    superfluous: str


class AnswerQuestion(BaseModel):
    answer: str
    reflection: Reflection
    search_queries: list[str]


# === Quick Example ===
def quick_reflexion_example():
    config = LocalAgentConfig()
    llm = ChatOllama(model=config.ollama.model, base_url=config.ollama.base_url)
    
    # Mock search tool
    class MockSearch(BaseTool):
        name: str = "search"
        description: str = "Search tool"
        
        def _run(self, query: str) -> str:
            return f"Mock results for: {query}"
    
    from langgraph_ollama_local.patterns.reflexion import (
        create_reflexion_graph,
        run_reflexion_task,
    )
    
    graph = create_reflexion_graph(llm, MockSearch())
    result = run_reflexion_task(
        graph,
        task="Explain quantum computing",
        max_iterations=2
    )
    
    return result


if __name__ == "__main__":
    result = quick_reflexion_example()
    print(f"Total attempts: {len(result['attempts'])}")
    print(f"Final answer length: {len(result['current_attempt'])} chars")

## Key Concepts

| Concept | Description |
|---------|-------------|
| **Episodic Memory** | Stores ALL attempts to learn from failures |
| **Self-Reflection** | Identifies missing and superfluous information |
| **Search Integration** | Actively queries external sources |
| **Iterative Learning** | Each attempt improves on previous ones |
| **Structured Output** | Pydantic models for reliable critique |
| **operator.add** | Accumulates attempts and reflections |
| **Tool Messages** | Store search results in message history |

## Best Practices

1. **Set reasonable iteration limits**: 2-3 iterations usually sufficient
2. **Use quality search tools**: TavilySearchResults or DuckDuckGoSearch in production
3. **Monitor episodic memory**: Check that previous attempts inform new ones
4. **Structured output crucial**: Ensures reliable reflection parsing
5. **Limit search queries**: 1-3 queries per iteration prevents overload
6. **Track references**: Citations improve answer credibility
7. **Compare attempts**: Review how answers improve across iterations

## What's Next

Congratulations! You've implemented the Reflexion pattern. You now understand:
- How episodic memory enables learning from failures
- Structured self-critique with missing/superfluous analysis
- Integration of external search for knowledge gaps
- Iterative refinement with accumulated reflections

Continue exploring:
- Tutorial 24: LATS (Tree search for agents)
- Tutorial 25: ReWOO (Decoupled planning)
- Combine Reflexion with evaluation patterns
- Use real search APIs for production applications