# Tutorial 19: Map-Reduce Agents

In this tutorial, you'll build a **map-reduce pattern** for parallel agent execution with result aggregation.

**What you'll learn:**
- How to split complex tasks into parallel subtasks
- Creating worker agents that process subtasks independently
- Implementing fan-out/fan-in patterns for parallel execution
- Aggregating results from multiple parallel workers
- When to use map-reduce vs sequential processing

By the end, you'll have a working map-reduce system that processes tasks in parallel and synthesizes results.

## Prerequisites

- Completed Tutorials 14-16 (Multi-Agent Patterns)
- Understanding of parallel execution concepts
- Ollama running with a capable model (llama3.1:8b or larger recommended)

## Why Map-Reduce?

Map-reduce is ideal for tasks that can be parallelized:

1. **Scalability**: Process large workloads by distributing across workers
2. **Speed**: Parallel execution is faster than sequential processing
3. **Independence**: Subtasks don't depend on each other
4. **Aggregation**: Combine results into a coherent final output

**Common use cases:**
- Document analysis (each worker analyzes a section)
- Data processing (each worker processes a chunk)
- Multi-perspective analysis (each worker takes a different angle)
- Large-scale summarization

```
                     ┌─────────────────┐
                     │     Mapper      │
                     │  (Split Task)   │
                     └────────┬────────┘
                              │
          ┌───────────────────┼───────────────────┐
          │                   │                   │
          ▼                   ▼                   ▼
   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
   │  Worker 1   │     │  Worker 2   │     │  Worker 3   │
   │  (Parallel) │     │  (Parallel) │     │  (Parallel) │
   └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
          │                   │                   │
          └───────────────────┼───────────────────┘
                              │
                              ▼
                     ┌─────────────────┐
                     │    Reducer      │
                     │  (Aggregate)    │
                     └─────────────────┘
```

## Step 1: Setup and Verify Environment

In [None]:
# Verify our setup
from langgraph_ollama_local import LocalAgentConfig

config = LocalAgentConfig()
print(f"Ollama URL: {config.ollama.base_url}")
print(f"Model: {config.ollama.model}")
print("Setup verified!")

## Step 2: Define the Map-Reduce State

Our state tracks:
- The main task
- Subtasks created by the mapper
- Results from all workers
- The final aggregated result

In [None]:
from typing import Annotated
from typing_extensions import TypedDict
import operator


class MapReduceState(TypedDict):
    """State schema for map-reduce pattern."""
    
    # Main task to process
    task: str
    
    # List of subtasks (one per worker)
    subtasks: list[str]
    
    # Results from workers - uses operator.add to accumulate
    worker_results: Annotated[list[dict], operator.add]
    
    # Final aggregated result
    final_result: str


print("State defined!")
print("\nKey fields:")
print("- subtasks: Created by mapper, one per worker")
print("- worker_results: Accumulated outputs using operator.add")
print("- final_result: Synthesized by reducer")

## Step 3: Create the Mapper Node

The mapper splits the main task into independent subtasks that can be processed in parallel.

In [None]:
from pydantic import BaseModel, Field
from langchain_core.messages import SystemMessage, HumanMessage


class MapperOutput(BaseModel):
    """Structured output from mapper."""
    
    subtasks: list[str] = Field(
        description="List of independent subtasks, one per worker"
    )
    reasoning: str = Field(
        description="Brief explanation of how the task was split"
    )


MAPPER_PROMPT = """You are a mapper agent that splits tasks into parallel subtasks.

Your job:
1. Analyze the main task
2. Break it into {num_workers} independent subtasks
3. Ensure subtasks don't depend on each other
4. Balance workload across subtasks

Guidelines:
- Create subtasks that can be completed independently
- Make each subtask clear and specific
- Aim for equal complexity across subtasks"""


def create_mapper_node(llm, num_workers=3):
    """Create a mapper that splits tasks into subtasks."""
    
    structured_llm = llm.with_structured_output(MapperOutput)
    
    def mapper(state: MapReduceState) -> dict:
        messages = [
            SystemMessage(content=MAPPER_PROMPT.format(num_workers=num_workers)),
            HumanMessage(content=f"""Task: {state['task']}

Break this into {num_workers} independent subtasks for parallel processing.""")
        ]
        
        output = structured_llm.invoke(messages)
        
        # Ensure we have exactly num_workers subtasks
        subtasks = output.subtasks[:num_workers]
        
        print(f"\n[Mapper] Created {len(subtasks)} subtasks:")
        for i, subtask in enumerate(subtasks):
            print(f"  {i+1}. {subtask[:80]}...")
        print(f"\n[Mapper] Reasoning: {output.reasoning}")
        
        return {"subtasks": subtasks}
    
    return mapper


print("Mapper node creator defined!")

## Step 4: Create Worker Nodes

Workers process subtasks independently and in parallel. Each worker only knows about its own subtask.

In [None]:
WORKER_PROMPT = """You are a worker agent processing a subtask.

Your responsibilities:
- Complete your assigned subtask thoroughly
- Work independently without knowledge of other workers
- Provide clear, structured output
- Focus only on your specific subtask

{custom_instructions}"""


def create_worker_node(llm, worker_id, worker_prompt=""):
    """Create a worker that processes a subtask."""
    
    system_prompt = WORKER_PROMPT.format(
        custom_instructions=worker_prompt or "Do your best work."
    )
    
    def worker(state: MapReduceState) -> dict:
        subtasks = state.get("subtasks", [])
        
        # Get this worker's subtask
        if worker_id < len(subtasks):
            subtask = subtasks[worker_id]
        else:
            subtask = f"Process part {worker_id + 1} of: {state['task']}"
        
        print(f"\n[Worker {worker_id}] Processing: {subtask[:60]}...")
        
        messages = [
            SystemMessage(content=system_prompt),
            HumanMessage(content=f"""Main task: {state['task']}

Your subtask: {subtask}

Complete your subtask and provide results.""")
        ]
        
        response = llm.invoke(messages)
        
        print(f"[Worker {worker_id}] Completed")
        
        return {
            "worker_results": [{
                "worker_id": worker_id,
                "subtask": subtask,
                "output": response.content,
            }]
        }
    
    return worker


print("Worker node creator defined!")

## Step 5: Create the Reducer Node

The reducer aggregates all worker outputs into a coherent final result.

In [None]:
class ReducerOutput(BaseModel):
    """Structured output from reducer."""
    
    final_result: str = Field(
        description="Synthesized result combining all worker outputs"
    )
    summary: str = Field(
        description="Brief summary of key findings"
    )


REDUCER_PROMPT = """You are a reducer agent that synthesizes results from multiple workers.

Your job:
1. Review all worker outputs
2. Identify common themes and patterns
3. Resolve any conflicts or inconsistencies
4. Synthesize a coherent final result

Guidelines:
- Include all important points from workers
- Remove redundancy while preserving key information
- Organize output logically
- Provide comprehensive synthesis, not just concatenation"""


def create_reducer_node(llm):
    """Create a reducer that aggregates worker results."""
    
    structured_llm = llm.with_structured_output(ReducerOutput)
    
    def reducer(state: MapReduceState) -> dict:
        worker_results = state.get("worker_results", [])
        
        print(f"\n[Reducer] Aggregating {len(worker_results)} worker results...")
        
        if not worker_results:
            return {"final_result": "No results to aggregate."}
        
        # Build context from all workers
        worker_sections = []
        for result in worker_results:
            worker_id = result.get("worker_id", "?")
            subtask = result.get("subtask", "")
            output = result.get("output", "")
            
            worker_sections.append(f"""### Worker {worker_id}
**Subtask**: {subtask}
**Output**:
{output}""")
        
        workers_context = "\n\n".join(worker_sections)
        
        messages = [
            SystemMessage(content=REDUCER_PROMPT),
            HumanMessage(content=f"""Original task: {state['task']}

Worker results:
{workers_context}

Synthesize these results into a comprehensive final output.""")
        ]
        
        output = structured_llm.invoke(messages)
        
        print(f"[Reducer] Synthesis complete")
        print(f"[Reducer] Summary: {output.summary}")
        
        return {"final_result": output.final_result}
    
    return reducer


print("Reducer node creator defined!")

## Step 6: Build the Map-Reduce Graph

Now we assemble the complete graph with parallel worker execution.

In [None]:
from langgraph.graph import StateGraph, START, END
from langchain_ollama import ChatOllama

# Create LLM
llm = ChatOllama(
    model=config.ollama.model,
    base_url=config.ollama.base_url,
    temperature=0,
)

# Configuration
NUM_WORKERS = 3

# Build the graph
workflow = StateGraph(MapReduceState)

# Add mapper
workflow.add_node("mapper", create_mapper_node(llm, NUM_WORKERS))

# Add workers
for i in range(NUM_WORKERS):
    workflow.add_node(f"worker_{i}", create_worker_node(llm, i))

# Add reducer
workflow.add_node("reducer", create_reducer_node(llm))

# Build graph structure
workflow.add_edge(START, "mapper")

# Mapper fans out to all workers (parallel execution)
for i in range(NUM_WORKERS):
    workflow.add_edge("mapper", f"worker_{i}")

# All workers converge to reducer (fan-in)
for i in range(NUM_WORKERS):
    workflow.add_edge(f"worker_{i}", "reducer")

# Reducer ends the graph
workflow.add_edge("reducer", END)

# Compile
graph = workflow.compile()

print("Map-reduce graph compiled!")
print(f"\nGraph structure (with {NUM_WORKERS} workers):")
print("  START -> mapper")
print(f"  mapper -> [worker_0 | worker_1 | worker_2] (parallel)")
print("  [all workers] -> reducer (fan-in)")
print("  reducer -> END")

## Step 7: Run the Map-Reduce System

Let's test with a document analysis task - a perfect use case for map-reduce!

In [None]:
def run_map_reduce(task: str):
    """Run a task through the map-reduce system."""
    
    print("="*70)
    print(f"Task: {task}")
    print("="*70)
    
    initial_state = {
        "task": task,
        "subtasks": [],
        "worker_results": [],
        "final_result": "",
    }
    
    result = graph.invoke(initial_state)
    
    print("\n" + "="*70)
    print("FINAL RESULT")
    print("="*70)
    print(result["final_result"])
    print("\n" + "="*70)
    print(f"Processed {len(result['worker_results'])} subtasks in parallel")
    
    return result


# Test with document analysis
result = run_map_reduce(
    """Analyze the key themes and implications of remote work trends:
    
Remote work has become increasingly prevalent, driven by technological 
advancements and changing workforce expectations. This shift impacts 
company culture, productivity, work-life balance, and urban development. 
Organizations must adapt their management practices, communication tools, 
and employee engagement strategies."""
)

## Step 8: Try Different Scenarios

Map-reduce excels at different types of parallel analysis.

In [None]:
# Multi-perspective analysis
result2 = run_map_reduce(
    """Analyze the impact of artificial intelligence on society from 
    multiple perspectives: economic, ethical, and technological."""
)

In [None]:
# Data processing task
result3 = run_map_reduce(
    """Process and summarize the following customer feedback themes:
    1) Product quality and features
    2) Customer service experience
    3) Pricing and value proposition"""
)

## Step 9: Customize Worker Behavior

You can give each worker specialized instructions.

In [None]:
# Build custom graph with specialized workers
custom_workflow = StateGraph(MapReduceState)

# Specialized worker prompts
worker_prompts = [
    "Focus on technical aspects and implementation details.",
    "Focus on business impact and strategic implications.",
    "Focus on user experience and human factors.",
]

custom_workflow.add_node("mapper", create_mapper_node(llm, NUM_WORKERS))

for i in range(NUM_WORKERS):
    custom_workflow.add_node(
        f"worker_{i}", 
        create_worker_node(llm, i, worker_prompts[i])
    )

custom_workflow.add_node("reducer", create_reducer_node(llm))

# Build structure
custom_workflow.add_edge(START, "mapper")
for i in range(NUM_WORKERS):
    custom_workflow.add_edge("mapper", f"worker_{i}")
    custom_workflow.add_edge(f"worker_{i}", "reducer")
custom_workflow.add_edge("reducer", END)

custom_graph = custom_workflow.compile()

print("Custom graph compiled with specialized workers!")
print("\nWorker specializations:")
for i, prompt in enumerate(worker_prompts):
    print(f"  Worker {i}: {prompt}")

In [None]:
# Test custom graph
def run_custom_map_reduce(task: str):
    initial_state = {
        "task": task,
        "subtasks": [],
        "worker_results": [],
        "final_result": "",
    }
    return custom_graph.invoke(initial_state)

result4 = run_custom_map_reduce(
    "Analyze the adoption of cloud computing in enterprise environments"
)

print("\nFinal Result:")
print(result4["final_result"])

## Step 10: Using the Built-in Module

The `langgraph_ollama_local.patterns` module provides a ready-to-use implementation:

In [None]:
from langgraph_ollama_local.patterns import (
    create_map_reduce_graph,
    run_map_reduce_task,
)

# Create the graph using the module
module_graph = create_map_reduce_graph(
    llm,
    num_workers=3,
    worker_prompt="Provide thorough analysis with specific examples."
)

# Run a task
result = run_map_reduce_task(
    module_graph,
    "Analyze the environmental, economic, and social impacts of renewable energy adoption"
)

print("Final Result:")
print(result["final_result"])

## Complete Code

Here's the complete implementation in one cell for reference:

In [None]:
# === Complete Map-Reduce Implementation ===

from typing import Annotated
from typing_extensions import TypedDict
import operator

from langchain_core.messages import SystemMessage, HumanMessage
from langchain_ollama import ChatOllama
from langgraph.graph import StateGraph, START, END
from pydantic import BaseModel, Field

from langgraph_ollama_local import LocalAgentConfig


# === State ===
class MapReduceState(TypedDict):
    task: str
    subtasks: list[str]
    worker_results: Annotated[list[dict], operator.add]
    final_result: str


# === Structured Outputs ===
class MapperOutput(BaseModel):
    subtasks: list[str] = Field(description="List of subtasks")
    reasoning: str = Field(description="Reasoning for split")


class ReducerOutput(BaseModel):
    final_result: str = Field(description="Synthesized result")
    summary: str = Field(description="Brief summary")


# === Node Creators ===
def create_mapper(llm, num_workers):
    structured_llm = llm.with_structured_output(MapperOutput)
    
    def mapper(state):
        output = structured_llm.invoke([
            SystemMessage(content=f"Split task into {num_workers} subtasks"),
            HumanMessage(content=f"Task: {state['task']}")
        ])
        return {"subtasks": output.subtasks[:num_workers]}
    return mapper


def create_worker(llm, worker_id):
    def worker(state):
        subtask = state["subtasks"][worker_id] if worker_id < len(state["subtasks"]) else "Process task"
        response = llm.invoke([
            SystemMessage(content="Process your assigned subtask"),
            HumanMessage(content=f"Subtask: {subtask}")
        ])
        return {"worker_results": [{"worker_id": worker_id, "output": response.content}]}
    return worker


def create_reducer(llm):
    structured_llm = llm.with_structured_output(ReducerOutput)
    
    def reducer(state):
        results = "\n\n".join([f"Worker {r['worker_id']}: {r['output']}" for r in state["worker_results"]])
        output = structured_llm.invoke([
            SystemMessage(content="Synthesize worker results"),
            HumanMessage(content=f"Results:\n{results}")
        ])
        return {"final_result": output.final_result}
    return reducer


# === Build Graph ===
def build_map_reduce_graph(num_workers=3):
    config = LocalAgentConfig()
    llm = ChatOllama(model=config.ollama.model, base_url=config.ollama.base_url, temperature=0)
    
    g = StateGraph(MapReduceState)
    g.add_node("mapper", create_mapper(llm, num_workers))
    for i in range(num_workers):
        g.add_node(f"worker_{i}", create_worker(llm, i))
    g.add_node("reducer", create_reducer(llm))
    
    g.add_edge(START, "mapper")
    for i in range(num_workers):
        g.add_edge("mapper", f"worker_{i}")
        g.add_edge(f"worker_{i}", "reducer")
    g.add_edge("reducer", END)
    
    return g.compile()


# === Use ===
if __name__ == "__main__":
    graph = build_map_reduce_graph(num_workers=3)
    result = graph.invoke({
        "task": "Analyze the impact of AI on education",
        "subtasks": [],
        "worker_results": [],
        "final_result": "",
    })
    print(result["final_result"])

## Key Concepts

| Concept | Description |
|---------|-------------|
| **Map Phase** | Mapper splits task into independent subtasks |
| **Fan-Out** | Task distributed to multiple workers in parallel |
| **Worker Independence** | Each worker processes its subtask without coordination |
| **Parallel Execution** | Workers run simultaneously for speed |
| **Fan-In** | All worker results collected at reducer |
| **Reduce Phase** | Reducer synthesizes all outputs into final result |
| **Scalability** | Add more workers to handle larger workloads |
| **Structured Output** | Pydantic models ensure reliable parsing |

## When to Use Map-Reduce vs Other Patterns

| Pattern | Best For | Parallelization | Coordination |
|---------|----------|-----------------|-------------|
| **Map-Reduce** | Independent subtasks, large-scale processing | High (parallel workers) | Low (no inter-worker communication) |
| **Supervisor** | Sequential tasks needing different skills | Low (sequential) | High (supervisor coordinates) |
| **Hierarchical** | Nested team structures, complex organizations | Medium (team-level) | High (multiple supervisors) |
| **Subgraphs** | Reusable components, modular systems | Depends on subgraph | Depends on subgraph |

**Choose map-reduce when:**
- Task can be split into independent parts
- Parallel processing improves speed
- Workers don't need to coordinate
- Final aggregation is straightforward

**Avoid map-reduce when:**
- Subtasks depend on each other
- Sequential processing is required
- Coordination between workers is needed
- Task doesn't decompose naturally

## What's Next

You've completed the multi-agent patterns series! You now know:
- **Tutorial 14**: Supervisor pattern for coordinating specialists
- **Tutorial 15**: Hierarchical teams for nested organizations
- **Tutorial 16**: Subgraphs for reusable components
- **Tutorial 19**: Map-reduce for parallel execution

**Explore more:**
- Combine patterns (e.g., hierarchical teams with map-reduce workers)
- Add tools to workers for enhanced capabilities
- Implement dynamic worker scaling based on task complexity
- Add error handling and retry logic for robust systems