<a href="https://colab.research.google.com/github/EnkrateiaLucca/oreilly_live_training_getting_started_with_langchain/blob/main/notebooks/7.0-human-in-the-loop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Human-in-the-Loop Patterns with LangChain and LangGraph

This notebook demonstrates how to implement Human-in-the-Loop (HITL) patterns in LangChain 1.0+ and LangGraph 1.0+.

## Table of Contents
1. Introduction to HITL
2. Setup and Installation
3. Basic Interrupt Pattern with LangGraph
4. Practical Example: File Operations Agent
5. Handling User Decisions
6. Checkpointer Requirements
7. Production Patterns

## 1. Introduction to Human-in-the-Loop

### Why HITL Matters

Human-in-the-Loop (HITL) is a critical pattern when building production LLM applications. It provides:

- **Safety**: Prevent unintended actions by requiring human approval
- **Compliance**: Meet regulatory requirements for human oversight
- **Trust**: Build user confidence by keeping humans in control
- **Quality**: Enable review and correction of agent decisions

### Common Use Cases

HITL is essential for:

- **Destructive Operations**: Deleting files, databases, or resources
- **Financial Transactions**: Making purchases, transfers, or payments
- **Sensitive Actions**: Sending emails, posting content, or modifying data
- **High-Stakes Decisions**: Medical, legal, or business-critical operations

### The Interrupt/Resume Pattern

LangGraph implements HITL through an **interrupt/resume pattern**:

1. Agent executes until it reaches a point requiring human input
2. Agent **interrupts** execution and returns control
3. Human reviews the request and makes a decision
4. Agent **resumes** execution with the human's decision

This pattern requires a **checkpointer** to save the agent's state during the interrupt.

## 2. Setup and Installation

In [None]:
# LangChain 1.0+ and LangGraph 1.0+ Setup
# %pip install -qU langchain>=1.0.0
# %pip install -qU langchain-core>=1.0.0
# %pip install -qU langchain-openai
# %pip install -qU langgraph>=1.0.0

In [None]:
import os
import getpass

# Set API Keys

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

In [None]:
# Import required modules
from typing import Annotated, Literal
from typing_extensions import TypedDict

from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

from langgraph.graph import StateGraph, START, END, MessagesState
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command, interrupt

## 3. Basic Interrupt Pattern with LangGraph

The simplest HITL pattern uses LangGraph's `interrupt()` function to pause execution and request human input.

### Key Concepts

- **`interrupt()`**: Function that pauses graph execution and returns control
- **`Command(resume=value)`**: Used to resume execution with human input
- **`__interrupt__`**: Special key in the output that indicates an interrupt occurred

Let's build a simple example:

In [None]:
# Define a simple state
class SimpleState(TypedDict):
    messages: Annotated[list, add_messages]
    user_approval: str  # Will hold "approve" or "reject"

# Node that requests approval
def request_approval(state: SimpleState):
    """This node interrupts to request human approval."""
    last_message = state["messages"][-1]
    
    # Ask for approval and interrupt
    approval = interrupt(
        {
            "question": "Do you approve this action?",
            "action": last_message.content
        }
    )
    
    return {"user_approval": approval}

# Node that processes the approval
def process_approval(state: SimpleState):
    """This node processes the user's decision."""
    approval = state.get("user_approval", "reject")
    
    if approval == "approve":
        response = AIMessage(content="Action approved and executed!")
    else:
        response = AIMessage(content="Action rejected and cancelled.")
    
    return {"messages": [response]}

In [None]:
# Build the graph
workflow = StateGraph(SimpleState)

# Add nodes
workflow.add_node("request_approval", request_approval)
workflow.add_node("process_approval", process_approval)

# Add edges
workflow.add_edge(START, "request_approval")
workflow.add_edge("request_approval", "process_approval")
workflow.add_edge("process_approval", END)

# IMPORTANT: Must use a checkpointer for HITL!
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)

In [None]:
# Visualize the graph
from IPython.display import Image

try:
    Image(app.get_graph().draw_mermaid_png())
except Exception:
    print("Graph visualization not available")

In [None]:
# Test the interrupt pattern
config = {"configurable": {"thread_id": "simple-hitl-1"}}

# Initial invocation - will interrupt
result = app.invoke(
    {"messages": [HumanMessage(content="Delete all user data")]},
    config=config
)

print("\n=== First Invocation (Interrupt) ===")
print(f"Result: {result}")
print(f"\nInterrupted: {'__interrupt__' in result}")

In [None]:
# Resume with approval
result = app.invoke(
    Command(resume="approve"),  # Human decision: approve
    config=config
)

print("\n=== Second Invocation (Resume with Approval) ===")
print(f"Final message: {result['messages'][-1].content}")

In [None]:
# Try again with rejection
config2 = {"configurable": {"thread_id": "simple-hitl-2"}}

# Initial invocation
result = app.invoke(
    {"messages": [HumanMessage(content="Delete all user data")]},
    config=config2
)

# Resume with rejection
result = app.invoke(
    Command(resume="reject"),  # Human decision: reject
    config=config2
)

print("\n=== Rejection Path ===")
print(f"Final message: {result['messages'][-1].content}")

## 4. Practical Example: File Operations Agent

Now let's build a more realistic example: an agent that can perform file operations but requires approval before destructive actions.

### Features

- **Read files**: No approval needed (safe operation)
- **Write files**: No approval needed (reversible)
- **Delete files**: Requires approval (destructive)

This demonstrates how to selectively apply HITL to specific operations.

In [None]:
# Define file operation tools
import os
from pathlib import Path

@tool
def read_file(file_path: str) -> str:
    """Read the contents of a file. Safe operation that doesn't require approval."""
    try:
        with open(file_path, 'r') as f:
            content = f.read()
        return f"Successfully read file '{file_path}'. Content:\n{content[:200]}..."  # Truncate for demo
    except Exception as e:
        return f"Error reading file: {str(e)}"

@tool
def write_file(file_path: str, content: str) -> str:
    """Write content to a file. Creates the file if it doesn't exist."""
    try:
        with open(file_path, 'w') as f:
            f.write(content)
        return f"Successfully wrote to file '{file_path}'"
    except Exception as e:
        return f"Error writing file: {str(e)}"

@tool
def delete_file(file_path: str) -> str:
    """Delete a file. DESTRUCTIVE operation that requires approval."""
    # This is just a marker - the actual deletion will happen after approval
    return f"DELETE_PENDING:{file_path}"

tools = [read_file, write_file, delete_file]

In [None]:
# Define the state for our file agent
class FileAgentState(TypedDict):
    messages: Annotated[list, add_messages]
    pending_approval: dict | None  # Stores the action awaiting approval

# Initialize the model with tools
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
model_with_tools = model.bind_tools(tools)

In [None]:
# Node: Call the model
def call_model(state: FileAgentState):
    """Call the LLM to decide what action to take."""
    messages = state["messages"]
    response = model_with_tools.invoke(messages)
    return {"messages": [response]}

# Node: Check if tool call needs approval
def check_for_approval(state: FileAgentState):
    """Check if the last tool call requires human approval."""
    last_message = state["messages"][-1]
    
    # Check if there are tool calls
    if not hasattr(last_message, "tool_calls") or not last_message.tool_calls:
        return {"pending_approval": None}
    
    # Check if any tool call is a delete operation
    for tool_call in last_message.tool_calls:
        if tool_call["name"] == "delete_file":
            # This is a destructive operation - request approval
            file_path = tool_call["args"]["file_path"]
            
            # Interrupt and ask for approval
            approval = interrupt(
                {
                    "type": "approval_required",
                    "action": "delete_file",
                    "file_path": file_path,
                    "message": f"The agent wants to delete the file: {file_path}. Do you approve?"
                }
            )
            
            return {
                "pending_approval": {
                    "tool_call": tool_call,
                    "approval": approval
                }
            }
    
    return {"pending_approval": None}

# Node: Execute tools
def execute_tools(state: FileAgentState):
    """Execute tool calls, respecting approval decisions."""
    last_message = state["messages"][-1]
    pending_approval = state.get("pending_approval")
    
    tool_messages = []
    
    # Process each tool call
    for tool_call in last_message.tool_calls:
        tool_name = tool_call["name"]
        
        # Check if this tool call needs approval
        if pending_approval and tool_call["id"] == pending_approval["tool_call"]["id"]:
            approval = pending_approval["approval"]
            
            if approval == "approve":
                # Execute the delete
                file_path = tool_call["args"]["file_path"]
                try:
                    os.remove(file_path)
                    result = f"Successfully deleted file: {file_path}"
                except Exception as e:
                    result = f"Error deleting file: {str(e)}"
            else:
                result = f"Delete operation rejected by user. File '{tool_call['args']['file_path']}' was NOT deleted."
            
            tool_messages.append(
                ToolMessage(content=result, tool_call_id=tool_call["id"])
            )
        else:
            # Execute normal tool calls
            selected_tool = {t.name: t for t in tools}[tool_name]
            tool_result = selected_tool.invoke(tool_call["args"])
            tool_messages.append(
                ToolMessage(content=tool_result, tool_call_id=tool_call["id"])
            )
    
    return {"messages": tool_messages, "pending_approval": None}

In [None]:
# Router: Decide next step
def should_continue(state: FileAgentState) -> Literal["check_approval", "end"]:
    """Determine if we should continue to tool execution or end."""
    last_message = state["messages"][-1]
    
    # If the last message has tool calls, route to approval check
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "check_approval"
    
    # Otherwise, end
    return "end"

# Router: After approval check
def after_approval_check(state: FileAgentState) -> Literal["execute_tools", "agent"]:
    """Route to tool execution after approval check."""
    # Always proceed to execute tools
    return "execute_tools"

In [None]:
# Build the file agent graph
workflow = StateGraph(FileAgentState)

# Add nodes
workflow.add_node("agent", call_model)
workflow.add_node("check_approval", check_for_approval)
workflow.add_node("execute_tools", execute_tools)

# Add edges
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, {"check_approval": "check_approval", "end": END})
workflow.add_conditional_edges("check_approval", after_approval_check)
workflow.add_edge("execute_tools", "agent")

# Compile with checkpointer
checkpointer = MemorySaver()
file_agent = workflow.compile(checkpointer=checkpointer)

In [None]:
# Visualize the file agent graph
try:
    Image(file_agent.get_graph().draw_mermaid_png())
except Exception:
    print("Graph visualization not available")

## 5. Testing the File Agent

Let's test the file agent with different scenarios.

In [None]:
# Scenario 1: Safe operations (no approval needed)
import tempfile

# Create a test file
test_dir = tempfile.mkdtemp()
test_file = os.path.join(test_dir, "test.txt")

with open(test_file, 'w') as f:
    f.write("This is a test file for the HITL demo.")

print(f"Created test file: {test_file}")

In [None]:
# Test reading a file (no approval needed)
config = {"configurable": {"thread_id": "file-agent-1"}}

result = file_agent.invoke(
    {"messages": [HumanMessage(content=f"Read the file at {test_file}")]},
    config=config
)

print("\n=== Safe Operation: Read File ===")
print(f"Final message: {result['messages'][-1].content}")
print(f"Interrupted: {'__interrupt__' in result}")

In [None]:
# Scenario 2: Destructive operation (requires approval)
config2 = {"configurable": {"thread_id": "file-agent-2"}}

# Create another test file to delete
test_file_2 = os.path.join(test_dir, "to_delete.txt")
with open(test_file_2, 'w') as f:
    f.write("This file will be deleted.")

# Request deletion (will interrupt)
result = file_agent.invoke(
    {"messages": [HumanMessage(content=f"Delete the file at {test_file_2}")]},
    config=config2
)

print("\n=== Destructive Operation: Delete File (First Call) ===")
print(f"Interrupted: {'__interrupt__' in result}")

if "__interrupt__" in result:
    print(f"Interrupt data: {result['__interrupt__']}")

In [None]:
# Approve the deletion
print(f"\nFile exists before approval: {os.path.exists(test_file_2)}")

result = file_agent.invoke(
    Command(resume="approve"),
    config=config2
)

print("\n=== After Approval ===")
print(f"Final message: {result['messages'][-1].content}")
print(f"File exists after approval: {os.path.exists(test_file_2)}")

In [None]:
# Scenario 3: Rejection
config3 = {"configurable": {"thread_id": "file-agent-3"}}

# Create another test file
test_file_3 = os.path.join(test_dir, "protected.txt")
with open(test_file_3, 'w') as f:
    f.write("This file is protected.")

# Request deletion
result = file_agent.invoke(
    {"messages": [HumanMessage(content=f"Delete the file at {test_file_3}")]},
    config=config3
)

print("\n=== Rejection Scenario ===")
print(f"File exists before rejection: {os.path.exists(test_file_3)}")

# Reject the deletion
result = file_agent.invoke(
    Command(resume="reject"),
    config=config3
)

print(f"Final message: {result['messages'][-1].content}")
print(f"File exists after rejection: {os.path.exists(test_file_3)}")

## 6. Checkpointer Requirements

HITL patterns **require a checkpointer** because:

1. **State Persistence**: The graph's state must be saved when execution is interrupted
2. **Resumption**: When the user provides input, the graph needs to restore the exact state
3. **Thread Continuity**: Multiple interactions need to be linked via a `thread_id`

### Checkpointer Options

LangGraph provides several checkpointer implementations:

- **`MemorySaver`**: In-memory checkpointer (for development/testing)
- **`SqliteSaver`**: SQLite-based persistence (for production)
- **`PostgresSaver`**: PostgreSQL-based persistence (for production at scale)

### Thread IDs

The `thread_id` in the config links related invocations:

```python
config = {"configurable": {"thread_id": "user-123-session-456"}}

# First call - interrupts
result1 = agent.invoke(input, config=config)

# Second call - resumes with same thread_id
result2 = agent.invoke(Command(resume=value), config=config)
```

## 7. Production Patterns

In production environments, HITL implementations need additional considerations:

### 1. Persistent Checkpointers

Use database-backed checkpointers for production:

```python
from langgraph.checkpoint.postgres import PostgresSaver

# PostgreSQL checkpointer
checkpointer = PostgresSaver.from_conn_string(
    "postgresql://user:pass@host:5432/db"
)

app = workflow.compile(checkpointer=checkpointer)
```

### 2. Webhook Integration

For async approval flows:

1. Agent interrupts and saves state
2. System sends notification to user (email, Slack, etc.)
3. User clicks approval link
4. Webhook receives approval and resumes agent

```python
# Webhook endpoint (pseudo-code)
@app.post("/approve/{thread_id}")
def handle_approval(thread_id: str, decision: str):
    config = {"configurable": {"thread_id": thread_id}}
    result = agent.invoke(Command(resume=decision), config=config)
    return result
```

### 3. Timeout Handling

Implement timeouts for pending approvals:

```python
import time
from datetime import datetime, timedelta

# Store approval requests with timestamps
pending_approvals = {
    "thread-123": {
        "created_at": datetime.now(),
        "timeout": timedelta(hours=24),
        "action": "delete_file",
    }
}

# Background task to check for timeouts
def check_timeouts():
    for thread_id, approval in pending_approvals.items():
        if datetime.now() > approval["created_at"] + approval["timeout"]:
            # Auto-reject on timeout
            config = {"configurable": {"thread_id": thread_id}}
            agent.invoke(Command(resume="timeout_reject"), config=config)
```

### 4. Approval Audit Trail

Log all approval decisions for compliance:

```python
def log_approval(thread_id: str, action: str, decision: str, user: str):
    audit_log.append({
        "timestamp": datetime.now(),
        "thread_id": thread_id,
        "action": action,
        "decision": decision,
        "user": user,
    })
```

### 5. Multi-Level Approvals

Implement approval chains for sensitive operations:

```python
def check_approval_level(state):
    action = state["pending_action"]
    
    if action["type"] == "delete_database":
        # Requires manager approval
        manager_approval = interrupt({"level": "manager", "action": action})
        
        if manager_approval == "approve":
            # Then requires director approval
            director_approval = interrupt({"level": "director", "action": action})
            return director_approval
    
    return "approve"
```

## Summary

In this notebook, we covered:

1. **Why HITL matters**: Safety, compliance, trust, and quality for production LLM apps
2. **Basic interrupt pattern**: Using `interrupt()` and `Command(resume=value)` for human input
3. **Practical example**: File operations agent with selective approval for destructive actions
4. **Checkpointer requirements**: Why HITL needs state persistence and how to configure it
5. **Production patterns**: Webhooks, timeouts, audit trails, and multi-level approvals

### Key Takeaways

- **Always use a checkpointer** with HITL patterns
- **Use thread IDs** to link interrupt and resume calls
- **Selective HITL**: Apply approval only to high-risk operations
- **Production considerations**: Persistence, webhooks, timeouts, and audit trails

### Next Steps

- Implement HITL in your own agents
- Explore `SqliteSaver` or `PostgresSaver` for production
- Build webhook integrations for async approval flows
- Add audit logging and compliance features

### Resources

- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [Human-in-the-Loop Guide](https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/)
- [Checkpointers](https://langchain-ai.github.io/langgraph/concepts/persistence/)

In [None]:
# Cleanup test files
import shutil

try:
    shutil.rmtree(test_dir)
    print(f"Cleaned up test directory: {test_dir}")
except Exception as e:
    print(f"Error cleaning up: {e}")