# Customer Technical Issue Resolution with Checkpoints

## Introduction

This tutorial demonstrates a **customer technical support agent** using Strands agents with AgentCore Memory for checkpoint management. The agent follows a simple workflow:

1. **Issue Analysis**: Agent analyzes the technical issue
2. **Resolution Finding**: Agent finds potential solutions
3. **Checkpoint Storage**: Agent stores checkpoint before human validation
4. **Human-in-the-Loop**: User validates/confirms the resolution
5. **Checkpoint Retrieval**: Agent retrieves checkpoint and continues
6. **Resolution Execution**: Agent executes the approved resolution

### Tutorial Details

| Information         | Details                                                                          |
|:--------------------|:---------------------------------------------------------------------------------|
| Tutorial type       | Technical Support with Checkpoints                                              |
| Agent type          | Customer Support Agent                                                           |
| Agentic Framework   | Strands Agents                                                                   |
| LLM model           | Anthropic Claude Sonnet 3.7                                                      |
| Tutorial components | AgentCore Memory (STM + Blob checkpoints), Human-in-the-Loop                    |
| Example complexity  | Intermediate                                                                     |

You'll learn to:
- Store and retrieve checkpoints using blob payloads
- Implement human-in-the-loop validation
- Resume agent execution from checkpoints
- Handle technical support workflows

## Prerequisites

- Python 3.10+
- AWS credentials with AgentCore Memory permissions
- Access to Amazon Bedrock models

Let's get started!

## Step 1: Setup and Imports

In [None]:
!pip install -qr requirements.txt

In [None]:
import logging
import json
import base64
from datetime import datetime
from enum import Enum
from typing import Dict, Any, Optional

# Setup
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("tech-support-agent")

In [None]:
# Imports
import os
from datetime import datetime
import boto3
from strands import Agent, tool
from strands.hooks import AgentInitializedEvent, HookProvider, HookRegistry, MessageAddedEvent
from bedrock_agentcore.memory import MemoryClient

# Configuration
REGION = os.getenv('AWS_REGION', 'us-west-2')
ACTOR_ID = "support_agent_001"
SESSION_ID = "tech_support_session_001"

## Step 2: Workflow States and Checkpoint Management

In [None]:
class WorkflowState(Enum):
    ANALYZING = "analyzing_issue"
    FINDING_SOLUTION = "finding_solution"
    AWAITING_APPROVAL = "awaiting_approval"
    EXECUTING = "executing_resolution"
    COMPLETED = "completed"

class CheckpointManager:
    def __init__(self, memory_id: str, actor_id: str, session_id: str):
        self.client = boto3.client('bedrock-agentcore')
        self.memory_id = memory_id
        self.actor_id = actor_id
        self.session_id = session_id
    
    def save_checkpoint(self, state: WorkflowState, data: Dict[str, Any]) -> str:
        """Save checkpoint as blob payload"""
        checkpoint = {
            "state": state.value,
            "timestamp": datetime.now().isoformat(),
            "data": data
        }
        
        # Encode checkpoint as base64 blob
        checkpoint_json = json.dumps(checkpoint)
        checkpoint_blob = base64.b64encode(checkpoint_json.encode()).decode()
        
        # Store as blob payload
        self.client.create_event(
            memoryId=self.memory_id,
            actorId=self.actor_id,
            sessionId=self.session_id,
            payload=[
                {"blob": checkpoint_blob }
            ],
            eventTimestamp=datetime.now(),
        )
        logger.info(f"✅ Checkpoint saved: {state.value}")
        return checkpoint_json

    def load_latest_checkpoint(self) -> Optional[Dict[str, Any]]:
        """Load the most recent checkpoint"""
        try:
            events = self.client.list_events(
                memoryId=self.memory_id,
                actorId=self.actor_id,
                sessionId=self.session_id,
                maxResults=10,
                includePayloads=True,
            )
            
            # Find latest blob checkpoint
            for event in events:
                for payload in event.get('payload', []):
                    if payload.get('type') == 'blob' and 'checkpoint_id' in payload.get('metadata', {}):
                        blob_data = base64.b64decode(payload['data']).decode()
                        checkpoint = json.loads(blob_data)
                        logger.info(f"✅ Checkpoint loaded: {checkpoint['state']}")
                        return checkpoint
            
            return None
        except Exception as e:
            logger.error(f"Failed to load checkpoint: {e}")
            return None

logger.info("✅ Checkpoint management ready")

## Step 3: Technical Support Tools

In [None]:
@tool
def analyze_issue(issue_description: str) -> str:
    """Analyze a technical issue and categorize it.
    
    Args:
        issue_description: Description of the technical problem
    
    Returns:
        Analysis results with issue category and severity
    """
    # Simulate issue analysis
    categories = {
        "network": ["connection", "internet", "wifi", "ethernet"],
        "software": ["application", "program", "software", "app"],
        "hardware": ["device", "computer", "laptop", "hardware"],
        "account": ["login", "password", "account", "access"]
    }
    
    issue_lower = issue_description.lower()
    detected_category = "general"
    
    for category, keywords in categories.items():
        if any(keyword in issue_lower for keyword in keywords):
            detected_category = category
            break
    
    severity = "medium"
    if any(word in issue_lower for word in ["urgent", "critical", "down", "broken"]):
        severity = "high"
    elif any(word in issue_lower for word in ["minor", "small", "question"]):
        severity = "low"
    
    return f"Issue Category: {detected_category}, Severity: {severity}, Analysis: Issue appears to be {detected_category}-related with {severity} priority."

@tool
def find_resolution(issue_category: str, issue_description: str) -> str:
    """Find potential resolution steps for a technical issue.
    
    Args:
        issue_category: Category of the issue (network, software, hardware, account)
        issue_description: Detailed description of the issue
    
    Returns:
        Suggested resolution steps
    """
    resolutions = {
        "network": [
            "Check network cable connections",
            "Restart router and modem",
            "Run network diagnostics",
            "Update network drivers"
        ],
        "software": [
            "Restart the application",
            "Check for software updates",
            "Clear application cache",
            "Reinstall the software if needed"
        ],
        "hardware": [
            "Check all physical connections",
            "Run hardware diagnostics",
            "Update device drivers",
            "Check for overheating issues"
        ],
        "account": [
            "Verify username and password",
            "Check account status",
            "Reset password if necessary",
            "Contact account administrator"
        ]
    }
    
    steps = resolutions.get(issue_category, ["Contact technical support for specialized assistance"])
    return "Resolution Steps:\n" + "\n".join(f"{i+1}. {step}" for i, step in enumerate(steps))

@tool
def execute_resolution(resolution_steps: str) -> str:
    """Execute the approved resolution steps.
    
    Args:
        resolution_steps: The approved resolution steps to execute
    
    Returns:
        Execution results
    """
    # Simulate execution
    return f"✅ Resolution executed successfully. Steps completed:\n{resolution_steps}\n\nPlease test the system and confirm if the issue is resolved."

logger.info("✅ Technical support tools ready")

## Step 4: Create Memory Resource

In [None]:
from botocore.exceptions import ClientError

# Initialize Memory Client
client = MemoryClient(region_name=REGION)
memory_name = "TechSupportMemory"

try:
    # Create memory resource for checkpoints and conversation
    memory = client.create_memory_and_wait(
        name=memory_name,
        strategies=[],  # Short-term memory for checkpoints
        description="Memory for technical support agent with checkpoints",
        event_expiry_days=7,
    )
    memory_id = memory['id']
    logger.info(f"✅ Created memory: {memory_id}")
except ClientError as e:
    if e.response['Error']['Code'] == 'ValidationException' and "already exists" in str(e):
        memories = client.list_memories()
        memory_id = next((m['id'] for m in memories if m['id'].startswith(memory_name)), None)
        logger.info(f"Memory already exists. Using: {memory_id}")
    else:
        logger.error(f"❌ ERROR: {e}")
        raise

# Initialize checkpoint manager
checkpoint_manager = CheckpointManager(memory_id, ACTOR_ID, SESSION_ID)

## Step 5: Memory Hook with Checkpoint Support

In [None]:
class TechSupportMemoryHook(HookProvider):
    def __init__(self, memory_client: MemoryClient, memory_id: str, actor_id: str, session_id: str, checkpoint_manager: CheckpointManager):
        self.memory_client = memory_client
        self.memory_id = memory_id
        self.actor_id = actor_id
        self.session_id = session_id
        self.checkpoint_manager = checkpoint_manager
        self.current_state = WorkflowState.ANALYZING
    
    def on_agent_initialized(self, event: AgentInitializedEvent):
        """Load conversation history and latest checkpoint"""
        try:
            # Load recent conversation
            recent_turns = self.memory_client.get_last_k_turns(
                memory_id=self.memory_id,
                actor_id=self.actor_id,
                session_id=self.session_id,
                k=5
            )
            
            # Load latest checkpoint
            checkpoint = self.checkpoint_manager.load_latest_checkpoint()
            
            context_parts = []
            
            if recent_turns:
                context_messages = []
                for turn in recent_turns:
                    for message in turn:
                        role = message['role']
                        content = message['content']['text']
                        context_messages.append(f"{role}: {content}")
                context_parts.append(f"Recent conversation:\n{chr(10).join(context_messages)}")
            
            if checkpoint:
                self.current_state = WorkflowState(checkpoint['state'])
                context_parts.append(f"\nCheckpoint State: {checkpoint['state']}")
                context_parts.append(f"Checkpoint Data: {json.dumps(checkpoint['data'], indent=2)}")
            
            if context_parts:
                event.agent.system_prompt += f"\n\n{chr(10).join(context_parts)}"
                logger.info(f"✅ Loaded context with state: {self.current_state.value}")
                
        except Exception as e:
            logger.error(f"Context load error: {e}")
    
    def on_message_added(self, event: MessageAddedEvent):
        """Store messages in memory"""
        messages = event.agent.messages
        try:
            if messages[-1].get("role").lower() == "user":
                self.memory_client.create_event(
                    memory_id=self.memory_id,
                    actor_id=self.actor_id,
                    session_id=self.session_id,
                    messages=[(str(messages[-1].get("content", "")), messages[-1]["role"])]
                )
        except Exception as e:
            logger.error(f"Memory save error: {e}")
    
    def register_hooks(self, registry: HookRegistry):
        registry.add_callback(MessageAddedEvent, self.on_message_added)
        registry.add_callback(AgentInitializedEvent, self.on_agent_initialized)

## Step 6: Create Technical Support Agent

In [None]:
def create_tech_support_agent():
    """Create technical support agent with checkpoint capabilities"""
    memory_hook = TechSupportMemoryHook(client, memory_id, ACTOR_ID, SESSION_ID, checkpoint_manager)
    
    agent = Agent(
        name="TechnicalSupportAgent",
        model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
        system_prompt=f"""You are a technical customer support agent. Follow this workflow:

1. ANALYZING: Use analyze_issue tool to understand the problem
2. FINDING_SOLUTION: Use find_resolution tool to get solution steps
3. AWAITING_APPROVAL: Present solution and ask for user confirmation
4. EXECUTING: Use execute_resolution tool after approval
5. COMPLETED: Confirm resolution

IMPORTANT CHECKPOINT RULES:
- Before asking for user approval, save a checkpoint with the proposed resolution
- After user approval, retrieve the checkpoint and proceed with execution
- Always be clear about what step you're in

Current date: {datetime.today().strftime('%Y-%m-%d')}
Be professional, helpful, and thorough.""",
        hooks=[memory_hook],
        tools=[analyze_issue, find_resolution, execute_resolution],
    )
    
    # Store reference to checkpoint manager in agent for manual checkpoint operations
    agent.checkpoint_manager = checkpoint_manager
    return agent

# Create agent
agent = create_tech_support_agent()
logger.info("✅ Technical support agent created")

## Step 7: Test the Workflow

Let's test the complete workflow with checkpoint management:

In [None]:
# Start with a technical issue
print("=== Customer Technical Issue ===\n")
issue = "My internet connection keeps dropping every few minutes. It's affecting my work calls."
print(f"Customer: {issue}\n")
print("Agent Response:")
response = agent(issue)

In [None]:
# Manually save checkpoint before human approval (simulating agent's internal process)
print("\n=== Saving Checkpoint Before Human Approval ===\n")

# Simulate the agent saving a checkpoint with the proposed resolution
proposed_resolution = {
    "issue_category": "network",
    "issue_description": "Internet connection dropping frequently",
    "proposed_steps": [
        "Check network cable connections",
        "Restart router and modem", 
        "Run network diagnostics",
        "Update network drivers"
    ],
    "severity": "medium"
}

checkpoint_data = agent.checkpoint_manager.save_checkpoint(
    WorkflowState.AWAITING_APPROVAL, 
    response.message
)

print("Checkpoint saved with proposed resolution. Awaiting human approval...")

In [None]:
# Human-in-the-loop: User provides approval
print("\n=== Human-in-the-Loop Validation ===\n")
user_approval = "Yes, please proceed with those steps. They look reasonable."
print(f"Customer: {user_approval}\n")

# Create new agent instance to simulate session continuation
print("=== Agent Retrieving Checkpoint and Continuing ===\n")
new_agent = create_tech_support_agent()

print("Agent Response:")
response = new_agent(user_approval)
print(f"\n{response}")

In [None]:
# Final confirmation
print("\n=== Final Confirmation ===\n")
confirmation = "Great! The connection seems stable now. Thank you for your help."
print(f"Customer: {confirmation}\n")

print("Agent Response:")
response = new_agent(confirmation)
print(f"\n{response}")

## Step 8: View Stored Memory and Checkpoints

In [None]:
# Check conversation history
print("=== Conversation History ===\n")
recent_turns = client.get_last_k_turns(
    memory_id=memory_id,
    actor_id=ACTOR_ID,
    session_id=SESSION_ID,
    k=3
)

for i, turn in enumerate(recent_turns, 1):
    print(f"Turn {i}:")
    for message in turn:
        role = message['role']
        content = message['content']['text'][:150] + "..." if len(message['content']['text']) > 150 else message['content']['text']
        print(f"  {role}: {content}")
    print()

# Check stored checkpoints
print("\n=== Stored Checkpoints ===\n")
events = client.list_events(
    memory_id=memory_id,
    actor_id=ACTOR_ID,
    session_id=SESSION_ID,
    max_results=5
)

checkpoint_count = 0
for event in events:
    for payload in event.get('payload', []):
        if payload.get('blob'):
            checkpoint_count += 1
            blob_data = base64.b64decode(payload['blob']).decode()
            checkpoint = json.loads(blob_data)
            print(f"Checkpoint {checkpoint_count}:")
            print(f"  State: {checkpoint['state']}")
            print(f"  Timestamp: {checkpoint['timestamp']}")
            print(f"  Data: {json.dumps(checkpoint['data'], indent=4)}")
            print()

if checkpoint_count == 0:
    print("No checkpoints found in recent events.")

## Step 9: Test Checkpoint Recovery

Let's test what happens when we create a completely new agent instance and see if it can recover from the checkpoint:

In [None]:
# Create a fresh agent instance to test checkpoint recovery
print("=== Testing Checkpoint Recovery ===\n")
print("Creating fresh agent instance...\n")

recovery_agent = create_tech_support_agent()

# Test if it remembers the context
recovery_query = "What was the issue we were working on?"
print(f"Customer: {recovery_query}\n")

print("Agent Response:")
response = recovery_agent(recovery_query)
print(f"\n{response}")

## Summary

This tutorial demonstrated a complete customer technical support workflow with checkpoint management:

**Key Features Implemented:**
- **Workflow States**: Defined clear states for the support process
- **Checkpoint Storage**: Used blob payloads to store workflow state
- **Human-in-the-Loop**: Implemented approval step before execution
- **Session Continuity**: Agent can resume from checkpoints across sessions
- **Technical Tools**: Issue analysis, resolution finding, and execution

**Checkpoint Management Benefits:**
- Reliable state persistence across agent restarts
- Human validation before critical actions
- Audit trail of decision points
- Recovery from interruptions

**Next Steps:**
- Add more sophisticated issue categorization
- Implement escalation workflows
- Add integration with ticketing systems
- Enhance checkpoint metadata for better tracking

## Cleanup (Optional)

In [None]:
# Uncomment to delete memory resource
# client.delete_memory_and_wait(memory_id)
# logger.info(f"✅ Deleted memory: {memory_id}")