# Lab 5: GenAIOps - AgentCore Runtime with Langfuse Observability

This lab explores GenAI Operations (GenAIOps) by deploying agents to Amazon Bedrock AgentCore Runtime with comprehensive observability through Langfuse. Learn how to implement production-ready agent monitoring, tracing, and analytics for enterprise GenAI applications.

**Observability Approaches:**
- `lab5_agent.py`: AgentCore default observability (CloudWatch)
- `lab5_agent_tools.py`: Custom tools with Langfuse tracing
- `lab5_agent_mcp.py`: MCP tools with Langfuse tracing

**GenAIOps Focus:** Production monitoring, performance tracking, and operational insights for AI agents.

## Prerequisites

- Python 3.10+
- AWS credentials configured
- Amazon Bedrock AgentCore SDK
- Docker running
- Required environment variables for Langfuse (if using tracing)

In [None]:
# Install required packages
!pip install --force-reinstall -U -r requirements.txt --quiet

## Step 1: Create, Deploy, and Invoke the First Agent
This section will create the first agent file (`lab5_agent.py`), deploy it to AgentCore Runtime, and invoke it remotely.

### Agent 1: Built-in Tool Agent (calculator)
This agent demonstrates AgentCore Runtime's default observability mode using the built-in `calculator` tool from Strands. When deployed to AgentCore Runtime, observability is automatically enabled through AWS OpenTelemetry instrumentation, providing comprehensive tracing to Amazon CloudWatch without requiring external tools like Langfuse.

**AgentCore Observability on Amazon CloudWatch:**
- **Automatic instrumentation**: AgentCore Runtime automatically instruments your agent with OpenTelemetry
- **CloudWatch integration**: All traces, metrics, and logs are sent to Amazon CloudWatch
- **Agent lifecycle tracking**: Monitor agent invocations, tool usage, and response times
- **No additional setup**: Works out-of-the-box without external observability platforms

**Key features:**
- Uses the `calculator` tool for mathematical operations
- Deployed as a BedrockAgentCoreApp entrypoint with default observability
- Traces automatically appear in CloudWatch for monitoring and debugging

This demonstrates the simplest observability setup - AgentCore's built-in CloudWatch integration.

In [None]:
%%writefile lab5_agent.py
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from strands import Agent
from strands_tools import calculator

app = BedrockAgentCoreApp()
MODEL = "global.anthropic.claude-sonnet-4-20250514-v1:0"


@app.entrypoint
def lab5_agent(payload):
    user_input = payload.get("prompt")
    print("LAB5: User input:", user_input)

    agent = Agent(
        system_prompt="You're a helpful assistant. You can do simple math calculation.",
        tools=[calculator],
        model=MODEL,
        name="lab5-agentcore-agent",
    )

    response = agent(user_input)
    response_text = response.message["content"][0]["text"]

    print(f"LAB5: Response preview: {response_text[:100]}...")

    return response_text


if __name__ == "__main__":
    app.run()

In [None]:
# Deploy lab5_agent.py to AgentCore Runtime
from bedrock_agentcore_starter_toolkit import Runtime
from boto3.session import Session
import os
from pathlib import Path

boto_session = Session()
region = boto_session.region_name
agentcore_runtime = Runtime()

agent_name = "lab5_strands_agent"
agent_file = str(Path('lab5_agent.py').absolute())

print(f"Configuring agent: {agent_name}")
configure_response = agentcore_runtime.configure(
    entrypoint=agent_file,
    auto_create_execution_role=True,
    auto_create_ecr=True,
    requirements_file=str(Path('requirements.txt').absolute()),
    region=region,
    agent_name=agent_name,
    )

print("Deploying to AgentCore...")
launch_result = agentcore_runtime.launch(auto_update_on_conflict=True)
print(f"Agent deployed: {launch_result.agent_arn}")
print("\n✅ AgentCore observability automatically enabled:")
print("   - OpenTelemetry instrumentation active")
print("   - Traces will appear in Amazon CloudWatch")
print("   - No additional configuration required")

### Understanding AgentCore's Default Observability

When you deploy an agent to AgentCore Runtime using the starter toolkit, observability is automatically configured:

1. **OpenTelemetry Instrumentation**: The generated Dockerfile includes `opentelemetry-instrument` command
2. **CloudWatch Integration**: All telemetry data flows to Amazon CloudWatch
3. **Trace Collection**: Agent invocations, tool calls, and model interactions are traced
4. **No External Dependencies**: Unlike Steps 2 and 3 which use Langfuse, this uses AWS-native observability

You can view traces in the CloudWatch console under **X-Ray traces** or **CloudWatch Insights**.

In [None]:
# Invoke lab5_agent remotely using AgentCore API
import boto3
import json
import uuid

def parse_response(response):
    if 'response' in response:
        response_bytes = response['response']
        if hasattr(response_bytes, 'read'):
            return response_bytes.read().decode('utf-8')
        elif isinstance(response_bytes, bytes):
            return response_bytes.decode('utf-8')
        else:
            return str(response_bytes)
    return "No response content found"

def invoke_agent(agent_arn, prompt, media=None):
    payload_data = {"prompt": prompt}
    if media:
        payload_data["media"] = media
    try:
        response = data_client.invoke_agent_runtime(
            agentRuntimeArn=agent_arn,
            runtimeSessionId=str(uuid.uuid4()),
            payload=json.dumps(payload_data).encode()
        )
        return parse_response(response)
    except Exception as e:
        return f"Error invoking agent: {e}"

control_client = boto3.client('bedrock-agentcore-control')
data_client = boto3.client('bedrock-agentcore')

response = control_client.list_agent_runtimes()
agents = response.get('agentRuntimes', [])
lab5_agent = next((a for a in agents if 'lab5_strands_agent' in a['agentRuntimeName']), None)

if not lab5_agent:
    print("No lab5 agent found. Available agents:")
    for agent in agents:
        print(f"  - {agent['agentRuntimeName']}: {agent['agentRuntimeArn']}")
    exit(1)

agent_arn = lab5_agent['agentRuntimeArn']
print(f"Found agent: {agent_arn}")

# Example invocation
print("\n=== Test: Basic Agent ===")
response = invoke_agent(agent_arn, "Describe the image reading tool.")
print(response)

## Step 2: Create, Deploy, and Invoke the Second Agent

Now we'll explore using **Langfuse** instead of AgentCore's default CloudWatch observability. Langfuse provides advanced GenAIOps capabilities including detailed trace analysis, cost tracking, and performance analytics specifically designed for LLM applications.

This section will create the second agent file (`lab5_agent_tools.py`), deploy it to AgentCore Runtime, and invoke it remotely.

### Agent 2: Custom Tool Agent (AWS Health Status Checker)
This agent extends the previous setup by introducing a custom tool: `aws_health_status_checker`. It uses the AWS Health Dashboard RSS feed to check the operational status of AWS services in a specified region.
- **Key features:**
  - Defines a custom tool using the `@tool` decorator to fetch and parse AWS Health RSS feed.
  - Responds to user queries about AWS service health, optionally filtered by region and service name.
  - Langfuse tracing is enabled for all agent actions and tool invocations.
  - Deployed as a BedrockAgentCoreApp entrypoint, similar to Agent 1.

This agent demonstrates how to add custom Python tools to your agent and trace their usage for observability.

In [None]:
%%writefile lab5_agent_tools.py
import base64
import os
import xml.etree.ElementTree as ET
from typing import Dict, List, Optional
import requests
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from dotenv import load_dotenv
from strands import Agent, tool
from strands.telemetry import StrandsTelemetry
from strands_tools import image_reader

load_dotenv()
langfuse_public_key = os.environ.get("LANGFUSE_PUBLIC_KEY")
langfuse_secret_key = os.environ.get("LANGFUSE_SECRET_KEY")
langfuse_host = os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
LANGFUSE_AUTH = base64.b64encode(
    f"{langfuse_public_key}:{langfuse_secret_key}".encode()
    ).decode()
os.environ["LANGFUSE_PROJECT_NAME"] = "my_llm_project"
os.environ["DISABLE_ADOT_OBSERVABILITY"] = "true"
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = (
    os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com") + "/api/public/otel"
)
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"
for k in [
    "OTEL_EXPORTER_OTLP_LOGS_HEADERS",
    "AGENT_OBSERVABILITY_ENABLED",
    "OTEL_PYTHON_DISTRO",
    "OTEL_RESOURCE_ATTRIBUTES",
    "OTEL_PYTHON_CONFIGURATOR",
    "OTEL_PYTHON_EXCLUDED_URLS",
]:
    os.environ.pop(k, None)
app = BedrockAgentCoreApp()
MODEL = "global.anthropic.claude-sonnet-4-20250514-v1:0"
@tool
def aws_health_status_checker(region: str = "us-east-1", service_name: Optional[str] = None) -> Dict:
    try:
        rss_url = "https://status.aws.amazon.com/rss/all.rss"
        response = requests.get(rss_url, timeout=10)
        response.raise_for_status()
        root = ET.fromstring(response.content)
        items = root.findall(".//item")
        events = []
        for item in items:
            title = item.find("title").text if item.find("title") is not None else ""
            description = item.find("description").text if item.find("description") is not None else ""
            pub_date = item.find("pubDate").text if item.find("pubDate") is not None else ""
            link = item.find("link").text if item.find("link") is not None else ""
            if region.lower() in title.lower():
                if service_name is None or service_name.lower() in title.lower():
                    events.append({"title": title, "description": description, "published_date": pub_date, "link": link})
        if not events:
            return {"status": "healthy", "message": f"All services in {region} appear to be operating normally" if not service_name else f"{service_name} in {region} appears to be operating normally", "events": []}
        return {"status": "service_disruption", "message": f"Service disruptions detected in {region}", "events": events}
    except Exception as e:
        return {"status": "error", "message": f"Error checking AWS service status: {str(e)}", "events": []}
@app.entrypoint
def lab5_agent(payload):
    user_input = payload.get("prompt")
    print("LAB5: User input:", user_input)
    strands_telemetry = StrandsTelemetry()
    strands_telemetry.setup_otlp_exporter()
    agent = Agent(
        system_prompt="You are an AWS service checking agent, use the aws_service_status_checker to tell the user if the service status that they ask",
        tools=[aws_health_status_checker],
        model=MODEL,
        name="lab5_strands_agent_custom_tool_example",
        trace_attributes={
            "session.id": "aws-mcp-agent-demo-session",
            "user.id": "example-user@example.com",
            "langfuse.tags": ["AWS-Strands-Agent", "Custom-Tool"],
            "metadata": {"environment": "development", "version": "1.0.0", "query_type": "storage_recommendation"}
        }
    )
    response = agent(user_input)
    response_text = response.message["content"][0]["text"]
    print(f"LAB5: Response preview: {response_text[:100]}...")
    return response_text
if __name__ == "__main__":
    app.run()

In [None]:
# Deploy lab5_agent_tools.py to AgentCore Runtime
agent_name = "lab5_strands_agent_custom_tool_example"
agent_file = str(Path('lab5_agent_tools.py').absolute())

print(f"Configuring agent: {agent_name}")
configure_response = agentcore_runtime.configure(
    entrypoint=agent_file,
    auto_create_execution_role=True,
    auto_create_ecr=True,
    requirements_file=str(Path('requirements.txt').absolute()),
    region=region,
    agent_name=agent_name,
    )

dockerfile_path = Path("Dockerfile")
if dockerfile_path.exists():
    dockerfile_content = dockerfile_path.read_text()
    dockerfile_content = dockerfile_content.replace(
        'CMD ["opentelemetry-instrument", "python", "-m", "lab5_agent_tools"]',
        'CMD ["python", "-m", "lab5_agent_tools"]'
    )
    dockerfile_path.write_text(dockerfile_content)
    print("✅ Dockerfile modified to disable opentelemetry-instrument")

env_vars = {
    "LANGFUSE_HOST": os.environ.get("LANGFUSE_HOST"),
    "LANGFUSE_PUBLIC_KEY": os.environ.get("LANGFUSE_PUBLIC_KEY"),
    "LANGFUSE_SECRET_KEY": os.environ.get("LANGFUSE_SECRET_KEY"),
    "PYTHONUNBUFFERED": "1",
}

print("Deploying to AgentCore...")
launch_result = agentcore_runtime.launch(env_vars=env_vars, auto_update_on_conflict=True)
print(f"Agent deployed: {launch_result.agent_arn}")

In [None]:
# Invoke lab5_agent_tools remotely using AgentCore API
response = control_client.list_agent_runtimes()
agents = response.get('agentRuntimes', [])
lab5_agent_tool = next((a for a in agents if 'lab5_strands_agent_custom_tool_example' in a['agentRuntimeName']), None)

if not lab5_agent_tool:
    print("No lab5 agent tool found. Available agents:")
    for agent in agents:
        print(f"  - {agent['agentRuntimeName']}: {agent['agentRuntimeArn']}")
    exit(1)

agent_arn = lab5_agent_tool['agentRuntimeArn']
print(f"Found agent: {agent_arn}")

# Example invocation
print("\n=== Test: AWS Health Tool Agent ===")
response = invoke_agent(agent_arn, "Check the AWS service health for any disruptions in the us-east-1 region.")
print(response)

## Step 3: Create, Deploy, and Invoke the Third Agent

In this final step, we'll combine **Langfuse observability** with **MCP (Model Context Protocol) tools**. This showcases how Langfuse can trace complex multi-tool interactions and provide deep insights into agent orchestration patterns.

This section will create the third agent file (`lab5_agent_mcp.py`), deploy it to AgentCore Runtime, and invoke it remotely.

### Agent 3: MCP Tool Agent (AWS Documentation & Pricing)
This agent showcases advanced orchestration by integrating MCP (Model Context Protocol) tools for AWS documentation and pricing queries. It demonstrates how agents can leverage external MCP servers to answer complex questions.
- **Key features:**
  - Uses MCPClient to connect to AWS Documentation and AWS Pricing MCP servers.
  - Combines multiple MCP tools for multi-step reasoning and answers.
  - Custom system prompt enforces tool usage limits and efficient response strategy.
  - Langfuse tracing is enabled for all agent and MCP tool interactions.
  - Requires a custom IAM execution role for pricing permissions during deployment.

This agent is ideal for scenarios requiring external tool orchestration, context-aware responses, and enterprise-grade observability.

In [None]:
%%writefile lab5_agent_mcp.py
import base64
import os
import xml.etree.ElementTree as ET
from typing import Dict, List, Optional
import requests
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from dotenv import load_dotenv
from strands import Agent, tool
from strands.telemetry import StrandsTelemetry
from strands_tools import image_reader
from mcp import StdioServerParameters, stdio_client
from strands.tools.mcp import MCPClient
load_dotenv()
langfuse_public_key = os.environ.get("LANGFUSE_PUBLIC_KEY")
langfuse_secret_key = os.environ.get("LANGFUSE_SECRET_KEY")
langfuse_host = os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
LANGFUSE_AUTH = base64.b64encode(f"{langfuse_public_key}:{langfuse_secret_key}".encode()).decode()
os.environ["LANGFUSE_PROJECT_NAME"] = "my_llm_project"
os.environ["DISABLE_ADOT_OBSERVABILITY"] = "true"
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = (os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com") + "/api/public/otel")
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"
for k in ["OTEL_EXPORTER_OTLP_LOGS_HEADERS", "AGENT_OBSERVABILITY_ENABLED", "OTEL_PYTHON_DISTRO", "OTEL_RESOURCE_ATTRIBUTES", "OTEL_PYTHON_CONFIGURATOR", "OTEL_PYTHON_EXCLUDED_URLS"]:
    os.environ.pop(k, None)
app = BedrockAgentCoreApp()
MODEL = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"  # Use cross-region inference
@tool
def aws_health_status_checker(region: str = "us-east-1", service_name: Optional[str] = None) -> Dict:
    try:
        rss_url = "https://status.aws.amazon.com/rss/all.rss"
        response = requests.get(rss_url, timeout=10)
        response.raise_for_status()
        root = ET.fromstring(response.content)
        items = root.findall(".//item")
        events = []
        for item in items:
            title = item.find("title").text if item.find("title") is not None else ""
            description = item.find("description").text if item.find("description") is not None else ""
            pub_date = item.find("pubDate").text if item.find("pubDate") is not None else ""
            link = item.find("link").text if item.find("link") is not None else ""
            if region.lower() in title.lower():
                if service_name is None or service_name.lower() in title.lower():
                    events.append({"title": title, "description": description, "published_date": pub_date, "link": link})
        if not events:
            return {"status": "healthy", "message": f"All services in {region} appear to be operating normally" if not service_name else f"{service_name} in {region} appears to be operating normally", "events": []}
        return {"status": "service_disruption", "message": f"Service disruptions detected in {region}", "events": events}
    except Exception as e:
        return {"status": "error", "message": f"Error checking AWS service status: {str(e)}", "events": []}
@app.entrypoint
def lab5_agent(payload):
    user_input = payload.get("prompt")
    print("LAB5: User input:", user_input)
    strands_telemetry = StrandsTelemetry()
    strands_telemetry.setup_otlp_exporter()
    aws_docs_mcp = MCPClient(lambda: stdio_client(StdioServerParameters(command="uvx", args=["awslabs.aws-documentation-mcp-server@latest"])))
    aws_pricing_mcp = MCPClient(lambda: stdio_client(StdioServerParameters(command="uvx", args=["awslabs.aws-pricing-mcp-server@latest"])))
    with aws_docs_mcp, aws_pricing_mcp:
        mcp_tools = aws_docs_mcp.list_tools_sync() + aws_pricing_mcp.list_tools_sync()
        agent = Agent(
            system_prompt="""You are an AWS expert assistant. Follow these rules strictly:\n\n1. TOOL USAGE LIMITS:\n   - Maximum 3 tool calls per response\n   - For pricing questions: Use get_pricing_service_attributes ONCE, then answer completely\n   - For documentation: Use search_documentation ONCE with specific keywords\n   - NEVER make repetitive calls to the same tool\n\n2. RESPONSE STRATEGY:\n   - Provide comprehensive answers based on available tool results\n   - If first tool call provides sufficient information, answer immediately\n   - Do not search for additional information unless critically missing\n\n3. EFFICIENCY REQUIREMENTS:\n   - Combine multiple concepts in single tool calls\n   - Use your existing knowledge to supplement tool results\n   - Stop tool usage once you have enough information to answer\n\nAnswer user questions thoroughly but efficiently.""",
            tools=mcp_tools,
            model=MODEL,
            name="lab5_strands_agent_mcp_example",
            trace_attributes={
                "session.id": "aws-mcp-agent-demo-session",
                "user.id": "example-user@example.com",
                "langfuse.tags": ["AWS-Strands-Agent", "MCP-Tools"],
                "metadata": {"environment": "development", "version": "1.0.0", "query_type": "aws_assistance"}
            }
        )
        response = agent(user_input)
        response_text = response.message["content"][0]["text"]
        print(f"LAB5: Response preview: {response_text[:100]}...")
        return response_text
if __name__ == "__main__":
    app.run()

In [None]:
# Deploy lab5_agent_mcp.py to AgentCore Runtime (with custom execution role for pricing permissions)
import boto3
import json

iam = boto3.client('iam')
sts = boto3.client('sts')
account_id = sts.get_caller_identity()['Account']

role_name = f"BedrockAgentCore-MCP-ExecutionRole-{region}"
execution_role_arn = f"arn:aws:iam::{account_id}:role/{role_name}"

# Check if role exists, create if not
try:
    iam.get_role(RoleName=role_name)
    print(f"✅ Using existing role: {role_name}")
except iam.exceptions.NoSuchEntityException:
    trust_policy = {
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {"Service": "bedrock-agentcore.amazonaws.com"},
            "Action": "sts:AssumeRole"
        }]
    }
    execution_policy = {
        "Version": "2012-10-17",
        "Statement": [
            # ... (see deploy_mcp.py for full policy, or copy from above)
        ]
    }
    iam.create_role(
        RoleName=role_name,
        AssumeRolePolicyDocument=json.dumps(trust_policy),
        Description="Execution role for Bedrock AgentCore with AWS Pricing permissions"
    )
    iam.put_role_policy(
        RoleName=role_name,
        PolicyName='BedrockAgentCoreExecutionPolicy',
        PolicyDocument=json.dumps(execution_policy)
    )
    print(f"✅ Created role with pricing permissions: {role_name}")

agent_name = "lab5_strands_agent_mcp_example"
agent_file = str(Path('lab5_agent_mcp.py').absolute())

print(f"Configuring agent: {agent_name}")
configure_response = agentcore_runtime.configure(
    entrypoint=agent_file,
    execution_role=execution_role_arn,
    auto_create_ecr=True,
    requirements_file=str(Path('requirements.txt').absolute()),
    region=region,
    agent_name=agent_name,
    )

dockerfile_path = Path("Dockerfile")
if dockerfile_path.exists():
    dockerfile_content = dockerfile_path.read_text()
    dockerfile_content = dockerfile_content.replace(
        'CMD ["opentelemetry-instrument", "python", "-m", "lab5_agent_mcp"]',
        'CMD ["python", "-m", "lab5_agent_mcp"]'
    )
    dockerfile_path.write_text(dockerfile_content)
    print("✅ Dockerfile modified to disable opentelemetry-instrument")

env_vars = {
    "LANGFUSE_HOST": os.environ.get("LANGFUSE_HOST"),
    "LANGFUSE_PUBLIC_KEY": os.environ.get("LANGFUSE_PUBLIC_KEY"),
    "LANGFUSE_SECRET_KEY": os.environ.get("LANGFUSE_SECRET_KEY"),
    "PYTHONUNBUFFERED": "1",
    "PATH": "/usr/local/bin:/usr/bin:/bin",
}

print("Deploying to AgentCore...")
launch_result = agentcore_runtime.launch(env_vars=env_vars, auto_update_on_conflict=True)
print(f"Agent deployed: {launch_result.agent_arn}")

In [None]:
# Invoke lab5_agent_mcp remotely using AgentCore API
response = control_client.list_agent_runtimes()
agents = response.get('agentRuntimes', [])
lab5_agent_mcp = next((a for a in agents if 'lab5_strands_agent_mcp_example' in a['agentRuntimeName']), None)

if not lab5_agent_mcp:
    print("No lab5 MCP agent found. Available agents:")
    for agent in agents:
        print(f"  - {agent['agentRuntimeName']}: {agent['agentRuntimeArn']}")
    exit(1)

agent_arn = lab5_agent_mcp['agentRuntimeArn']
print(f"Found agent: {agent_arn}")

# Example invocation
print("\n=== Test: MCP Agent ===")
response = invoke_agent(agent_arn, "What is the pricing model for Amazon S3?")
print(response)

## Cleanup: Remove All Agent Resources

Clean up all AgentCore Runtime resources created in this lab to avoid ongoing charges.

In [None]:
# Clean up all AgentCore Runtime resources
import boto3

control_client = boto3.client('bedrock-agentcore-control')

# List all agent runtimes
response = control_client.list_agent_runtimes()
agents = response.get('agentRuntimes', [])

# Filter lab5 agents
lab5_agents = [a for a in agents if 'lab5' in a['agentRuntimeName'].lower()]

if not lab5_agents:
    print("No lab5 agents found to clean up.")
else:
    print(f"Found {len(lab5_agents)} lab5 agents to delete:")
    for agent in lab5_agents:
        print(f"  - {agent['agentRuntimeName']}")
    
    # Delete each agent
    for agent in lab5_agents:
        try:
            control_client.delete_agent_runtime(
                agentRuntimeArn=agent['agentRuntimeArn']
            )
            print(f"✅ Deleted: {agent['agentRuntimeName']}")
        except Exception as e:
            print(f"❌ Failed to delete {agent['agentRuntimeName']}: {e}")
    
    print("\n🧹 Cleanup completed. All lab5 agents have been removed.")