# Agent Tutorial: Building Intelligent ETL Error Resolution Agents

For this tutorial, you will learn how to create an agent for both LangGraph and Strands Agent. We'll build an intelligent ETL error resolution Agent that can analyze error tickets and provide step-by-step solutions. Our agent will be equipped with tools (log_identifer, information_retriever) to identify errors from ticket IDs and retrieve relevant solutions from a knowledge base.

![Sample Architect](./static/agent/1_agent/sample-architect.png) 

Here is the ideal resolution steps:  
1. Taking a ticket ID as input
2. Using the log_identifier tool to find the error type associated with that ticket
3. Using the information_retriever tool to get step-by-step solutions
4. Providing clear, actionable resolution steps to the user

Let's start by setting up our data and building the agent step by step. 

### Load data and libaries 

In [1]:
import boto3
import os
from data.data import log_data_set, log_data
from data.solution_book import knowledge_base

try:
    boto3.client('bedrock-runtime')
except Exception as e:
    print(f"Error configuring AWS credentials: {e}")
    print("Please set your AWS credentials before proceeding.")

First, let's examine our sample synthesis data that represents a typical support ticket system. log_data is a list of dictionaries contains ticked id and error_name, for this workshop, we will use a simplified version to extract the error_name based on the ticket_id 

In [2]:
log_data

[{'id': 'TICKET-001', 'error_name': 'Connection Timeout'},
 {'id': 'TICKET-002', 'error_name': 'Database Authentication Failed'},
 {'id': 'TICKET-003', 'error_name': 'Memory Overflow'},
 {'id': 'TICKET-004', 'error_name': 'API Rate Limit Exceeded'},
 {'id': 'TICKET-005', 'error_name': 'Invalid SSL Certificate'},
 {'id': 'TICKET-006', 'error_name': 'Disk Space Full'},
 {'id': 'TICKET-007', 'error_name': 'Network Connectivity Lost'},
 {'id': 'TICKET-008', 'error_name': 'Permission Denied'},
 {'id': 'TICKET-009', 'error_name': 'Service Unavailable'},
 {'id': 'TICKET-010', 'error_name': 'Configuration File Missing'}]

We use a dictionary to simulate knowledgebase usage, here the key is the error name and the value is a series of steps

In [3]:
knowledge_base

{'Connection Timeout': '1. Check network connectivity between client and server\n2. Verify if the server is running and accessible\n3. Increase the connection timeout settings\n4. Check for firewall rules blocking the connection\n5. Monitor network latency and bandwidth\n    ',
 'Database Authentication Failed': '1. Verify database credentials are correct\n2. Check if the database user account is locked\n3. Ensure database service is running\n4. Review database access permissions\n5. Check for recent password changes\n    ',
 'Memory Overflow': '1. Analyze application memory usage patterns\n2. Increase available memory or swap space\n3. Look for memory leaks in the application\n4. Optimize database queries and caching\n5. Consider implementing memory pooling\n    ',
 'API Rate Limit Exceeded': '1. Implement request throttling\n2. Use caching to reduce API calls\n3. Review API usage patterns\n4. Contact service provider for limit increase\n5. Optimize API call frequency\n    ',
 'Invali

### LangGraph

First Lets import libaries from LangGraph Framework. here: 
1. `create_react_agent`: Create an agent that uses ReAct prompting.
2. `init_chat_model`: Initialize a ChatModel in a single line using the model’s name and provider.
3. `langchain_core.tools`: Tool that takes in function or coroutine directly. You can customize your tool with `@tool` decoration

In [4]:
from langgraph.prebuilt import create_react_agent
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool

Now let's create the tools that our agent will use to interact with the data:

In [5]:
@tool 
def log_identifier(ticket_id: str) -> str:
    """Get error type from ticket number

    Args:
        ticket_id: ticket id

    Returns:
        an error type

    """
    if ticket_id not in log_data_set:
        return "ticket id not found in the database"
    
    for item in log_data:
        if item["id"] == ticket_id:
            return item['error_name']

@tool(return_direct=True)
def information_retriever(error_type: str) -> str:
    """Retriever error solution based on error type

    Args:
        error_type: user input error type
    
    Returns:
        a str of steps 
    """

    if error_type not in knowledge_base.keys():
        return "error type not found in the knowledge base, please use your own knowledge"
    
    return knowledge_base[error_type]



**Explanation**: 
1. The `log_identifier` tool is our first agent tool. It takes a ticket ID as input and searches through our log data to find the corresponding error type. The `@tool` decorator from LangChain converts this function into a tool that the agent can use. Notice how we include a detailed docstring - this is crucial as the agent uses this information to understand when and how to use the tool.
2. The `information_retriever` tool looks up solutions in our knowledge base. The return_direct=True parameter is important here - it tells the agent to return the result directly to the user without further processing, which is perfect for our final solution steps.

We add both `@tool` decorator above the function definition to indicate this is a LangGraph tool

### Setting Up Language Model

 We initialize our language model using Claude 3.5 Haiku through AWS Bedrock. This model will power our agent's reasoning and decision-making capabilities. The init_chat_model function provides a standardized way to initialize different LLM providers.

In [6]:
llm = init_chat_model(
    model= "us.anthropic.claude-3-5-haiku-20241022-v1:0",
    model_provider="bedrock_converse",
)

The system prompt is crucial for defining the agent's personality and behavior. We tell the agent it's an ETL error resolution expert and provide clear instructions about the available tools and the expected workflow. The formatting requirements ensure consistent, clean output that users can easily follow.

The `create_react_agent` function creates a ReAct (Reasoning and Acting) agent. This type of agent can reason about problems and decide which tools to use in what order. We pass in our language model, the tools we created, and our system prompt to define the agent's capabilities and behavior.

In [7]:
system_prompt = """
You are an expert a resolving ETL errors. You are equiped with two tools: 
1. log_identifier: Get error type from ticket number
2. information_retriever: Retriever error solution based on error type

You will use the ticket ID to gather information about the error using the log_identifier tool. 
Then you should search the database for information on how to resolve the error using the information_retriever tool

Return ONLY the numbered steps without any introduction or conclusion. Format as:
1. step 1 text
2. step 2 text
...
"""

agent = create_react_agent(
    model=llm,
    tools= [log_identifier, information_retriever], 
    prompt=system_prompt
)

Now let's define a clean interface for interacting with our agent. It formats the input as a message (following the chat format), invokes the agent, and extracts the final response content. The agent will automatically use the tools in the correct sequence to resolve the ticket.

In [8]:
def get_langGraph_agent_response(ticket_id = 'TICKET-001'):
    # Prepare input for the agent
    agent_input = {"messages": [{"role": "user", "content": ticket_id}]}
    response = agent.invoke(agent_input)
    return response

Finally, we test our agent with a sample ticket ID. The agent should:

1. Use the log_identifier tool to find that TICKET-001 corresponds to "Connection Timeout"
2. Use the information_retriever tool to get the solution steps
3. Return the formatted solution steps to the user

When you run this code, you should see a numbered list of steps for resolving connection timeout issues, demonstrating that your agent successfully chained the tools together to provide a complete solution.

In [9]:
langGraph_agent_response = get_langGraph_agent_response(ticket_id = 'TICKET-001')
print(langGraph_agent_response['messages'][-1].content)

1. Check network connectivity between client and server
2. Verify if the server is running and accessible
3. Increase the connection timeout settings
4. Check for firewall rules blocking the connection
5. Monitor network latency and bandwidth
    


### Strands Agents

Strands Agent has very similar customize tool functionality like LangGraph, Lets first import libaries from Strands Agents Framework. here: 

In [10]:
from strands import Agent, tool 


In Strands Agents, you can use similar strands `@tool` decoration to define your own tools. This is exactly the same to how we define tools for LangGraph Agent

In [11]:
@tool 
def log_identifier(ticket_id: str) -> str:
    """Get error type from ticket number

    Args:
        ticket_id: ticket id

    Returns:
        an error type

    """
    if ticket_id not in log_data_set:
        return "ticket id not found in the database"
    
    for item in log_data:
        if item["id"] == ticket_id:
            return item['error_name']

@tool
def information_retriever(error_type: str) -> str:
    """Retriever error solution based on error type

    Args:
        error_type: user input error type
    
    Returns:
        a str of steps 
    """

    if error_type not in knowledge_base.keys():
        return "error type not found in the knowledge base, please use your own knowledge"
    
    return knowledge_base[error_type]



We will use the same system prompt like we did in LangGraph Agent, in Strands Agents, you can define an Agent using Agent interface:
`Agent(model = , system_prompt = , tools = [])`: 

In [12]:
system_prompt = """
You are an expert a resolving ETL errors. You are equiped with two tools: 
1. log_identifier: Get error type from ticket number
2. information_retriever: Retriever error solution based on error type

You will use the ticket ID to gather information about the error using the log_identifier tool. 
Then you should search the database for information on how to resolve the error using the information_retriever tool

Return ONLY the numbered steps without any introduction or conclusion. Format as:
1. step 1 text
2. step 2 text
...
"""
strands_agent = Agent(model="us.anthropic.claude-3-5-haiku-20241022-v1:0", system_prompt= system_prompt, tools= [log_identifier, information_retriever])

Now let's define a `get_strands_agent_response` method to invokes the agent, and extracts the final response content. The agent will automatically use the tools in the correct sequence to resolve the ticket.

In [13]:
def get_strands_agent_response(ticket_id = 'TICKET-001'):
    strands_agent_result = strands_agent('TICKET-001')

    try: 
        return strands_agent_result.message['content'][-1]['text']
    except:
        return strands_agent_result.message['content']

strands_agent_response = get_strands_agent_response(ticket_id = 'TICKET-001')
print("\nFinal Answer: \n", strands_agent_response)

I'll help you resolve the error for TICKET-001 by first identifying the error type and then retrieving the solution.
Tool #1: log_identifier
Now, I'll retrieve the solution for the Connection Timeout error:
Tool #2: information_retriever
1. Check network connectivity between client and server
2. Verify if the server is running and accessible
3. Increase the connection timeout settings
4. Check for firewall rules blocking the connection
5. Monitor network latency and bandwidth
Final Answer: 
 1. Check network connectivity between client and server
2. Verify if the server is running and accessible
3. Increase the connection timeout settings
4. Check for firewall rules blocking the connection
5. Monitor network latency and bandwidth
