# Sagemaker MLflow Agent Introduction

For this tutorial, you will learn how to create an agent via LangGraph framework and use MLflow to log trace. We'll build an intelligent ETL error resolution Agent that can analyze error tickets and provide step-by-step solutions. Our agent will be equipped with two agent tools (log_identifer, information_retriever) to: 
1. Identify errors from ticket IDs 
2. Retrieve relevant solutions from a solution book.

![Sample Architect](./static/sample-architect.png) 

Here is the ideal resolution steps:  
1. User ask help for a ticket with ticket_id
2. Using the log_identifier tool to find the error type associated with that ticket
3. Using the information_retriever tool to get step-by-step solutions
4. Providing clear, actionable resolution steps to the user

Let's start by setting up our data and building the agent step by step. 

In [None]:
import os
os.environ["AWS_PROFILE"] = "mlflow-workshop"
os.environ['AWS_REGION'] = 'us-east-1'

## Load data and libaries 

In [None]:
import boto3
import os

## Import Data 
from data.data import log_data_set, log_data
from data.solution_book import solution_book

## Import MLflow Libs
import sagemaker_mlflow 
import mlflow

## Check AWS Credentials 
try:
    boto3.client('bedrock-runtime')
except Exception as e:
    print(f"Error configuring AWS credentials: {e}")
    print("Please set your AWS credentials before proceeding.")

> The latest MLflow version with Sagemaker AI is MLflow 3.0 and python 3.9 or later, You can find more information [here](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html). To make sure you can successfuly run use the compatible version, make sure you install `sagemaker_mlflow==0.1.0` and `mlflow=3.0.0`


First, let's examine our sample synthesis data that represents a typical support ticket system. `log_data` is a list of dictionaries contains ticked id and error name, for this workshop, we will use a simplified version to extract the error_name based on the ticket_id. `solution_book` is a dictionary where the key is the error name, the value is solution steps. Here we use the dictionary to mimic the real world use case solution which normally use Vector Store Knowledge Base to retrieve relevant solutions

In [None]:
log_data

In [None]:
solution_book

## MLflow Setting Up

Now you can put your tracking server arn to the following place holder string, you can give any experiment name, in this workshop, we will use experiment name: agent-mlflow-demo 

In [None]:
tracking_server_arn = "ENTER YOUR MLFLOW TRACKIHG SERVER ARN HERE" 
experiment_name = "agent-mlflow-demo"
mlflow.set_tracking_uri(tracking_server_arn) 
mlflow.set_experiment(experiment_name)

Since we will use LangGraph agent, let's set up the auto tracing for LangGraph. 

> `mlflow.langchain.autolog()` is a function within the MLflow LangChain flavor that enables automatic logging of crucial details about LangChain models and their execution. This feature simplifies experiment tracking and analysis by eliminating the need for explicit logging statements. By default, `mlflow.langchain.autolog()`automatically logs traces of your LangChain components, providing a visual representation of data flow through chains, agents, and retrievers. This includes invocations of methods like invoke, batch, stream, ainvoke, abatch, astream, get_relevant_documents (for retrievers), and `__call__` (for Chains and AgentExecutors).

> For **Strands Agents**, you can use `mlflow.strands.autolog`. However, this only support for MLflow version great than 3.4.0

In [None]:
mlflow.langchain.autolog()

## LangGraph Agent Implementation

First Lets import libaries from LangGraph Framework. here: 
1. `create_react_agent`: Create an agent that uses ReAct prompting.
2. `init_chat_model`: Initialize a ChatModel in a single line using the model’s name and provider.
3. `langchain_core.tools`: Tool that takes in function or coroutine directly. You can customize your tool with `@tool` decoration

In [None]:
from langgraph.prebuilt import create_react_agent
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool

Now let's create the tools that our agent will use to interact with the data:

In [None]:
@tool 
def log_identifier(ticket_id: str) -> str:
    """Get error type from ticket number

    Args:
        ticket_id: ticket id

    Returns:
        an error type

    """
    if ticket_id not in log_data_set:
        return "ticket id not found in the database"
    
    for item in log_data:
        if item["id"] == ticket_id:
            return item['error_name']

@tool(return_direct=True)
def information_retriever(error_type: str) -> str:
    """Retriever error solution based on error type

    Args:
        error_type: user input error type
    
    Returns:
        a str of steps 
    """

    if error_type not in solution_book.keys():
        return "error type not found in the knowledge base, please use your own knowledge"
    
    return solution_book[error_type]



**Explanation**: 
1. The `log_identifier` tool is our first agent tool. It takes a ticket ID as input and searches through our log data to find the corresponding error type. The `@tool` decorator from LangChain converts this function into a tool that the agent can use. Notice how we include a detailed docstring - this is crucial as the agent uses this information to understand when and how to use the tool.
2. The `information_retriever` tool looks up solutions in our knowledge base. The return_direct=True parameter is important here - it tells the agent to return the result directly to the user without further processing, which is perfect for our final solution steps.

We add both `@tool` decorator above the function definition to indicate this is a LangGraph tool

### Setting Up Language Model

 We initialize our language model using Claude 3.5 Haiku through AWS Bedrock. This model will power our agent's reasoning and decision-making capabilities. The init_chat_model function provides a standardized way to initialize different LLM providers.

In [None]:
llm = init_chat_model(
    model= "us.anthropic.claude-3-5-haiku-20241022-v1:0",
    model_provider="bedrock_converse",
)

The system prompt is crucial for defining the agent's personality and behavior. We tell the agent it's an ETL error resolution expert and provide clear instructions about the available tools and the expected workflow. The formatting requirements ensure consistent, clean output that users can easily follow.

The `create_react_agent` function creates a ReAct (Reasoning and Acting) agent. This type of agent can reason about problems and decide which tools to use in what order. We pass in our language model, the tools we created, and our system prompt to define the agent's capabilities and behavior.

In [None]:
system_prompt = """
You are an expert a resolving ETL errors. You are equiped with two tools: 
1. log_identifier: Get error type from ticket number
2. information_retriever: Retriever error solution based on error type

You will use the ticket ID to gather information about the error using the log_identifier tool. 
Then you should search the database for information on how to resolve the error using the information_retriever tool

Return ONLY the numbered steps without any introduction or conclusion. Format as:
1. step 1 text
2. step 2 text
...
"""

agent = create_react_agent(
    model=llm,
    tools= [log_identifier, information_retriever], 
    prompt=system_prompt
)

Now let's define a clean interface for interacting with our agent. It formats the input as a message (following the chat format), invokes the agent, and extracts the final response content. The agent will automatically use the tools in the correct sequence to resolve the ticket.

In [None]:
def get_langGraph_agent_response(user_prompt):
    # Prepare input for the agent
    agent_input = {"messages": [{"role": "user", "content": user_prompt}]}
    response = agent.invoke(agent_input)
    return response['messages'][-1].content

Finally, we test our agent with a sample ticket ID. The agent should:

1. Use the log_identifier tool to find that TICKET-001 corresponds to "Connection Timeout"
2. Use the information_retriever tool to get the solution steps
3. Return the formatted solution steps to the user

When you run this code, you should see a numbered list of steps for resolving connection timeout issues, demonstrating that your agent successfully chained the tools together to provide a complete solution.

In [None]:
langGraph_agent_response = get_langGraph_agent_response(user_prompt = 'Can you help me with this ticket_id : TICKET-001?')

In [None]:
print(langGraph_agent_response)

Now you should be able to see the tracing in the MLflow