# MLFlow Tracing for AI Agents 

This section we will show you how to use MLflow to see how Agent is thinking, selecting and executing tools to get the final response. 

## Load data & MLflow Package

In [1]:
import boto3
import os

from data.data import log_data_set, log_data
from data.solution_book import knowledge_base

import sagemaker_mlflow
import mlflow

print(sagemaker_mlflow.__version__)
print(mlflow.__version__)

0.1.0
3.0.0


> The latest MLflow version with Sagemaker AI is MLflow 3.0 and python 3.9 or later, You can find more information [here](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html). To make sure you can successfuly run use the compatible version, make sure you install `sagemaker_mlflow==0.1.0` and `mlflow=3.0.0`

Once you have started the Tracking Servers, you can copy the tracking_server_arn and use that to set the tracking uri. So that you can monitor your Agent experiment on the user interface. You can find it right here:

![MLflow Tracking Server ARN](./static/agent/2_agent_mlflow/mlflow_tracking_server_arn.png) 

Now you can put your tracking server arn to the following place holder string, you can give any experiment name, in this workshop, we will use experiment name: agent-mlflow-demo 

In [None]:
tracking_server_arn = "" 
experiment_name = "agent-mlflow-demo"
mlflow.set_tracking_uri(tracking_server_arn) 
mlflow.set_experiment(experiment_name)

<Experiment: artifact_location='s3://agent-mlflow/1', creation_time=1758206034961, experiment_id='1', last_update_time=1758206034961, lifecycle_stage='active', name='agent-mlflow-demo', tags={}>

Lets open the MLflow and take a look on the User Interface 

![Open MLflow](./static/agent/2_agent_mlflow/open_mlflow.png) 

Click ***Open MLflow***, you will get into the MLflow UI page, here you can see your **agent-mlflow-demo** under Experiments section on the top left 

## MLflow Tracing for LangGraph

Let's use the same example we did in the Agent section, before we show trace, let's set up the auto tracing for LangGraph 

In [3]:
mlflow.langchain.autolog()

> `mlflow.langchain.autolog()` is a function within the MLflow LangChain flavor that enables automatic logging of crucial details about LangChain models and their execution. This feature simplifies experiment tracking and analysis by eliminating the need for explicit logging statements. By default, `mlflow.langchain.autolog()`automatically logs traces of your LangChain components, providing a visual representation of data flow through chains, agents, and retrievers. This includes invocations of methods like invoke, batch, stream, ainvoke, abatch, astream, get_relevant_documents (for retrievers), and `__call__` (for Chains and AgentExecutors).

> For **Strands Agents**, you can use `mlflow.strands.autolog`. However, this only support for MLflow version great than 3.4.0

We will the exact same code we showed for LangGraph:

In [4]:
from langgraph.prebuilt import create_react_agent
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool

@tool 
def log_identifier(ticket_id: str) -> str:
    """Get error type from ticket number

    Args:
        ticket_id: ticket id

    Returns:
        an error type

    """
    if ticket_id not in log_data_set:
        return "ticket id not found in the database"
    
    for item in log_data:
        if item["id"] == ticket_id:
            return item['error_name']

@tool(return_direct=True)
def information_retriever(error_type: str) -> str:
    """Retriever error solution based on error type

    Args:
        error_type: user input error type
    
    Returns:
        a str of steps 
    """

    if error_type not in knowledge_base.keys():
        return "error type not found in the knowledge base, please use your own knowledge"
    
    return knowledge_base[error_type]

llm = init_chat_model(
    model= "us.anthropic.claude-3-5-haiku-20241022-v1:0",
    model_provider="bedrock_converse",
)

system_prompt = """
You are an expert a resolving ETL errors. You are equiped with two tools: 
1. log_identifier: Get error type from ticket number
2. information_retriever: Retriever error solution based on error type

You will use the ticket ID to gather information about the error using the log_identifier tool. 
Then you should search the database for information on how to resolve the error using the information_retriever tool

Return ONLY the numbered steps without any introduction or conclusion. Format as:
1. step 1 text
2. step 2 text
...
"""

agent = create_react_agent(
    model=llm,
    tools= [log_identifier, information_retriever], 
    prompt=system_prompt
)

def get_langGraph_agent_response(ticket_id = 'TICKET-001'):
    # Prepare input for the agent
    agent_input = {"messages": [{"role": "user", "content": ticket_id}]}
    response = agent.invoke(agent_input)
    return response 

langGraph_agent_response = get_langGraph_agent_response(ticket_id = 'TICKET-001')
print(langGraph_agent_response['messages'][-1].content)

content.str
  Input should be a valid string [type=string_type, input_value=[{'type': 'tool_use', 'na...AHhtWz0QzWx-VRT_Xr93w'}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.11/v/string_type
content.list[tagged-union[TextContentPart,ImageContentPart,AudioContentPart]].0
  Input tag 'tool_use' found using 'type' does not match any of the expected tags: 'text', 'image_url', 'input_audio' [type=union_tag_invalid, input_value={'type': 'tool_use', 'nam...gAHhtWz0QzWx-VRT_Xr93w'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/union_tag_invalid


1. Check network connectivity between client and server
2. Verify if the server is running and accessible
3. Increase the connection timeout settings
4. Check for firewall rules blocking the connection
5. Monitor network latency and bandwidth
    


Now when you can invoke the Agent to get the results for TICKET-001

:::code{showCopyAction=true showLineNumbers=true language=python}
langGraph_agent_response = get_langGraph_agent_response(ticket_id = 'TICKET-001')
print(langGraph_agent_response)
:::

You will see the tracing showing in the UI, when you have more results, you will see more traces in the dashboard

![MLflow Trace](./static/agent/2_agent_mlflow/trace-example.png) 

When you click into the Request ID, you can see all the agent trajectory for both Agent decision and Tool use

![MLflow Trace](./static/agent/2_agent_mlflow/trace-details.png) 

Lets take a look at how the Agent works. At the initial stage, the LLM start reasoning and decide which tool to use, it will generate tool name and tool arguments to execute the tool 

![MLflow Trace](./static/agent/2_agent_mlflow/langgraph-step1.png) 

Then the Agent will execute the tool, you can see the tool name, arguments details and output in the orange box

![MLflow Trace](./static/agent/2_agent_mlflow/langgraph-tool1.png) 

After tool execution, the agent will decide whether to keep choosing tools based on the current environment  

![MLflow Trace](./static/agent/2_agent_mlflow/langgraph-step2.png) 

Then the Agent will execute the tool, since we use directly output the tool results as the final answer. The agent will stop it here

![MLflow Trace](./static/agent/2_agent_mlflow/langgraph-tool2.png) 
