#### **How to use MlFlow with LLM**

We can also use the same tools that we have seen for traditional models:
- **Experiment tracking** collects models, prompts, traces and metrics in a single place. It collects further information related to document retrieval, data queries and tool calls. 
- **Experiment tracing**: collects runtime information like retrieval, tool calls, data queries etc. 
- **Packaging**: manage moving pieces of GenAI systems
- **Evaluation**: in this way it's possible to compare different models using latency, answer correctness etc. 
- **Model Serving**: they can be deployed on Kubernetes cluster, cloud providers etc. 
- **Prompt Engineering UI** is used to modify the prompt in order to obtain better results. 
- **MLflow AI Gateway**: for unified endpoint for deploying

The main difference between MlFlow serving and the MLflow AI Gateway is that the first one allows us to query the model through a HTTP request while the latter is an advanced service built on top of MLflow that allows easier deployment, scaling, and management of machine learning models across different environments and infrastructure.

#### **Tracing**

After setting an experiment, a Trace works as a run but with more information. 

MlFlow trace is characterized by:
- TraceInfo: the Trace Info within MLflow's tracing feature aims to provide a lightweight snapshot of critical data about the overall trace. This includes the logistical information about the trace, such as the experiment_id, providing the storage location for the trace, as well as trace-level data such as start time and total execution time. The Trace Info also includes tags and status information for the trace as a whole.
- TraceData: The Trace Data within MLflow's tracing feature provides the core of the trace information. Within this object is a list of Span objects that represent the individual steps of the trace. These **spans** are associated with one another in a hierarchical relationship, providing a clear order-of-operations linkage of what happened within your application during the trace. ach Span object contains information about the step being instrumented, including the span_id, name, start_time, parent_id, status, inputs, outputs, attributes, and events.

You can track both run information and trace during the same run: a new run will be created and the traces are stored within the race.

It's possible to define the spans with the decorator @mlflow.trace related to the function that will be called within the application or mlflow.start_span() to customize the span.

A trace is like a run that store more information related to the execution. 

Traces can be explored through MLFlow UI. 

In [1]:
import mlflow

# Enable auto-tracing
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
# Ollama offers open-source models
exp = mlflow.set_experiment("Ollama-v4")

In [2]:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",  # The local Ollama REST endpoint
    api_key="dummy",  # Required to instantiate OpenAI client, it can be a random string
)

In [30]:
# In this case, the trace appears both in the run and trace tab but they are the same

with mlflow.start_run():
    stream = client.chat.completions.create(
        model="llama3.2:1b",
        messages=[
            {"role": "system", "content": "You are a science teacher."},
            {"role": "user", "content": "Why is the sky blue?"},
        ],
        temperature=0.1,
        max_tokens=1000,   
        # stream = True # streaming data are stored in Event tab    
    )
    
    mlflow.log_param('p', 1)
    
    # for chunk in stream:
    #     print(chunk.choices[0].delta.content or "", end="")

🏃 View run nervous-stork-873 at: http://localhost:5000/#/experiments/923280573827939376/runs/06fe450af8284586b66654689c0569e6
🧪 View experiment at: http://localhost:5000/#/experiments/923280573827939376


#### **Trace Customization**

Using the autologging, the span type is defined by default. On the other hand, you can define the span type on your own:
- @mlflow.trace(span_type=SpanType.RETRIEVER) using predefined spans type
- with mlflow.start_span(name = 'add', span_type = 'MATH') is a new span type

You can mix autologging with span description.

For multiple invocations, set the same session_id.

In [3]:
import json
from openai import OpenAI
import mlflow
from mlflow.entities import SpanType

mlflow.openai.autolog()

with mlflow.start_run() as run: # otherwise Exception
    span = mlflow.start_span("train_model")
    # Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.
    @mlflow.trace(name = "GetWeather", span_type=SpanType.TOOL,  attributes={"Description": "Get wetaher for a given city"})
    def get_weather(city: str) -> str:
        if city == "Tokyo":
            return "sunny"
        elif city == "Paris":
            return "rainy"
        return "unknown"


    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "parameters": {
                    "type": "object",
                    "properties": {"city": {"type": "string"}},
                },
            },
        }
    ]

    _tool_functions = {"get_weather": get_weather}


    # Define a simple tool calling agent
    @mlflow.trace(span_type=SpanType.AGENT)
    def run_tool_agent(question: str):
        mlflow.update_current_trace(tags={"Agent": "Chat"})
        messages = [{"role": "user", "content": question}]

        # Invoke the model with the given question and available tools
        response = client.chat.completions.create(
            model="llama3.2:1b",
            messages=messages,
            tools=tools,
        )
        ai_msg = response.choices[0].message
        messages.append(ai_msg)

        # If the model request tool call(s), invoke the function with the specified arguments
        if tool_calls := ai_msg.tool_calls:
            for tool_call in tool_calls:
                function_name = tool_call.function.name
                if tool_func := _tool_functions.get(function_name):
                    args = json.loads(tool_call.function.arguments)
                    tool_result = tool_func(**args)
                else:
                    raise RuntimeError("An invalid tool is returned from the assistant!")

                messages.append(
                    {
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": tool_result,
                    }
                )

            # Sent the tool results to the model and get a new response
            response = client.chat.completions.create(
                model="llama3.2:1b", messages=messages
            )

        return response.choices[0].message.content

    # Run the tool calling agent
    question = "What's the weather like in Paris today?"
    answer = run_tool_agent(question)

🏃 View run big-tern-304 at: http://localhost:5000/#/experiments/923280573827939376/runs/37f775059bc245bba9d9b285837bb52e
🧪 View experiment at: http://localhost:5000/#/experiments/923280573827939376


#### **Query Traces**

In [4]:
# tables with information about request_id
mlflow.search_traces()

Unnamed: 0,request_id,trace,timestamp_ms,status,execution_time_ms,request,response,request_metadata,spans,tags,assessments
0,98ee4daa5daa40e2bbfcc5b637d59045,Trace(request_id=98ee4daa5daa40e2bbfcc5b637d59...,1742313806499,TraceStatus.OK,5587,{'question': 'What's the weather like in Paris...,It seems that the weather in Paris is currentl...,{'mlflow.sourceRun': 'b8ef5bfcb527470bba0ee2c9...,"[{'name': 'run_tool_agent', 'context': {'span_...","{'Agent': 'Chat', 'mlflow.artifactLocation': '...",[]
1,70d5ffcbf2c14c7c973df76b03449b86,Trace(request_id=70d5ffcbf2c14c7c973df76b03449...,1742313778413,TraceStatus.ERROR,45,{'question': 'What's the weather like in Paris...,,{'mlflow.sourceRun': 'bc8795352ab74029a37ebf07...,"[{'name': 'run_tool_agent', 'context': {'span_...","{'Agent': 'Chat', 'mlflow.artifactLocation': '...",[]
2,b2743314768543e8a3d6414fa368ca0b,Trace(request_id=b2743314768543e8a3d6414fa368c...,1742313776289,TraceStatus.ERROR,41,{'question': 'What's the weather like in Paris...,,{'mlflow.sourceRun': 'a637d4de6d004e12a2e91313...,"[{'name': 'run_tool_agent', 'context': {'span_...","{'Agent': 'Chat', 'mlflow.artifactLocation': '...",[]
3,5374367d987e4d53800ee55bb18c4bd0,Trace(request_id=5374367d987e4d53800ee55bb18c4...,1742313614432,TraceStatus.OK,14222,"{'model': 'llama3.2:1b', 'messages': [{'role':...","{'id': 'chatcmpl-307', 'choices': [{'finish_re...",{'mlflow.sourceRun': '06fe450af8284586b6665468...,"[{'name': 'Completions', 'context': {'span_id'...",{'mlflow.artifactLocation': 'mlflow-artifacts:...,[]


In [5]:
# list of request_id

from mlflow import MlflowClient

client = MlflowClient()

client.search_traces(experiment_ids=[exp.experiment_id])

[Trace(request_id=7e529c6708b64e079f928d9940394f1b),
 Trace(request_id=79231af57b3d4ea698f5d959e8facac4),
 Trace(request_id=7670eef0bd324d4c8e9c921df1b255bf),
 Trace(request_id=98ee4daa5daa40e2bbfcc5b637d59045),
 Trace(request_id=70d5ffcbf2c14c7c973df76b03449b86),
 Trace(request_id=b2743314768543e8a3d6414fa368ca0b),
 Trace(request_id=5374367d987e4d53800ee55bb18c4bd0)]

In [8]:
client.search_traces(
    experiment_ids=[exp.experiment_id],
    filter_string="trace.name = 'run_tool_agent'",
)

[Trace(request_id=7e529c6708b64e079f928d9940394f1b),
 Trace(request_id=79231af57b3d4ea698f5d959e8facac4),
 Trace(request_id=7670eef0bd324d4c8e9c921df1b255bf),
 Trace(request_id=98ee4daa5daa40e2bbfcc5b637d59045),
 Trace(request_id=70d5ffcbf2c14c7c973df76b03449b86),
 Trace(request_id=b2743314768543e8a3d6414fa368ca0b)]

In [9]:
client.search_traces(
    experiment_ids=[exp.experiment_id],
    filter_string="tag.Agent = 'Chat'",
)

[Trace(request_id=7e529c6708b64e079f928d9940394f1b),
 Trace(request_id=79231af57b3d4ea698f5d959e8facac4),
 Trace(request_id=7670eef0bd324d4c8e9c921df1b255bf),
 Trace(request_id=98ee4daa5daa40e2bbfcc5b637d59045),
 Trace(request_id=70d5ffcbf2c14c7c973df76b03449b86),
 Trace(request_id=b2743314768543e8a3d6414fa368ca0b)]

In [11]:
client.search_traces(
    experiment_ids=[exp.experiment_id],
    filter_string="tag.Status = 'OK'",
)

[]

Run and traces can be mixed with mlflow.start_run and mlflow.start_span. Indeed, in that case, a new trace will be created inside a run.  