#### **How to use MlFlow with LLM**

We can also use the same tools that we have seen for traditional models:
- **Experiment tracking** collects models, prompts, traces and metrics in a single place. It collects further information related to document retrieval, data queries and tool calls. 
- **Experiment Tracing**: collects runtime information like retrieval, tool calls, data queries etc. 
- **Packaging**: manage moving pieces of GenAI systems
- **Evaluation**: in this way it's possible to compare different models using latency, answer correctness etc. 
- **Model Serving**: they can be deployed on Kubernetes cluster, cloud providers etc. 
- **Prompt Engineering UI** is used to modify the prompt in order to obtain better results. 
- **MLflow AI Gateway**: for unified endpoint for deploying

The main difference between MlFlow serving and the MLflow AI Gateway is that the first one allows us to query the model through a HTTP request while the latter is an advanced service built on top of MLflow that allows easier deployment, scaling, and management of machine learning models across different environments and infrastructure.

#### **Tracing**

In [None]:
import mlflow
import os 

# os.environ["MLFLOW_TRACKING_TOKEN"] = ""
# os.environ["OPENAI_API_KEY"] = ""

In [25]:
import mlflow

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Ollama-v3")

<Experiment: artifact_location='mlflow-artifacts:/486592786572829437', creation_time=1742222202058, experiment_id='486592786572829437', last_update_time=1742222202058, lifecycle_stage='active', name='Ollama-v3', tags={}>

In [26]:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",  # The local Ollama REST endpoint
    api_key="dummy",  # Required to instantiate OpenAI client, it can be a random string
)

response = client.chat.completions.create(
    model="llama3.2:1b",
    messages=[
        {"role": "system", "content": "You are a science teacher."},
        {"role": "user", "content": "Why is the sky blue?"},
    ],
)

MlFlow traces is characterized by:
- TraceInfo: every features related to experiment and runs such as the duration etc.
- TraceData: each information related to runtime. Each trace is made up of multiple spans: they record key, critical data about each of the steps within your genai application. 

It's possible to define the spans with the decorator @mlflow.trace related to the function that will be called within the application or mlflow.start_span() to customize the span.

A trace is like a run that store information of the execution. 

In [4]:
import mlflow

@mlflow.trace(span_type="func", attributes={"key": "value"})
def add_1(x):
    return x + 1


@mlflow.trace(span_type="func", attributes={"key1": "value1"})
def minus_1(x):
    return x - 1


@mlflow.trace(name="Trace Test")
def trace_test(x):
    step1 = add_1(x)
    return minus_1(step1)


trace_test(4)

4

In [22]:
mlflow.search_traces()

Unnamed: 0,request_id,trace,timestamp_ms,status,execution_time_ms,request,response,request_metadata,spans,tags,assessments
0,e12f87c5ca0d4d66a52706bbf5a1d569,Trace(request_id=e12f87c5ca0d4d66a52706bbf5a1d...,1742222240588,TraceStatus.OK,16471,"{'model': 'llama3.2:1b', 'messages': [{'role':...","{'id': 'chatcmpl-690', 'choices': [{'finish_re...","{'mlflow.traceInputs': '{""model"": ""llama3.2:1b...","[{'name': 'Completions', 'context': {'span_id'...",{'mlflow.artifactLocation': 'mlflow-artifacts:...,[]
1,248fd79e3ac148e1bfbedb3712b945fc,Trace(request_id=248fd79e3ac148e1bfbedb3712b94...,1742222204166,TraceStatus.OK,14722,"{'model': 'llama3.2:1b', 'messages': [{'role':...","{'id': 'chatcmpl-897', 'choices': [{'finish_re...","{'mlflow.traceInputs': '{""model"": ""llama3.2:1b...","[{'name': 'Completions', 'context': {'span_id'...",{'mlflow.artifactLocation': 'mlflow-artifacts:...,[]


Run and traces can be mixed with mlflow.start_run and mlflow.start_span. Indeed, in that case, a new trace will be created inside a run.  