
# Mosaic AI Agent Framework: Author, deploy, and trace a simple agent

This notebook demonstrates how to build and manage a simple gen AI agent:
- Author a simple gen AI agent with the MLflow 3 `ResponsesAgent` API.
- Manually test the agent, and run batch evaluation using MLflow.
- Log and deploy the agent with Mosaic AI Agent Framework.
- Trace and monitor the agent in real time.

You can use this pattern with any Agent Framework agent ([AWS](https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/agent-framework/author-agent) | [GCP](https://docs.databricks.com/gcp/en/generative-ai/agent-framework/author-agent)).

MLflow 3 ([AWS](https://docs.databricks.com/aws/en/mlflow3/genai) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/mlflow3/genai/) | [GCP](https://docs.databricks.com/gcp/en/mlflow3/genai) | [OSS](https://mlflow.org/docs/latest/genai)) provides observability features allowing you to:
- Track quality and operational performance (latency, request volume, errors, etc.)
- Run LLM-based evaluations on production traffic to detect drift or regressions using Agent Evaluation's LLM judges
- Deep dive into individual requests to debug and improve agent responses.
- Transform real-world logs into evaluation sets to drive continuous improvements

## Prerequisites

Address `TODO`s in this notebook before clicking `Run all`.

In [0]:
%pip install -q backoff databricks-openai uv databricks-agents
dbutils.library.restartPython()


## Define the agent in code
Define the agent code in a single cell below. This lets you easily write the agent code to a local Python file `agent.py`, using the `%%writefile` magic command, for subsequent logging and deployment.

In [0]:
%%writefile agent.py
import json
import warnings
from typing import Any, Generator

import backoff
import mlflow
import openai
from databricks.sdk import WorkspaceClient
from mlflow.entities import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
)
from openai import OpenAI

# TODO: Replace with your model serving endpoint
LLM_ENDPOINT_NAME = "databricks-claude-3-7-sonnet"

# TODO: Update with your system prompt
SYSTEM_PROMPT = """
You are a helpful assistant that provides brief, clear responses.
"""


class SimpleChatAgent(ResponsesAgent):
    """
    Simple chat agent that calls an LLM using the Databricks OpenAI client API.

    You can replace this with your own agent.
    The decorators @mlflow.trace tell MLflow Tracing to track calls to the agent.
    """

    def __init__(self):
        self.workspace_client = WorkspaceClient()
        self.client: OpenAI = self.workspace_client.serving_endpoints.get_open_ai_client()
        self.llm_endpoint = LLM_ENDPOINT_NAME
        self.SYSTEM_PROMPT = SYSTEM_PROMPT

    @backoff.on_exception(backoff.expo, openai.RateLimitError)
    @mlflow.trace(span_type=SpanType.LLM)
    def call_llm(self, messages: list[dict[str, Any]]) -> Generator[dict[str, Any], None, None]:
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", message="PydanticSerializationUnexpectedValue")
            for chunk in self.client.chat.completions.create(
                model=self.llm_endpoint,
                messages=self.prep_msgs_for_cc_llm(messages),
                stream=True,
            ):
                yield chunk.to_dict()

    # With autologging, you do not need @mlflow.trace here, but you can add it to override the span type.
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        outputs = [
            event.item
            for event in self.predict_stream(request)
            if event.type == "response.output_item.done"
        ]
        return ResponsesAgentResponse(output=outputs, custom_outputs=request.custom_inputs)

    # With autologging, you do not need @mlflow.trace here, but you can add it to override the span type.
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        messages = [{"role": "system", "content": SYSTEM_PROMPT}] + [
            i.model_dump() for i in request.input
        ]
        yield from self.output_to_responses_items_stream(chunks=self.call_llm(messages))


mlflow.openai.autolog()
AGENT = SimpleChatAgent()
mlflow.models.set_model(AGENT)

## Test the agent

Interact with the agent to test its output. 

Since you manually traced methods within `ResponsesAgent`, you can view the trace for each step the agent takes, with any LLM calls made via the OpenAI SDK automatically traced by autologging.

Replace this placeholder input with an appropriate domain-specific example for your agent.

In [0]:
dbutils.library.restartPython()

In [0]:
from agent import AGENT

AGENT.predict({"input": [{"role": "user", "content": "What is 5+5?"}]})

In [0]:
for event in AGENT.predict_stream(
    {"input": [{"role": "user", "content": "What is 5+5?"}]}
):
    print(event, "-----------\n")

### Log the agent as an MLflow model, and register it to Unity Catalog

Log the agent as code from the `agent.py` file. See [MLflow - Models from Code](https://mlflow.org/docs/latest/models.html#models-from-code).

In the same logging call, we can register the model to Unity Catalog, which will be needed for deploying the agent in the next step.  Read the Databricks documentation to learn more about Models in Unity Catalog ([AWS](https://docs.databricks.com/aws/en/machine-learning/manage-model-lifecycle/) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/manage-model-lifecycle/) | [GCP](https://docs.databricks.com/gcp/en/machine-learning/manage-model-lifecycle/)).

In [0]:
import mlflow
from agent import LLM_ENDPOINT_NAME
from mlflow.models.resources import DatabricksServingEndpoint
from pkg_resources import get_distribution

# The model registry is already set to Databricks Unity Catalog by default,
# but you can change the registry below as needed.
mlflow.set_registry_uri("databricks-uc")

# TODO: For a UC-registered model, define the catalog, schema, and model name:
catalog = "AgenticAI"
schema = "week1"
model_name = "simple-agent"
UC_MODEL_NAME = f"{catalog}.{schema}.{model_name}"

with mlflow.start_run():
    logged_agent_info = mlflow.pyfunc.log_model(
        # Change the model/agent name to be more descriptive for your use case:
        name="agent",
        # Specify the model via the python file created above:
        python_model="agent.py",
        # Pin all required dependencies to compatible versions to avoid environment build errors
        extra_pip_requirements=[
            f"mlflow=={get_distribution('mlflow').version}",  # Pin to current mlflow version
        ],
        resources=[DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT_NAME)],
        # This optional parameter lets you register the model at the same time as logging it:
        registered_model_name=UC_MODEL_NAME,
    )

## Pre-deployment agent validation
Before deploying the agent, perform pre-deployment checks.

* **Manual vibe checks** using the [mlflow.models.predict() API](https://mlflow.org/docs/latest/python_api/mlflow.models.html#mlflow.models.predict). See the Databricks documentation ([AWS](https://docs.databricks.com/en/machine-learning/model-serving/model-serving-debug.html#validate-inputs) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/model-serving-debug#before-model-deployment-validation-checks) | [GCP](https://docs.databricks.com/gcp/en/machine-learning/model-serving/model-serving-debug)).
* **Dataset evaluation checks** using the [mlflow.genai.evaluate() API](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.genai.html#mlflow.genai.evaluate).  See the Databricks documentation ([AWS](https://docs.databricks.com/aws/en/mlflow3/genai/eval-monitor/evaluate-app) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/mlflow3/genai/eval-monitor/evaluate-app) | [GCP](https://docs.databricks.com/gcp/en/mlflow3/genai/eval-monitor/evaluate-app)).

In [0]:
print(logged_agent_info.registered_model_version)

In [0]:

mlflow.models.predict(
    model_uri=f"runs:/{logged_agent_info.run_id}/agent",
    input_data={"input": [{"role": "user", "content": "Hello!"}]},
    env_manager="uv",
)

### Batch evaluation

We next demonstrate how to use MLflow to evaluate the agent on a batch of traces.  See Databricks documentation ([AWS](https://docs.databricks.com/aws/en/mlflow3/genai/eval-monitor/) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/mlflow3/genai/eval-monitor/) | [GCP](https://docs.databricks.com/gcp/en/mlflow3/genai/eval-monitor/)).
* Collect traces to score.
* Select scorers (LLM judges) to run.
* Run evaluation to compute metrics.

In [0]:
traces = mlflow.search_traces(max_results=10)

In [0]:
display(traces)

In [0]:
from mlflow.genai.scorers import (
    RelevanceToQuery,
    Safety,
    Guidelines,
)

scorers = [
  RelevanceToQuery(),  # Checks if email addresses the user's request
  Safety(),  # Checks for harmful or inappropriate content
  # Custom guideline below:
  Guidelines(
      name="concise_communication",
      guidelines="The response MUST be concise and to the point.",
  ),
]

# Run evaluation with the scorers selected above.
eval_results = mlflow.genai.evaluate(
    data=traces,
    model_id=logged_agent_info.model_id,
    scorers=scorers,
)

## Deploy the agent

Deploy the agent using Agent Framework ([AWS](https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/agent-framework/author-agent) | [GCP](https://docs.databricks.com/gcp/en/generative-ai/agent-framework/author-agent)).  This will, by default, log the deployed agent's traces to the current experiment, as well as inference tables (if enabled).

In [0]:
from databricks import agents

agents.deploy(UC_MODEL_NAME, model_version=logged_agent_info.registered_model_version, tags={"endpointSource": "docs"})

## View **real time** traces from your endpoint

By default, the deployed agent will log, in **real-time**, its traces to the MLflow Experiment attached to this notebook. If you wish to change the MLflow Experiment that contains your traces, call `mlflow.set_experiment(...)` before calling `agents.deploy(...)`.

You can optionally enable production monitoring to copy traces from the MLflow experiment into a Delta table. ([AWS](https://docs.databricks.com/aws/en/mlflow3/genai/eval-monitor/production-monitoring) | [GCP](https://docs.databricks.com/gcp/en/mlflow3/genai/eval-monitor/production-monitoring)).  If you enable monitoring, then you can visit the MLflow Experiment's **Monitoring** tab to update the quality scorers run on your production traces.

In [0]:
print(f"\nView traces from your endpoint in REAL TIME in the MLflow experiment here: https://{mlflow.utils.databricks_utils.get_browser_hostname()}/ml/experiments/{mlflow.get_experiment_by_name(mlflow.utils.databricks_utils.get_notebook_path()).experiment_id}/traces")

## Next steps

After your agent is deployed, you can chat with it in AI playground to perform additional checks, share it with your organization for feedback, or embed it in a production application.

## Resources

* Agent Framework documentation [AWS](https://docs.databricks.com/aws/en/generative-ai/agent-framework/author-agent) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/agent-framework/author-agent) | [GCP](https://docs.databricks.com/gcp/en/generative-ai/agent-framework/author-agent)
* MLflow 3 documentation [AWS](https://docs.databricks.com/aws/en/mlflow3/genai) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/mlflow3/genai/) | [GCP](https://docs.databricks.com/gcp/en/mlflow3/genai) | [OSS](https://mlflow.org/docs/latest/genai)
* MLflow Evaluation documentation [AWS](https://docs.databricks.com/aws/en/mlflow3/genai/eval-monitor/) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/mlflow3/genai/eval-monitor/) | [GCP](https://docs.databricks.com/gcp/en/mlflow3/genai/eval-monitor/) | [OSS](https://mlflow.org/docs/latest/genai/eval-monitor/)