# Mosaic AI Agent Framework: Author and deploy a multi-agent system with Genie

This notebook demonstrates how to build a multi-agent system using Mosaic AI Agent Framework and [LangGraph](https://blog.langchain.dev/langgraph-multi-agent-workflows/), where [Genie](https://www.databricks.com/product/ai-bi/genie) is one of the agents.
In this notebook, you:
1. Author a multi-agent system using LangGraph.
1. Wrap the LangGraph agent with MLflow `ChatAgent` to ensure compatibility with Databricks features.
1. Manually test the multi-agent system's output.
1. Log and deploy the multi-agent system.

This example is based on [LangGraph documentation - Multi-agent supervisor example](https://github.com/langchain-ai/langgraph/blob/main/docs/docs/tutorials/multi_agent/agent_supervisor.ipynb)

## Why use a Genie agent?

Multi-agent systems consist of multiple AI agents working together, each with specialized capabilities. As one of those agents, Genie allows users to interact with their structured data using natural language.

Unlike SQL functions which can only run pre-defined queries, Genie has the flexibility to create novel queries to answer user questions.

## Prerequisites

- Address all `TODO`s in this notebook.
- Create a Genie Space, see Databricks documentation ([AWS](https://docs.databricks.com/aws/genie/set-up) | [Azure](https://learn.microsoft.com/azure/databricks/genie/set-up)).

In [0]:
%pip install -U -qqq mlflow langgraph==0.3.4 databricks-langchain databricks-agents uv
dbutils.library.restartPython()


## Define the multi-agent system

Create a multi-agent system in LangGraph using a supervisor agent node directing the following agent nodes:
- **GenieAgent**: The Genie agent that queries and reasons over structured data.
- **Tool-calling agent**: An agent that calls Unity Catalog function tools.

In this example, the tool-calling agent uses the built-in Unity Catalog function `system.ai.python_exec` to execute Python code.
For examples of other tools you can add to your agents, see Databricks documentation ([AWS](https://docs.databricks.com/aws/generative-ai/agent-framework/agent-tool) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/agent-framework/agent-tool)).


#### Wrap the LangGraph agent using the `ChatAgent` interface

Databricks recommends using `ChatAgent` to ensure compatibility with Databricks AI features and to simplify authoring multi-turn conversational agents using an open source standard. 

The `LangGraphChatAgent` class implements the `ChatAgent` interface to wrap the LangGraph agent.

See MLflow's [ChatAgent documentation](https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.ChatAgent).

#### Write agent code to file

Define the agent code in a single cell below. This lets you write the agent code to a local Python file, using the `%%writefile` magic command, for subsequent logging and deployment.


## A compenent-based compound AI system

<img src="https://docs.databricks.com/aws/en/assets/images/multi-agent-framework-63c273cb78929a1904fec94e34611e38.png" />

In [0]:
%pip install mlflow  --upgrade --pre
dbutils.library.restartPython()

## Writing an external file
Since we'll likely be updating this notebook over the course of time, running a check to see if a previous version exists is a good idea. We will make sure to remove it if it does exist otherwise we run the risk of appending new code to old without overwriting previous functionality. This ambuguity is likely to cause runtime errors so it's better to remove the old file first in lieu of the new one.

In [0]:
import os

#Check if the multi_agent.py file exists and delete it if it does
#This is what allows us to run the notebook over again. If our logic changes, we can also ensure we're not appending garabage to the file or screwing it up.

if os.path.exists("multi_agent.py"):
    os.remove("multi_agent.py")
    print("multi_agent.py has been deleted.")
else:
    print("multi_agent.py does not exist.")

## Dependencies and third-party libraries
Assuming that this will be coalesced into a single file, it's also important that we manage our imports at the top of the file so all functions have access to what they need. I typically break my imports into blocks of related libraries to make management easier in the long-term. Adding category descriptors also helps keep things organized.

In [0]:
%%writefile -a multi_agent.py
#Python libs
import functools
import os
from typing import Any, Generator, Literal, Optional

#Databricks sdk & Databricks langchain implementation
from databricks.sdk import WorkspaceClient
from databricks_langchain import (
    ChatDatabricks,
    UCFunctionToolkit,
)
from databricks_langchain.genie import GenieAgent

#Langchain tools (langraph is our agent lib)
from langchain_core.runnables import RunnableLambda
from langgraph.graph import END, StateGraph
from langgraph.graph.state import CompiledStateGraph
from langgraph.prebuilt import create_react_agent
from langchain.agents import AgentType, initialize_agent, agent

#MLflow stuff
import mlflow
from mlflow.langchain.chat_agent_langgraph import ChatAgentState
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import (
    ChatAgentChunk,
    ChatAgentMessage,
    ChatAgentResponse,
    ChatContext,
)

#Parsing libs
from pydantic import BaseModel



## Creating Genie Agents
Databricks Genie now includes the Genie Spaces API. This allows us to easily define and interact with different genie rooms in an agentic fashion. With proper labeling and descriptions, the supervisor that will be managing conversions will have context awareness of each agent's capabilities. This is how we scope out and identify which agent is used for each conversational turn. Turns can either be with the user in the form of prompts, or internally with the supervisor agent. One thing that might seem confusing is the block containing the `host` and `token` values. These are placeholder values that will be injected with the real values from the parent function later on. Since we won't have access to the `dbutils` API within the agent (as it's an fully encapsulated object) we'll have to rely on the fact that these agent nodes will be invoked from a parent or 'calling' entity of some type.
<br/>
<br/>
(example)
```python
        host=os.getenv("DB_MODEL_SERVING_HOST_URL"),
        token=os.getenv("DATABRICKS_GENIE_PAT")
```

In [0]:
%%writefile -a multi_agent.py
#You can find the ID in the URL of the genie room /genie/rooms/<GENIE_SPACE_ID>
#In lieu of the locally scoped variable for the PAT, we'll use the one from our secrets store for security.

GENIE_SPACE_ID_1 = "01f026a703761605b18fa1d904cf1a64"
genie_agent_description_1 = "This genie agent can answer any questions around billing and Databricks or AWS related expenses associated with the account. It is assumed that all relevant billing data is included in this agent."

genie_agent_1 = GenieAgent(
    genie_space_id=GENIE_SPACE_ID_1,
    genie_agent_name="Genie_DBX_Cost",
    description=genie_agent_description_1,
    client=WorkspaceClient(
        host=os.getenv("DB_MODEL_SERVING_HOST_URL"),
        token=os.getenv("DATABRICKS_GENIE_PAT")
        #token=secret,
    ),
)

GENIE_SPACE_ID_2 = "01f02ad494421be2953d3e5ba3818319"
genie_agent_description_2 = "This genie agent can answer any questions concerning hotels, hotel rates and preferences of employees for the hotels."

genie_agent_2 = GenieAgent(
    genie_space_id=GENIE_SPACE_ID_2,
    genie_agent_name="Genie_DBX_Hotel",
    description=genie_agent_description_2,
    client=WorkspaceClient(
        host=os.getenv("DB_MODEL_SERVING_HOST_URL"),
        token=os.getenv("DATABRICKS_GENIE_PAT")
        #token=secret,
    ),
)

GENIE_SPACE_ID_3 = "01f02ad431cb12a9a93030fac014b105"
genie_agent_description_3 = "This genie agent can answer any questions concerning employees and employee data. This includes things like name, salaray, job, date of birth (dob) and location."

genie_agent_3 = GenieAgent(
    genie_space_id=GENIE_SPACE_ID_3,
    genie_agent_name="Genie_DBX_Employee",
    description=genie_agent_description_3,
    client=WorkspaceClient(
        host=os.getenv("DB_MODEL_SERVING_HOST_URL"),
        token=os.getenv("DATABRICKS_GENIE_PAT")
        #token=secret,
    ),
)

## Defining our LLM Foundation Model
The LLM Foundation Model prprovides the interpretation layer for communications both internally between agents and the supervisor, as well as with the user prompt and response. This is usually the point where we'd want to pick a good foundational model for our task. Remember that every AI application, no matter how general, still needs to have focus and intent for maintenance and stability. Databricks provides several registered models in the `databricks-uc` registry for use and is updating them globally on a regular basis. 

In [0]:
%%writefile -a multi_agent.py
#Multi-agent Genie works best with claude 3.7 or gpt 4o models. Both of these are served using the system.ai.* databricks-uc namespace.
LLM_ENDPOINT_NAME = "databricks-claude-3-7-sonnet"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME)



## Creating a code authoring agent
Agents can be anything that have the ability to make a logical decision. We've seen up until now that we can use pre-built wrapper functions (`ChatAgent()` and it's Databricks-specific sister `GenieAgent()`) but there are also several pre-built agents available for use natively in Databricks. Let's add a chain of tools (called a toolbox) for another agent we'll call 'Code' to provide the ability to return code suggestions to the invoking entity (whether that's the supervisor or the user). This is a very helpful agent that can actually author, generate and execute code in the background. For security reasons, it's always best practice to put strict guardrails around coding agents.

In [0]:
%%writefile -a multi_agent.py
############################################################
# You can also create agents with access to additional tools
############################################################
toolbox = []

#If you want to add more tools to to the toolbox, add additional tools and update the description of this agent
uc_tool_names = ["system.ai.*"]
uc_toolkit = UCFunctionToolkit(function_names=uc_tool_names)

toolbox.extend(uc_toolkit.tools)

code_agent_description = (
    "The Coder agent specializes in solving programming challenges, generating code snippets, debugging issues, and explaining complex coding concepts.",
)
code_agent = create_react_agent(llm, tools=toolbox)



## Creating the supervisor agent

**THIS PART IS REALLY IMPORTANT**

Now that we have all of our node agents defined for our application graph, let's tie them together with a supervisor agent. This agent is special because it has exclusive access to each of the agents. Although agents can technically talk directly to one another, having a brokerage is a pattern-safe design. We'll always try to decouple and encapsulate agents by responsibility for reasons of security, portability and generalizability. We want our agents to be aware enough of what they can do, but not so aware that they start creating their own graph edges. If we're considering object accountability from a design perspective this keeps the agents 'in line' with their least-privileged responsibility.

In a nutshell, what we're doing is we're creating the supervisor agent as concrete implemenation of an iterator that builds it's runnable chain programmatically based on the presence of agents (`workers`) and the generated input description (hence `system_prompt` as the main directive of the application). The supervisor chain in the `supervisor_agent()` function essentially takes all of the upstream definitions and flattens them all into a single, deployable object.

In [0]:
%%writefile -a multi_agent.py
#NOTE: This cell is just the DEFINITION of the agent graph. All we're doing here is describing how to compose the composite application.

#Update the max number of iterations between supervisor and worker nodes before returning to the user. This is how many internal 'conversations' the supervisor has with the other agents. This is a maximum value - if an answer is sufficient with fewer iterations, then great.
MAX_ITERATIONS = 5

#Add the description for each agent we're going to use as a dictionary
worker_descriptions = {
    "Genie_DBX_Billing": genie_agent_description_1,
    "Genie_DBX_Hotel": genie_agent_description_2,
    "Genie_DBX_Employee": genie_agent_description_3,
    "Coder": code_agent_description,
}

#Flatten the descriptions into a single string variable
formatted_descriptions = "\n".join(
    f"- {name}: {desc}" for name, desc in worker_descriptions.items()
)

#Tell the LM in plain language about the agents it has access to
system_prompt = f"Decide between routing between the following workers or ending the conversation if an answer is provided. \n{formatted_descriptions}"
options = ["FINISH"] + list(worker_descriptions.keys())
FINISH = {"next_node": "FINISH"}

#Make use of all the above definitions and create the supervisor. This is what we'll be interfacing with and logging in MLFlow.
def supervisor_agent(state):
    count = state.get("iteration_count", 0) + 1
    if count > MAX_ITERATIONS:
        return FINISH
    
    #Define our chaining logic
    class nextNode(BaseModel):
        next_node: Literal[tuple(options)]

    #Assemble the entire chain, defining the supervisor and callable agents with some simple recursion logic.
    preprocessor = RunnableLambda(
        lambda state: [{"role": "system", "content": system_prompt}] + state["messages"]
    )
    supervisor_chain = preprocessor | llm.with_structured_output(nextNode)
    next_node = supervisor_chain.invoke(state).next_node
    
    #If the response routed back to the same node, exit the loop. This identifies when the conversation has reached its peak epoch.
    if state.get("next_node") == next_node:
        return FINISH
    return {
        "iteration_count": count,
        "next_node": next_node
    }



## Defining the node graph
This part seems more complicated than it really is. The first thing we need to do is define the conversational structure of the agent nodes and how they interact with the supervisor. Really, all we're doing here is invoking the agent node (any of them) and getting a response based on the input and classifying the response as an assistant response. Once we've done this for each of our agent nodes, we also do it for the supervisor agent and an assembled final answer. We wrap this all together under a `pyfunc()` banner so we can have one fully built, and bundled application. Then we just specify the entry and exit point and then run the whole object through a `compile()` function (part of the `LangGraph.graph` library) to package the whole thing up. Under the `pyfunc()` banner, we can also use MLFlow logging in the `databricks-uc` model registry.

In [0]:
%%writefile -a multi_agent.py
#This is the function that composes the message that interfaces with the LLM.
def agent_node(state, agent, name):
    result = agent.invoke(state)
    return {
        "messages": [
            {
                "role": "assistant",
                "content": result["messages"][-1].content,
                "name": name,
            }
        ]
    }


#This is the callable object that contains the response payload.
def final_answer(state):
    prompt = "Using only the content in the messages, respond to the previous user question using the answer given by the other assistant messages."
    preprocessor = RunnableLambda(
        lambda state: state["messages"] + [{"role": "user", "content": prompt}]
    )
    final_answer_chain = preprocessor | llm
    return {"messages": [final_answer_chain.invoke(state)]}


#This object definition is technically just a struct to keep tabs on the agent.
class AgentState(ChatAgentState):
    next_node: str
    iteration_count: int

#Use a functools wrapper to build out the actual agent objects based on their descriptors
code_node = functools.partial(agent_node, agent=code_agent, name="Coder")
genie_node_1 = functools.partial(agent_node, agent=genie_agent_1, name="Genie_DBX_Billing")
genie_node_2 = functools.partial(agent_node, agent=genie_agent_2, name="Genie_DBX_Hotel")
genie_node_3 = functools.partial(agent_node, agent=genie_agent_3, name="Genie_DBX_Employee")

#Build the graph from the nodes, including something to send a result back to whatever's invoking the application (aka final answer).
workflow = StateGraph(AgentState)
workflow.add_node("Genie_DBX_Billing", genie_node_1)
workflow.add_node("Genie_DBX_Hotel", genie_node_2)
workflow.add_node("Genie_DBX_Employee", genie_node_3)
workflow.add_node("Coder", code_node)
workflow.add_node("supervisor", supervisor_agent)
workflow.add_node("final_answer", final_answer)

workflow.set_entry_point("supervisor")
# We want our workers to ALWAYS "report back" to the supervisor when done
for worker in worker_descriptions.keys():
    workflow.add_edge(worker, "supervisor")

# Let the supervisor decide which next node to go
workflow.add_conditional_edges(
    "supervisor",
    lambda x: x["next_node"],
    {**{k: k for k in worker_descriptions.keys()}, "FINISH": "final_answer"},
)
workflow.add_edge("final_answer", END)
multi_agent = workflow.compile()



## Adding a Chat interface
Now that we have our multi_agent() object defined, all we need to do is create a parent function that colours the behaviour of our agent graph. This is defined as a class object that accepts the compiled agent graph as input once instanciated and decorates it with two behaviours. `predict()` is the public behaviour that interfaces with the user or external system. `predict_steam()` is used for the supervisor to keep track of, and maintain conversations with each of the agent nodes.

In [0]:
%%writefile -a multi_agent.py
class LangGraphChatAgent(ChatAgent):
    #Class constructor. This defines how the LangGraphChatAgent is initialized.
    def __init__(self, agent: CompiledStateGraph):
        self.agent = agent

    #This function is a behaviour that returns a response. It defines the chat structure between the agents. I.E., how they talk back and forth with the supervisor agent. We should probably create an installable library for this since it's pretty typical and can benefit from override and extension functionality.
    def predict(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> ChatAgentResponse:
        request = {
            "messages": [m.model_dump_compat(exclude_none=True) for m in messages]
        }

        messages = []
        for event in self.agent.stream(request, stream_mode="updates"):
            for node_data in event.values():
                messages.extend(
                    ChatAgentMessage(**msg) for msg in node_data.get("messages", [])
                )
        return ChatAgentResponse(messages=messages)

    #This behaviour is how the supervisor keeps track of internal conversations. This is important as it allows agents to pass context to one another.
    def predict_stream(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> Generator[ChatAgentChunk, None, None]:
        request = {
            "messages": [m.model_dump_compat(exclude_none=True) for m in messages]
        }
        for event in self.agent.stream(request, stream_mode="updates"):
            for node_data in event.values():
                yield from (
                    ChatAgentChunk(**{"delta": msg})
                    for msg in node_data.get("messages", [])
                )



## Logging to MLFlow
Now that everything is assembled and put together, we can pass the whole thing into MLFlow. Since we've been appending everything to `multi_agent.py` up to this point, the entire definition for the applicaiton can actually be logged just like we would any other ML model. By adding autolog functionality to our application, we can use MLFlow for tracing our application and decision tracking of the conversation with inference tables.

In [0]:
%%writefile -a multi_agent.py
#Create the agent object, and specify it as the agent object to use when loading the agent back for inference via mlflow.models.set_model()
mlflow.langchain.autolog()
AGENT = LangGraphChatAgent(multi_agent)
mlflow.models.set_model(AGENT)

## Test the agent

Interact with the agent to test its output. Since this notebook called `mlflow.langchain.autolog()` you can view the trace for each step the agent takes.

In [0]:
#Kill the python context to validate the file.
dbutils.library.restartPython()

## Create a Personal Access Token (PAT) as a Databricks secret
In order to access the Genie Space and its underlying resources, we need to create a PAT
- This can either be your own PAT or that of a System Principal ([AWS](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/oauth-m2m)). You will have to rotate this token yourself upon expiry.
- Add secrets-based environment variables to a model serving endpoint ([AWS](https://docs.databricks.com/aws/en/machine-learning/model-serving/store-env-variable-model-serving#add-secrets-based-environment-variables) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/store-env-variable-model-serving#add-secrets-based-environment-variables)).
- You can reference the table in the deploy docs for the right permissions level for each resource: ([AWS](https://docs.databricks.com/aws/en/generative-ai/agent-framework/deploy-agent#automatic-authentication-passthrough) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/agent-framework/deploy-agent#automatic-authentication-passthrough)).
  - Provision with `CAN RUN` on the Genie Space
  - Provision with `CAN USE` on the SQL Warehouse powering the Genie Space
  - Provision with `SELECT` on underlying Unity Catalog Tables 
  - Provision with `EXECUTE` on underyling Unity Catalog Functions 

In [0]:
import os
from dbruntime.databricks_repl_context import get_context

#Set the variables for the PAT in the Databricks Secrets store
secret_scope_name = "general"
secret_key_name = "genie_access"

#Inject the variables into the agent for use.
os.environ["DB_MODEL_SERVING_HOST_URL"] = "https://" + get_context().workspaceUrl
assert os.environ["DB_MODEL_SERVING_HOST_URL"] is not None
os.environ["DATABRICKS_GENIE_PAT"] = dbutils.secrets.get(
    scope=secret_scope_name, key=secret_key_name
)
assert os.environ["DATABRICKS_GENIE_PAT"] is not None, (
    "The DATABRICKS_GENIE_PAT was not properly set to the PAT secret"
)

## Sample testing our application
Now that we've written our entire application definition out to `multi_agent.py` and restarted our python interpreter, we can validate the source code that we prepped for deployment. All we need is a simple code stub to simulate how the serving endpoint interfaces with the application. This will show us a breakdown of how each agent is invoked based on different prompts. Experiment with the message prompts for different results. Once we're satisfied, we can consider deploying this to a production-grade serving endpoint.

In [0]:
from multi_agent import AGENT, genie_agent_description_1

assert genie_agent_description_1 != "This genie agent can answer ...", (
    "Remember to update the genie agent description for higher quality answers."
)
input_example = {
    "messages": [
        {
            "role": "user",
            "content": "Can you help me get the total of my aws costs?",
        }
    ]
}
AGENT.predict(input_example)

## Viewing the logic chain
If we want to translate the logic chain into something a bit more human-readable, we can examine the supervisor's conversation with the agents. By iterating through the `predict_stream()` events, we can see what this looks like.

In [0]:
for event in AGENT.predict_stream(input_example):
  print(event, "-----------\n")

# Validating and flattening the agent logic
Notice at the top of each cell for the actual agent composition logic we've used a `%%writefile -a multi_agent.py` command. This takes the contents of each cell and cats them out (via `>>`) to a single file called `multi_agent.py`. Since the agent application is wrapped using `pyfunc()` tools, this is the easiest and most consistent way to log the model artifact in MLFlow for easy registration in Unity Catalog.

## Log the agent as an MLflow model

Log the agent as code from the `agent.py` file. See [MLflow - Models from Code](https://mlflow.org/docs/latest/models.html#models-from-code).

### Enable automatic authentication for Databricks resources
For the most common Databricks resource types, Databricks supports and recommends declaring resource dependencies for the agent upfront during logging. This enables automatic authentication passthrough when you deploy the agent. With automatic authentication passthrough, Databricks automatically provisions, rotates, and manages short-lived credentials to securely access these resource dependencies from within the agent endpoint.

To enable automatic authentication, specify the dependent Databricks resources when calling `mlflow.pyfunc.log_model().`
  - **TODO**: If your Unity Catalog tool queries a [vector search index](docs link) or leverages [external functions](docs link), you need to include the dependent vector search index and UC connection objects, respectively, as resources. See docs ([AWS](https://docs.databricks.com/generative-ai/agent-framework/log-agent.html#specify-resources-for-automatic-authentication-passthrough) | [Azure](https://learn.microsoft.com/azure/databricks/generative-ai/agent-framework/log-agent#resources)).

## Creating the MLFlow defintion
When logging a model, project or application to MLFlow, we need to define a few components to tell MLFlow what the project 'looks' like. In other words, we need to provide the context around how it operates. This is what allows MLFlow to serve the project in a consistent and repeatable fashion. Basically, we define the tools and resources that are required for the project to run and be served. Since we're also relying on other Databricks resources (a serving endpoint and a few genie spaces), those need to be both defined and present for the application to run. The same goes for the system and any AI tools that we defined in the code agent. Once all is in order, we can create our MLFlow run that logs our agent application.

In [0]:
# Determine Databricks resources to specify for automatic auth passthrough at deployment time
import mlflow
from multi_agent import GENIE_SPACE_ID_1, GENIE_SPACE_ID_2, GENIE_SPACE_ID_3, LLM_ENDPOINT_NAME, toolbox
from databricks_langchain import UnityCatalogTool, VectorSearchRetrieverTool
from mlflow.models.resources import (
    DatabricksFunction,
    DatabricksGenieSpace,
    DatabricksServingEndpoint,
)
from pkg_resources import get_distribution

resources = [
    DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT_NAME),
    DatabricksGenieSpace(genie_space_id=GENIE_SPACE_ID_1),
    DatabricksGenieSpace(genie_space_id=GENIE_SPACE_ID_2),
    DatabricksGenieSpace(genie_space_id=GENIE_SPACE_ID_3),
]
for tool in toolbox:
    if isinstance(tool, VectorSearchRetrieverTool):
        resources.extend(tool.resources)
    elif isinstance(tool, UnityCatalogTool):
        resources.append(DatabricksFunction(function_name=tool.uc_function_name))

with mlflow.start_run():
    logged_agent_info = mlflow.pyfunc.log_model(
        artifact_path="agent",
        python_model="multi_agent.py",
        input_example=input_example,
        extra_pip_requirements=[f"databricks-connect=={get_distribution('databricks-connect').version}"],
        resources=resources,
    )

## Pre-deployment agent validation
Before registering and deploying the agent, perform pre-deployment checks using the [mlflow.models.predict()](https://mlflow.org/docs/latest/python_api/mlflow.models.html#mlflow.models.predict) API. See Databricks documentation ([AWS](https://docs.databricks.com/en/machine-learning/model-serving/model-serving-debug.html#validate-inputs) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/model-serving-debug#before-model-deployment-validation-checks))."

In [0]:
# mlflow.models.predict(
#     model_uri=f"runs:/{logged_agent_info.run_id}/agent",
#     input_data=input_example,
#     env_manager="uv",
# )

## Register the model to Unity Catalog
Once our model has been uploaded, we can then register it as the latest version and prepare it for serving. Here we define the location for the registered model in Unity Catalog, where it will be stored and managed as a knowledge object.

In [0]:
mlflow.set_registry_uri("databricks-uc")

# TODO: define the catalog, schema, and model name for your UC model
catalog = "ademianczuk"
schema = "general"
model_name = "multi_agent_demo"
UC_MODEL_NAME = f"{catalog}.{schema}.{model_name}"

# register the model to UC
uc_registered_model_info = mlflow.register_model(
    model_uri=logged_agent_info.model_uri, name=UC_MODEL_NAME
)

## Deploy our application
The last step is to deploy our application to the Databricks serving endpoint. All we need to do is invoke the `deploy()` function from the databricks core library and pass in our actual access token that's stored in the secrets store. This delegates access to a privileged credential for the agent to act on the user's behalf. Creating the serving infrastructure can take a while (usually about 20 minute for first time deployment) because the entire isolated ephemeral network and container runtime are defined, built, deployed, configured and networked for the duration of the application serving lifetime. If an endpoint is terminated, the deployment descriptors remain from the previous configuration for reuse.

In [0]:
from databricks import agents

agents.deploy(
    UC_MODEL_NAME,
    uc_registered_model_info.version,
    tags={"endpointSource": "docs"},
    environment_vars={
        "DATABRICKS_GENIE_PAT": f"{{{{secrets/{secret_scope_name}/{secret_key_name}}}}}"
    },
)

## Next steps

After your agent is deployed, you can chat with it in AI playground to perform additional checks, share it with SMEs in your organization for feedback, or embed it in a production application. See Databricks documentation ([AWS](https://docs.databricks.com/en/generative-ai/deploy-agent.html) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/deploy-agent)).