# Custom Tools Demo <br/>
<img src="https://eu-images.contentstack.com/v3/assets/blt6b0f74e5591baa03/blt5d5323f706ef288f/637c0195c162df49beaae102/Untitled_design_(77).png?width=1280&auto=webp&quality=95&format=jpg&disable=upscale" width=500 /><br/>
## Part 2: Assembling The Agentic Application
Now that we've prototyped our agents in the previous notebook and worked through debugging them, we can start thinking about composing our application. This version is a little different than a GenieSpaces agentic workflow because we're relying on open libraries rather than the Databricks Genie Agents API. This time we'll need to take into consideration how we're invoking our agents and toolchains.

## Designing Our Custom Application

In this section, we'll be taking the tools we built previously and composing them into a fully agentic, compound AI system. Using the two prototype agents from the previous notebook, we'll be using the Databricks LangChain agent framework to create a fully-integrated system. Once it's working, we'll be registering our application in our MLFlow registry and serving a version of the model to a front-end application. To get an idea of the relationships between tools, toolboxes, node agents and supervisors, compare this project to a fully integrated GenieAgent version of a compound AI system: <br/>
<br/>
<img src="https://github.com/andrijdemianczuk/Heroic-Hare/blob/main/Notebooks/Custom%20Tools%20for%20Agents%20Demo/Compound%20AI%20Schematic.jpg?raw=true" width=750/>

## Assembling our new agents into an compound AI system
Since we've already prototyped our two agents (and by extension with two toolboxes) in the last notebook, we can start putting together our agentic application. We'll use the Databricks Agent framework to help us out with this. We'll need to coalesce and decouple our agents (including a supervisor) into new files and publish them to MLFlow for serving.

### Building the artifact file
Since we've already established the workflow of our tooling, we will need to create a coalesced application file. Technically we could break this up further into separate classes but for the sake of demonstration we will be exporting just a single file for agent registration as an artifact in Unity Catalog with MLFlow

### A quick note on the logic of this project
Much of the same logic (and documentation) is directly repeated from the multi-agent composite demo. Where this differs is in its definition and use of custom tools and how they're being managed. Each tool is formatted for different queries that can be translated to other API calls, which was one of the most challenging aspects of this project.

In [0]:
%pip install --upgrade --quiet langchain-core langchain databricks-vectorsearch langchain-community feedparser
%pip install -U -qqq langgraph==0.3.4 databricks-langchain databricks-agents uv
%pip install mlflow

dbutils.library.restartPython()

## The asset file
Most of the logic and processing is split up as naturally as possible in this notebook. In order to register our application as a 'model' in MLFlow it needs a definition of what that artifact looks like. Since we're not actually compiling a model (but rather an application), we need to build a pseudo file in place of something like a `pkl` file.

In [0]:
import os

#Check if the asset_agent.py file exists and delete it if it does
#This is what allows us to run the notebook over again. If our logic changes, we can also ensure that we're not appending garbage to the file or causing it to fail.

if os.path.exists("asset_agent.py"):
    os.remove("asset_agent.py")
    print("asset_agent.py has been deleted.")
else:
    print("asset_agent.py does not exist.")

## Dependencies and third-party libraries
Assuming that this will be coalesced into a single file, it's also important that we manage our imports at the top of the file so all functions have access to what they need. I typically break my imports into blocks of related libraries to make management easier in the long-term. Adding category descriptors also helps keep things organized.

In [0]:
%%writefile -a asset_agent.py
#Python libs
import functools
import os
from typing import Any, Generator, Literal, Optional, Type
import requests
import feedparser

#Databricks sdk & Databricks langchain implementation
from databricks.sdk import WorkspaceClient
from databricks_langchain import (
    ChatDatabricks,
    UCFunctionToolkit,
)
# from databricks_langchain.genie import GenieAgent

#Langchain tools (langraph is our agent lib)
from langchain_core.runnables import RunnableLambda
from langgraph.graph import END, StateGraph
from langgraph.graph.state import CompiledStateGraph
from langgraph.prebuilt import create_react_agent
from langchain.agents import AgentType, initialize_agent, agent
from langchain.tools import Tool
from langchain.tools import BaseTool
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

#MLflow stuff
import mlflow
from mlflow.langchain.chat_agent_langgraph import ChatAgentState
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import (
    ChatAgentChunk,
    ChatAgentMessage,
    ChatAgentResponse,
    ChatContext,
)

#Parsing libs
from pydantic import BaseModel, Field



## Defining our LLM Foundation Model
The LLM Foundation Model provides the interpretation layer for internal communications between agents and the supervisor, as well as the user prompt and response. This is usually the point where we'd want to pick a good foundational model for our task. Remember that every AI application, no matter how general, still needs to have focus and intent for maintenance and stability. Databricks provides several registered models in the `databricks-uc` registry for use and is updating them globally on a regular basis. 

In [0]:
%%writefile -a asset_agent.py
#This is the foundation LLM that we'll be using for the basis of our agents
LLM_ENDPOINT_NAME = "databricks-claude-3-7-sonnet"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME, extra_params={"temperature": 0.3})



## Defining our tools
The following logic is the exact same logic we used in the prototypes of the previous notebook. The most important changes, however, are in the tool descriptions. They are slightly more verbose and descriptive here since we'll be relying on them for tool routing by the agent.

### Agent expectations (taken from our prototype explanation of expectations)
Since we're going to be leveraging LangChain's BaseTool and BaseModel classes, we need to implement them properly. `BaseModel()` will provide us with an input interface and `BaseTool()` will provide us with the structured framework we will rely on for our application. It's important to choose an agent framework that's consistent with how we're going to make use of our agents later on. When we create an implementation of the `BaseTool()` class, we're obligated to include 2 attributes (`name` and `description`) and two behaviors (`_run` and `_arun`).

In [0]:
%%writefile -a asset_agent.py
#Input Schema for the weather class. We'll be using this to compose the tool
class WeatherInput(BaseModel):
    latitude: float = Field(..., description="Latitude of the location")
    longitude: float = Field(..., description="Longitude of the location")

#Create an implementation inheriting the BaseTool abstract class from LangChain. _run() and _arun() are both required implementations. The name and description attributes are also required.
class FetchWeatherTool(BaseTool):
    name: str = "fetch_weather"
    description: str = "Fetch hourly weather temperature data for a given latitude and longitude. When asked about weather, assume that the user is always referring to temperature only. temperature_2m refers to the temperature in degrees celsius."
    args_schema: Type[BaseModel] = WeatherInput #This is the input schema we defined above.

    #The _run() function is always called when invoked. It's essentially doing the same thing as a class constructor, but since we're invoking the class from a toolchain in lieu of instancing the class as an object, this is run instead. This behaviour is what we're inheriting from the BaseTool() class.
    def _run(self, latitude: float, longitude: float):
        url = "https://api.open-meteo.com/v1/forecast"
        params = {
            "latitude": latitude,
            "longitude": longitude,
            "hourly": "temperature_2m",
        }
        response = requests.get(url, params=params)
        if response.status_code == 200:
            return response.json()["hourly"]["temperature_2m"][:5]  # Preview
        return f"Failed to fetch data: {response.status_code}"

    def _arun(self, *args, **kwargs):
        raise NotImplementedError("Async not supported.")



In [0]:
%%writefile -a asset_agent.py
class ArxivSearchInput(BaseModel):
    query: str = Field(..., description="Search keywords for the arXiv research papers")
    max_results: Optional[int] = Field(5, description="Max number of papers to return")

class SearchArxivTool(BaseTool):
    name: str = "search_arxiv"
    description: str = "Search for academic papers on arXiv related to a given keyword."
    args_schema: Type[BaseModel] = ArxivSearchInput

    def _run(self, query: str, max_results: int = 3):
        url = "http://export.arxiv.org/api/query"
        params = {
            "search_query": f"all:{query}",
            "start": 0,
            "max_results": max_results,
        }
        response = requests.get(url, params=params)
        feed = feedparser.parse(response.text)
        results = []
        for entry in feed.entries[:max_results]:
            results.append(f"{entry.title} — {entry.link}")
        return "\n".join(results) if results else "No papers found."

    def _arun(self, *args, **kwargs):
        raise NotImplementedError("Async not supported.")



In [0]:
%%writefile -a asset_agent.py
class AssetSearchInput(BaseModel):
    asset_query: str = Field(..., description="The name of the location or asset to search for")

class SearchAssetsTool(BaseTool):
    name: str = "search_assets"
    description: str = "Search for geographic or structural asset metadata using OpenStreetMap. "
    args_schema: Type[BaseModel] = AssetSearchInput

    def _run(self, asset_query: str):
        url = "https://nominatim.openstreetmap.org/search"
        params = {"q": asset_query, "format": "json"}
        headers = {"User-Agent": "LangChainAgent/1.0 (andrij.demianczuk@databricks.com)"}

        response = requests.get(url, params=params, headers=headers)
        if response.status_code != 200:
            return f"Search API returned {response.status_code}. Cannot continue using search_assets."
        
        data = response.json()
        if data:
            result = data[0]
            return f"{result['display_name']} (lat: {result['lat']}, lon: {result['lon']})"
        else:
            return f"No results found for '{asset_query}'"

    def _arun(self, *args, **kwargs):
        raise NotImplementedError("Async not supported.")



## Agent 1: Weather and Articles
Our first agent will cover the duties of answering questions about weather with known coordinates and retrieving research articles. The purpose of bundling these two tools together is to show how we can have tools with different signatures being called upon in the same manner with predictable output (that is, they work). This capability is referred to as *idempotency* and is one of the critical concepts of object-oriented programming (polymorphism specifically). What we **want** is predictable behavior (even if the results are probabilistic) regardless of how the agent operates. To put it in another way, we need to make sure that our agents are engaged in a *declarative* manner.

In [0]:
%%writefile -a asset_agent.py
#Specify what tools are available in the toolbox
weather_tools = [
    FetchWeatherTool(),
    SearchArxivTool(),
]

#A good description so the supervisor knows what the node agent does and is responsible for.
weatherdoc_agent_description = (
    "The Weather and Document agent specializes in retrieving weather (temperature) information from known coordinates. This agent can also retrieve documents from arxiv with a keyword search. This agent focuses on returning useful information to the prompter for weather and research articles. It can have internal conversations to get the most complete infomration from it's tools.",
)

#Not directly used at this time, but handy to have for conversation tracking in the future.
weather_doc_conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=5,
    return_messages=True
)

#Create an instance of the agent.
weatherdoc_bot = initialize_agent(
    tools=weather_tools,
    llm=llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    max_iterations=10
)



## Agent 2: Asset Lookup
Notice here that the agent descriptions are much more verbose. Since we know they won't be interfaced with directly by the user, we need good descriptions to tell the parent (supervisor) agent what each one does. This is very similar in concept to a derived class.


In [0]:
%%writefile -a asset_agent.py
asset_tools = [
    SearchAssetsTool()
]

asset_agent_description = (
    "The asset agent looks up details for asset searches. Typically this will return coordinates but also contains information about the timezone for the assets as well. This input epxects a string for the query and is generally adept at major landmarks in North America.",
)

asset_conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=10,
    return_messages=True
)

asset_bot = initialize_agent(
    tools=asset_tools,
    llm=llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    max_iterations=10
)



In [0]:
#Only used for debugging
# asset_bot("Where is the Saddledome?")

In [0]:
#Only used for debugging
# weatherdoc_bot("Can you find me some articles on NVidia's AI Technology?")

## Defining the Supervisor Agent

**THIS PART IS REALLY IMPORTANT**

Now that we have all of our node agents defined for our application graph, let's tie them together with a supervisor agent. This agent is special because it has exclusive access to each of the agents. Although agents can technically talk directly to one another, having a brokerage is a pattern-safe design. We'll always try to decouple and encapsulate agents by responsibility for reasons of security, portability and generalizability. We want our agents to be aware enough of what they can do, but not so aware that they start creating their own graph edges. If we're considering object accountability from a design perspective this keeps the agents 'in line' with their least-privileged responsibility.

In a nutshell, what we're doing is we're creating the supervisor agent as concrete implemenation of an iterator that builds it's runnable chain programmatically based on the presence of agents (`workers`) and the generated input description (hence `system_prompt` as the main directive of the application). The supervisor chain in the `supervisor_agent()` function essentially takes all of the upstream definitions and flattens them all into a single, deployable object.

In [0]:
%%writefile -a asset_agent.py
#Update the max number of iterations between supervisor and worker nodes before returning to the user. This is how many internal 'conversations' the supervisor has with the other agents. This is a maximum value - if an answer is sufficient with fewer iterations, then great.
MAX_ITERATIONS = 10

#Add the description for each agent we're going to use as a dictionary
worker_descriptions = {
    "Weatherdoc_Agent":weatherdoc_agent_description,
    "Asset_Agent":asset_agent_description,
}

#Flatten the descriptions into a single string variable
formatted_descriptions = "\n".join(
    f"- {name}: {desc}" for name, desc in worker_descriptions.items()
)

#Tell the LM in plain language about the agents it has access to
system_prompt = f"Decide between routing between the following workers or ending the conversation if an answer is provided. \n{formatted_descriptions}"
options = ["FINISH"] + list(worker_descriptions.keys())
FINISH = {"next_node": "FINISH"}

#Make use of all the above definitions and create the supervisor. This is what we'll be interfacing with and logging in MLFlow.
def supervisor_agent(state):
    count = state.get("iteration_count", 0) + 1
    if count > MAX_ITERATIONS:
        return FINISH
    
    #Define our chaining logic
    class nextNode(BaseModel):
        next_node: Literal[tuple(options)]

    #Assemble the entire chain, defining the supervisor and callable agents with some simple recursion logic.
    preprocessor = RunnableLambda(
        lambda state: [{"role": "system", "content": system_prompt}] + state["messages"]
    )
    supervisor_chain = preprocessor | llm.with_structured_output(nextNode)
    next_node = supervisor_chain.invoke(state).next_node
    
    #If the response routed back to the same node, exit the loop. This identifies when the conversation has reached its peak epoch.
    if state.get("next_node") == next_node:
        return FINISH
    return {
        "iteration_count": count,
        "next_node": next_node
    }



## Defining the node graph
This part seems more complicated than it really is. The first thing we need to do is define the conversational structure of the agent nodes and how they interact with the supervisor. Really, all we're doing here is invoking the agent node (any of them) and getting a response based on the input and classifying the response as an assistant response. Once we've done this for each of our agent nodes, we also do it for the supervisor agent and an assembled final answer. We wrap this all together under a `pyfunc()` banner so we can have one fully built, and bundled application. Then we just specify the entry and exit point and then run the whole object through a `compile()` function (part of the `LangGraph.graph` library) to package the whole thing up. Under the `pyfunc()` banner, we can also use MLFlow logging in the `databricks-uc` model registry.

### Developer's note
Most of the problems I encountered came up here. This was a tough one to debug because what was happening, was that the node agents weren't receiving any input from the supervisor. This was a pain to debug and it took a lot of time to cat out exactly what was going on in the stack traces at various points. My error stemmed from the fact that I was originally doing an implicit reference to the `llm_chain.prompt()` function, rather than building it up for each node agent programmatically.

In [0]:
%%writefile -a asset_agent.py
#This is the function that composes the message that interfaces with the LLM.
def agent_node(state, agent, name, tools):
    prompt = agent.agent.create_prompt(tools=tools)
    agent.agent.llm_chain.prompt = prompt
    user_messages = [msg for msg in state["messages"] if msg["role"] == "user"]
    if not user_messages:
        raise ValueError("No user message found to use as input")

    last_user_message = user_messages[-1]["content"]
    result = agent.invoke({"input": last_user_message})

    return {
        "messages": [
            {
                "role": "assistant",
                "content": f"The {name} has determined: {result['output']}",
                "name": name,
            }
        ]
    }


#This is the callable object that contains the response payload.
def final_answer(state):
    prompt = f"""You are the final summarizer.

                Here is a conversation between a user and multiple assistant agents.

                Each assistant agent was responsible for solving part of the user's question.

                Your job is to answer the original question as clearly as possible based only on the assistant messages below.

                Do not say "I don't know" unless no assistant provided a relevant answer.
                If multiple assistants provided conflicting or redundant answers, pick the most confident and relevant one.

                Respond to the user's original question with the best possible answer.
                """

    preprocessor = RunnableLambda(
        lambda state: state["messages"] + [{"role": "user", "content": prompt}]
    )
    final_answer_chain = preprocessor | llm
    return {"messages": [final_answer_chain.invoke(state)]}


#This object definition is technically just a struct to keep tabs on the agent.
class AgentState(ChatAgentState):
    next_node: str
    iteration_count: int

#Use a functools wrapper to build out the actual agent objects based on their descriptors
weatherdoc_node = functools.partial(agent_node, agent=weatherdoc_bot, name="Weatherdoc_Agent", tools=weather_tools)
assetdoc_node = functools.partial(agent_node, agent=asset_bot, name="Asset_Agent", tools=asset_tools)

#Build the graph from the nodes, including something to send a result back to whatever's invoking the application (aka final answer).
workflow = StateGraph(AgentState)
workflow.add_node("Weatherdoc_Agent", weatherdoc_node)
workflow.add_node("Asset_Agent", assetdoc_node)
workflow.add_node("supervisor", supervisor_agent)
workflow.add_node("final_answer", final_answer)

workflow.set_entry_point("supervisor")
# We want our workers to ALWAYS "report back" to the supervisor when done
for worker in worker_descriptions.keys():
    workflow.add_edge(worker, "supervisor")

# Let the supervisor decide which next node to go
workflow.add_conditional_edges(
    "supervisor",
    lambda x: x["next_node"],
    {**{k: k for k in worker_descriptions.keys()}, "FINISH": "final_answer"},
)
workflow.add_edge("final_answer", END)
multi_agent = workflow.compile()



## Adding a Chat interface
Now that we have our asset_agent() object defined, all we need to do is create a parent function that colours the behaviour of our agent graph. This is defined as a class object that accepts the compiled agent graph as input once instanciated and decorates it with two behaviours. `predict()` is the public behaviour that interfaces with the user or external system. `predict_stream()` is used for the supervisor to keep track of, and maintain conversations with each of the agent nodes.

In [0]:
%%writefile -a asset_agent.py
class LangGraphChatAgent(ChatAgent):
    #Class constructor. This defines how the LangGraphChatAgent is initialized.
    def __init__(self, agent: CompiledStateGraph):
        self.agent = agent

    #This function is a behaviour that returns a response. It defines the chat structure between the agents. I.E., how they talk back and forth with the supervisor agent. We should probably create an installable library for this since it's pretty typical and can benefit from override and extension functionality.
    def predict(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> ChatAgentResponse:
        request = {
            "messages": [m.model_dump_compat(exclude_none=True) for m in messages]
        }

        messages = []
        for event in self.agent.stream(request, stream_mode="updates"):
            for node_data in event.values():
                messages.extend(
                    ChatAgentMessage(**msg) for msg in node_data.get("messages", [])
                )
        return ChatAgentResponse(messages=messages)

    #This behaviour is how the supervisor keeps track of internal conversations. This is important as it allows agents to pass context to one another.
    def predict_stream(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> Generator[ChatAgentChunk, None, None]:
        request = {
            "messages": [m.model_dump_compat(exclude_none=True) for m in messages]
        }
        for event in self.agent.stream(request, stream_mode="updates"):
            for node_data in event.values():
                yield from (
                    ChatAgentChunk(**{"delta": msg})
                    for msg in node_data.get("messages", [])
                )



## Logging to MLFlow
Now that everything is assembled and put together, we can pass the whole thing into MLFlow. Since we've been appending everything to `asset_agent.py` up to this point, the entire definition for the application can actually be logged just like we would any other ML model. By adding auto-logging functionality to our application, we can utilize MLFlow for tracing our application and tracking decisions within conversations using inference tables.

In [0]:
%%writefile -a asset_agent.py
#Create the agent object, and specify it as the agent object to use when loading the agent back for inference via mlflow.models.set_model()
mlflow.langchain.autolog()
AGENT = LangGraphChatAgent(multi_agent)
mlflow.models.set_model(AGENT)

## Test the Agent

In [0]:
#Kill the python context to validate the file.
dbutils.library.restartPython()

## Create a Personal Access Token (PAT) as a Databricks secret
In order to access the underlying resources, we need to create a PAT
- This can either be your own PAT or that of a System Principal ([AWS](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/oauth-m2m)). You will have to rotate this token yourself upon expiry.
- Add secrets-based environment variables to a model serving endpoint ([AWS](https://docs.databricks.com/aws/en/machine-learning/model-serving/store-env-variable-model-serving#add-secrets-based-environment-variables) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/store-env-variable-model-serving#add-secrets-based-environment-variables)).
- You can reference the table in the deploy docs for the right permissions level for each resource: ([AWS](https://docs.databricks.com/aws/en/generative-ai/agent-framework/deploy-agent#automatic-authentication-passthrough) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/agent-framework/deploy-agent#automatic-authentication-passthrough)).

### Developer's Note
For this application we shouldn't need to use the PAT that we previously used to access UC resources HOWEVER I'm likely going to create a GenieAgent soon to add to this project so I thought I'd leave it in for now.

In [0]:
import os
from dbruntime.databricks_repl_context import get_context

#Set the variables for the PAT in the Databricks Secrets store
secret_scope_name = "general"
secret_key_name = "genie_access"

#Inject the variables into the agent for use.
os.environ["DB_MODEL_SERVING_HOST_URL"] = "https://" + get_context().workspaceUrl
assert os.environ["DB_MODEL_SERVING_HOST_URL"] is not None
os.environ["DATABRICKS_GENIE_PAT"] = dbutils.secrets.get(
    scope=secret_scope_name, key=secret_key_name
)
assert os.environ["DATABRICKS_GENIE_PAT"] is not None, (
    "The DATABRICKS_GENIE_PAT was not properly set to the PAT secret"
)

## Testing our compiled compount AI application
Let's quickly run a test for each of our three tools, and both of our agents as defined. This is important and handy to understand how each agent expects to be invoked. This helps us understand the tooling of each agent and the stages of reasoning that occur.

### Sample testing our application
Now that we've written our entire application definition out to `multi_agent.py` and restarted our python interpreter, we can validate the source code that we prepped for deployment. All we need is a simple code stub to simulate how the serving endpoint interfaces with the application. This will show us a breakdown of how each agent is invoked based on different prompts. Experiment with the message prompts for different results. Once we're satisfied, we can consider deploying this to a production-grade serving endpoint.

In [0]:
from asset_agent import AGENT

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "What is the weather at 51.0447° N, 114.0719° W?",
        }
    ]
}
AGENT.predict(input_example)

In [0]:
from asset_agent import AGENT

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "What are the coordinates of the Calgary Saddledome?",
        }
    ]
}
AGENT.predict(input_example)

In [0]:
from asset_agent import AGENT

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "Can you find me the top 5 articles about Nvidia?",
        }
    ]
}
AGENT.predict(input_example)

## Log the agent as an MLflow model

Log the agent as code from the `asset_agent.py` file. See [MLflow - Models from Code](https://mlflow.org/docs/latest/models.html#models-from-code).

### Enable automatic authentication for Databricks resources
For the most common Databricks resource types, Databricks supports and recommends declaring resource dependencies for the agent upfront during logging. This enables automatic authentication passthrough when you deploy the agent. With automatic authentication passthrough, Databricks automatically provisions, rotates, and manages short-lived credentials to securely access these resource dependencies from within the agent endpoint.

To enable automatic authentication, specify the dependent Databricks resources when calling `mlflow.pyfunc.log_model().`
  - **TODO**: If your Unity Catalog tool queries a [vector search index](docs link) or leverages [external functions](docs link), you need to include the dependent vector search index and UC connection objects, respectively, as resources. See docs ([AWS](https://docs.databricks.com/generative-ai/agent-framework/log-agent.html#specify-resources-for-automatic-authentication-passthrough) | [Azure](https://learn.microsoft.com/azure/databricks/generative-ai/agent-framework/log-agent#resources)).

## Creating the MLFlow defintion
When logging a model, project or application to MLFlow, we need to define a few components to tell MLFlow what the project 'looks' like. In other words, we need to provide the context around how it operates. This is what allows MLFlow to serve the project in a consistent and repeatable fashion. Basically, we define the tools and resources that are required for the project to run and be served. Since we're also relying on other Databricks resources (a serving endpoint and a few custom tools), those need to be both defined and present for the application to run. The same goes for the system and any AI tools that we defined in the code agent. Once all is in order, we can create our MLFlow run that logs our agent application.

In [0]:
# Determine Databricks resources to specify for automatic auth passthrough at deployment time
import mlflow
from asset_agent import LLM_ENDPOINT_NAME, weather_tools, asset_tools
from databricks_langchain import UnityCatalogTool, VectorSearchRetrieverTool
from mlflow.models.resources import (
    DatabricksFunction,
    DatabricksGenieSpace,
    DatabricksServingEndpoint,
)
from pkg_resources import get_distribution

#This is what enables endpoint authentication. If this doesn't exist we won't be able to connect to any served endpoints providing this application.
resources = [
    DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT_NAME)
]

for tool in weather_tools:
    if isinstance(tool, UnityCatalogTool):
        resources.append(DatabricksFunction(function_name=tool.uc_function_name))

for tool in asset_tools:
    if isinstance(tool, UnityCatalogTool):
        resources.append(DatabricksFunction(function_name=tool.uc_function_name))

#Create our MLFlow run and log all teh things.
with mlflow.start_run():
    logged_agent_info = mlflow.pyfunc.log_model(
        artifact_path="asset_agent",
        python_model="asset_agent.py",
        input_example=input_example,
        extra_pip_requirements=[f"databricks-connect=={get_distribution('databricks-connect').version}"],
        resources=resources
    )

## Register the model to Unity Catalog
Once our model has been uploaded, we can then register it as the latest version and prepare it for serving. Here we define the location for the registered model in Unity Catalog, where it will be stored and managed as a knowledge object.

In [0]:
mlflow.set_registry_uri("databricks-uc")

# TODO: define the catalog, schema, and model name for your UC model
catalog = "ademianczuk"
schema = "general"
model_name = "asset_agent"
UC_MODEL_NAME = f"{catalog}.{schema}.{model_name}"

# register the model to UC
uc_registered_model_info = mlflow.register_model(
    model_uri=logged_agent_info.model_uri, name=UC_MODEL_NAME
)

## Deploy our application
The last step is to deploy our application to the Databricks serving endpoint. All we need to do is invoke the `deploy()` function from the databricks core library and pass in our actual access token that's stored in the secrets store. This delegates access to a privileged credential for the agent to act on the user's behalf. Creating the serving infrastructure can take a while (usually about 20 minute for first time deployment) because the entire isolated ephemeral network and container runtime are defined, built, deployed, configured and networked for the duration of the application serving lifetime. If an endpoint is terminated, the deployment descriptors remain from the previous configuration for reuse.

In [0]:
from databricks import agents

agents.deploy(
    UC_MODEL_NAME,
    uc_registered_model_info.version,
    tags={"endpointSource": "docs"},
    environment_vars={
        "DATABRICKS_GENIE_PAT": f"{{{{secrets/{secret_scope_name}/{secret_key_name}}}}}"
    },
)

## Next steps

After your agent is deployed, you can chat with it in AI playground to perform additional checks, share it with SMEs in your organization for feedback, or embed it in a production application. See Databricks documentation ([AWS](https://docs.databricks.com/en/generative-ai/deploy-agent.html) | [Azure](https://learn.microsoft.com/en-us/azure/databricks/generative-ai/deploy-agent)).