# Building a Databricks Mosaic AI Agent: A Complete Walkthrough

As a Solution Architect at Databricks, I'm excited to walk you through this comprehensive notebook that demonstrates how to build an intelligent AI agent using Databricks' Mosaic AI platform. This notebook showcases the power of combining retrieval-augmented generation (RAG) with tool-calling capabilities to create a sophisticated question-answering system for Databricks documentation.

## Overview: What We're Building

This notebook creates a **DocsAgent** - an intelligent assistant that can answer questions about Databricks by retrieving relevant documentation chunks and leveraging Unity Catalog functions. The agent uses TF-IDF-based retrieval combined with a large language model to provide accurate, contextual responses.

Let's dive deep into each component:

---

## Cell 1: Essential Imports and Dependencies

```python
from typing import Any, Optional, Sequence, Union
import mlflow
import pandas as pd
from databricks_langchain import ChatDatabricks
from databricks_langchain.uc_ai import (
    DatabricksFunctionClient,
    UCFunctionToolkit,
    set_uc_function_client,
)
from langchain_core.language_models import LanguageModelLike
from langchain_core.runnables import RunnableConfig, RunnableLambda
from langchain_core.tools import BaseTool, tool
from langgraph.graph import END, StateGraph
from langgraph.graph.graph import CompiledGraph
from langgraph.prebuilt.tool_node import ToolNode
from mlflow.langchain.chat_agent_langgraph import ChatAgentState, ChatAgentToolNode
from mlflow.models import ModelConfig
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import (
    ChatAgentMessage,
    ChatAgentResponse,
    ChatContext,
)
from sklearn.feature_extraction.text import TfidfVectorizer
```

### Purpose and Breakdown:

**Type Hints & Core Libraries:**
- `typing` imports provide type safety for better code documentation
- `mlflow` handles model lifecycle management and experiment tracking
- `pandas` manages data manipulation for our document corpus

**Databricks-Specific Integrations:**
- `ChatDatabricks`: Interface to Databricks-hosted language models
- `DatabricksFunctionClient` & `UCFunctionToolkit`: Enable integration with Unity Catalog functions
- These allow our agent to call registered functions from Unity Catalog as tools

**LangChain Framework Components:**
- `langchain_core`: Provides abstractions for language models and runnables
- `langgraph`: Enables building complex agent workflows with state management
- These create the orchestration layer for our agent's decision-making process

**ML/NLP Libraries:**
- `TfidfVectorizer`: Creates vector representations of documents for similarity search
- This forms the backbone of our retrieval system

---

## Cell 2: Data Loading and Preprocessing

```python
databricks_docs_url = "https://raw.githubusercontent.com/databricks/genai-cookbook/refs/heads/main/quick_start_demo/chunked_databricks_docs_filtered.jsonl"
parsed_docs_df = pd.read_json(databricks_docs_url, lines=True)

documents = parsed_docs_df
doc_vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = doc_vectorizer.fit_transform(documents["content"])
```

### What's Happening Here:

**Document Corpus Loading:**
- Downloads pre-processed Databricks documentation chunks in JSONL format
- Each line contains a document chunk with content, metadata, and source URI
- The chunks are optimally sized for retrieval and LLM context windows

**TF-IDF Vectorization Process:**
1. **TfidfVectorizer Configuration:**
   - `stop_words="english"`: Removes common English words that don't carry semantic meaning
   - Creates a vocabulary from all unique words in the document corpus

2. **Matrix Creation:**
   - `fit_transform()` converts each document's content into a numerical vector
   - Each dimension represents the TF-IDF score for a specific term
   - This creates a sparse matrix where documents can be compared mathematically

**Why TF-IDF?**
- **Term Frequency (TF):** Measures how often a word appears in a document
- **Inverse Document Frequency (IDF):** Reduces weight of common words across the corpus
- Results in vectors that highlight distinctive terms for each document

---

## Cell 3: Document Retrieval Tool

```python
@tool
@mlflow.trace(name="LittleIndex", span_type=mlflow.entities.SpanType.RETRIEVER)
def find_relevant_documents(query: str, top_n: int = 5) -> list[dict[str, Any]]:
    """gets relevant documents for the query"""
    query_tfidf = doc_vectorizer.transform([query])
    similarities = (tfidf_matrix @ query_tfidf.T).toarray().flatten()
    ranked_docs = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)

    result = []
    for idx, score in ranked_docs[:top_n]:
        row = documents.iloc[idx]
        content = row["content"]
        doc_entry = {
            "page_content": content,
            "metadata": {
                "doc_uri": row["doc_uri"],
                "score": score,
            },
        }
        result.append(doc_entry)
    return result
```

### Detailed Function Analysis:

**Decorators:**
- `@tool`: Marks this function as a LangChain tool that can be called by the agent
- `@mlflow.trace()`: Enables MLflow tracking for observability and debugging
- `SpanType.RETRIEVER`: Categorizes this as a retrieval operation in MLflow traces

**Retrieval Algorithm:**
1. **Query Vectorization:**
   ```python
   query_tfidf = doc_vectorizer.transform([query])
   ```
   - Converts the user's query into the same TF-IDF vector space as documents
   - Uses the fitted vocabulary from the document corpus

2. **Similarity Computation:**
   ```python
   similarities = (tfidf_matrix @ query_tfidf.T).toarray().flatten()
   ```
   - Performs matrix multiplication between document matrix and query vector
   - Results in cosine similarity scores for each document
   - Higher scores indicate better semantic matches

3. **Ranking and Selection:**
   ```python
   ranked_docs = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)
   ```
   - Sorts documents by similarity score in descending order
   - `enumerate()` preserves original document indices
   - Returns top N most relevant documents

**Output Structure:**
Each returned document contains:
- `page_content`: The actual text content
- `metadata`: Source URI and similarity score for provenance tracking

---

## Cell 4: Agent Workflow Architecture

```python
def create_tool_calling_agent(
    model: LanguageModelLike,
    tools: Union[ToolNode, Sequence[BaseTool]],
    agent_prompt: Optional[str] = None,
) -> CompiledGraph:
    model = model.bind_tools(tools)

    def routing_logic(state: ChatAgentState):
        last_message = state["messages"][-1]
        if last_message.get("tool_calls"):
            return "continue"
        else:
            return "end"

    if agent_prompt:
        system_message = {"role": "system", "content": agent_prompt}
        preprocessor = RunnableLambda(
            lambda state: [system_message] + state["messages"]
        )
    else:
        preprocessor = RunnableLambda(lambda state: state["messages"])
    model_runnable = preprocessor | model

    def call_model(
        state: ChatAgentState,
        config: RunnableConfig,
    ):
        response = model_runnable.invoke(state, config)
        return {"messages": [response]}

    workflow = StateGraph(ChatAgentState)
    workflow.add_node("agent", RunnableLambda(call_model))
    workflow.add_node("tools", ChatAgentToolNode(tools))
    workflow.set_entry_point("agent")
    workflow.add_conditional_edges(
        "agent",
        routing_logic,
        {
            "continue": "tools",
            "end": END,
        },
    )
    workflow.add_edge("tools", "agent")
    return workflow.compile()
```

### Agent Architecture Breakdown:

**Tool Binding:**
```python
model = model.bind_tools(tools)
```
- Informs the LLM about available tools and their schemas
- Enables the model to generate structured tool calls in its responses

**Routing Logic Function:**
```python
def routing_logic(state: ChatAgentState):
    last_message = state["messages"][-1]
    if last_message.get("tool_calls"):
        return "continue"
    else:
        return "end"
```
- Examines the last message in the conversation state
- Decides whether to continue to tool execution or end the conversation
- Forms the decision-making core of the agent workflow

**Message Preprocessing:**
```python
if agent_prompt:
    system_message = {"role": "system", "content": agent_prompt}
    preprocessor = RunnableLambda(
        lambda state: [system_message] + state["messages"]
    )
```
- Prepends system prompt to every conversation if provided
- Ensures consistent agent behavior and role definition
- Uses LangChain's `RunnableLambda` for functional composition

**Workflow Graph Construction:**
1. **Node Creation:**
   - `"agent"`: Handles LLM inference and decision-making
   - `"tools"`: Executes tool calls and returns results

2. **Edge Definitions:**
   - Entry point starts with the agent
   - Conditional edges from agent based on routing logic
   - Deterministic edge from tools back to agent
   - Creates a loop for multi-step reasoning

**Graph Compilation:**
- `workflow.compile()` creates an executable state machine
- Optimizes the workflow for efficient execution
- Returns a `CompiledGraph` ready for invocation

---

## Cell 5: DocsAgent Class Implementation

```python
class DocsAgent(ChatAgent):
    def __init__(self, config, tools):
        self.config = ModelConfig(development_config=config)
        self.tools = tools
        self.agent = self._build_agent_from_config()

    def _build_agent_from_config(self):
        llm = ChatDatabricks(
            endpoint=self.config.get("endpoint_name"),
            temperature=self.config.get("temperature"),
            max_tokens=self.config.get("max_tokens"),
        )
        agent = create_tool_calling_agent(
            llm,
            tools=self.tools,
            agent_prompt=self.config.get("system_prompt"),
        )
        return agent

    def predict(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> ChatAgentResponse:
        request = {"messages": self._convert_messages_to_dict(messages)}
        output = self.agent.invoke(request)
        return ChatAgentResponse(**output)
```

### Class Architecture Analysis:

**Inheritance Structure:**
- Extends `mlflow.pyfunc.ChatAgent` for MLflow integration
- Provides standardized interface for model serving
- Enables deployment to Databricks Model Serving endpoints

**Initialization Process:**
1. **Configuration Management:**
   ```python
   self.config = ModelConfig(development_config=config)
   ```
   - Wraps configuration in MLflow's ModelConfig
   - Handles environment-specific settings (development vs. production)
   - When deployed, `development_config` is replaced with serving config

2. **Agent Construction:**
   ```python
   self.agent = self._build_agent_from_config()
   ```
   - Delegates agent building to separate method for clarity
   - Maintains separation of concerns between configuration and construction

**LLM Configuration:**
```python
llm = ChatDatabricks(
    endpoint=self.config.get("endpoint_name"),
    temperature=self.config.get("temperature"),
    max_tokens=self.config.get("max_tokens"),
)
```
- **endpoint_name**: Specifies which Databricks-hosted model to use
- **temperature**: Controls response randomness (lower = more deterministic)
- **max_tokens**: Limits response length to prevent excessive token usage

**Prediction Interface:**
```python
def predict(self, messages: list[ChatAgentMessage], ...):
    request = {"messages": self._convert_messages_to_dict(messages)}
    output = self.agent.invoke(request)
    return ChatAgentResponse(**output)
```

**Message Processing Flow:**
1. **Format Conversion:** Uses built-in helper to convert framework-specific messages to dictionaries
2. **Agent Invocation:** Triggers the compiled workflow graph
3. **Response Formatting:** Ensures output matches expected ChatAgentResponse schema

---

## Cell 6: Configuration and Initialization

```python
# TODO fill in your catalog and schema name
catalog = ""
schema = ""

# TODO: Replace with your model serving endpoint
LLM_ENDPOINT = "databricks-meta-llama-3-3-70b-instruct"

baseline_config = {
    "endpoint_name": LLM_ENDPOINT,
    "temperature": 0.01,
    "max_tokens": 1000,
    "system_prompt": """You are a helpful assistant that answers questions about Databricks. Questions unrelated to Databricks are irrelevant.

    You answer questions using a set of tools. If needed, you ask the user follow-up questions to clarify their request.
    """,
}

tools = [find_relevant_documents]
uc_client = DatabricksFunctionClient()
set_uc_function_client(uc_client)
uc_toolkit = UCFunctionToolkit(function_names=[f"{catalog}.{schema}.*"])
tools.extend(uc_toolkit.tools)

AGENT = DocsAgent(baseline_config, tools)
mlflow.models.set_model(AGENT)
```

### Configuration Deep Dive:

**Unity Catalog Integration:**
```python
catalog = ""  # Your Unity Catalog name
schema = ""   # Your schema name
```
- These placeholders need to be filled with actual Unity Catalog coordinates
- Enables access to registered functions in your data governance layer

**Model Endpoint Selection:**
```python
LLM_ENDPOINT = "databricks-meta-llama-3-3-70b-instruct"
```
- Uses Meta's Llama 3.3 70B model hosted on Databricks
- This is a instruction-tuned model optimized for following directions
- 70B parameters provide strong reasoning capabilities for complex queries

**Agent Configuration Parameters:**
1. **temperature: 0.01**
   - Near-deterministic responses for consistent behavior
   - Minimizes hallucination in factual question-answering scenarios

2. **max_tokens: 1000**
   - Reasonable limit for most documentation queries
   - Prevents excessive token usage while allowing comprehensive answers

3. **system_prompt**
   - Defines agent's role and scope (Databricks-focused)
   - Sets expectations for tool usage and clarifying questions
   - Establishes boundaries for relevant vs. irrelevant queries

**Tool Assembly Process:**
1. **Base Tools:** Starts with our document retrieval function
2. **Unity Catalog Integration:**
   ```python
   uc_client = DatabricksFunctionClient()
   set_uc_function_client(uc_client)
   uc_toolkit = UCFunctionToolkit(function_names=[f"{catalog}.{schema}.*"])
   ```
   - Creates client for Unity Catalog function calls
   - Sets global client for the toolkit
   - Discovers all functions matching the pattern `catalog.schema.*`
   - Automatically converts UC functions into LangChain tools

3. **Tool Combination:** Extends the base tool list with UC functions

**Model Registration:**
```python
AGENT = DocsAgent(baseline_config, tools)
mlflow.models.set_model(AGENT)
```
- Instantiates the complete agent with configuration and tools
- Registers with MLflow for serving and lifecycle management
- Makes the agent available for deployment to Model Serving endpoints

---

## How It All Works Together

This notebook creates a sophisticated RAG-enhanced agent through several integrated components:

1. **Document Processing Pipeline:** TF-IDF vectorization enables fast semantic search across Databricks documentation

2. **Tool-Calling Architecture:** The agent can dynamically decide when to retrieve documents or call Unity Catalog functions based on user queries

3. **Workflow Orchestration:** LangGraph manages the complex decision-making process, allowing for multi-step reasoning and tool chaining

4. **Production Readiness:** MLflow integration ensures the agent can be deployed, monitored, and managed in production environments

5. **Extensibility:** The Unity Catalog integration allows easy addition of custom business functions as agent capabilities

## Key Benefits and Use Cases

**For Data Engineers:**
- Quick access to Databricks documentation during development
- Integration with custom data processing functions via Unity Catalog

**For Data Scientists:**
- Interactive assistance with ML workflows and best practices
- Access to both documentation and computational tools in one interface

**For Solution Architects:**
- Comprehensive platform knowledge at fingertips
- Ability to validate architectural decisions against documentation

This notebook demonstrates the power of combining retrieval-augmented generation with tool-calling capabilities, creating an intelligent assistant that can both access knowledge and perform actions - a crucial pattern for building practical AI applications in the enterprise.

The architecture shown here can be easily extended to support additional data sources, more sophisticated retrieval methods (like vector databases), or domain-specific tool integrations, making it a robust foundation for enterprise AI agent development.

[Source: Databricks Documentation](https://docs.databricks.com/aws/en/notebooks/source/generative-ai/mosaic-ai-agent-demo.html)

# Step-by-Step Breakdown: `create_tool_calling_agent` Function

Let me walk you through the `create_tool_calling_agent` function step by step, explaining each line and concept in detail.

## Function Overview

```python
def create_tool_calling_agent(
    model: LanguageModelLike,
    tools: Union[ToolNode, Sequence[BaseTool]],
    agent_prompt: Optional[str] = None,
) -> CompiledGraph:
```

**What this function does**: Creates an AI agent that can decide when to use tools vs. when to respond directly to users.

**Think of it like**: Building a smart assistant that knows when to look things up vs. when to answer from memory.

---

## Step 1: Bind Tools to the Model

```python
model = model.bind_tools(tools)
```

### What's happening here?

**Before binding:**
```python
# Model only knows how to chat
model = ChatDatabricks(endpoint="llama-3-70b")
# Model can only generate text responses
```

**After binding:**
```python
# Model now knows about available tools
model = model.bind_tools([find_relevant_documents, unity_catalog_functions])
# Model can now generate both text AND tool calls
```

### Detailed explanation:

1. **Tool Schema Registration**: The model learns about each tool's:
   - Name (e.g., "find_relevant_documents")
   - Description (e.g., "gets relevant documents for the query")
   - Parameters (e.g., query: string, top_n: integer)

2. **Response Format Enhancement**: The model can now generate responses like:
   ```python
   # Option 1: Regular text response
   {"role": "assistant", "content": "Delta tables are..."}
   
   # Option 2: Tool calling response
   {
       "role": "assistant", 
       "content": "Let me search for that information",
       "tool_calls": [
           {
               "function": {
                   "name": "find_relevant_documents",
                   "arguments": '{"query": "delta tables", "top_n": 5}'
               }
           }
       ]
   }
   ```

**Real-world analogy**: Like giving a librarian (model) a catalog of all available resources (tools) they can use to help visitors.

---

## Step 2: Define Routing Logic

```python
def routing_logic(state: ChatAgentState):
    last_message = state["messages"][-1]
    if last_message.get("tool_calls"):
        return "continue"
    else:
        return "end"
```

### What's the purpose?

This function decides what happens after the model generates a response. It's like a traffic controller for the conversation flow.

### Step-by-step breakdown:

1. **Get the latest message**: 
   ```python
   last_message = state["messages"][-1]
   ```
   - Looks at the most recent message in the conversation
   - This will be the response the model just generated

2. **Check for tool calls**:
   ```python
   if last_message.get("tool_calls"):
   ```
   - Examines if the model requested to use any tools
   - `get("tool_calls")` safely checks if this field exists

3. **Make routing decision**:
   ```python
   return "continue"  # Go execute tools
   # OR
   return "end"       # Finish the conversation
   ```

### Visual flow:
```
Model Response → Routing Logic → Decision
     ↓                ↓             ↓
"Let me search..." → Has tool_calls? → "continue" → Execute tools
     ↓                ↓             ↓
"Here's the answer" → No tool_calls? → "end" → Finish
```

---

## Step 3: Set Up Message Preprocessing

```python
if agent_prompt:
    system_message = {"role": "system", "content": agent_prompt}
    preprocessor = RunnableLambda(
        lambda state: [system_message] + state["messages"]
    )
else:
    preprocessor = RunnableLambda(lambda state: state["messages"])
```

### Why do we need preprocessing?

Every time we call the model, we want to remind it of its role and instructions.

### Detailed walkthrough:

**Case 1: With system prompt**
```python
# Input state
state = {
    "messages": [
        {"role": "user", "content": "How do I create a Delta table?"}
    ]
}

# System message creation
system_message = {
    "role": "system", 
    "content": "You are a helpful assistant that answers questions about Databricks..."
}

# Preprocessor function
preprocessor = RunnableLambda(
    lambda state: [system_message] + state["messages"]
)

# Result after preprocessing
[
    {"role": "system", "content": "You are a helpful assistant..."},
    {"role": "user", "content": "How do I create a Delta table?"}
]
```

**Case 2: Without system prompt**
```python
# Just passes messages through unchanged
preprocessor = RunnableLambda(lambda state: state["messages"])
```

### Why use RunnableLambda?

```python
RunnableLambda(lambda state: [system_message] + state["messages"])
```

- **RunnableLambda**: Wraps a simple function to work with LangChain's pipeline system
- **Lambda function**: Anonymous function that takes state and returns processed messages
- **Integration**: Allows this preprocessing to be part of a larger pipeline

---

## Step 4: Create the Model Pipeline

```python
model_runnable = preprocessor | model
```

### What's the `|` operator?

This creates a **pipeline** where output of one step becomes input of the next:

```python
# Pipeline flow:
state → preprocessor → processed_messages → model → response

# Equivalent to:
processed_messages = preprocessor.invoke(state)
response = model.invoke(processed_messages)
```

### Visual representation:
```
Input State
    ↓
[Preprocessor] → Adds system prompt
    ↓
Processed Messages
    ↓
[Model] → Generates response (with possible tool calls)
    ↓
Response
```

---

## Step 5: Define the Model Calling Function

```python
def call_model(
    state: ChatAgentState,
    config: RunnableConfig,
):
    response = model_runnable.invoke(state, config)
    return {"messages": [response]}
```

### What does this function do?

1. **Takes the current state**: All conversation history
2. **Runs the pipeline**: Preprocessing + Model inference
3. **Returns formatted response**: In the format the workflow expects

### Step-by-step execution:
```python
# Step 1: Receive state
state = {
    "messages": [
        {"role": "user", "content": "What is Spark?"}
    ]
}

# Step 2: Pipeline execution
# preprocessor adds system prompt
# model generates response
response = {
    "role": "assistant",
    "content": "I'll search for information about Spark",
    "tool_calls": [...]
}

# Step 3: Return in expected format
return {"messages": [response]}
```

### Why return `{"messages": [response]}`?

This format tells the workflow system: "Add this new message to the conversation history."

---

## Step 6: Build the Workflow Graph

```python
workflow = StateGraph(ChatAgentState)
```

### What's StateGraph?

Think of it as building a flowchart for your AI agent:

```
[Start] → [Agent] → [Decision] → [Tools] → [Agent] → [End]
                       ↓
                    [End]
```

**StateGraph**: A framework for creating state machines where each node can process and modify the conversation state.

---

## Step 7: Add Nodes to the Graph

```python
workflow.add_node("agent", RunnableLambda(call_model))
workflow.add_node("tools", ChatAgentToolNode(tools))
```

### What are nodes?

Nodes are the **processing stations** in your workflow:

**Agent Node**: 
```python
workflow.add_node("agent", RunnableLambda(call_model))
```
- **Name**: "agent"
- **Function**: `call_model` (wrapped in RunnableLambda)
- **Purpose**: Generate responses and decide on tool usage

**Tools Node**:
```python
workflow.add_node("tools", ChatAgentToolNode(tools))
```
- **Name**: "tools"  
- **Function**: `ChatAgentToolNode` (built-in LangGraph component)
- **Purpose**: Execute the tools that the agent requested

### Visual representation:
```
┌─────────────┐       ┌─────────────┐
│    Agent    │       │    Tools    │
│             │       │             │
│ - call_model│       │ - execute   │
│ - decide    │       │   tool_calls│
│ - respond   │       │ - return    │
│             │       │   results   │
└─────────────┘       └─────────────┘
```

---

## Step 8: Set Entry Point

```python
workflow.set_entry_point("agent")
```

### What does this mean?

When a conversation starts, it always begins at the "agent" node.

```
User Question → [Agent Node] → Decision...
```

**Why start with agent?**
- The agent needs to understand the question first
- Then it decides whether to use tools or respond directly

---

## Step 9: Add Conditional Edges

```python
workflow.add_conditional_edges(
    "agent",
    routing_logic,
    {
        "continue": "tools",
        "end": END,
    },
)
```

### Breaking this down:

1. **Source node**: `"agent"` - Where the decision is made
2. **Decision function**: `routing_logic` - How to decide
3. **Possible outcomes**: Dictionary mapping decisions to destinations

### Visual flow:
```python
# After agent generates response:
if routing_logic(state) == "continue":
    # Go to tools node
    next_node = "tools"
elif routing_logic(state) == "end":
    # End the conversation  
    next_node = END
```

### Complete conditional flow:
```
[Agent] generates response
    ↓
[Routing Logic] checks response
    ↓
Has tool_calls?
    ↓         ↓
   Yes       No
    ↓         ↓
[Tools]    [END]
```

---

## Step 10: Add Direct Edge

```python
workflow.add_edge("tools", "agent")
```

### What's this for?

After tools execute, we **always** go back to the agent to process the results.

```python
# Tools execute and return results
tool_results = execute_tools(tool_calls)

# These results get added to conversation
state["messages"].append(tool_results)

# Flow goes back to agent to synthesize the information
# Agent can now provide a final answer OR call more tools
```

### Complete workflow loop:
```
[Agent] → decides to use tools → [Tools] → execute → [Agent] → synthesize results
   ↓                                                      ↓
decides no tools needed                            provide final answer
   ↓                                                      ↓
 [END]                                                  [END]
```

---

## Step 11: Compile the Workflow

```python
return workflow.compile()
```

### What does compile do?

1. **Validates the graph**: Ensures all connections make sense
2. **Optimizes execution**: Creates efficient execution plan  
3. **Returns executable**: Creates a `CompiledGraph` ready to run

### Before vs After compilation:

**Before (workflow definition):**
```python
workflow = StateGraph(ChatAgentState)
# Just a blueprint/definition
```

**After (compiled graph):**
```python
compiled_agent = workflow.compile()
# Executable agent ready for conversations
```

---

## Complete Flow Example

Let's trace through a complete conversation:

### User asks: "How do I create a Delta table?"

**Step 1**: Start at agent node
```python
state = {"messages": [{"role": "user", "content": "How do I create a Delta table?"}]}
```

**Step 2**: `call_model` executes
```python
# Preprocessor adds system prompt
# Model sees: [system_prompt, user_question]
# Model decides to search for information
response = {
    "role": "assistant",
    "content": "I'll search for Delta table information",
    "tool_calls": [{"function": {"name": "find_relevant_documents", ...}}]
}
```

**Step 3**: `routing_logic` checks response
```python
# Sees tool_calls in response
# Returns "continue"
```

**Step 4**: Flow goes to tools node
```python
# Executes find_relevant_documents
# Gets documentation about Delta tables
tool_result = {"role": "tool", "content": "Delta table documentation..."}
```

**Step 5**: Flow returns to agent node  
```python
# Agent now sees: [user_question, agent_response, tool_result]
# Model synthesizes information
final_response = {
    "role": "assistant", 
    "content": "Based on the documentation, here's how to create a Delta table: ..."
}
```

**Step 6**: `routing_logic` checks final response
```python
# No tool_calls this time
# Returns "end"
```

**Step 7**: Conversation ends with complete answer

---

## Key Benefits of This Architecture

### 1. **Flexible Decision Making**
The agent can choose when to use tools vs. respond directly:
```python
# Simple questions → Direct response
"What is Databricks?" → Direct answer from training

# Complex questions → Tool usage  
"How do I optimize my specific ETL pipeline?" → Search documentation + call UC functions
```

### 2. **Multi-Step Reasoning**
The agent can chain multiple tool calls:
```python
# Step 1: Search documentation
# Step 2: Analyze user's context  
# Step 3: Call Unity Catalog function for specific data
# Step 4: Provide tailored recommendation
```

### 3. **Error Recovery**
If tools fail, the agent can try alternatives:
```python
# Tool call fails → Agent tries different approach
# No relevant docs found → Agent asks clarifying questions
```

### 4. **Observability**
Every step is traceable through MLflow:
```python
# Track: What decisions were made? Which tools were called? How long did it take?
```

This architecture creates a sophisticated AI agent that can reason about when and how to use tools, making it much more capable than a simple question-answering system.

# Create, evaluate, and deploy an AI agent

This notebook walks you through building, evaluating, and deploying an AI agent that combines retrieval and tool usage. You'll work with a pre-chunked subset of Databricks documentation as your dataset.

## Supporting documentation
For a comprehensive understanding of this notebook's contents, including the rationale behind the code and the challenges it addresses, see the accompanying Databricks documentation page. ([AWS](https://docs.databricks.com/aws/en/generative-ai/tutorials/agent-framework-notebook) | [Azure](https://learn.microsoft.com/azure/databricks/generative-ai/tutorials/agent-framework-notebook))

In [0]:
%pip install -U -qqqq mlflow langchain langgraph==0.3.4 databricks-langchain pydantic databricks-agents unitycatalog-langchain[databricks]
dbutils.library.restartPython()

# Create an agent and tools

In [0]:
from databricks_langchain import ChatDatabricks

# TODO: Replace with your model serving endpoint
LLM_ENDPOINT = "databricks-meta-llama-3-3-70b-instruct"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT)

In [0]:
import pandas as pd

databricks_docs_url = "https://raw.githubusercontent.com/databricks/genai-cookbook/refs/heads/main/quick_start_demo/chunked_databricks_docs_filtered.jsonl"
parsed_docs_df = pd.read_json(databricks_docs_url, lines=True)

In [0]:
from databricks_langchain.uc_ai import (
    DatabricksFunctionClient,
    UCFunctionToolkit,
    set_uc_function_client,
)

uc_client = DatabricksFunctionClient()
set_uc_function_client(uc_client)


def tfidf_keywords(text: str) -> list[str]:
    """
    Extracts keywords from the provided text using TF-IDF.

    Args:
        text (string): Input text.
    Returns:
        list[str]: List of extracted keywords in ascending order of importance.
    """
    from sklearn.feature_extraction.text import TfidfVectorizer

    def extract_keywords(text, top_n=5):
        """Extracts top keywords from input text using trained TF-IDF vectorizer"""
        keyword_vectorizer = TfidfVectorizer(
            stop_words="english"
        )  # New vectorizer for query
        query_tfidf = keyword_vectorizer.fit_transform([text])  # Fit on query only
        scores = query_tfidf.toarray()[0]
        indices = scores.argsort()[-top_n:][::-1]  # Get top N keywords
        return [
            keyword_vectorizer.get_feature_names_out()[i]
            for i in indices
            if scores[i] > 0
        ]

    return extract_keywords(text)

# TODO fill in your catalog and schema name
catalog = "agentic_ai"
schema = "databricks"

assert (catalog and schema)

# Create the function within the Unity Catalog catalog and schema specified
function_info = uc_client.create_python_function(
    func=tfidf_keywords,
    catalog=catalog,
    schema=schema,
    replace=True,  # Set to True to overwrite if the function already exists
)

uc_tool_names = [f"{catalog}.{schema}.tfidf_keywords"]
uc_toolkit = UCFunctionToolkit(function_names=uc_tool_names)

In [0]:
print(uc_toolkit.tools[0])
uc_toolkit.tools[0].invoke({"text": "The quick brown fox jumped over the lazy brown dog."})

In [0]:
from typing import Any

import mlflow
from langchain_core.tools import tool
from sklearn.feature_extraction.text import TfidfVectorizer

documents = parsed_docs_df
doc_vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = doc_vectorizer.fit_transform(documents["content"])


@tool
@mlflow.trace(name="LittleIndex", span_type=mlflow.entities.SpanType.RETRIEVER)
def find_relevant_documents(query: str, top_n: int = 5) -> list[dict[str, Any]]:
    """gets relevant documents for the query"""
    query_tfidf = doc_vectorizer.transform([query])
    similarities = (tfidf_matrix @ query_tfidf.T).toarray().flatten()
    ranked_docs = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)

    result = []
    for idx, score in ranked_docs[:top_n]:
        row = documents.iloc[idx]
        content = row["content"]
        doc_entry = {
            "page_content": content,
            "metadata": {
                "doc_uri": row["doc_uri"],
                "score": score,
            },
        }
        result.append(doc_entry)
    return result


In [0]:
from typing import Optional, Sequence, Union

from langchain_core.language_models import LanguageModelLike
from langchain_core.runnables import RunnableConfig, RunnableLambda
from langchain_core.tools import BaseTool
from langgraph.graph import END, StateGraph
from langgraph.graph.graph import CompiledGraph
from langgraph.prebuilt.tool_node import ToolNode
from mlflow.langchain.chat_agent_langgraph import ChatAgentState, ChatAgentToolNode


def create_tool_calling_agent(
    model: LanguageModelLike,
    tools: Union[ToolNode, Sequence[BaseTool]],
    agent_prompt: Optional[str] = None,
) -> CompiledGraph:
    model = model.bind_tools(tools)

    def routing_logic(state: ChatAgentState):
        last_message = state["messages"][-1]
        if last_message.get("tool_calls"):
            return "continue"
        else:
            return "end"

    if agent_prompt:
        system_message = {"role": "system", "content": agent_prompt}
        preprocessor = RunnableLambda(
            lambda state: [system_message] + state["messages"]
        )
    else:
        preprocessor = RunnableLambda(lambda state: state["messages"])
    model_runnable = preprocessor | model

    def call_model(
        state: ChatAgentState,
        config: RunnableConfig,
    ):
        response = model_runnable.invoke(state, config)

        return {"messages": [response]}

    workflow = StateGraph(ChatAgentState)

    workflow.add_node("agent", RunnableLambda(call_model))
    workflow.add_node("tools", ChatAgentToolNode(tools))

    workflow.set_entry_point("agent")
    workflow.add_conditional_edges(
        "agent",
        routing_logic,
        {
            "continue": "tools",
            "end": END,
        },
    )
    workflow.add_edge("tools", "agent")

    return workflow.compile()

In [0]:
import mlflow

mlflow.langchain.autolog()

agent = create_tool_calling_agent(llm, tools=[*uc_toolkit.tools, find_relevant_documents])
agent.invoke({"messages": [{"role": "user", "content":"How can I create a Delta Live Tables pipeline that processes CDC events using apply_changes() in Python, including handling out-of-order data?"}]})

In [0]:
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import (
    ChatAgentChunk,
    ChatAgentMessage,
    ChatAgentResponse,
    ChatContext,
)
from typing import Any, Optional

class DocsAgent(ChatAgent):
  def __init__(self, agent):
    self.agent = agent

  def predict(
      self,
      messages: list[ChatAgentMessage],
      context: Optional[ChatContext] = None,
      custom_inputs: Optional[dict[str, Any]] = None,
  ) -> ChatAgentResponse:
      # ChatAgent has a built-in helper method to help convert framework-specific messages, like langchain BaseMessage to a python dictionary
      request = {"messages": self._convert_messages_to_dict(messages)}

      output = agent.invoke(request)
      # Here 'output' is already a ChatAgentResponse, but to make the ChatAgent signature explicit for this demonstration we are returning a new instance
      return ChatAgentResponse(**output)

In [0]:
AGENT = DocsAgent(agent=agent)
AGENT.predict({"messages": [{"role": "user", "content": "How can I create a Delta Live Tables pipeline that processes CDC events using apply_changes() in Python, including handling out-of-order data??"}]})

In [0]:
from mlflow.models import ModelConfig

baseline_config = {
   "endpoint_name": "databricks-meta-llama-3-3-70b-instruct",
   "temperature": 0.01,
   "max_tokens": 1000,
   "system_prompt": """You are a helpful assistant that answers questions about Databricks. Questions unrelated to Databricks are irrelevant.

    You answer questions using a set of tools. If needed, you ask the user follow-up questions to clarify their request.
    """,
   "tool_list": [f"{catalog}.{schema}.*"],
}


class DocsAgent(ChatAgent):
    def __init__(self):
        self.config = ModelConfig(development_config=baseline_config)
        self.agent = self._build_agent_from_config()

    def _build_agent_from_config(self):
        temperature = self.config.get("temperature")
        max_tokens = self.config.get("max_tokens")
        system_prompt = self.config.get("system_prompt")
        llm_endpoint_name = self.config.get("endpoint_name")
        tool_list = self.config.get("tool_list")

        llm = ChatDatabricks(endpoint=llm_endpoint_name, temperature=temperature, max_tokens=max_tokens)
        toolkit = UCFunctionToolkit(function_names=tool_list)
        agent = create_tool_calling_agent(llm, tools=[*toolkit.tools, find_relevant_documents], agent_prompt=system_prompt)

        return agent
    
    def predict(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> ChatAgentResponse:
        # ChatAgent has a built-in helper method to help convert framework-specific messages, like langchain BaseMessage to a python dictionary
        request = {"messages": self._convert_messages_to_dict(messages)}

        output = self.agent.invoke(request)
        # Here 'output' is already a ChatAgentResponse, but to make the ChatAgent signature explicit for this demonstration we are returning a new instance
        return ChatAgentResponse(**output)

agent = DocsAgent()
agent.predict({"messages": [{"role": "user", "content": "What is DLT"}]})

In [0]:
response = agent.predict({"messages": [{"role": "user", "content": "How can I create a Delta Live Tables pipeline that processes CDC events using apply_changes() in Python, including handling out-of-order data?"}]})


In [0]:
response

In [0]:
%%writefile getting_started_agent.py
from typing import Any, Optional, Sequence, Union

import mlflow
import pandas as pd
from databricks_langchain import ChatDatabricks
from databricks_langchain.uc_ai import (
    DatabricksFunctionClient,
    UCFunctionToolkit,
    set_uc_function_client,
)
from langchain_core.language_models import LanguageModelLike
from langchain_core.runnables import RunnableConfig, RunnableLambda
from langchain_core.tools import BaseTool, tool
from langgraph.graph import END, StateGraph
from langgraph.graph.graph import CompiledGraph
from langgraph.prebuilt.tool_node import ToolNode
from mlflow.langchain.chat_agent_langgraph import ChatAgentState, ChatAgentToolNode
from mlflow.models import ModelConfig
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import (
    ChatAgentMessage,
    ChatAgentResponse,
    ChatContext,
)
from sklearn.feature_extraction.text import TfidfVectorizer

databricks_docs_url = "https://raw.githubusercontent.com/databricks/genai-cookbook/refs/heads/main/quick_start_demo/chunked_databricks_docs_filtered.jsonl"
parsed_docs_df = pd.read_json(databricks_docs_url, lines=True)

documents = parsed_docs_df
doc_vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = doc_vectorizer.fit_transform(documents["content"])


@tool
@mlflow.trace(name="LittleIndex", span_type=mlflow.entities.SpanType.RETRIEVER)
def find_relevant_documents(query: str, top_n: int = 5) -> list[dict[str, Any]]:
    """gets relevant documents for the query"""
    query_tfidf = doc_vectorizer.transform([query])
    similarities = (tfidf_matrix @ query_tfidf.T).toarray().flatten()
    ranked_docs = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)

    result = []
    for idx, score in ranked_docs[:top_n]:
        row = documents.iloc[idx]
        content = row["content"]
        doc_entry = {
            "page_content": content,
            "metadata": {
                "doc_uri": row["doc_uri"],
                "score": score,
            },
        }
        result.append(doc_entry)
    return result


def create_tool_calling_agent(
    model: LanguageModelLike,
    tools: Union[ToolNode, Sequence[BaseTool]],
    agent_prompt: Optional[str] = None,
) -> CompiledGraph:
    model = model.bind_tools(tools)

    def routing_logic(state: ChatAgentState):
        last_message = state["messages"][-1]
        if last_message.get("tool_calls"):
            return "continue"
        else:
            return "end"

    if agent_prompt:
        system_message = {"role": "system", "content": agent_prompt}
        preprocessor = RunnableLambda(
            lambda state: [system_message] + state["messages"]
        )
    else:
        preprocessor = RunnableLambda(lambda state: state["messages"])
    model_runnable = preprocessor | model

    def call_model(
        state: ChatAgentState,
        config: RunnableConfig,
    ):
        response = model_runnable.invoke(state, config)

        return {"messages": [response]}

    workflow = StateGraph(ChatAgentState)

    workflow.add_node("agent", RunnableLambda(call_model))
    workflow.add_node("tools", ChatAgentToolNode(tools))

    workflow.set_entry_point("agent")
    workflow.add_conditional_edges(
        "agent",
        routing_logic,
        {
            "continue": "tools",
            "end": END,
        },
    )
    workflow.add_edge("tools", "agent")

    return workflow.compile()


class DocsAgent(ChatAgent):
    def __init__(self, config, tools):
        # Load config
        # When this agent is deployed to Model Serving, the configuration loaded here is replaced with the config passed to mlflow.pyfunc.log_model(model_config=...)
        self.config = ModelConfig(development_config=config)
        self.tools = tools
        self.agent = self._build_agent_from_config()

    def _build_agent_from_config(self):
        llm = ChatDatabricks(
            endpoint=self.config.get("endpoint_name"),
            temperature=self.config.get("temperature"),
            max_tokens=self.config.get("max_tokens"),
        )
        agent = create_tool_calling_agent(
            llm,
            tools=self.tools,
            agent_prompt=self.config.get("system_prompt"),
        )
        return agent

    def predict(
        self,
        messages: list[ChatAgentMessage],
        context: Optional[ChatContext] = None,
        custom_inputs: Optional[dict[str, Any]] = None,
    ) -> ChatAgentResponse:
        # ChatAgent has a built-in helper method to help convert framework-specific messages, like langchain BaseMessage to a python dictionary
        request = {"messages": self._convert_messages_to_dict(messages)}

        output = self.agent.invoke(request)
        # Here 'output' is already a ChatAgentResponse, but to make the ChatAgent signature explicit for this demonstration we are returning a new instance
        return ChatAgentResponse(**output)
    

# TODO fill in your catalog and schema name
catalog = "agentic_ai"
schema = "databricks"

# TODO: Replace with your model serving endpoint
LLM_ENDPOINT = "databricks-meta-llama-3-3-70b-instruct"

baseline_config = {
    "endpoint_name": LLM_ENDPOINT,
    "temperature": 0.01,
    "max_tokens": 1000,
    "system_prompt": """You are a helpful assistant that answers questions about Databricks. Questions unrelated to Databricks are irrelevant.

    You answer questions using a set of tools. If needed, you ask the user follow-up questions to clarify their request.
    """,
}

tools = [find_relevant_documents]
uc_client = DatabricksFunctionClient()
set_uc_function_client(uc_client)
uc_toolkit = UCFunctionToolkit(function_names=[f"{catalog}.{schema}.*"])
tools.extend(uc_toolkit.tools)


AGENT = DocsAgent(baseline_config, tools)
mlflow.models.set_model(AGENT)

In [0]:
dbutils.library.restartPython()

In [0]:
from getting_started_agent import AGENT

AGENT.predict({"messages": [{"role": "user", "content": "How can I create a Delta Live Tables pipeline that processes CDC events using apply_changes() in Python, including handling out-of-order data?"}]})

In [0]:
import mlflow
from mlflow.tracking import MlflowClient

# Enable tracing at the start of your notebook/script
mlflow.langchain.autolog()

# Alternative: Set tracing environment variable
import os
os.environ["MLFLOW_ENABLE_TRACING"] = "true"

In [0]:
import mlflow
from getting_started_agent import LLM_ENDPOINT, baseline_config, tools
from mlflow.models.resources import DatabricksFunction, DatabricksServingEndpoint
from unitycatalog.ai.langchain.toolkit import UnityCatalogTool

resources = [DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT)]
for tool in tools:
    if isinstance(tool, UnityCatalogTool):
        resources.append(DatabricksFunction(function_name=tool.uc_function_name))

with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        python_model="getting_started_agent.py",
        artifact_path="agent",
        model_config=baseline_config,
        resources=resources,
        pip_requirements=[
            "mlflow",
            "langchain",
            "langgraph==0.3.4",
            "databricks-langchain",
            "unitycatalog-langchain[databricks]",
            "pydantic",
        ],
        input_example={
            "messages": [{"role": "user", "content": "How can I create a Delta Live Tables pipeline that processes CDC events using apply_changes() in Python, including handling out-of-order data?"}]
        },
    )

In [0]:
import pandas as pd
from databricks.agents.evals import generate_evals_df

agent_description = """
The agent is a RAG chatbot that answers questions about Databricks. Questions unrelated to Databricks are irrelevant.
"""
question_guidelines = """
# User personas
- A developer who is new to the Databricks platform
- An experienced, highly technical Data Scientist or Data Engineer


# Example questions
- what API lets me parallelize operations over rows of a delta table?
- Which cluster settings will give me the best performance when using Spark?


# Additional Guidelines
- Questions should be succinct, and human-like
"""


databricks_docs_url = "https://raw.githubusercontent.com/databricks/genai-cookbook/refs/heads/main/quick_start_demo/chunked_databricks_docs_filtered.jsonl"
parsed_docs_df = pd.read_json(databricks_docs_url, lines=True)


num_evals = 25
evals = generate_evals_df(
    docs=parsed_docs_df[
        :500
    ],  # Pass your docs. They should be in a Pandas or Spark DataFrame with columns `content STRING` and `doc_uri STRING`.
    num_evals=num_evals,  # How many synthetic evaluations to generate
    agent_description=agent_description,
    question_guidelines=question_guidelines,
)
display(evals)

In [0]:
from databricks.agents.evals import metric
from getting_started_agent import catalog, schema
@metric
def uses_keywords_and_retriever(request, trace):
  retriever_spans = trace.search_spans(span_type='RETRIEVER')
  keyword_tool_spans = trace.search_spans(name=f"{catalog}__{schema}__tfidf_keywords")
  return len(keyword_tool_spans) > 0 and len(retriever_spans) > 0

In [0]:
with mlflow.start_run(run_name="my_agent"):
  eval_results = mlflow.evaluate(
      data=evals,  # Your evaluation set
      model=model_info.model_uri,  # Logged agent from above
      model_type="databricks-agent",  # activate Mosaic AI Agent Evaluation,
      extra_metrics=[uses_keywords_and_retriever]
  )

In [0]:
import mlflow
from databricks import agents

# Connect to the Unity catalog model registry
mlflow.set_registry_uri("databricks-uc")


# TODO: define the catalog and schema for your UC model
catalog = "agentic_ai"
schema = "databricks"
assert (catalog and schema)
UC_MODEL_NAME = f"agentic_ai.databricks.getting_started_agent"


# Register to Unity catalog
uc_registered_model_info = mlflow.register_model(
    model_uri=model_info.model_uri, name=UC_MODEL_NAME
)


In [0]:
# Deploy to enable the review app and create an API endpoint
deployment_info = agents.deploy(UC_MODEL_NAME, uc_registered_model_info.version, deploy_feedback_model=False)