# AAI 594 — Assignment 3

## Building Agent Tools

**In this lab you will:**
- **Required (Sections 1–5):** Create **Unity Catalog function tools** — one SQL function and one Python function — and test them.
- **Required (Section 6):** Set up **Vector Search** on the UltraFeedback dataset so an agent can find similar instructions by meaning.
- **Required (Section 7):** Configure an **external MCP server** (You.com web search) in Cursor so your agent can access live web information.
- **Optional, strongly encouraged (Section 8):** Create an **Agent Skill** (`SKILL.md`) that documents the tools you built.

### The big picture

Over Weeks 3–5 you are building an **UltraFeedback Expert** agent — an AI assistant that helps users explore and understand LLM preference data. This week you create the **tools**; next week you wire them into a working agent; in Week 5 you evaluate how well it performs.

| Week | What you do | Deliverable |
|------|------------|-------------|
| 3 (this week) | Build tools: UC functions, Vector Search, MCP | Tested tools + MCP config |
| 4 | Wire tools into an agent; register a prompt; compare LLMs | Working agent |
| 5 | Evaluate the agent with judges and an eval dataset | Evaluation report |

**Readings this week:**
- [Practical Guide for Agentic AI Workflows](https://arxiv.org/pdf/2512.08769)
- [MCP Architecture](https://modelcontextprotocol.io/docs/learn/architecture)

**Key docs:**
- [Create AI agent tools with UC functions](https://docs.databricks.com/aws/en/generative-ai/agent-framework/create-custom-tool)
- [Vector Search: Create endpoints and indexes](https://docs.databricks.com/aws/en/vector-search/create-vector-search)
- [You.com MCP Server](https://docs.you.com/developer-resources/mcp-server)

---
## 1. Why agents need tools *(Required)*

An LLM on its own can only generate text. **Tools** give agents the ability to *act* — query databases, search the web, look up facts, run computations. In this assignment you'll create three kinds of tools:

| Tool type | What it does | Example |
|-----------|-------------|--------|
| **UC SQL function** | Deterministic lookup against structured data | "How many rows come from `evol_instruct`?" |
| **UC Python function** | Custom computation or text processing | "Analyze the complexity of this instruction" |
| **Vector Search** | Semantic similarity search over text | "Find instructions similar to *Explain quantum tunneling*" |
| **External MCP** | Access external services (web search, APIs) | "Search the web for recent LLM benchmarks" |

Each tool is registered in a place the agent can discover it — Unity Catalog for functions and Vector Search, MCP for external services.

---
## 2. Install dependencies *(Required)*

We need two packages:
- `unitycatalog-ai[databricks]` — the Unity Catalog AI client for creating and testing UC functions as agent tools.
- `databricks-vectorsearch` — the Vector Search SDK for creating endpoints and indexes.

**Docs:** [Unity Catalog AI](https://docs.unitycatalog.io/ai/) · [Vector Search SDK](https://api-docs.databricks.com/python/vector-search/index.html)

In [None]:
# Install the UC AI client (for creating/testing UC functions as tools)
# and the Vector Search SDK (for creating endpoints and indexes)
%pip install unitycatalog-ai[databricks] databricks-vectorsearch
dbutils.library.restartPython()

---
## 3. Verify your data *(Required)*

Confirm the UltraFeedback table from Assignment 1 is still available. If you get an error, re-run Assignment 1 first.

In [None]:
# Quick check: confirm the table exists, show schema and row count
df = spark.table("main.default.assignment_file")
print(f"Row count: {df.count():,}")
print(f"Columns:  {df.columns}")
df.printSchema()
display(df.limit(3))

---
## 4. Create Unity Catalog function tools *(Required)*

Unity Catalog functions are UDFs registered in `catalog.schema.function_name`. When an agent needs a tool, it calls the function by name. Two patterns are common:

1. **SQL functions** — best for deterministic lookups against tables (e.g., counts, filters, joins).
2. **Python functions** — best for custom logic, text processing, or computations that don't map cleanly to SQL.

You'll create one of each, then build your own.

**Docs:** [Create AI agent tools with UC functions](https://docs.databricks.com/aws/en/generative-ai/agent-framework/create-custom-tool) · [CREATE FUNCTION syntax](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function)

### 4.1 SQL function: `lookup_source_info`

This function takes a source name (like `evol_instruct` or `sharegpt`) and returns the row count plus a sample instruction. An agent can call this to understand what kind of data each source contains.

**Key points:**
- The `COMMENT` on the function and its parameters helps the agent understand *when* and *how* to use the tool. Write clear, descriptive comments.
- The function returns a `STRING` — this is the simplest return type for agent tool calling.

In [None]:
%%sql
-- Create a SQL UC function that looks up source information.
-- The COMMENT fields are critical: they tell the agent what this tool does.
CREATE OR REPLACE FUNCTION main.default.lookup_source_info(
  source_name STRING COMMENT 'Name of the data source to look up (e.g., evol_instruct, sharegpt, ultrachat, flan_v2).'
)
RETURNS STRING
COMMENT 'Returns the row count and a sample instruction for a given source in the UltraFeedback dataset. Use this to understand what kind of data each source contains and how much is available.'
RETURN
  SELECT CONCAT(
    'Source: ', source_name,
    ' | Row count: ', CAST(COUNT(*) AS STRING),
    ' | Sample instruction: ', COALESCE(FIRST(instruction), 'N/A')
  )
  FROM main.default.assignment_file
  WHERE source = source_name;

In [None]:
%%sql
-- Test the function with a known source name.
-- Try different sources: evol_instruct, sharegpt, ultrachat, flan_v2, false_qa, etc.
SELECT main.default.lookup_source_info('evol_instruct') AS result;

### 4.2 Python function: `analyze_instruction`

This function takes an instruction text and returns complexity metrics (word count, sentence count, estimated complexity level). An agent could use this to assess how complex a prompt is before deciding how to handle it.

**Key points:**
- Python UC functions must have **type hints** on all arguments and the return value.
- **Imports go inside the function body** — they won't be resolved otherwise.
- Use [Google-style docstrings](https://google.github.io/styleguide/pyguide.html#383-functions-and-methods) so the agent can parse the description.

In [None]:
from unitycatalog.ai.core.databricks import DatabricksFunctionClient

# Initialize the Databricks Function Client
uc_client = DatabricksFunctionClient()

# Define the Python function with type hints and a clear docstring.
# NOTE: all imports must be INSIDE the function body.
def analyze_instruction(instruction: str) -> str:
    """
    Analyzes the complexity and characteristics of an instruction prompt.

    Returns word count, sentence count, average word length, estimated
    complexity level (low/medium/high), and whether the text is a question.
    Use this to assess instruction difficulty before generating a response.

    Args:
        instruction: The instruction or prompt text to analyze.

    Returns:
        A JSON string with analysis metrics.
    """
    import json
    import re

    words = instruction.split()
    word_count = len(words)
    sentences = [s.strip() for s in re.split(r'[.!?]+', instruction) if s.strip()]
    sentence_count = len(sentences)
    avg_word_length = round(sum(len(w) for w in words) / max(word_count, 1), 1)
    is_question = instruction.strip().endswith('?')

    if word_count > 50 or sentence_count > 3:
        complexity = "high"
    elif word_count > 20:
        complexity = "medium"
    else:
        complexity = "low"

    return json.dumps({
        "word_count": word_count,
        "sentence_count": sentence_count,
        "avg_word_length": avg_word_length,
        "complexity": complexity,
        "is_question": is_question
    })

# Register the function in Unity Catalog (main.default schema)
function_info = uc_client.create_python_function(
    func=analyze_instruction,
    catalog="main",
    schema="default",
    replace=True  # overwrite if it already exists
)
print(f"Registered: {function_info.full_name}")

In [None]:
# Test the Python function through the UC client
result = uc_client.execute_function(
    function_name="main.default.analyze_instruction",
    parameters={"instruction": "Explain the process of photosynthesis in detail, including the light-dependent and light-independent reactions."}
)
print(result.value)

# Try a simpler instruction for comparison
result2 = uc_client.execute_function(
    function_name="main.default.analyze_instruction",
    parameters={"instruction": "What is 2 + 2?"}
)
print(result2.value)

### 4.3 Your turn: create a function *(Required)*

Create **one additional UC function** (SQL or Python) that would be useful for the UltraFeedback Expert agent. Some ideas:

| Idea | Type | What it does |
|------|------|--------------|
| `count_model_appearances` | SQL | Count how often a model appears as chosen vs. rejected |
| `get_sample_pairs` | SQL | Return N example chosen/rejected pairs for a given source |
| `format_comparison` | Python | Take a chosen and rejected response and format them side-by-side |
| `extract_keywords` | Python | Pull key terms from an instruction for categorization |

Make sure your function has:
- A clear `COMMENT` (SQL) or docstring (Python) explaining what it does and when to use it
- Type hints (Python) or typed parameters (SQL)
- A test cell showing it works

In [None]:
# CREATE YOUR FUNCTION HERE
# Use either %%sql for a SQL function or Python with uc_client.create_python_function()
# Then add a test cell below to verify it works.


In [None]:
# TEST YOUR FUNCTION HERE


---
## 5. List your registered tools

Before moving on, verify all your UC functions are registered. The cell below lists functions in `main.default`.

In [None]:
%%sql
-- List all functions you've created in main.default
SHOW USER FUNCTIONS IN main.default;

---
## 6. Vector Search *(Required)*

Vector Search lets an agent find **semantically similar** text — not just exact keyword matches. For the UltraFeedback Expert, this means an agent can find instructions similar to a user's question, even if the wording is different.

**How it works:**
1. You create a **Vector Search endpoint** (the compute that serves queries).
2. You create a **Delta Sync Index** on a source table — Databricks automatically computes embeddings and keeps the index in sync.
3. The agent queries the index with natural language and gets back similar rows.

**Free Edition limits:** One Vector Search endpoint, one unit. No Direct Vector Access. This is enough for our purposes.

**Docs:** [Create Vector Search endpoints and indexes](https://docs.databricks.com/aws/en/vector-search/create-vector-search) · [Vector Search SDK reference](https://api-docs.databricks.com/python/vector-search/databricks.vector_search.html)

> **Important:** Vector Search endpoints consume resources even when idle. You will **delete the endpoint at the end of this section**. You can recreate it in Assignment 4 when you build the agent.

### 6.1 Prepare a source table

Vector Search requires:
- A **primary key** column (unique identifier for each row).
- A **text column** to embed (we'll use the `instruction` column).
- **Change Data Feed** enabled on the Delta table (required for Delta Sync Index).

We'll create a focused table with 1,000 unique instructions — this keeps indexing fast and stays within Free Edition quotas.

In [None]:
from pyspark.sql.functions import monotonically_increasing_id

# Create a focused table for Vector Search:
# - Unique instructions only (deduplicated)
# - Limited to 1,000 rows (keeps indexing fast on Free Edition)
# - Includes an ID column for the primary key
vs_source = (
    spark.table("main.default.assignment_file")
    .select("source", "instruction")
    .dropDuplicates(["instruction"])
    .limit(1000)
    .withColumn("id", monotonically_increasing_id())
)

# Write as a new Delta table with Change Data Feed enabled
vs_source.write.format("delta") \
    .option("delta.enableChangeDataFeed", "true") \
    .mode("overwrite") \
    .saveAsTable("main.default.ultrafeedback_vs_source")

# Verify
print(f"VS source table rows: {spark.table('main.default.ultrafeedback_vs_source').count()}")
display(spark.table("main.default.ultrafeedback_vs_source").limit(3))

### 6.2 Create a Vector Search endpoint

The endpoint is the compute resource that serves similarity queries. On Free Edition you get one endpoint with one unit.

In [None]:
from databricks.vector_search.client import VectorSearchClient

# Initialize the Vector Search client (auto-detects notebook credentials)
vs_client = VectorSearchClient()

VS_ENDPOINT_NAME = "aai594_vs_endpoint"

# Create the endpoint (this may take 1-2 minutes)
try:
    vs_client.create_endpoint_and_wait(
        name=VS_ENDPOINT_NAME,
        endpoint_type="STANDARD"
    )
    print(f"Endpoint '{VS_ENDPOINT_NAME}' is ready.")
except Exception as e:
    # If the endpoint already exists, that's fine
    if "already exists" in str(e).lower():
        print(f"Endpoint '{VS_ENDPOINT_NAME}' already exists — reusing it.")
    else:
        raise e

### 6.3 Create a Delta Sync Index

The index tells Vector Search which table to embed and how. We use **managed embeddings** — Databricks automatically computes embeddings using a Foundation Model API endpoint (`databricks-gte-large-en`).

- `pipeline_type="TRIGGERED"` means the index syncs when you explicitly ask it to (not continuously). This saves resources.
- `embedding_source_column="instruction"` — the text column to embed.

> **Note:** If `databricks-gte-large-en` is not available in your workspace, check which embedding endpoints are available under **Serving** in the left sidebar and substitute the endpoint name below.

In [None]:
VS_INDEX_NAME = "main.default.ultrafeedback_vs_index"

# Create a Delta Sync Index with managed embeddings
try:
    index = vs_client.create_delta_sync_index_and_wait(
        endpoint_name=VS_ENDPOINT_NAME,
        source_table_name="main.default.ultrafeedback_vs_source",
        index_name=VS_INDEX_NAME,
        pipeline_type="TRIGGERED",
        primary_key="id",
        embedding_source_column="instruction",
        embedding_model_endpoint_name="databricks-gte-large-en"
    )
    print(f"Index '{VS_INDEX_NAME}' created and synced.")
except Exception as e:
    if "already exists" in str(e).lower():
        print(f"Index '{VS_INDEX_NAME}' already exists — reusing it.")
        index = vs_client.get_index(
            endpoint_name=VS_ENDPOINT_NAME,
            index_name=VS_INDEX_NAME
        )
    else:
        raise e

In [None]:
# Check the index status — it should show as ONLINE after syncing
# If status is still PROVISIONING, wait a minute and re-run this cell.
index.describe()

### 6.4 Query the index

Now test a similarity search. The query text is embedded automatically and compared against the indexed instructions.

In [None]:
# Similarity search: find instructions related to a query
results = index.similarity_search(
    query_text="Explain how machine learning models are trained",
    columns=["id", "instruction", "source"],
    num_results=5
)

# Display results
for row in results.get("result", {}).get("data_array", []):
    print(f"Score: {row[-1]:.4f} | Source: {row[2]} | Instruction: {row[1][:100]}...")
    print()

In [None]:
# Try your own query — replace the text below
results2 = index.similarity_search(
    query_text="Write a Python function to sort a list",
    columns=["id", "instruction", "source"],
    num_results=3
)

for row in results2.get("result", {}).get("data_array", []):
    print(f"Score: {row[-1]:.4f} | {row[1][:120]}...")
    print()

### 6.5 Clean up — delete endpoint and index

**This step is critical.** Vector Search endpoints consume resources even when idle. On Free Edition you only get one, and leaving it running counts against your daily quota.

Delete the index first, then the endpoint. You'll recreate them in Assignment 4 when you build the full agent.

In [None]:
# Step 1: Delete the index
try:
    vs_client.delete_index(
        endpoint_name=VS_ENDPOINT_NAME,
        index_name=VS_INDEX_NAME
    )
    print(f"Index '{VS_INDEX_NAME}' deleted.")
except Exception as e:
    print(f"Index deletion note: {e}")

# Step 2: Delete the endpoint
try:
    vs_client.delete_endpoint(name=VS_ENDPOINT_NAME)
    print(f"Endpoint '{VS_ENDPOINT_NAME}' deleted.")
except Exception as e:
    print(f"Endpoint deletion note: {e}")

In [None]:
# Verify cleanup: this should return an empty list (or not include your endpoint)
vs_client.list_endpoints()

---
## 7. Configure an external MCP server *(Required)*

The **Model Context Protocol (MCP)** is an open standard that lets AI assistants connect to external tools and data sources. By adding an MCP server to Cursor, your agent gains access to live capabilities — in this case, **web search**.

You'll configure the **You.com MCP server**, which provides:
- `you-search` — web and news search with filtering
- `you-contents` — extract content from URLs in markdown format

This means your agent will be able to search the web for current information about LLMs, benchmarks, and research papers — something it can't do with just the UltraFeedback dataset.

**Docs:** [You.com MCP Server](https://docs.you.com/developer-resources/mcp-server) · [MCP in Cursor](https://cursor.com/docs/context/mcp) · [MCP Architecture](https://modelcontextprotocol.io/docs/learn/architecture)

### 7.1 Get a You.com API key

1. Go to [you.com/platform](https://you.com/platform).
2. Sign in or create an account.
3. Generate an API key and copy it. **Keep it safe — you'll need it in the next step.**

### 7.2 Add the MCP server to Cursor

Cursor reads MCP server configuration from a JSON file. You can configure it at the **project level** (only this project) or **globally** (all projects).

#### Option A: Project-level (recommended for this course)

Create or edit `.cursor/mcp.json` in your project root:

```json
{
  "mcpServers": {
    "ydc-server": {
      "url": "https://api.you.com/mcp",
      "headers": {
        "Authorization": "Bearer <YOUR-YOU-COM-API-KEY>"
      }
    }
  }
}
```

Replace `<YOUR-YOU-COM-API-KEY>` with your actual API key.

#### Option B: Global

Edit `~/.cursor/mcp.json` to make this available across all your Cursor projects.

#### One-click install

Alternatively, you can install directly from Cursor's MCP directory: visit the [You.com MCP page](https://docs.you.com/developer-resources/mcp-server) and click the **"Install MCP Server"** button for Cursor.

> **Tip:** After saving `mcp.json`, restart Cursor or reload the window (`Cmd+Shift+P` → "Reload Window"). You should see the You.com tools available in the Agent chat.

### 7.3 Test your MCP connection

Open Cursor's **Agent chat** (not the regular chat) and try a query that requires live web search:

- *"Search the web for the latest LLM benchmarks from 2025-2026."*
- *"What are the top open-source LLMs released in the last 6 months?"*

The agent should use the `you-search` tool to fetch live results. You'll see a tool-call indicator in the chat.

> **Take a screenshot** of the agent using the You.com MCP tool in Cursor. Include it in your submission as `screenshots/mcp_you_com.png`.

> **Troubleshooting:**
> - If the tools don't appear, check that `mcp.json` is valid JSON (no trailing commas, correct quoting).
> - Verify your API key is active at [you.com/platform](https://you.com/platform).
> - Try restarting Cursor after editing the config.
> - Go to Cursor Settings → Agents tab and turn off Cursor's built-in web search to avoid conflicts.

### 7.4 Alternative MCP servers

You.com is the recommended MCP for this assignment, but you're welcome to configure additional servers. Here are some useful options:

| MCP Server | What it provides | Get started |
|------------|-----------------|-------------|
| **You.com** (required) | Web search, news, content extraction | [you.com/platform](https://you.com/platform) |
| **Brave Search** | Privacy-focused web search | [brave.com/search/api](https://brave.com/search/api/) |
| **Tavily** | AI-optimized search for agents | [tavily.com](https://tavily.com/) |
| **GitHub** | Code search, issues, PRs | [github.com/github/github-mcp-server](https://github.com/github/github-mcp-server) |
| **Filesystem** | Read/write local files | Built into many MCP clients |

Each follows the same pattern: get an API key, add a server entry to `mcp.json`, restart Cursor. The [MCP Registry](https://registry.modelcontextprotocol.io/) has a full catalog of available servers.

---
## 8. Bonus: Create an Agent Skill *(Optional, strongly encouraged)*

An **Agent Skill** is a markdown document (`SKILL.md`) that gives an AI assistant domain knowledge. When a skill is loaded, the assistant knows how to use specific tools, follow procedures, and avoid common mistakes — without you having to explain everything in every prompt.

Think of it as a **user manual for your agent's tools**, written so another AI can follow it.

### Why this matters

You've just built several tools (UC functions, Vector Search, MCP). But an AI assistant doesn't automatically know *when* to use each one, *how* to call them, or *what to watch out for*. A skill bridges that gap.

### Create a skill

Create a file called `SKILL.md` in your `assignment_3/` folder with the following structure:

```markdown
---
name: ultrafeedback-expert
description: >
  Tools and knowledge for exploring the UltraFeedback LLM preference dataset.
  Activate when: user asks about LLM preferences, model comparisons, or
  instruction quality in the UltraFeedback dataset.
---

# UltraFeedback Expert

## When to Use This Skill

**Trigger patterns:**
- "UltraFeedback" or "preference data" or "chosen vs rejected"
- "Which model is preferred" or "model comparison"
- "Find similar instructions" or "semantic search"

## Available Tools

| Tool | Type | What it does |
|------|------|--------------|
| `main.default.lookup_source_info` | UC SQL | Returns row count and sample for a source |
| `main.default.analyze_instruction` | UC Python | Analyzes instruction complexity |
| `main.default.<your_function>` | UC SQL/Python | <your description> |
| Vector Search index | Databricks VS | Semantic search over 1K instructions |
| You.com MCP | External MCP | Live web search for current LLM info |

## Procedures

### Answering "What sources are in the dataset?"
1. Call `lookup_source_info` for each known source.
2. Summarize counts and sample instructions.

### Finding similar instructions
1. Query the Vector Search index with the user's text.
2. Return the top 3-5 matches with their sources.

## Gotchas
- Vector Search endpoint must be running (recreate if deleted).
- Column names with hyphens (e.g., `chosen-model`) need backtick escaping.
```

Fill in the details based on the actual tools you created. You can use this skill in Cursor by placing it in `~/.cursor/skills/` or referencing it in a project rule.

**Docs:** [Agent Skills standard](https://github.com/xnano-ai/agentskills) · [Cursor Rules](https://cursor.com/docs/context/rules)

---
## Lab complete

### Required (Sections 1–7)
- [ ] **Section 3:** Verified the UltraFeedback table exists.
- [ ] **Section 4:** Created and tested the SQL function (`lookup_source_info`) and Python function (`analyze_instruction`).
- [ ] **Section 4.3:** Created and tested your own UC function.
- [ ] **Section 5:** Listed all registered UC functions.
- [ ] **Section 6:** Created a Vector Search endpoint and index, queried it successfully, then **deleted both**.
- [ ] **Section 7:** Configured the You.com MCP server in Cursor and tested it (screenshot taken).

### Optional but strongly encouraged (Section 8)
- [ ] **Section 8:** Created a `SKILL.md` documenting your tools.

**Submit:** Your executed notebook (`.ipynb` with all outputs) and the completed `SUBMISSION_3.md`. Include screenshots in the `screenshots/` folder.

*Next week you'll wire these tools into a working agent, register a prompt, and compare different LLMs.*