Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 20 additions & 18 deletions docs/en/guides/51-ai-functions/01-external-functions.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@
---
title: External Functions for Custom AI/ML
title: Custom AI/ML with External Functions
---

# External Functions for Custom AI/ML
# Custom AI/ML with External Functions

For advanced AI/ML scenarios, Databend supports external functions that connect your data with custom AI/ML infrastructure written in languages like Python.
Build powerful AI/ML capabilities by connecting Databend with your own infrastructure. External functions let you deploy custom models, leverage GPU acceleration, and integrate with any ML framework while keeping your data secure.

| Feature | Description | Benefits |
|---------|-------------|----------|
| **Model Flexibility** | Use open-source models or your internal AI/ML infrastructure | • Freedom to choose any model<br/>• Leverage existing ML investments<br/>• Stay up-to-date with latest AI advancements |
| **GPU Acceleration** | Deploy external function servers on GPU-equipped machines | • Faster inference for deep learning models<br/>• Handle larger batch sizes<br/>• Support compute-intensive workloads |
| **Custom ML Models** | Deploy and use your own machine learning models | • Proprietary algorithms<br/>• Domain-specific models<br/>• Fine-tuned for your data |
| **Advanced AI Pipelines** | Build complex AI workflows with specialized libraries | • Multi-step processing<br/>• Custom transformations<br/>• Integration with ML frameworks |
| **Scalability** | Handle resource-intensive AI operations outside Databend | • Independent scaling<br/>• Optimized resource allocation<br/>• High-throughput processing |
## Key Capabilities

## Implementation Overview
| Feature | Benefits |
|---------|----------|
| **Custom Models** | Use any open-source or proprietary AI/ML models |
| **GPU Acceleration** | Deploy on GPU-equipped machines for faster inference |
| **Data Privacy** | Keep your data within your infrastructure |
| **Scalability** | Independent scaling and resource optimization |
| **Flexibility** | Support for any programming language and ML framework |

1. Create an external server with your AI/ML code (Python with [databend-udf](https://pypi.org/project/databend-udf))
2. Register the server with Databend using `CREATE FUNCTION`
3. Call your AI/ML functions directly in SQL queries
## How It Works

## Example: Custom AI Model Integration
1. **Create AI Server**: Build your AI/ML server using Python and [databend-udf](https://pypi.org/project/databend-udf)
2. **Register Function**: Connect your server to Databend with `CREATE FUNCTION`
3. **Use in SQL**: Call your custom AI functions directly in SQL queries

## Example: Text Embedding Function

```python
# Simple embedding UDF server demo
Expand Down Expand Up @@ -74,7 +76,7 @@ ORDER BY similarity ASC
LIMIT 5;
```

## Next Steps
## Learn More

1. **[External Functions Guide](/guides/query/external-function)** - Detailed setup instructions
2. **[Databend Cloud](https://databend.com)** - Try AI functions with a free trial
- **[External Functions Guide](/guides/query/external-function)** - Complete setup and deployment instructions
- **[Databend Cloud](https://databend.com)** - Try external functions with a free account
227 changes: 227 additions & 0 deletions docs/en/guides/51-ai-functions/02-mcp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# MCP Server for Databend

[mcp-databend](https://github.com/databendlabs/mcp-databend) is an MCP (Model Context Protocol) server that enables AI assistants to interact directly with your Databend database using natural language.

## What mcp-databend Can Do

- **execute_sql** - Execute SQL queries with timeout protection
- **show_databases** - List all available databases
- **show_tables** - List tables in a database (with optional filter)
- **describe_table** - Get detailed table schema information

## Build a ChatBI Tool

This tutorial shows you how to build a conversational Business Intelligence tool using mcp-databend and the Agno framework. You'll create a local agent that can answer data questions in natural language.

## Step-by-Step Tutorial

### Step 1: Setup Databend Connection

First, you need a Databend database to connect to:

1. **Sign up for [Databend Cloud](https://app.databend.com)** (free tier available)
2. **Create a warehouse and database**
3. **Get your connection string** from the console

For detailed DSN format and examples, see [Connection String Documentation](https://docs.databend.com/developer/drivers/#connection-string-dsn).

| Deployment | Connection String Example |
|------------|---------------------------|
| **Databend Cloud** | `databend://user:pwd@host:443/database?warehouse=wh` |
| **Self-hosted** | `databend://user:pwd@localhost:8000/database?sslmode=disable` |

### Step 2: Install Dependencies

Create a virtual environment and install the required packages:

```bash
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install packages
pip install packaging openai agno openrouter sqlalchemy fastapi mcp-databend
```

### Step 3: Create ChatBI Agent

Now create your ChatBI agent that uses mcp-databend to interact with your database.

Create a file `agent.py`:

```python
from contextlib import asynccontextmanager
import os
import logging
import sys

from agno.agent import Agent
from agno.playground import Playground
from agno.storage.sqlite import SqliteStorage
from agno.tools.mcp import MCPTools
from agno.models.openrouter import OpenRouter
from fastapi import FastAPI

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def check_env_vars():
"""Check required environment variables"""
required = {
"DATABEND_DSN": "https://docs.databend.com/developer/drivers/#connection-string-dsn",
"OPENROUTER_API_KEY": "https://openrouter.ai/settings/keys"
}

missing = [var for var in required if not os.getenv(var)]

if missing:
print("❌ Missing environment variables:")
for var in missing:
print(f" • {var}: {required[var]}")
print("\nExample: export DATABEND_DSN='...' OPENROUTER_API_KEY='...'")
sys.exit(1)

print("✅ Environment variables OK")

check_env_vars()

class DatabendTool:
def __init__(self):
self.mcp = None
self.dsn = os.getenv("DATABEND_DSN")

def create(self):
env = os.environ.copy()
env["DATABEND_DSN"] = self.dsn
self.mcp = MCPTools(
command="python -m mcp_databend",
env=env,
timeout_seconds=300
)
return self.mcp

async def init(self):
try:
await self.mcp.connect()
logger.info("✓ Connected to Databend")
return True
except Exception as e:
logger.error(f"✗ Databend connection failed: {e}")
return False

databend = DatabendTool()

agent = Agent(
name="ChatBI",
model=OpenRouter(
id=os.getenv("MODEL_ID", "anthropic/claude-sonnet-4"),
api_key=os.getenv("OPENROUTER_API_KEY")
),
tools=[],
instructions=[
"You are ChatBI - a Business Intelligence assistant for Databend.",
"Help users explore and analyze their data using natural language.",
"Always start by exploring available databases and tables.",
"Format query results in clear, readable tables.",
"Provide insights and explanations with your analysis."
],
storage=SqliteStorage(table_name="chatbi", db_file="chatbi.db"),
add_datetime_to_instructions=True,
add_history_to_messages=True,
num_history_responses=5,
markdown=True,
show_tool_calls=True,
)

@asynccontextmanager
async def lifespan(app: FastAPI):
tool = databend.create()
if not await databend.init():
logger.error("Failed to initialize Databend")
raise RuntimeError("Databend connection failed")

agent.tools.append(tool)
logger.info("ChatBI initialized successfully")

yield

if databend.mcp:
await databend.mcp.close()

playground = Playground(
agents=[agent],
name="ChatBI with Databend",
description="Business Intelligence Assistant powered by Databend"
)

app = playground.get_app(lifespan=lifespan)

if __name__ == "__main__":
print("🤖 Starting MCP Server for Databend")
print("Open http://localhost:7777 to start chatting!")
playground.serve(app="agent:app", host="127.0.0.1", port=7777)
```

### Step 4: Configure Environment

Set up your API keys and database connection:

```bash
# Set your OpenRouter API key
export OPENROUTER_API_KEY="your-openrouter-key"

# Set your Databend connection string
export DATABEND_DSN="your-databend-connection-string"
```

### Step 5: Start Your ChatBI Agent

Run your agent to start the local server:

```bash
python agent.py
```

You should see:
```
✅ Environment variables OK
🤖 Starting MCP Server for Databend
Open http://localhost:7777 to start chatting!
INFO Starting playground on http://127.0.0.1:7777
INFO: Started server process [189851]
INFO: Waiting for application startup.
INFO:agent:✓ Connected to Databend
INFO:agent:ChatBI initialized successfully
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:7777 (Press CTRL+C to quit)
```

### Step 6: Setup Web Interface

For a better user experience, you can set up Agno's web interface:

```bash
# Create the Agent UI
npx create-agent-ui@latest

# Enter 'y' when prompted, then run:
cd agent-ui && npm run dev
```

**Connect to Your Agent:**
1. Open [http://localhost:3000](http://localhost:3000)
2. Select "localhost:7777" as your endpoint
3. Start asking questions about your data!

**Try These Queries:**
- "Show me all databases"
- "What tables do I have?"
- "Describe the structure of my tables"
- "Run a query to show sample data"

## Resources

- **GitHub Repository**: [databendlabs/mcp-databend](https://github.com/databendlabs/mcp-databend)
- **PyPI Package**: [mcp-databend](https://pypi.org/project/mcp-databend)
- **Agno Framework**: [Agno MCP](https://docs.agno.com/tools/mcp/mcp)
- **Agent UI**: [Agent UI](https://docs.agno.com/agent-ui/introduction)
16 changes: 14 additions & 2 deletions docs/en/guides/51-ai-functions/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AI Functions in Databend
# AI & ML Integration

Databend provides AI and ML capabilities through external functions, allowing you to integrate custom AI models and leverage advanced ML infrastructure while maintaining data privacy and control.
Databend enables powerful AI and ML capabilities through two complementary approaches: build custom AI functions with your own infrastructure, or create conversational data experiences using natural language.

## External Functions - The Recommended Approach

Expand All @@ -14,6 +14,18 @@ External functions enable you to connect your data with custom AI/ML infrastruct
| **Scalability** | Independent scaling and resource optimization |
| **Flexibility** | Support for any programming language and ML framework |

## MCP Server - Natural Language Data Interaction

The Model Context Protocol (MCP) server enables AI assistants to interact with your Databend database using natural language, perfect for building conversational BI tools.

| Feature | Benefits |
|---------|----------|
| **Natural Language** | Query your data using plain English |
| **AI Assistant Integration** | Works with Claude, ChatGPT, and custom agents |
| **Real-time Analysis** | Get instant insights from your data |

## Getting Started

**[External Functions Guide](01-external-functions.md)** - Learn how to create and deploy custom AI functions with practical examples and implementation guidance

**[MCP Server Guide](02-mcp.md)** - Build a conversational BI tool using mcp-databend and natural language queries
Loading