diff --git a/docs/en/guides/51-ai-functions/01-external-functions.md b/docs/en/guides/51-ai-functions/01-external-functions.md index 9c717343b4..7a8218ed26 100644 --- a/docs/en/guides/51-ai-functions/01-external-functions.md +++ b/docs/en/guides/51-ai-functions/01-external-functions.md @@ -1,26 +1,28 @@ --- -title: External Functions for Custom AI/ML +title: Custom AI/ML with External Functions --- -# External Functions for Custom AI/ML +# Custom AI/ML with External Functions -For advanced AI/ML scenarios, Databend supports external functions that connect your data with custom AI/ML infrastructure written in languages like Python. +Build powerful AI/ML capabilities by connecting Databend with your own infrastructure. External functions let you deploy custom models, leverage GPU acceleration, and integrate with any ML framework while keeping your data secure. -| Feature | Description | Benefits | -|---------|-------------|----------| -| **Model Flexibility** | Use open-source models or your internal AI/ML infrastructure | • Freedom to choose any model
• Leverage existing ML investments
• Stay up-to-date with latest AI advancements | -| **GPU Acceleration** | Deploy external function servers on GPU-equipped machines | • Faster inference for deep learning models
• Handle larger batch sizes
• Support compute-intensive workloads | -| **Custom ML Models** | Deploy and use your own machine learning models | • Proprietary algorithms
• Domain-specific models
• Fine-tuned for your data | -| **Advanced AI Pipelines** | Build complex AI workflows with specialized libraries | • Multi-step processing
• Custom transformations
• Integration with ML frameworks | -| **Scalability** | Handle resource-intensive AI operations outside Databend | • Independent scaling
• Optimized resource allocation
• High-throughput processing | +## Key Capabilities -## Implementation Overview +| Feature | Benefits | +|---------|----------| +| **Custom Models** | Use any open-source or proprietary AI/ML models | +| **GPU Acceleration** | Deploy on GPU-equipped machines for faster inference | +| **Data Privacy** | Keep your data within your infrastructure | +| **Scalability** | Independent scaling and resource optimization | +| **Flexibility** | Support for any programming language and ML framework | -1. Create an external server with your AI/ML code (Python with [databend-udf](https://pypi.org/project/databend-udf)) -2. Register the server with Databend using `CREATE FUNCTION` -3. Call your AI/ML functions directly in SQL queries +## How It Works -## Example: Custom AI Model Integration +1. **Create AI Server**: Build your AI/ML server using Python and [databend-udf](https://pypi.org/project/databend-udf) +2. **Register Function**: Connect your server to Databend with `CREATE FUNCTION` +3. **Use in SQL**: Call your custom AI functions directly in SQL queries + +## Example: Text Embedding Function ```python # Simple embedding UDF server demo @@ -74,7 +76,7 @@ ORDER BY similarity ASC LIMIT 5; ``` -## Next Steps +## Learn More -1. **[External Functions Guide](/guides/query/external-function)** - Detailed setup instructions -2. **[Databend Cloud](https://databend.com)** - Try AI functions with a free trial \ No newline at end of file +- **[External Functions Guide](/guides/query/external-function)** - Complete setup and deployment instructions +- **[Databend Cloud](https://databend.com)** - Try external functions with a free account \ No newline at end of file diff --git a/docs/en/guides/51-ai-functions/02-mcp.md b/docs/en/guides/51-ai-functions/02-mcp.md new file mode 100644 index 0000000000..463878a7f3 --- /dev/null +++ b/docs/en/guides/51-ai-functions/02-mcp.md @@ -0,0 +1,227 @@ +# MCP Server for Databend + +[mcp-databend](https://github.com/databendlabs/mcp-databend) is an MCP (Model Context Protocol) server that enables AI assistants to interact directly with your Databend database using natural language. + +## What mcp-databend Can Do + +- **execute_sql** - Execute SQL queries with timeout protection +- **show_databases** - List all available databases +- **show_tables** - List tables in a database (with optional filter) +- **describe_table** - Get detailed table schema information + +## Build a ChatBI Tool + +This tutorial shows you how to build a conversational Business Intelligence tool using mcp-databend and the Agno framework. You'll create a local agent that can answer data questions in natural language. + +## Step-by-Step Tutorial + +### Step 1: Setup Databend Connection + +First, you need a Databend database to connect to: + +1. **Sign up for [Databend Cloud](https://app.databend.com)** (free tier available) +2. **Create a warehouse and database** +3. **Get your connection string** from the console + +For detailed DSN format and examples, see [Connection String Documentation](https://docs.databend.com/developer/drivers/#connection-string-dsn). + +| Deployment | Connection String Example | +|------------|---------------------------| +| **Databend Cloud** | `databend://user:pwd@host:443/database?warehouse=wh` | +| **Self-hosted** | `databend://user:pwd@localhost:8000/database?sslmode=disable` | + +### Step 2: Install Dependencies + +Create a virtual environment and install the required packages: + +```bash +# Create virtual environment +python3 -m venv .venv +source .venv/bin/activate + +# Install packages +pip install packaging openai agno openrouter sqlalchemy fastapi mcp-databend +``` + +### Step 3: Create ChatBI Agent + +Now create your ChatBI agent that uses mcp-databend to interact with your database. + +Create a file `agent.py`: + +```python +from contextlib import asynccontextmanager +import os +import logging +import sys + +from agno.agent import Agent +from agno.playground import Playground +from agno.storage.sqlite import SqliteStorage +from agno.tools.mcp import MCPTools +from agno.models.openrouter import OpenRouter +from fastapi import FastAPI + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +def check_env_vars(): + """Check required environment variables""" + required = { + "DATABEND_DSN": "https://docs.databend.com/developer/drivers/#connection-string-dsn", + "OPENROUTER_API_KEY": "https://openrouter.ai/settings/keys" + } + + missing = [var for var in required if not os.getenv(var)] + + if missing: + print("❌ Missing environment variables:") + for var in missing: + print(f" • {var}: {required[var]}") + print("\nExample: export DATABEND_DSN='...' OPENROUTER_API_KEY='...'") + sys.exit(1) + + print("✅ Environment variables OK") + +check_env_vars() + +class DatabendTool: + def __init__(self): + self.mcp = None + self.dsn = os.getenv("DATABEND_DSN") + + def create(self): + env = os.environ.copy() + env["DATABEND_DSN"] = self.dsn + self.mcp = MCPTools( + command="python -m mcp_databend", + env=env, + timeout_seconds=300 + ) + return self.mcp + + async def init(self): + try: + await self.mcp.connect() + logger.info("✓ Connected to Databend") + return True + except Exception as e: + logger.error(f"✗ Databend connection failed: {e}") + return False + +databend = DatabendTool() + +agent = Agent( + name="ChatBI", + model=OpenRouter( + id=os.getenv("MODEL_ID", "anthropic/claude-sonnet-4"), + api_key=os.getenv("OPENROUTER_API_KEY") + ), + tools=[], + instructions=[ + "You are ChatBI - a Business Intelligence assistant for Databend.", + "Help users explore and analyze their data using natural language.", + "Always start by exploring available databases and tables.", + "Format query results in clear, readable tables.", + "Provide insights and explanations with your analysis." + ], + storage=SqliteStorage(table_name="chatbi", db_file="chatbi.db"), + add_datetime_to_instructions=True, + add_history_to_messages=True, + num_history_responses=5, + markdown=True, + show_tool_calls=True, +) + +@asynccontextmanager +async def lifespan(app: FastAPI): + tool = databend.create() + if not await databend.init(): + logger.error("Failed to initialize Databend") + raise RuntimeError("Databend connection failed") + + agent.tools.append(tool) + logger.info("ChatBI initialized successfully") + + yield + + if databend.mcp: + await databend.mcp.close() + +playground = Playground( + agents=[agent], + name="ChatBI with Databend", + description="Business Intelligence Assistant powered by Databend" +) + +app = playground.get_app(lifespan=lifespan) + +if __name__ == "__main__": + print("🤖 Starting MCP Server for Databend") + print("Open http://localhost:7777 to start chatting!") + playground.serve(app="agent:app", host="127.0.0.1", port=7777) +``` + +### Step 4: Configure Environment + +Set up your API keys and database connection: + +```bash +# Set your OpenRouter API key +export OPENROUTER_API_KEY="your-openrouter-key" + +# Set your Databend connection string +export DATABEND_DSN="your-databend-connection-string" +``` + +### Step 5: Start Your ChatBI Agent + +Run your agent to start the local server: + +```bash +python agent.py +``` + +You should see: +``` +✅ Environment variables OK +🤖 Starting MCP Server for Databend +Open http://localhost:7777 to start chatting! +INFO Starting playground on http://127.0.0.1:7777 +INFO: Started server process [189851] +INFO: Waiting for application startup. +INFO:agent:✓ Connected to Databend +INFO:agent:ChatBI initialized successfully +INFO: Application startup complete. +INFO: Uvicorn running on http://127.0.0.1:7777 (Press CTRL+C to quit) +``` + +### Step 6: Setup Web Interface + +For a better user experience, you can set up Agno's web interface: + +```bash +# Create the Agent UI +npx create-agent-ui@latest + +# Enter 'y' when prompted, then run: +cd agent-ui && npm run dev +``` + +**Connect to Your Agent:** +1. Open [http://localhost:3000](http://localhost:3000) +2. Select "localhost:7777" as your endpoint +3. Start asking questions about your data! + +**Try These Queries:** +- "Show me all databases" +- "What tables do I have?" +- "Describe the structure of my tables" +- "Run a query to show sample data" + +## Resources + +- **GitHub Repository**: [databendlabs/mcp-databend](https://github.com/databendlabs/mcp-databend) +- **PyPI Package**: [mcp-databend](https://pypi.org/project/mcp-databend) +- **Agno Framework**: [Agno MCP](https://docs.agno.com/tools/mcp/mcp) +- **Agent UI**: [Agent UI](https://docs.agno.com/agent-ui/introduction) \ No newline at end of file diff --git a/docs/en/guides/51-ai-functions/index.md b/docs/en/guides/51-ai-functions/index.md index 710b80393a..4436880483 100644 --- a/docs/en/guides/51-ai-functions/index.md +++ b/docs/en/guides/51-ai-functions/index.md @@ -1,6 +1,6 @@ -# AI Functions in Databend +# AI & ML Integration -Databend provides AI and ML capabilities through external functions, allowing you to integrate custom AI models and leverage advanced ML infrastructure while maintaining data privacy and control. +Databend enables powerful AI and ML capabilities through two complementary approaches: build custom AI functions with your own infrastructure, or create conversational data experiences using natural language. ## External Functions - The Recommended Approach @@ -14,6 +14,18 @@ External functions enable you to connect your data with custom AI/ML infrastruct | **Scalability** | Independent scaling and resource optimization | | **Flexibility** | Support for any programming language and ML framework | +## MCP Server - Natural Language Data Interaction + +The Model Context Protocol (MCP) server enables AI assistants to interact with your Databend database using natural language, perfect for building conversational BI tools. + +| Feature | Benefits | +|---------|----------| +| **Natural Language** | Query your data using plain English | +| **AI Assistant Integration** | Works with Claude, ChatGPT, and custom agents | +| **Real-time Analysis** | Get instant insights from your data | + ## Getting Started **[External Functions Guide](01-external-functions.md)** - Learn how to create and deploy custom AI functions with practical examples and implementation guidance + +**[MCP Server Guide](02-mcp.md)** - Build a conversational BI tool using mcp-databend and natural language queries