# Using Foundation Models on Databricks: A Comprehensive Guide

Foundation models have revolutionized how we approach AI and machine learning tasks. Databricks provides a robust platform for querying and deploying these models through their Model Serving infrastructure. This guide breaks down everything you need to know about using foundation models on Databricks.

## üéØ What Are Foundation Models on Databricks?

Foundation models are large-scale AI models that can be applied to various tasks. Databricks supports both:
- **Databricks-hosted models** through Foundation Model APIs
- **External models** from providers like OpenAI, Anthropic, and Google
- **Unified OpenAI-compatible API** for consistent querying across all models

---

## üîß Query Options Available

Databricks offers multiple methods to interact with foundation models:

### 1. **OpenAI Client**
- Query endpoints using the familiar OpenAI SDK
- Specify the model serving endpoint name as the model input
- Supports chat, embeddings, and completions models

### 2. **AI Functions (SQL)**
- Use `ai_query()` SQL function for direct model inference
- Ideal for data analysts working in SQL environments
- Seamlessly integrate AI into your data pipelines

### 3. **Serving UI**
- Web-based interface for testing queries
- Insert JSON format input data
- Load pre-logged input examples
- Perfect for quick experimentation

### 4. **REST API**
- Standard HTTP API calls
- POST to `/serving-endpoints/{name}/invocations`
- Programmatic access for production applications

### 5. **MLflow Deployments SDK**
- Use the `predict()` function
- Python-native interface
- Integrated with MLflow ecosystem

### 6. **Databricks Python SDK**
- High-level abstraction over REST API
- Automatic authentication handling
- Simplified development experience

---

## üì¶ Installation & Setup

### Package Requirements by Method:

**For OpenAI Client:**
```python
!pip install databricks-sdk[openai]>=0.35.0
```

**For MLflow Deployments:**
```python
!pip install mlflow
```

**For Databricks SDK:**
- Pre-installed on Databricks Runtime 13.3 LTS and above
- Manual installation required for Runtime 12.2 LTS and below

### üîê Authentication Best Practices:
- **Production**: Use machine-to-machine OAuth tokens
- **Development/Testing**: Use service principal access tokens
- Avoid using personal user tokens in production

---

## ü§ñ Foundation Model Types & Use Cases

### 1. **General Purpose (Chat Models)**

**Supported Models:**
- `databricks-gpt-5-1`, `databricks-gpt-5`, `databricks-gpt-5-mini/nano`
- `databricks-gemini-3-pro`, `databricks-gemini-2-5-pro/flash`
- `databricks-claude-sonnet-4-5`, `databricks-claude-opus-4-1`
- `databricks-llama-4-maverick`, `databricks-meta-llama-3-3-70b-instruct`
- External: OpenAI GPT, Anthropic Claude, Google Gemini

**Best Use Cases:**
- Virtual assistants and chatbots
- Customer support automation
- Interactive tutoring systems
- Multi-turn dialogue applications

---

### 2. **Embeddings Models**

**Supported Models:**
- `databricks-gte-large-en`
- `databricks-bge-large-en`
- External: OpenAI, Cohere, Google text embeddings

**Best Use Cases:**
- Semantic search engines
- Retrieval Augmented Generation (RAG)
- Topic clustering and classification
- Sentiment analysis
- Document similarity comparison

**Key Benefit:** Transform complex data into compact numerical vectors for efficient comparison and analysis

---

### 3. **Vision Models**

**Supported Models:**
- GPT-5 series with vision capabilities
- Gemini series with vision
- Claude series with vision
- Gemma-3-12b, Llama-4-maverick

**Best Use Cases:**
- Object detection and recognition
- Image classification and segmentation
- Document understanding and OCR
- Visual content analysis
- Medical imaging analysis

---

### 4. **Reasoning Models**

**Supported Models:**
- Advanced GPT-5 series
- Gemini Pro series
- Claude Sonnet and Opus series
- Specialized reasoning variants

**Best Use Cases:**
- Complex code generation
- Content creation and summarization
- Agent orchestration
- Multi-step problem solving
- Logical inference tasks

**Key Feature:** Simulate human-like logical thinking with explainable decision-making

---

## üöÄ Advanced Features

### ‚ö° Function Calling
- OpenAI-compatible function calling
- Available for Foundation Model APIs and external models
- Enable models to interact with external tools and APIs
- Perfect for building AI agents

### üìä Structured Outputs
- Enforce specific output formats
- JSON schema validation
- Available for Foundation Model APIs
- Ensures predictable, parseable responses

### üíæ Prompt Caching
**Supported for Databricks-hosted Claude models**

**Cacheable Components:**
- Text content in messages
- Thinking messages content
- Image content blocks
- Tool use definitions and results

**Cache Control Example:**
```json
{
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "What's the date today?",
      "cache_control": {"type": "ephemeral"}
    }]
  }]
}
```

**Benefits:**
- Reduced latency for repeated prompts
- Cost optimization for similar queries
- Improved performance for RAG applications

---

## üéÆ AI Playground

Interactive chat-like environment for:
- Testing and experimenting with LLMs
- Comparing different models side-by-side
- Prompt engineering and optimization
- No code required

**Access:** Available directly in your Databricks workspace

---

## üìã Prerequisites & Requirements

### ‚úÖ Must Have:
1. Active model serving endpoint
2. Databricks workspace in supported region
3. Databricks API token (personal access token or service principal)

### üåç Regional Availability:
- Check Foundation Model APIs supported regions
- Verify external models regional support
- Ensure your workspace is in compatible region

---

## üîÑ Important API Differences

### Databricks vs. Anthropic REST API:
- **Output field**: `choices` (not `content`)
- **Stop reasons**: `stop`, `length`, `tool_calls` (not Anthropic's naming)
- **Streaming format**: Consistent chunk format with usage in every chunk
- **OpenAI-compatible**: Ensures broader ecosystem compatibility

---

## üìö Additional Resources & Next Steps

### Monitoring & Operations:
- Monitor with AI Gateway-enabled inference tables
- Deploy batch inference pipelines
- Set up model endpoint management

### Learning Resources:
- Foundation Model APIs documentation
- External models integration guides
- OpenAI models tutorial
- REST API reference documentation

### Model Information:
- Browse supported Databricks-hosted models
- Review acceptable use policies
- Check mitigation requirements (OpenAI models)
- Understand gen AI model maintenance policy

---

## ‚ö†Ô∏è Important Notes

### Model Retirement Notice:
- **Meta-Llama-3.1-405B-Instruct** will be retired:
  - February 15, 2026: Pay-per-token workloads
  - May 15, 2026: Provisioned throughput workloads
- Check documentation for recommended replacement models
- Plan migration during deprecation period

---

## üéØ Key Takeaways

1. **Flexibility**: Multiple query methods suit different use cases and workflows
2. **Compatibility**: OpenAI-compatible API enables easy migration and integration
3. **Variety**: Wide range of model types for diverse AI tasks
4. **Advanced Features**: Function calling, structured outputs, and prompt caching
5. **Enterprise-Ready**: Strong authentication, monitoring, and governance features
6. **Unified Platform**: Single interface for both hosted and external models

---

**Get Started Today**: Set up your model serving endpoint and start querying foundation models through your preferred method. The unified API makes it easy to experiment and move to production quickly.

[Source](https://docs.databricks.com/aws/en/machine-learning/model-serving/score-foundation-models)