# Module 5: Mosaic AI Vector Search in Databricks

## Learning Objectives
- Understand Mosaic AI Vector Search features and capabilities
- Learn how vector search works in Databricks
- Set up vector search endpoints and indexes
- Integrate vector search with the Databricks ecosystem
- Implement vector search in RAG applications


## 1. Introduction to Mosaic AI Vector Search in Databricks

### 1.1 What is Mosaic AI Vector Search?

**Mosaic AI Vector Search** is Databricks' managed vector database service that provides:

- **Tightly integrated with Lakehouse**: Direct integration with Delta Lake
- **Scalable vector representation**: Handle millions of vectors
- **Metadata support**: Store and filter by metadata
- **Low latency production service**: Sub-second query times
- **Zero operational overhead**: Fully managed service
- **ACL using Unity Catalog**: Fine-grained access control
- **API for real-time similarity search**: REST API and Python client

### 1.2 Key Features

#### Tightly Integrated with Lakehouse

- **Delta Lake Integration**: Vectors stored alongside data in Delta tables
- **Automatic Sync**: Changes in Delta tables automatically sync to vector index
- **Unified Governance**: Unity Catalog manages access and lineage

#### Scalable Vector Representation

- **Scale**: Handle millions to billions of vectors
- **Performance**: Low-latency queries even at scale
- **Metadata**: Store rich metadata with vectors

#### Production-Ready Service

- **Low Latency**: Sub-second query response times
- **High Availability**: Built for production workloads
- **Managed Service**: No infrastructure management needed

#### Security and Governance

- **Unity Catalog Integration**: Fine-grained access control
- **Index-level ACL**: Control access at index level
- **Audit Logging**: Track all access and queries


## 2. How Does Vector Search Work?

Mosaic AI Vector Search supports three methods for working with vectors:

### Method 1: Delta Sync API with Managed Embeddings

**Approach**: Databricks automatically computes embeddings

**Flow**:
1. Store chunks in Delta table
2. Create vector search index pointing to Delta table
3. Databricks automatically:
   - Computes embeddings using specified model
   - Syncs to vector index
   - Keeps index updated as Delta table changes

**Benefits**:
- **Simplest**: No embedding code needed
- **Automatic**: Embeddings computed and synced automatically
- **Always in sync**: Index stays updated with Delta table

**Use Case**: When you want Databricks to handle embeddings

### Method 2: Delta Sync API with Self-Managed Embeddings

**Approach**: You compute embeddings, Databricks syncs them

**Flow**:
1. Compute embeddings yourself (using your own code/model)
2. Store embeddings in Delta table
3. Create vector search index pointing to Delta table
4. Databricks automatically syncs embeddings to vector index

**Benefits**:
- **Control**: You control embedding computation
- **Flexibility**: Use any embedding model or approach
- **Automatic Sync**: Index stays updated automatically

**Use Case**: When you need custom embedding logic

### Method 3: Direct Access CRUD API

**Approach**: Direct access to vector index

**Flow**:
1. Directly insert/update/delete vectors in index
2. Bypass Delta table sync
3. Full control over index operations

**Benefits**:
- **Direct Control**: Full control over index
- **Real-time Updates**: Immediate index updates
- **Flexibility**: Custom update patterns

**Use Case**: When you need real-time updates or custom workflows

### Comparison

| Method | Embedding Computation | Sync | Use Case |
|--------|----------------------|------|----------|
| Method 1 | Databricks (managed) | Automatic | Simplest, standard use cases |
| Method 2 | You (self-managed) | Automatic | Custom embedding logic |
| Method 3 | You (self-managed) | Manual | Real-time, custom workflows |


## 3. Setting Up Vector Search

### Step 1: Create a Vector Search Endpoint

**Vector Search Endpoint**: Compute resource associated with vector search

**Purpose**: 
- Provides compute for vector search operations
- Manages vector indexes
- Handles query processing

**Configuration**:
- Choose instance type based on workload
- Configure for your scale and latency requirements

**Example**:
```python
# Create vector search endpoint
endpoint_name = "my-vector-search-endpoint"
# Configure endpoint settings
```

### Step 2: Create Model Serving Endpoint (if using managed embeddings)

**Model Serving Endpoint**: Hosts embedding models for automatic embedding computation

**Supported Models**:
- **Foundation Model APIs**: BGE, E5, etc.
- **External Models**: OpenAI ada-002, etc.
- **Custom Models**: Your own embedding models

**Configuration**:
- Select embedding model
- Configure model parameters
- Set up scaling

**Example**:
```python
# Create model serving endpoint for embeddings
embedding_endpoint = "my-embedding-endpoint"
# Configure with BGE or other embedding model
```

### Step 3: Create a Vector Search Index

**Vector Search Index**: Created and auto-synced from Delta table

**Process**:
1. Define source Delta table (with chunks or embeddings)
2. Configure index settings:
   - Embedding model (if using managed embeddings)
   - Index type (HNSW, etc.)
   - Metadata fields
3. Create index - Databricks handles the rest

**Index-Level ACL**: Control access at index level using Unity Catalog

**Example**:
```python
# Create vector search index
index_name = "my-vector-index"
source_table = "catalog.schema.chunks_table"

# Configure index
index_config = {
    "primary_key": "chunk_id",
    "embedding_source_columns": ["chunk_text"],  # For managed embeddings
    "embedding_model_endpoint": embedding_endpoint,
    # OR
    "embedding_source_columns": ["embedding_vector"],  # For self-managed
    "schema": {
        "embedding_vector": "array<float>",
        "metadata": {...}
    }
}
```


## 4. Setting Up Vector Search in Databricks - Detailed Steps

### 4.1 Prerequisites

**Required**:
- Databricks workspace with Vector Search enabled
- Unity Catalog enabled
- Delta table with document chunks
- Appropriate permissions

### 4.2 Complete Setup Workflow

#### Step 1: Prepare Your Data

**Create Delta Table with Chunks**:
```python
# Example: Chunks table structure
chunks_table = spark.createDataFrame([
    {
        "chunk_id": "chunk_001",
        "document_id": "doc_123",
        "chunk_text": "RAG is a pattern...",
        "metadata": {
            "section": "Introduction",
            "page": 1
        }
    },
    # ... more chunks
])

# Write to Delta table
chunks_table.write.format("delta").saveAsTable("catalog.schema.chunks")
```

#### Step 2: Create Vector Search Endpoint

**Using UI or API**:
```python
from databricks.vector_search.client import VectorSearchClient

client = VectorSearchClient()

# Create endpoint
endpoint = client.create_endpoint(
    name="my-endpoint",
    endpoint_type="STANDARD"
)
```

#### Step 3: Create Embedding Model Endpoint (if using managed embeddings)

**Using Model Serving**:
```python
# Create model serving endpoint for embeddings
# Configure with BGE, OpenAI, or custom model
```

#### Step 4: Create Vector Search Index

**Using Python Client**:
```python
# Create index with managed embeddings
index = client.create_index(
    endpoint_name="my-endpoint",
    index_name="my-index",
    primary_key="chunk_id",
    index_type="DELTA_SYNC",
    delta_sync_index_spec={
        "source_table": "catalog.schema.chunks",
        "pipeline_type": "TRIGGERED",
        "embedding_source_columns": ["chunk_text"],
        "embedding_model_endpoint": "my-embedding-endpoint"
    }
)
```

**Or with self-managed embeddings**:
```python
# Create index with pre-computed embeddings
index = client.create_index(
    endpoint_name="my-endpoint",
    index_name="my-index",
    primary_key="chunk_id",
    index_type="DELTA_SYNC",
    delta_sync_index_spec={
        "source_table": "catalog.schema.chunks_with_embeddings",
        "pipeline_type": "TRIGGERED",
        "embedding_source_columns": ["embedding_vector"]
    }
)
```

### 4.3 Query the Index

**Using Python Client**:
```python
# Query vector search index
results = client.query_index(
    endpoint_name="my-endpoint",
    index_name="my-index",
    query_text="What is RAG?",
    num_results=5
)

# Or with query vector
query_vector = embed("What is RAG?")
results = client.query_index(
    endpoint_name="my-endpoint",
    index_name="my-index",
    query_vector=query_vector,
    num_results=5,
    filters={"section": "Introduction"}  # Metadata filtering
)
```

**Using REST API**:
```python
import requests

response = requests.post(
    f"{workspace_url}/api/2.0/vector-search/indexes/{index_name}/query",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "query_text": "What is RAG?",
        "num_results": 5
    }
)
```


## 5. Integration with Databricks Ecosystem

### 5.1 Delta Lake Integration

**Automatic Sync**:
- Changes to Delta table automatically sync to vector index
- No manual refresh needed
- Supports incremental updates

**Benefits**:
- Single source of truth (Delta table)
- Automatic consistency
- Version control through Delta Lake

### 5.2 Unity Catalog Integration

**Access Control**:
- Index-level permissions
- Table-level permissions
- Fine-grained access control

**Governance**:
- Data lineage tracking
- Audit logging
- Compliance support

### 5.3 Model Serving Integration

**Embedding Models**:
- Host embedding models via Model Serving
- Support for foundation models
- Support for external models (OpenAI, etc.)
- Support for custom models

**Generation Models**:
- Host LLMs for RAG generation
- Complete RAG pipeline in Databricks

### 5.4 Workflow Integration

**Automated Pipelines**:
- Use Databricks Workflows to orchestrate
- Automated data preparation
- Automated index updates
- Scheduled refreshes

**Example Workflow**:
```
1. Ingest documents → Delta table
2. Process and chunk → Chunks table
3. Create/update vector index
4. (Optional) Compute embeddings → Embeddings table
5. Sync to vector search
```

### 5.5 Notebook Integration

**Interactive Development**:
- Develop and test in notebooks
- Iterate on chunking strategies
- Test embedding models
- Evaluate retrieval quality


## 6. Best Practices

### 6.1 Index Configuration

**Choose Right Index Type**:
- **DELTA_SYNC**: For automatic sync from Delta tables
- **DIRECT_ACCESS**: For direct CRUD operations

**Configure Metadata**:
- Include useful metadata fields
- Enable filtering on common fields
- Balance between flexibility and performance

### 6.2 Embedding Model Selection

**Considerations**:
- Domain match (general, code, multilingual)
- Dimension size (storage vs. quality trade-off)
- Query vs. document embeddings
- Performance requirements

### 6.3 Monitoring and Optimization

**Monitor**:
- Query latency
- Index size
- Sync lag
- Error rates

**Optimize**:
- Tune index parameters
- Optimize chunk sizes
- Choose appropriate embedding models
- Balance accuracy vs. latency

### 6.4 Security

**Access Control**:
- Use Unity Catalog for access control
- Set appropriate permissions
- Regular access reviews

**Data Privacy**:
- Mask sensitive data before indexing
- Use row-level security
- Audit all access


## 7. Summary and Next Steps

### Key Takeaways

1. **Mosaic AI Vector Search** provides managed vector database in Databricks
2. **Three methods** for working with vectors (managed embeddings, self-managed, direct access)
3. **Tight integration** with Delta Lake and Unity Catalog
4. **Zero operational overhead** - fully managed service
5. **Production-ready** with low latency and high availability

### Next Module: Building RAG Applications with MLflow

In the next module, we'll explore:
- Assembling complete RAG applications
- Using MLflow for RAG solutions
- Managing RAG chains
- Deploying RAG applications


## Exercises

1. **Exercise 1**: Set up a vector search endpoint and index in Databricks
2. **Exercise 2**: Compare managed vs self-managed embeddings - when would you use each?
3. **Exercise 3**: Create a vector search index with metadata filtering
4. **Exercise 4**: Query a vector search index and analyze results
5. **Exercise 5**: Design a complete data pipeline from documents to vector search
