# My Vector Database - Demo

This notebook demonstrates the complete functionality of the vector database and SDK.

**Topics Covered**:
1. Architecture Overview
2. Creating Data
3. Reading & Updating
4. Vector Search
5. Filtering
6. Persistence
7. Agno Integration
8. Design Patterns

---

## 1. Architecture Overview

The vector database is organized in a **3-tier hierarchy**:
- **Libraries**: Top-level containers with vector index configuration
- **Documents**: Logical groupings within a library
- **Chunks**: Individual searchable units with text, embeddings, and metadata

### Key Design Principles

1. **Layered Architecture**: Clean separation (API → Service → Storage → Index)
2. **Thread-Safe**: RLock-based synchronization for concurrent operations
3. **Type-Safe**: Full Pydantic validation throughout
4. **Persistence**: JSON snapshots with atomic writes
5. **Filtering**: Post-filtering strategy with declarative and custom options

### Connection & Validation

First, verify the API server is running and check initial state.

In [None]:
from my_vector_db.sdk import VectorDBClient

client = VectorDBClient(base_url="http://localhost:8000")

# Validate connection and check initial state
status = client.get_persistence_status()
print(f"Connected to API")
print(f"Persistence: {status['enabled']}")

---

## 2. Creating Data: Hierarchical Structure

The hierarchical design allows flexible organization:
- **Libraries** define index configuration (FLAT, HNSW) and distance metrics
- **Documents** group related chunks (e.g., chapters in a book)
- **Chunks** are the actual searchable units with embeddings

In [None]:
# Create library with index configuration
library = client.create_library(
    name="tech_articles",
    index_type="flat",
    index_config={"metric": "cosine"},
    metadata={"description": "Technology articles", "version": "1.0"},
)

print(f"Created library: {library.id}")
print(f"Index type: {library.index_type}")
print(f"Metric: {library.index_config['metric']}")

In [None]:
# Create document within library
document = client.create_document(
    library_id=library.id,
    name="tech_articles_2024",
    metadata={"year": 2024, "source": "tech blogs"},
)

print(f"Created document: {document.id}")
print(f"Name: {document.name}")

### Sample Dataset

This dataset includes:
- **7 articles** across 3 categories (AI, Cloud, Security)
- **Rich metadata**: category, topic, confidence scores, authors
- **5D embeddings**: Simplified for demo purposes (production typically uses 768-1536 dimensions)

In [None]:
articles = [
    # AI/ML articles
    {
        "text": "Machine learning models use neural networks for pattern recognition",
        "embedding": [0.9, 0.8, 0.1, 0.2, 0.3],
        "metadata": {
            "category": "ai",
            "topic": "machine learning",
            "confidence": 0.95,
            "author": "Alice",
        },
    },
    {
        "text": "Deep learning architectures enable complex AI applications",
        "embedding": [0.85, 0.75, 0.15, 0.25, 0.35],
        "metadata": {
            "category": "ai",
            "topic": "deep learning",
            "confidence": 0.88,
            "author": "Bob",
        },
    },
    {
        "text": "Reinforcement learning enables autonomous decision making",
        "embedding": [0.87, 0.77, 0.13, 0.23, 0.33],
        "metadata": {
            "category": "ai",
            "topic": "reinforcement learning",
            "confidence": 0.90,
            "author": "Alice",
        },
    },
    # Cloud computing articles
    {
        "text": "Cloud infrastructure provides scalable computing resources",
        "embedding": [0.3, 0.2, 0.9, 0.8, 0.1],
        "metadata": {
            "category": "cloud",
            "topic": "infrastructure",
            "confidence": 0.92,
            "author": "Charlie",
        },
    },
    {
        "text": "Kubernetes orchestrates containerized applications in the cloud",
        "embedding": [0.35, 0.25, 0.85, 0.75, 0.15],
        "metadata": {
            "category": "cloud",
            "topic": "kubernetes",
            "confidence": 0.89,
            "author": "Alice",
        },
    },
    # Security articles
    {
        "text": "Cybersecurity best practices protect against data breaches",
        "embedding": [0.1, 0.2, 0.3, 0.9, 0.8],
        "metadata": {
            "category": "security",
            "topic": "cybersecurity",
            "confidence": 0.91,
            "author": "Bob",
        },
    },
    {
        "text": "Machine learning detects anomalies in network security",
        "embedding": [0.6, 0.5, 0.4, 0.7, 0.6],
        "metadata": {
            "category": "security",
            "topic": "machine learning",
            "confidence": 0.82,
            "author": "Charlie",
        },
    },
]

print(f"Prepared {len(articles)} articles")

### Batch Insert

**Design Pattern**: Batch operations are preferred over individual inserts.

**Benefits**:
- Single API call reduces HTTP round-trips
- Atomic transactions (all-or-nothing)
- Better performance for large datasets

**Best Practice**: Always use `add_chunks()` for multiple inserts rather than looping with individual `create_chunk()` calls.

In [None]:
# Batch insert all chunks at once
chunks = client.add_chunks(document_id=document.id, chunks=articles)

print(f"Inserted {len(chunks)} chunks")
print(f"Sample: {chunks[0].text[:50]}...")

---

## 3. Reading & Updating Data

After creation, all entities can be accessed by their UUID without specifying parent relationships. This simplifies API usage while maintaining referential integrity.

In [None]:
# List operations at each level
libraries = client.list_libraries()
print(f"Libraries: {len(libraries)}")

documents = client.list_documents(library_id=library.id)
print(f"Documents: {len(documents)}")

chunks = client.list_chunks(document_id=document.id)
print(f"Chunks: {len(chunks)}")

### Update Operations

Updates are performed on full objects. Notice how the `updated_at` timestamp changes while `created_at` remains unchanged.

In [None]:
# Update metadata on all chunks
for chunk in client.list_chunks(document_id=document.id):
    chunk.metadata["reviewed_by"] = "demo_bot"
    client.update_chunk(chunk)

print("Updated all chunks with review metadata")

# Verify update
sample = client.list_chunks(document_id=document.id)[0]
sample

---

## 4. Vector Search

The vector database performs **k-nearest neighbor (kNN)** search using the configured distance metric.

**Current Implementation**: FLAT index
- **Complexity**: O(n) - exhaustive search
- **Recall**: 100% (exact search, guaranteed true nearest neighbors)
- **Best for**: < 10,000 vectors

**Alternative**: HNSW index (planned)
- **Complexity**: O(log n) - approximate search
- **Recall**: ~95-99% (tunable)
- **Best for**: Millions of vectors

In [None]:
# Query about AI/ML topics
query_vector = [0.88, 0.78, 0.12, 0.22, 0.32]

search_results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=5,
)

print(f"Query time: {search_results.query_time_ms:.2f}ms")
print(f"Searched {len(search_results.results)} chunks")
print(f"Index type: {library.index_type}\n")

for i, result in enumerate(search_results.results, 1):
    print(f"{i}. Score: {result.score:.4f}")
    print(f"   {result.text}")
    print(f"   Category: {result.metadata['category']}")
    print()

---

## 5. Filtering: Two Approaches

The SDK provides two complementary filtering strategies:

### Approach 1: Declarative Filters (API-Compatible)
- **JSON-serializable** filter definitions
- **Server-side** filtering via REST API
- Works with any HTTP client
- **Use for**: Production deployments, cross-language clients

### Approach 2: Custom Functions (SDK-Only)
- **Python functions** for maximum flexibility
- **Client-side** filtering with access to similarity scores
- Complex logic without API changes
- **Use for**: Prototyping, complex business logic, ad-hoc queries

In [None]:
from my_vector_db.domain.models import (
    SearchFilters,
    FilterGroup,
    MetadataFilter,
    FilterOperator,
    LogicalOperator,
)

# Simple metadata filter
filters = SearchFilters(
    metadata=FilterGroup(
        operator=LogicalOperator.AND,
        filters=[
            MetadataFilter(field="category", operator=FilterOperator.EQUALS, value="ai")
        ],
    )
)

results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=5,
    filters=filters,
)

print("Filtered by category='ai':\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. {result.text[:60]}...")
    print(f"   Category: {result.metadata['category']}")
    print(f"   Score: {result.score:.4f}")

### Complex AND/OR Filters

Filters support nested logic for sophisticated queries. This example finds:
- (AI articles with confidence > 0.9) **OR** (any security article)

In [None]:
complex_filters = SearchFilters(
    metadata=FilterGroup(
        operator=LogicalOperator.OR,
        filters=[
            # High-confidence AI articles
            FilterGroup(
                operator=LogicalOperator.AND,
                filters=[
                    MetadataFilter(
                        field="category", operator=FilterOperator.EQUALS, value="ai"
                    ),
                    MetadataFilter(
                        field="confidence",
                        operator=FilterOperator.GREATER_THAN,
                        value=0.9,
                    ),
                ],
            ),
            # OR any security article
            MetadataFilter(
                field="category", operator=FilterOperator.EQUALS, value="security"
            ),
        ],
    )
)

results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=10,
    filters=complex_filters,
)

print("Filtered by (AI AND confidence>0.9) OR security:\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. [{result.metadata['category']}] {result.text[:50]}...")
    print(
        f"   Confidence: {result.metadata['confidence']}, Score: {result.score:.4f}\n"
    )

### Custom Filter Functions (SDK Only)

For complex filtering logic, pass a Python function to `filter_function`. The function receives `SearchResult` objects (not `Chunk` objects), which include the similarity score.

**Implementation Detail**: 
- Uses over-fetch strategy (k×3) to compensate for client-side filtering
- Operates on `SearchResult` to avoid circular dependencies
- Enables filtering based on similarity scores

In [None]:
from my_vector_db.sdk.models import SearchResult


# Custom filter combining multiple conditions
def high_quality_ai_by_alice(result: SearchResult) -> bool:
    return (
        result.score > 0.95
        and result.metadata.get("category") == "ai"
        and result.metadata.get("author") == "Alice"
        and result.metadata.get("confidence", 0) > 0.85
    )


results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=5,
    filter_function=high_quality_ai_by_alice,
)

print("Custom filter: high score, AI, Alice, high confidence\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. {result.text}")
    print(
        f"   Score: {result.score:.4f}, Confidence: {result.metadata['confidence']}\n"
    )

In [None]:
# Lambda filter example - filter by text content
results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=5,
    filter_function=lambda r: "learning" in r.text.lower(),
)

print("Lambda filter: contains 'learning'\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. {result.text}")

---

## 6. Persistence & Durability

The database supports optional persistence with a simple snapshot-based approach.

**Design Decision**: JSON snapshots
- Saves entire state to JSON every N operations (default: 10)
- Atomic writes using temp file + rename pattern
- Human-readable format for debugging
- Configurable via environment variables

**Tradeoff**: 
- **Pro**: Simple implementation, easy debugging, no threading complexity
- **Con**: May lose last N operations on crash
- **Production Alternative**: PostgreSQL + pgvector with write-ahead logging

In [None]:
import json

# Check current persistence status
status = client.get_persistence_status()

print("Persistence Status:")
print(json.dumps(status, indent=2))

### Save/Restore Demo

Demonstrate persistence by saving a snapshot, performing a destructive operation, then restoring.

In [None]:
# Save current state
result = client.save_snapshot()
print(json.dumps(result, indent=2))

In [None]:
# Delete the library (destructive operation)
client.delete_library(library_id=library.id)

libraries_after_delete = client.list_libraries()
print(f"After delete - Libraries: {len(libraries_after_delete)}")

In [None]:
# Restore from snapshot
result = client.restore_snapshot()

# Verify restoration
libraries_after_restore = client.list_libraries()
print(f"\nVerification - Libraries: {len(libraries_after_restore)}")
for lib in libraries_after_restore:
    print(f"  {lib.name} ({lib.id})")

---

## 7. Real-World Integration: Agno RAG Agent

This section demonstrates integration with the [Agno](https://github.com/agno-ai/agno) agent framework for building RAG (Retrieval-Augmented Generation) applications.

**RAG Pattern**:
1. User asks a question
2. Agent searches vector DB for relevant context
3. Context augments the LLM prompt
4. LLM generates informed response

**Why This Works**: The hierarchical design (libraries/documents/chunks) naturally maps to knowledge organization, making integration straightforward.

See `examples/agno_example.py` for a complete working implementation.

In [None]:
# Example integration pattern (see agno_example.py for full code)


from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.models.anthropic import Claude
from my_vector_db.db import MyVectorDB

# Create vector database connection
vector_db = MyVectorDB(
    api_base_url="http://localhost:8000",
    library_name="Python Programming Guide",
    index_type="flat",
)

# Create knowledge base that uses our vector DB
knowledge = Knowledge(name="Tech Knowledge Base", vector_db=vector_db, max_results=5)

# Create agent with RAG capabilities
agent = Agent(
    name="Tech Assistant",
    knowledge=knowledge,
    model=Claude(id="claude-sonnet-4-5"),
    search_knowledge=True,  # Enable RAG
)

# Start interactive CLI (agent searches vector DB automatically)
agent.print_response(
    "what are the latest trends in AI and cloud computing?", stream=False
)

---

## 8. Design Patterns & Best Practices

This section summarizes key design decisions and recommended practices.

### 1. Batch Operations Pattern

**Recommended**:
```python
chunks = client.add_chunks(document_id=doc.id, chunks=large_list)
```

**Avoid**:
```python
for chunk in large_list:
    client.create_chunk(...)  # Many HTTP round-trips
```

**Takeaway**: Batch operations reduce network overhead and enable atomic transactions.

### 2. Index Selection: FLAT vs HNSW

| Metric | Flat Index | HNSW Index |
|--------|------------|------------|
| Search | O(n) - exact | O(log n) - approximate |
| Insert | O(1) | O(log n) |
| Recall | 100% | 95-99% (tunable) |
| Best For | <10K vectors | Millions of vectors |

**Recommendation**: Start with FLAT for accuracy and simplicity. Migrate to HNSW when dataset grows beyond 10,000 vectors.

### 3. Post-Filtering Strategy

**Algorithm**:
1. Perform kNN vector search → get candidates
2. Fetch full chunk data from storage
3. Apply metadata filters
4. Return top k results

**Tradeoff**: 
- **Pro**: Simple implementation, works with any index type
- **Pro**: Index layer doesn't need filter logic
- **Con**: May not return k results if filters are highly selective
- **Con**: Requires over-fetching (k×3 for custom filters)

**Production Alternative**: Pre-filtering with bitmap indexes for highly selective queries.

### 4. Type Safety Throughout

**Pattern**: Pydantic models everywhere
```python
library: Library = client.create_library(...)  # Type-checked
document: Document = client.create_document(...)  # Validated
```

**Benefits**:
- Runtime validation catches errors early
- IDE autocomplete improves developer experience
- Auto-generated OpenAPI documentation
- Prevents entire classes of bugs

**Tradeoff**: Slight performance overhead (~10-15%), but worth it for reliability.

### 5. Layered Architecture

**Layers**: API → Service → Storage → Index

**Benefits**:
- **Separation of concerns**: Each layer has clear responsibilities
- **Testable**: Can test each layer in isolation
- **Extensible**: Can swap implementations without affecting other layers

**Example**: Adding HNSW index requires only implementing the index interface. API and Service layers remain unchanged.

### Summary: Pragmatic Design Choices

| Feature | Current | Production Alternative |
|---------|---------|------------------------|
| Persistence | JSON snapshots | PostgreSQL + WAL |
| Filtering | Post-filtering | Pre-filtering with bitmaps |
| Index | FLAT (O(n)) | HNSW (O(log n)) |
| Locking | Coarse RLock | Fine-grained locks |
| Storage | In-memory | Distributed (Redis, etc.) |

**Philosophy**: Start simple, scale where needed.

The current design is:
- Easy to understand and debug
- Production-ready for moderate scale (<100K vectors)
- Extensible for larger scale with clear upgrade paths

---

## Cleanup

In [None]:
# Close client connection
client.delete_library(library_id=library.id)  # Clean up
client.close()
print("Client connection closed")

---

## Next Steps

1. **API Documentation**: http://localhost:8000/docs
2. **SDK Reference**: `docs/README.md`
3. **More Examples**: `examples/` directory
4. **Run Tests**: `uv run pytest`

**Discussion Questions**:
- When would you switch from FLAT to HNSW index?
- How would you implement distributed storage?
- What monitoring and observability would you add?
- How would you handle schema migrations?