# My Vector Database 

## 1. Architecture Overview <a id="architecture"></a>

### Key Design Principles

1. **Layered Architecture**: Clean separation of concerns (API → Service → Storage → Index)
2. **Thread-Safe**: RLock-based synchronization for concurrent operations
3. **Type-Safe**: Full Pydantic models with validation
4. **Persistence**: Periodic JSON snapshots with atomic writes
5. **Filtering**: Post-filtering strategy with over-fetch mitigation

## 2. Setup & Connection <a id="setup"></a>

First, ensure the Vector Database API is running:
```bash
docker compose up -d
```

In [1]:
from my_vector_db.sdk import VectorDBClient

# Initialize client
client = VectorDBClient(base_url="http://localhost:8000")
print("✓ Connected to Vector Database API")

✓ Connected to Vector Database API


## 3. Basic Client Operations <a id="crud"></a>

In [27]:
# Create a library (collection of documents)
library = client.create_library(
    name="tech_articles",
    index_type="flat",  # Options: 'flat', 'hnsw'
    index_config={"metric": "cosine"},  # Options: 'cosine', 'euclidean', 'dot_product'
    metadata={"description": "Technology articles dataset", "version": "1.0"},
)

print(f"✓ Created library: {library.id}")
print(f"  Name: {library.name}")
print(f"  Index: {library.index_type}")
print(f"  Metric: {library.index_config['metric']}")

✓ Created library: 00fa6322-88ad-4f27-bfa6-1a0802acd19d
  Name: tech_articles
  Index: IndexType.FLAT
  Metric: cosine


In [28]:
# Create a document (logical grouping of chunks)
document = client.create_document(
    library_id=library.id,
    name="tech_articles_2024",
    metadata={"year": 2024, "source": "tech blogs"},
)

print(f"✓ Created document: {document.id}")
print(f"  Name: {document.name}")
print(f"  Library: {document.library_id}")

✓ Created document: 3fa0fb29-2bdb-43f3-b2b0-4815e69d6cc5
  Name: tech_articles_2024
  Library: 00fa6322-88ad-4f27-bfa6-1a0802acd19d


### Sample Dataset

We'll use a curated dataset of technology articles with:
- Multiple categories (AI, Cloud, Security)
- Varying confidence scores
- Different authors
- 5-dimensional embeddings (simplified for demo)

In [29]:
# Sample articles with embeddings and metadata
articles = [
    # AI/ML articles
    {
        "text": "Machine learning models use neural networks for pattern recognition",
        "embedding": [0.9, 0.8, 0.1, 0.2, 0.3],
        "metadata": {
            "category": "ai",
            "topic": "machine learning",
            "confidence": 0.95,
            "author": "Alice",
        },
    },
    {
        "text": "Deep learning architectures enable complex AI applications",
        "embedding": [0.85, 0.75, 0.15, 0.25, 0.35],
        "metadata": {
            "category": "ai",
            "topic": "deep learning",
            "confidence": 0.88,
            "author": "Bob",
        },
    },
    {
        "text": "Reinforcement learning enables autonomous decision making",
        "embedding": [0.87, 0.77, 0.13, 0.23, 0.33],
        "metadata": {
            "category": "ai",
            "topic": "reinforcement learning",
            "confidence": 0.90,
            "author": "Alice",
        },
    },
    # Cloud computing articles
    {
        "text": "Cloud infrastructure provides scalable computing resources",
        "embedding": [0.3, 0.2, 0.9, 0.8, 0.1],
        "metadata": {
            "category": "cloud",
            "topic": "infrastructure",
            "confidence": 0.92,
            "author": "Charlie",
        },
    },
    {
        "text": "Kubernetes orchestrates containerized applications in the cloud",
        "embedding": [0.35, 0.25, 0.85, 0.75, 0.15],
        "metadata": {
            "category": "cloud",
            "topic": "kubernetes",
            "confidence": 0.89,
            "author": "Alice",
        },
    },
    # Security articles
    {
        "text": "Cybersecurity best practices protect against data breaches",
        "embedding": [0.1, 0.2, 0.3, 0.9, 0.8],
        "metadata": {
            "category": "security",
            "topic": "cybersecurity",
            "confidence": 0.91,
            "author": "Bob",
        },
    },
    {
        "text": "Machine learning detects anomalies in network security",
        "embedding": [0.6, 0.5, 0.4, 0.7, 0.6],
        "metadata": {
            "category": "security",
            "topic": "machine learning",
            "confidence": 0.82,
            "author": "Charlie",
        },
    },
]

print(f"Prepared {len(articles)} articles for insertion")

Prepared 7 articles for insertion


### Batch Insert with add_chunks()

**Design Decision**: Batch operations for efficiency
- Single API call instead of multiple requests
- Atomic transactions (all-or-nothing)
- Better for large datasets

In [30]:
# Add all chunks in a single batch operation
chunks = client.add_chunks(document_id=document.id, chunks=articles)

print(f"✓ Inserted {len(chunks)} chunks")
print(f"\nSample chunk:")
print(f"  ID: {chunks[0].id}")
print(f"  Text: {chunks[0].text[:50]}...")
print(f"  Metadata: {chunks[0].metadata}")

✓ Inserted 7 chunks

Sample chunk:
  ID: bdb78e0f-b8c2-4338-9950-b314be507ea1
  Text: Machine learning models use neural networks for pa...
  Metadata: {'category': 'ai', 'topic': 'machine learning', 'confidence': 0.95, 'author': 'Alice'}


### Client methods for working with libraries, documents, and chunks

In [31]:
libraries = client.list_libraries()
libraries

[Library(id=UUID('00fa6322-88ad-4f27-bfa6-1a0802acd19d'), name='tech_articles', document_ids=[UUID('3fa0fb29-2bdb-43f3-b2b0-4815e69d6cc5')], metadata={'description': 'Technology articles dataset', 'version': '1.0'}, index_type=<IndexType.FLAT: 'flat'>, index_config={'metric': 'cosine'}, created_at=datetime.datetime(2025, 10, 31, 22, 37, 48, 978691), updated_at=datetime.datetime(2025, 10, 31, 22, 37, 48, 978691))]

In [32]:
documents = client.list_documents(library_id=library.id)
documents

[Document(id=UUID('3fa0fb29-2bdb-43f3-b2b0-4815e69d6cc5'), name='tech_articles_2024', chunk_ids=[UUID('bdb78e0f-b8c2-4338-9950-b314be507ea1'), UUID('d3faf2db-c3ab-453a-a307-3c9152cb191d'), UUID('da84ef24-b029-46c4-9c98-50cd1c57527f'), UUID('5fe03a9e-8fb6-41fe-9fd7-b48135918fe0'), UUID('24f5c889-2215-4a3a-9b30-8e700711be6c'), UUID('986bfa72-788e-49e1-94f5-ab14a2ba103a'), UUID('e57f543b-483c-4c27-be28-10a435eeb255')], metadata={'year': 2024, 'source': 'tech blogs'}, library_id=UUID('00fa6322-88ad-4f27-bfa6-1a0802acd19d'), created_at=datetime.datetime(2025, 10, 31, 22, 37, 50, 838904), updated_at=datetime.datetime(2025, 10, 31, 22, 37, 50, 838904))]

In [33]:
chunks = client.list_chunks(document.id)
chunks

[Chunk(id=UUID('bdb78e0f-b8c2-4338-9950-b314be507ea1'), text='Machine learning models use neural networks for pattern recognition', embedding=[0.9, 0.8, 0.1, 0.2, 0.3], metadata={'category': 'ai', 'topic': 'machine learning', 'confidence': 0.95, 'author': 'Alice'}, document_id=UUID('3fa0fb29-2bdb-43f3-b2b0-4815e69d6cc5'), created_at=datetime.datetime(2025, 10, 31, 22, 37, 55, 379313), updated_at=datetime.datetime(2025, 10, 31, 22, 37, 55, 379313)),
 Chunk(id=UUID('d3faf2db-c3ab-453a-a307-3c9152cb191d'), text='Deep learning architectures enable complex AI applications', embedding=[0.85, 0.75, 0.15, 0.25, 0.35], metadata={'category': 'ai', 'topic': 'deep learning', 'confidence': 0.88, 'author': 'Bob'}, document_id=UUID('3fa0fb29-2bdb-43f3-b2b0-4815e69d6cc5'), created_at=datetime.datetime(2025, 10, 31, 22, 37, 55, 379329), updated_at=datetime.datetime(2025, 10, 31, 22, 37, 55, 379329)),
 Chunk(id=UUID('da84ef24-b029-46c4-9c98-50cd1c57527f'), text='Reinforcement learning enables autonomous

In [None]:
# lets uppdate all chunks with new metadata
for chunk in client.list_chunks(document_id=document.id):
    print(f"update chunk: {chunk.id}")

    # update the metadata on the chunk object
    chunk.metadata["reviewed_by"] = "the code bot"
    client.update_chunk(chunk)

update chunk: bdb78e0f-b8c2-4338-9950-b314be507ea1
update chunk: d3faf2db-c3ab-453a-a307-3c9152cb191d
update chunk: da84ef24-b029-46c4-9c98-50cd1c57527f
update chunk: 5fe03a9e-8fb6-41fe-9fd7-b48135918fe0
update chunk: 24f5c889-2215-4a3a-9b30-8e700711be6c
update chunk: 986bfa72-788e-49e1-94f5-ab14a2ba103a
update chunk: e57f543b-483c-4c27-be28-10a435eeb255


In [36]:
client.list_chunks(document_id=document.id)

[Chunk(id=UUID('bdb78e0f-b8c2-4338-9950-b314be507ea1'), text='Machine learning models use neural networks for pattern recognition', embedding=[0.9, 0.8, 0.1, 0.2, 0.3], metadata={'category': 'ai', 'topic': 'machine learning', 'confidence': 0.95, 'author': 'Alice', 'reviewed_by': 'the code bot'}, document_id=UUID('3fa0fb29-2bdb-43f3-b2b0-4815e69d6cc5'), created_at=datetime.datetime(2025, 10, 31, 22, 37, 55, 379313), updated_at=datetime.datetime(2025, 10, 31, 22, 40, 0, 900990)),
 Chunk(id=UUID('d3faf2db-c3ab-453a-a307-3c9152cb191d'), text='Deep learning architectures enable complex AI applications', embedding=[0.85, 0.75, 0.15, 0.25, 0.35], metadata={'category': 'ai', 'topic': 'deep learning', 'confidence': 0.88, 'author': 'Bob', 'reviewed_by': 'the code bot'}, document_id=UUID('3fa0fb29-2bdb-43f3-b2b0-4815e69d6cc5'), created_at=datetime.datetime(2025, 10, 31, 22, 37, 55, 379329), updated_at=datetime.datetime(2025, 10, 31, 22, 40, 0, 902480)),
 Chunk(id=UUID('da84ef24-b029-46c4-9c98-50c

## 4. Vector Search <a id="search"></a>

### Search Algorithm: kNN with Cosine Similarity

**Design Decision**: Post-filtering strategy
1. Perform vector search on index
2. Retrieve full chunk data
3. Apply metadata filters
4. Return top k results

**Tradeoff**: Simple implementation vs. optimal performance for highly selective filters

In [37]:
# Basic vector search
query_vector = [0.88, 0.78, 0.12, 0.22, 0.32]  # Query about AI/ML

results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=5,  # Return top 5 results
)

print(f"Query time: {results.query_time_ms:.2f}ms")
print(f"\nTop {len(results.results)} results:\n")

for i, result in enumerate(results.results, 1):
    print(f"{i}. Score: {result.score:.4f}")
    print(f"   Text: {result.text}")
    print(f"   Category: {result.metadata['category']}")
    print(f"   Author: {result.metadata['author']}")
    print()

Query time: 2.20ms

Top 5 results:

1. Score: 0.9999
   Text: Reinforcement learning enables autonomous decision making
   Category: ai
   Author: Alice

2. Score: 0.9995
   Text: Machine learning models use neural networks for pattern recognition
   Category: ai
   Author: Alice

3. Score: 0.9987
   Text: Deep learning architectures enable complex AI applications
   Category: ai
   Author: Bob

4. Score: 0.8285
   Text: Machine learning detects anomalies in network security
   Category: security
   Author: Charlie

5. Score: 0.5382
   Text: Kubernetes orchestrates containerized applications in the cloud
   Category: cloud
   Author: Alice



## 5. Advanced Filtering <a id="filtering"></a>

### Declarative Metadata Filtering

**Design Decision**: Two filtering approaches
1. **Declarative filters** (API-compatible): JSON-serializable filter definitions
2. **Custom Python functions** (SDK-only): Flexible client-side filtering

Both support complex logic (AND/OR operators, nested conditions)

In [38]:
from my_vector_db.domain.models import (
    SearchFilters,
    FilterGroup,
    MetadataFilter,
    FilterOperator,
    LogicalOperator,
)

# Example 1: Simple metadata filter
filters = SearchFilters(
    metadata=FilterGroup(
        operator=LogicalOperator.AND,
        filters=[
            MetadataFilter(field="category", operator=FilterOperator.EQUALS, value="ai")
        ],
    )
)

results = client.search(
    library_id=library.id, embedding=query_vector, k=5, filters=filters
)

print("Filtered by category='ai':\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. {result.text[:60]}... (score: {result.score:.4f})")

Filtered by category='ai':

1. Reinforcement learning enables autonomous decision making... (score: 0.9999)
2. Machine learning models use neural networks for pattern reco... (score: 0.9995)
3. Deep learning architectures enable complex AI applications... (score: 0.9987)


In [39]:
# Example 2: Complex AND/OR filter
complex_filters = SearchFilters(
    metadata=FilterGroup(
        operator=LogicalOperator.OR,
        filters=[
            # High-confidence AI articles
            FilterGroup(
                operator=LogicalOperator.AND,
                filters=[
                    MetadataFilter(
                        field="category", operator=FilterOperator.EQUALS, value="ai"
                    ),
                    MetadataFilter(
                        field="confidence",
                        operator=FilterOperator.GREATER_THAN,
                        value=0.9,
                    ),
                ],
            ),
            # OR any security article
            MetadataFilter(
                field="category", operator=FilterOperator.EQUALS, value="security"
            ),
        ],
    )
)

results = client.search(
    library_id=library.id, embedding=query_vector, k=10, filters=complex_filters
)

print("Filtered by (AI AND confidence>0.9) OR security:\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. [{result.metadata['category']}] {result.text[:50]}...")
    print(f"   Confidence: {result.metadata['confidence']}, Score: {result.score:.4f}")
    print()

Filtered by (AI AND confidence>0.9) OR security:

1. [ai] Machine learning models use neural networks for pa...
   Confidence: 0.95, Score: 0.9995

2. [security] Machine learning detects anomalies in network secu...
   Confidence: 0.82, Score: 0.8285

3. [security] Cybersecurity best practices protect against data ...
   Confidence: 0.91, Score: 0.4679



### Custom Filter Functions (SDK Only)

**Design Decision**: Client-side custom filtering
- Filter functions receive `SearchResult` objects (not `Chunk`)
- Over-fetch strategy (k×3) to compensate for filtering
- Enables complex text analysis, score-based filtering, etc.

**Tradeoff**: More network transfer, but maximum flexibility

In [40]:
from my_vector_db.sdk.models import SearchResult


# Custom filter: High-quality AI articles by Alice
def quality_ai_by_alice(result: SearchResult) -> bool:
    """
    Custom filter that combines:
    - High similarity score (>0.95)
    - AI category
    - Author is Alice
    - High confidence (>0.85)
    """
    return (
        result.score > 0.95
        and result.metadata.get("category") == "ai"
        and result.metadata.get("author") == "Alice"
        and result.metadata.get("confidence", 0) > 0.85
    )


results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=5,
    filter_function=quality_ai_by_alice,  # Pass function directly!
)

print("Custom filter (score>0.95, AI, Alice, confidence>0.85):\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. {result.text}")
    print(f"   Score: {result.score:.4f}, Confidence: {result.metadata['confidence']}")
    print()

Custom filter (score>0.95, AI, Alice, confidence>0.85):

1. Reinforcement learning enables autonomous decision making
   Score: 0.9999, Confidence: 0.9

2. Machine learning models use neural networks for pattern recognition
   Score: 0.9995, Confidence: 0.95



In [41]:
# Lambda filter: Articles mentioning "learning"
results = client.search(
    library_id=library.id,
    embedding=query_vector,
    k=5,
    filter_function=lambda result: "learning" in result.text.lower(),
)

print("Lambda filter (text contains 'learning'):\n")
for i, result in enumerate(results.results, 1):
    print(f"{i}. {result.text}")

Lambda filter (text contains 'learning'):

1. Reinforcement learning enables autonomous decision making
2. Machine learning models use neural networks for pattern recognition
3. Deep learning architectures enable complex AI applications
4. Machine learning detects anomalies in network security


## 6. Batch Operations <a id="batch"></a>

### Performance Comparison: Single vs Batch Inserts

**Design Decision**: Batch API for efficiency
- Reduces HTTP round-trips
- Atomic transactions
- Better for production workloads

In [None]:
import time

# Create test document
test_doc = client.create_document(library_id=library.id, name="batch_test")

# Generate test data
test_chunks = [
    {
        "text": f"Test article number {i}",
        "embedding": [0.1 * i, 0.2 * i, 0.3 * i, 0.4 * i, 0.5 * i],
        "metadata": {"index": i},
    }
    for i in range(1, 11)
]

# Batch insert
start = time.time()
client.add_chunks(document_id=test_doc.id, chunks=test_chunks)
batch_time = time.time() - start

print(f"Batch insert (10 chunks): {batch_time * 1000:.2f}ms")
print(f"Throughput: {len(test_chunks) / batch_time:.1f} chunks/sec")

In [43]:
[0.1 * i, 0.2 * i, 0.3 * i, 0.4 * i, 0.5 * i] * 100

[0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8,
 1.2,
 1.6,
 2.0,
 0.4,
 0.8

## 7. Persistence & Durability <a id="persistence"></a>

### Persistence Strategy: Periodic JSON Snapshots

**Design Decision**: Simple periodic snapshots
- Saves entire state to JSON every N operations (default: 10)
- Atomic writes (temp file + rename)
- Configurable via environment variables

**Architecture**:
```python
# In storage.py - no circular dependencies!
from serialization import serialize_to_json

def save_snapshot(self):
    serialize_to_json(libraries, documents, chunks, path)
```

**Tradeoff**: Simplicity vs WAL-based durability
- ✅ Easy to understand and debug
- ✅ Human-readable JSON format
- ✅ No threading complexity
- ⚠️ May lose last N operations on crash

### Check Persistence Status

In [None]:
# List all libraries (will show persistence working across restarts)
libraries = client.list_libraries()

print(f"Total libraries: {len(libraries)}\n")
for lib in libraries:
    print(f"Library: {lib.name}")
    print(f"  ID: {lib.id}")
    print(f"  Documents: {len(lib.document_ids)}")
    print(f"  Index: {lib.index_type}")
    for doc_id in lib.document_ids:
        doc = client.get_document(document_id=doc_id)
        print(f"    Document: {doc.name} (ID: {doc.id}) - Chunks: {len(doc.chunk_ids)}")

In [None]:
print(f"Total libraries: {len(libraries)}\n")
for lib in libraries:
    print(f"Library: {lib.name}")
    print(f"  ID: {lib.id}")
    print(f"  Documents: {len(lib.document_ids)}")
    print(f"  Index: {lib.index_type}")
    for doc_id in lib.document_ids:
        doc = client.get_document(document_id=doc_id)
        print(f"    Document: {doc.name} (ID: {doc.id}) - Chunks: {len(doc.chunk_ids)}")
        for chunk_id in doc.chunk_ids:
            chunk = client.get_chunk(chunk_id=chunk_id)
            print(f"      Chunk ID: {chunk.id} - Text: {chunk.text[:30]}...")

In [None]:
print(f"Total libraries: {len(libraries)}\n")
for lib in libraries:
    print(f"Library: {lib.name}")
    print(f"  ID: {lib.id}")
    print(f"  Documents: {len(lib.document_ids)}")
    print(f"  Index: {lib.index_type}")
    for doc_id in lib.document_ids:
        doc = client.get_document(document_id=doc_id)
        print(f"    Document: {doc.name} (ID: {doc.id}) - Chunks: {len(doc.chunk_ids)}")
        client.update_document(document_id=doc.id, name=doc.name + "_updated")


for lib in libraries:
    print(f"Library: {lib.name}")
    print(f"  ID: {lib.id}")
    print(f"  Documents: {len(lib.document_ids)}")
    print(f"  Index: {lib.index_type}")
    for doc_id in lib.document_ids:
        doc = client.get_document(document_id=doc_id)
        print(f"    Document: {doc.name} (ID: {doc.id}) - Chunks: {len(doc.chunk_ids)}")

### Docker Configuration for Persistence

```yaml
# docker-compose.yml
environment:
  - ENABLE_PERSISTENCE=true
  - PERSISTENCE_DIR=/app/data
  - PERSISTENCE_SAVE_EVERY=10  # Save every 10 operations
volumes:
  - ./data:/app/data  # Persist across restarts
```

**Test**: Restart the Docker container and re-run this notebook - data will be preserved!

In [None]:
client.save_snapshot()

In [None]:
client.delete_library(library_id=library.id)

In [None]:
print(client.list_libraries())

client.restore_snapshot()
libraries = client.list_libraries()
print(f"Total libraries: {len(libraries)}\n")
for lib in libraries:
    print(f"Library: {lib.name}")

## 8. Design Decisions & Tradeoffs <a id="design"></a>

### 1. Layered Architecture

**Decision**: Separate concerns into layers (API → Service → Storage → Index)

**Pros**:
- Clean separation of concerns
- Easy to test each layer independently
- Can swap implementations (e.g., different indexes)

**Cons**:
- More code
- Slight performance overhead from abstraction

---

### 2. Post-Filtering Strategy

**Decision**: Apply metadata filters AFTER vector search

```python
# Algorithm
1. kNN search → get candidates
2. Fetch chunk data
3. Apply filters
4. Return top k
```

**Pros**:
- Simple implementation
- Index layer doesn't need filter logic
- Works with any index type

**Cons**:
- May not return k results if filters are very selective
- Requires over-fetching (k×3)

**Production Alternative**: Pre-filtering with bitmap indexes for highly selective queries

---

### 3. Flat Index (Current) vs HNSW (Future)

| Metric | Flat Index | HNSW Index |
|--------|------------|------------|
| Search | O(n) - exact | O(log n) - approximate |
| Insert | O(1) | O(log n) |
| Memory | O(n·d) | O(n·M·log n) |
| Recall | 100% | ~95-99% (tunable) |
| Best For | <10K vectors | Millions of vectors |

**Decision**: Start with Flat, add HNSW for scale

---

### 4. JSON Persistence vs Database

**Decision**: JSON snapshots for take-home assessment

**Pros**:
- Human-readable
- Easy to debug
- No external dependencies
- Simple implementation (~200 LOC)

**Cons**:
- Not suitable for production at scale
- May lose last N operations

**Production Alternative**: PostgreSQL with pgvector, or specialized vector DB like Qdrant

---

### 5. Thread Safety: RLock

**Decision**: Reentrant locks for synchronization

```python
with self._lock:
    # Thread-safe operations
    self._libraries[id] = library
```

**Pros**:
- Simple and correct
- Allows nested locking (same thread can acquire multiple times)

**Cons**:
- Coarse-grained locking (entire storage)

**Production Alternative**: Fine-grained locking per library or read-write locks

---

### 6. Type Safety: Pydantic Models

**Decision**: Full Pydantic validation throughout

**Pros**:
- Compile-time and runtime validation
- Auto-generated OpenAPI docs
- IDE autocomplete

**Cons**:
- Slight performance overhead
- More verbose code

**Worth it**: Type safety prevents entire classes of bugs

---

### 7. Client-Side Custom Filtering

**Decision**: Custom filter functions work on `SearchResult`, not `Chunk`

**Pros**:
- No circular dependencies (domain ↔ SDK)
- Access to similarity `score` (only in SearchResult)
- No fake data (empty embeddings)

**Cons**:
- Embedding not available for filtering (acceptable - not typically needed)

---

### Summary: Pragmatic Tradeoffs for Take-Home Assessment

| Feature | Current Approach | Production Alternative |
|---------|------------------|------------------------|
| Persistence | JSON snapshots | PostgreSQL + WAL |
| Filtering | Post-filtering | Pre-filtering with bitmaps |
| Index | Flat (O(n)) | HNSW (O(log n)) |
| Locking | Coarse RLock | Fine-grained locks |
| Storage | In-memory | Distributed (Redis, etc.) |

**Philosophy**: Start simple, scale where needed. The current design is:
- ✅ Easy to understand
- ✅ Demonstrably correct
- ✅ Production-ready for moderate scale (<100K vectors)
- ✅ Extensible to handle larger scale

## Cleanup (Optional)

In [None]:
# Delete the demo library
# client.delete_library(library_id=library.id)
# print("✓ Cleaned up demo data")

# Close client connection
client.close()
print("✓ Client connection closed")

## Next Steps

1. **Explore the API**: Visit http://localhost:8000/docs for interactive Swagger docs
2. **Check persistence**: Restart Docker container and re-run to see data preserved
3. **Review code**: See `src/my_vector_db/` for implementation details
4. **Run tests**: `uv run pytest` to verify all functionality

---

**Questions for Discussion**:
- When would you switch from Flat to HNSW index?
- How would you implement distributed storage?
- What's your approach to monitoring and observability?
- How would you handle schema migrations?