# 🗄️ Vector Databases

## Storing and Searching Embeddings at Scale

---


In [None]:
import numpy as np
print('✅ Vector DB concepts!')


## Why Vector Databases?

**Traditional DB**: Exact match on keywords
**Vector DB**: Semantic similarity search

### Core Operations

1. **Insert**: Add vector + metadata
2. **Search**: Find k nearest neighbors
3. **Update**: Modify vectors
4. **Delete**: Remove vectors

### Similarity Metrics

**Cosine**: $\frac{a \cdot b}{\|a\|\|b\|}$ (most common)
**Euclidean**: $\sqrt{\sum(a_i - b_i)^2}$
**Dot Product**: $a \cdot b$

### HNSW (Hierarchical Navigable Small World)

**The secret sauce** of fast vector search

- **Graph-based** search structure
- **Multiple layers**: Coarse → Fine
- **Complexity**: O(log n) vs O(n) brute force
- **Trade-off**: Memory for speed


## Chroma Example

```python
import chromadb

client = chromadb.Client()
collection = client.create_collection('docs')

# Add documents
collection.add(
    documents=['AI is amazing', 'ML is powerful'],
    ids=['doc1', 'doc2']
)

# Query
results = collection.query(
    query_texts=['artificial intelligence'],
    n_results=2
)
```

## Qdrant Example

```python
from qdrant_client import QdrantClient

client = QdrantClient(':memory:')

client.create_collection(
    collection_name='docs',
    vectors_config={'size': 384, 'distance': 'Cosine'}
)

client.upsert(
    collection_name='docs',
    points=[{'id': 1, 'vector': [0.1]*384}]
)
```
