# 🗃️ Week 07-08 · Notebook 06 · Vector Database Benchmarking

Compare Chroma, FAISS, and pgvector for storing manufacturing knowledge and supporting plant SLAs.

## 🎯 Learning Objectives
- Ingest identical document sets into Chroma, FAISS, and pgvector.
- Benchmark latency, recall, storage cost, and operational complexity.
- Track evaluation runs in MLflow for future audits.
- Produce a recommendation matrix for plant deployments.

## 🧩 Scenario
Pune and Monterrey plants debate hosting the retrieval store on-prem (FAISS) vs. managed (pgvector/CloudSQL). Leadership requires data to justify decision.

In [None]:
import time
import numpy as np
from langchain_community.vectorstores import Chroma, FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
import mlflow

# --- Setup ---
# Use a real-world-like set of documents for benchmarking
docs = [
    Document(page_content=f'Standard Operating Procedure for CNC Machine {i+1}: Regular maintenance includes checking coolant levels and ensuring safety guards are in place. Spindle calibration is required every 200 hours.', metadata={'doc_id': f'SOP-CNC-{i+1}'}) for i in range(100)
] + [
    Document(page_content=f'Maintenance Log {i+1}: Technician [REDACTED] reported unusual noise from Press Machine {i+1}. Root cause was a loose bolt, which was tightened.', metadata={'doc_id': f'LOG-PRESS-{i+1}'}) for i in range(100)
]

# Use a standard, lightweight embedding model
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

# --- Benchmarking Function ---
def benchmark(store_cls, name: str, docs: list, embedding_function):
    """
    Benchmarks a vector store for ingestion time and query latency.
    """
    print(f"--- Benchmarking {name} ---")
    
    # 1. Ingestion Time
    start_ingest = time.perf_counter()
    if name == 'Chroma':
        # Chroma can be ephemeral or persistent
        vector_store = store_cls.from_documents(docs, embedding_function, persist_directory=f"./chroma_db_{name}")
    else: # FAISS
        vector_store = store_cls.from_documents(docs, embedding_function)
    ingest_time_ms = (time.perf_counter() - start_ingest) * 1000
    print(f"Ingestion Time: {ingest_time_ms:.2f} ms")

    # 2. Query Latency
    query = "What is the procedure for spindle calibration?"
    latencies = []
    for _ in range(10): # Run multiple queries to get an average
        start_query = time.perf_counter()
        vector_store.similarity_search(query, k=4)
        query_time_ms = (time.perf_counter() - start_query) * 1000
        latencies.append(query_time_ms)
    
    avg_latency_ms = np.mean(latencies)
    p95_latency_ms = np.percentile(latencies, 95)
    print(f"Average Query Latency: {avg_latency_ms:.2f} ms")
    print(f"P95 Query Latency: {p95_latency_ms:.2f} ms\n")

    # Log metrics to MLflow
    with mlflow.start_run(run_name=f"VectorStore_Benchmark_{name}"):
        mlflow.log_param("vector_store", name)
        mlflow.log_metric("ingestion_time_ms", ingest_time_ms)
        mlflow.log_metric("avg_query_latency_ms", avg_latency_ms)
        mlflow.log_metric("p95_query_latency_ms", p95_latency_ms)
        mlflow.log_param("num_documents", len(docs))

    return {
        "name": name,
        "ingestion_time_ms": ingest_time_ms,
        "avg_query_latency_ms": avg_latency_ms,
        "p95_query_latency_ms": p95_latency_ms
    }

# --- Run Benchmarks ---
# List of vector stores to benchmark
vector_stores_to_benchmark = [
    {"class": Chroma, "name": "Chroma"},
    {"class": FAISS, "name": "FAISS"},
]

results = []
for store_info in vector_stores_to_benchmark:
    result = benchmark(store_info["class"], store_info["name"], docs, embeddings)
    results.append(result)

print("--- Benchmark Summary ---")
for res in results:
    print(f"{res['name']}: Ingestion={res['ingestion_time_ms']:.2f}ms, Avg Query={res['avg_query_latency_ms']:.2f}ms")

> ⚠️ pgvector benchmarking requires a running Postgres instance with the extension enabled. Refer to `infrastructure/pgvector_setup.sql`.

## 📊 Evaluation Matrix
| Criterion | Chroma | FAISS | pgvector |

## 🧪 Lab Assignment
1. Run pgvector benchmark using CloudSQL dev instance and capture metrics.
2. Evaluate recall by scoring against labeled maintenance Q&A set.
3. Log benchmarks to MLflow (`mlflow.log_metrics` & `mlflow.log_dict`).
4. Draft recommendation memo for CIO summarizing trade-offs.

## ✅ Checklist
- [ ] Benchmarks executed for all stores
- [ ] Metrics logged
- [ ] Recommendation memo drafted
- [ ] Governance evidence archived

## 📚 References
- LangChain Vectorstore Docs
- pgvector Extension Guide
- Week 05 Data Storage Policy