Optimization & Best Practices

**Referenced Files in This Document** - [service.ts](file://src/services/embedding/service.ts) - [bm25-tokenizer.ts](file://src/services/embedding/bm25-tokenizer.ts) - [providers.ts](file://src/services/embedding/providers.ts) - [config.ts](file://src/services/embedding/config.ts) - [types.ts](file://src/services/embedding/types.ts) - [embedding-metrics.ts](file://src/services/metrics/embedding-metrics.ts) - [audit.ts](file://src/services/embedding/audit.ts) - [health.ts](file://src/services/embedding/health.ts) - [store-init.ts](file://src/services/memory/store-init.ts) - [search.ts](file://src/services/qdrant/search.ts) - [memory-retrieval.ts](file://src/services/qdrant/memory-retrieval.ts) - [qdrant-vector-management.ts](file://src/utils/qdrant-vector-management.ts) - [redis-cache.ts](file://src/services/redis-cache.ts) - [http-health-routes.ts](file://src/http/http-health-routes.ts) - [kairos-search-scores.test.ts](file://tests/integration/kairos-search-scores.test.ts)

Introduction

This document provides comprehensive optimization and best practices for embedding systems in the repository. It covers BM25 tokenizer implementation for keyword-based search enhancement, hybrid search strategies combining dense and sparse vectors, embedding dimension optimization, batch sizing, provider selection, cost optimization, caching, performance benchmarking, quality assessment, model comparison, troubleshooting, storage optimization, retrieval tuning, and scalability for high-volume workloads.

Project Structure

The embedding optimization spans several modules:

Embedding service orchestrates provider selection, batching, anomaly detection, and metrics.
Provider implementations support OpenAI and TEI with retry logic and robust error handling.
BM25 tokenizer enables keyword-based sparse vector search for hybrid retrieval.
Qdrant integration supports hybrid search, vector management, and pagination.
Redis cache accelerates search results and integrates with invalidation channels.
Health checks and audit/logging provide observability and anomaly detection.

graph TB
subgraph "Embedding Layer"
ES["EmbeddingService<br/>service.ts"]
CFG["Embedding Config<br/>config.ts"]
PRV["Providers<br/>providers.ts"]
BM25["BM25 Tokenizer<br/>bm25-tokenizer.ts"]
AUD["Audit & Anomalies<br/>audit.ts"]
HM["Health Check<br/>health.ts"]
MET["Metrics<br/>embedding-metrics.ts"]
end
subgraph "Vector Store"
QCONN["Qdrant Connection<br/>search.ts"]
QVM["Vector Management<br/>qdrant-vector-management.ts"]
QINIT["Collection Init & BM25<br/>store-init.ts"]
end
subgraph "Caching"
RC["Redis Cache<br/>redis-cache.ts"]
end
ES --> PRV
ES --> CFG
ES --> MET
ES --> AUD
ES --> HM
ES --> BM25
ES --> QCONN
QCONN --> QVM
QCONN --> QINIT
QCONN --> RC

Diagram sources

service.ts:38-284
providers.ts:251-278
config.ts:12-36
bm25-tokenizer.ts:37-56
audit.ts:94-157
health.ts:16-119
embedding-metrics.ts:11-47
search.ts:11-82
qdrant-vector-management.ts:13-114
store-init.ts:130-148
redis-cache.ts:21-211

Section sources

service.ts:14-284
providers.ts:1-280
bm25-tokenizer.ts:1-57
config.ts:1-40
embedding-metrics.ts:1-51
audit.ts:1-197
health.ts:1-121
search.ts:1-82
qdrant-vector-management.ts:1-301
store-init.ts:1-155
redis-cache.ts:1-211

Core Components

EmbeddingService: Central orchestration for generating single and batch embeddings, selecting providers, validating dimensions, emitting metrics, and logging anomalies.
Providers: OpenAI and TEI implementations with retry/backoff, error categorization, and audit logging.
BM25 Tokenizer: Lightweight tokenizer producing sparse vectors for Qdrant BM25 search.
Qdrant Hybrid Search: Dense similarity search combined with BM25 sparse retrieval and reciprocal rank fusion.
Vector Management: Named vector creation, migration between dimensions, and safe recreation strategies.
Redis Cache: Search result caching with TTL and invalidation patterns.
Metrics and Audit: Embedding request counters, durations, vector sizes, batch sizes, and anomaly detection.

Section sources

service.ts:38-284
providers.ts:77-278
bm25-tokenizer.ts:37-56
search.ts:11-82
qdrant-vector-management.ts:13-202
redis-cache.ts:21-211
embedding-metrics.ts:11-47
audit.ts:94-157

Architecture Overview

The embedding pipeline integrates with Qdrant for hybrid retrieval and caches hot queries in Redis. Health checks and audit trails ensure reliability and observability.

sequenceDiagram
participant Client as "Client"
participant ES as "EmbeddingService"
participant PRV as "Providers"
participant Q as "Qdrant"
participant RC as "RedisCache"
Client->>ES : generateEmbedding(query)
ES->>PRV : postEmbeddings(query)
PRV-->>ES : embeddings[]
ES->>ES : validate dimension, norms, latency
ES-->>Client : embedding
Client->>RC : getSearchResult(query, limit)
alt cache miss
RC-->>Client : null
Client->>ES : generateEmbedding(query)
ES->>Q : hybrid search (dense + BM25 RRF)
Q-->>ES : results
ES-->>RC : cache results
ES-->>Client : results
else cache hit
RC-->>Client : cached results
end

Diagram sources

service.ts:47-127
providers.ts:251-278
search.ts:11-82
redis-cache.ts:36-70

Detailed Component Analysis

EmbeddingService: Dimension, Batching, Metrics, and Anomalies

Dimension resolution: Resolved at first successful call and enforced on subsequent calls to prevent mismatches.
Batch processing: Validates inputs, tracks batch size, and ensures consistent dimensions across vectors.
Metrics: Tracks request counts, durations, vector sizes, and batch sizes with tenant/provider labels.
Anomaly detection: Flags high latency, unusual vector norms, and dimension mismatches.

flowchart TD
Start(["generateEmbedding"]) --> Normalize["Normalize input text"]
Normalize --> EmptyCheck{"Empty?"}
EmptyCheck --> |Yes| ThrowEmpty["Throw error"]
EmptyCheck --> |No| CallProvider["postEmbeddings()"]
CallProvider --> Parse["Parse embeddings"]
Parse --> ValidateDim["Validate dimension"]
ValidateDim --> Anomaly["Detect anomalies"]
Anomaly --> |Critical| ThrowMismatch["Throw dimension mismatch"]
Anomaly --> |OK| EmitMetrics["Emit metrics & audit"]
EmitMetrics --> Return["Return embedding"]

Diagram sources

service.ts:47-127
audit.ts:94-157
embedding-metrics.ts:11-47

Section sources

service.ts:38-284
audit.ts:94-157
embedding-metrics.ts:11-47

Providers: OpenAI and TEI with Retry and Audit

Provider selection: Explicit preference or auto-detection with fallback.
Retry/backoff: Network transient errors and specific HTTP statuses retried with bounded attempts.
Audit logging: Captures provider calls with status, dimensions, latency, and optional HTTP status/error messages.

flowchart TD
PStart(["postEmbeddings"]) --> Pref{"Provider pref?"}
Pref --> |openai| OA["postEmbeddingsOpenAI"]
Pref --> |tei| TEI["postEmbeddingsTEI"]
Pref --> |auto| AutoOA{"OpenAI configured?"}
AutoOA --> |Yes| TryOA["Try OpenAI"]
TryOA --> OAErr{"Error?"}
OAErr --> |Yes & TEI configured| TEI
OAErr --> |No| Done["Return embeddings"]
AutoOA --> |No| TEI
TEI --> Done

Diagram sources

providers.ts:251-278
providers.ts:77-175
providers.ts:177-249

Section sources

providers.ts:14-47
providers.ts:77-249

BM25 Tokenizer: Sparse Vectors for Keyword Search

Tokenization: Lowercase, non-alphanumeric split, stop word removal, minimum token length.
Indexing: FNV-1a hash modulo sparse dimension.
Values: Sublinear term frequency 1 + log(count).
Output: Indices and values arrays for Qdrant sparse vector search.

flowchart TD
TStart(["tokenizeToSparse"]) --> Lower["Lowercase"]
Lower --> Split["Split on non-alphanumeric"]
Split --> Filter["Remove stop words & short tokens"]
Filter --> Count["Count term frequencies"]
Count --> Hash["FNV-1a hash mod 30000"]
Hash --> TF["Compute 1 + log(count)"]
TF --> Emit["Emit {indices, values}"]

Diagram sources

bm25-tokenizer.ts:37-56

Section sources

bm25-tokenizer.ts:1-57

Hybrid Search: Dense + BM25 with Reciprocal Rank Fusion

Dense search: Vector similarity using the primary dense vector.
BM25 sparse search: Keyword matching using BM25 sparse vectors.
Fusion: Reciprocal rank fusion (RRF) combines results from multiple prefetch queries.
Scoring: Boosts for title, activation patterns, labels, and tags.

sequenceDiagram
participant S as "Search"
participant ES as "EmbeddingService"
participant Q as "Qdrant"
participant BM25 as "BM25 Tokenizer"
S->>ES : generateEmbedding(query)
ES-->>S : query vector
S->>Q : queryVector (using primary dense)
S->>Q : BM25 query (sparse)
Q-->>S : dense results
Q-->>S : BM25 results
S->>S : RRF fusion with boosts
S-->>Client : ranked results

Diagram sources

search.ts:11-82
bm25-tokenizer.ts:37-56

Section sources

search.ts:11-82
store-init.ts:130-148

Vector Management: Adding Named Vectors, Migration, and Safe Recreation

Add named vectors: Attempts updateCollection; falls back to safe recreation preserving data.
Migrate vector space: Scrolls points with old vector, re-embeds content, and upserts new vectors in batches.
Remove vector: Recreates collection without the target vector and restores data.

flowchart TD
VMStart(["addVectorsToCollection"]) --> TryUpdate["Try updateCollection"]
TryUpdate --> |Success| Done["Done"]
TryUpdate --> |Fail| Gather["Scroll all points"]
Gather --> Recreate["Delete & recreate collection with merged vectors"]
Recreate --> Restore["Upsert points in batches"]
Restore --> Done

Diagram sources

qdrant-vector-management.ts:13-114
qdrant-vector-management.ts:126-202
qdrant-vector-management.ts:208-301

Section sources

qdrant-vector-management.ts:13-114
qdrant-vector-management.ts:126-202
qdrant-vector-management.ts:208-301

Redis Cache: Search Result Caching and Invalidation

Caching: Stores search results with TTL and distinguishes collapsed vs natural modes.
Invalidation: Provides targeted invalidation for begin/activate caches and a pub/sub channel for broader invalidation.
Integration: Used by higher-level search flows to accelerate repeated queries.

flowchart TD
RS(["getSearchResult"]) --> BuildKey["Build cache key"]
BuildKey --> KVGet["keyValueStore.getJson"]
KVGet --> Hit{"Cache hit?"}
Hit --> |Yes| IncHit["Increment hits"] --> ReturnHit["Return cached"]
Hit --> |No| IncMiss["Increment misses"] --> ReturnMiss["Return null"]
WS(["setSearchResult"]) --> BuildKey2["Build cache key"]
BuildKey2 --> Serialize["Serialize results"]
Serialize --> KVSet["keyValueStore.setJson(TTL)"]
KVSet --> Done["Done"]

Diagram sources

redis-cache.ts:36-70
redis-cache.ts:186-211

Section sources

redis-cache.ts:21-211

Health Checks and Observability

Health: Determines provider availability, handles rate limits and auth failures, and probes endpoint readiness.
Audit: Emits structured logs for successes and errors, including dimensions, latency, and error messages.
Metrics: Exposes counters and histograms for embedding requests, durations, errors, vector sizes, and batch sizes.

flowchart TD
HC(["runEmbeddingHealthCheck"]) --> Pref{"Provider pref?"}
Pref --> |openai| OAHC["Check OpenAI"]
Pref --> |tei| TEIHC["Check TEI"]
Pref --> |auto| TryOAHC["Try OpenAI"]
TryOAHC --> OAHC
OAHC --> OAStatus{"Operational?"}
OAStatus --> |Yes| ReportOK["Report healthy"]
OAStatus --> |No| CheckTEI["Try TEI if configured"]
CheckTEI --> TEIStatus{"Operational?"}
TEIStatus --> |Yes| ReportOK
TEIStatus --> |No| ReportFail["Report unhealthy"]

Diagram sources

health.ts:16-119
audit.ts:60-92
embedding-metrics.ts:11-47

Section sources

health.ts:16-119
audit.ts:60-92
embedding-metrics.ts:11-47

Dependency Analysis

EmbeddingService depends on provider implementations, config, metrics, audit, and health modules.
Qdrant search depends on EmbeddingService for query vectors and BM25 tokenizer for sparse queries.
Vector management utilities depend on Qdrant client APIs and collection metadata.
Redis cache integrates with the key-value store abstraction and invalidation patterns.

graph LR
ES["EmbeddingService"] --> PRV["Providers"]
ES --> CFG["Config"]
ES --> MET["Metrics"]
ES --> AUD["Audit"]
ES --> HM["Health"]
ES --> BM25["BM25 Tokenizer"]
ES --> QSRCH["Qdrant Search"]
QSRCH --> QVM["Vector Management"]
QSRCH --> QINIT["Collection Init"]
QSRCH --> RC["Redis Cache"]

Diagram sources

service.ts:38-284
providers.ts:251-278
config.ts:12-36
embedding-metrics.ts:11-47
audit.ts:94-157
health.ts:16-119
bm25-tokenizer.ts:37-56
search.ts:11-82
qdrant-vector-management.ts:13-114
store-init.ts:130-148
redis-cache.ts:21-211

Section sources

service.ts:38-284
providers.ts:251-278
search.ts:11-82
qdrant-vector-management.ts:13-114
redis-cache.ts:21-211

Performance Considerations

Embedding dimension optimization
- Resolve dimension at startup via a minimal embedding probe to cache and enforce consistent dimensions across requests.
- Ensure vector size metrics reflect float32 size (4 bytes per float) to estimate storage and network overhead.
Batch size tuning
- Track batch sizes with histograms to identify optimal throughput vs latency trade-offs.
- Filter out empty inputs and compute character lengths for cost-aware batching.
Provider selection strategies
- Prefer explicit provider configuration for predictable behavior; auto-detection with OpenAI preferred when both are configured.
- Use health checks to detect rate limits and authentication failures and adjust fallbacks accordingly.
Hybrid search tuning
- Adjust prefetch limits per vector type and combine with RRF to balance precision and recall.
- Apply boost weights for title, activation patterns, labels, and tags to align with domain semantics.
Caching and storage
- Use Redis cache for frequent queries with TTL; invalidate on memory updates to maintain freshness.
- For vector migrations, batch upserts and preserve payload to minimize downtime and memory pressure.
Cost optimization
- Monitor embedding vector sizes and request counts; reduce dimensionality or query frequency where appropriate.
- Use BM25 sparse vectors to improve keyword recall without increasing dense vector costs.
Benchmarking and quality assessment
- Maintain baselines for top scores and query performance; compare current results against historical baselines to detect regressions.
- Use anomaly detection for latency and vector norms to flag potential provider or model issues.

[No sources needed since this section provides general guidance]

Troubleshooting Guide

Common issues and resolutions:

Provider misconfiguration
- Symptoms: Authentication failures, rate limits, or missing endpoints.
- Actions: Verify API keys and model names; use health checks to confirm connectivity and status codes.
Dimension mismatch
- Symptoms: Errors indicating unexpected embedding dimensions.
- Actions: Ensure a startup probe resolves the dimension; enforce dimension validation on all subsequent calls.
Latency spikes and anomalies
- Symptoms: High-latency requests or unusual vector norms.
- Actions: Review audit logs and anomaly events; adjust provider selection or retry policies.
Cache invalidation
- Symptoms: Stale search results after memory updates.
- Actions: Trigger invalidation for begin/activate caches and ensure pub/sub invalidation is enabled.
Vector migration failures
- Symptoms: Partial migrations or inconsistencies after dimension changes.
- Actions: Use safe recreation strategy and batch upserts; monitor progress and handle partial failures gracefully.

Section sources

health.ts:16-119
audit.ts:94-157
redis-cache.ts:186-211
qdrant-vector-management.ts:13-114

Conclusion

This repository implements a robust, observable, and scalable embedding system with hybrid dense/sparse retrieval, provider flexibility, and strong operational controls. By leveraging BM25 tokenization, vector management utilities, Redis caching, and comprehensive metrics and audit logging, teams can optimize embedding quality, performance, and cost while maintaining reliability at scale.

[No sources needed since this section summarizes without analyzing specific files]

Appendices

Practical Examples and Techniques

Embedding quality assessment
- Compare vector norms and latencies over time; flag anomalies exceeding thresholds.
- Validate embedding dimensions across batches and reject inconsistent results.
Model comparison
- Run health checks and measure latency distributions for different providers/models.
- Benchmark hybrid search top scores against historical baselines to detect regressions.
Benchmarking methods
- Use histogram metrics for embedding durations and batch sizes to identify bottlenecks.
- Track embedding vector sizes to estimate storage and bandwidth needs.

Section sources

audit.ts:94-157
embedding-metrics.ts:11-47
kairos-search-scores.test.ts:103-134

Optimization & Best Practices

Optimization & Best Practices

Table of Contents

Introduction

Project Structure

Core Components

Architecture Overview

Detailed Component Analysis

EmbeddingService: Dimension, Batching, Metrics, and Anomalies

Providers: OpenAI and TEI with Retry and Audit

BM25 Tokenizer: Sparse Vectors for Keyword Search

Hybrid Search: Dense + BM25 with Reciprocal Rank Fusion

Vector Management: Adding Named Vectors, Migration, and Safe Recreation

Redis Cache: Search Result Caching and Invalidation

Health Checks and Observability

Dependency Analysis

Performance Considerations

Troubleshooting Guide

Conclusion

Appendices

Practical Examples and Techniques

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!