-
Notifications
You must be signed in to change notification settings - Fork 0
Embedding Services Architecture
Changes Made
- Enhanced provider abstraction documentation with comprehensive fallback mechanisms
- Expanded configuration management coverage including environment variable precedence
- Added detailed health monitoring and anomaly detection sections
- Updated BM25 tokenizer integration with hybrid search capabilities
- Improved audit trail system documentation with structured logging
- Enhanced Qdrant integration documentation with vector management
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
This document describes the KAIROS MCP embedding services layer, focusing on the provider abstraction supporting multiple backends (OpenAI, TEI), initialization and provider selection logic, health monitoring, hybrid search with BM25 tokenization, audit trails, configuration management, error handling and fallbacks, caching and batching, and the relationship between embedding models and Qdrant vector storage.
The embedding subsystem is organized around a cohesive set of modules:
- Provider abstraction and selection logic with automatic fallback
- Health checks and anomaly detection
- BM25 tokenizer for sparse vectors
- Audit logging and metrics
- Integration with Qdrant vector storage and hybrid search
graph TB
subgraph "Embedding Layer"
ES["EmbeddingService<br/>service.ts"]
CFG["Embedding Config<br/>config.ts"]
PRV["Providers<br/>providers.ts"]
AUD["Audit & Anomalies<br/>audit.ts"]
HLT["Health Check<br/>health.ts"]
BM25["BM25 Tokenizer<br/>bm25-tokenizer.ts"]
TYP["Types<br/>types.ts"]
end
subgraph "Qdrant Integration"
QSRV["QdrantService<br/>qdrant/service.ts"]
QSRC["Qdrant Search<br/>qdrant/search.ts"]
QINIT["Store Init & Vectors<br/>memory/store-init.ts"]
QMETH["Memory Methods<br/>memory/store-methods.ts"]
QVT["Vector Types<br/>utils/qdrant-vector-types.ts"]
QVM["Vector Management<br/>utils/qdrant-vector-management.ts"]
end
ES --> PRV
ES --> CFG
ES --> AUD
ES --> HLT
ES --> BM25
ES --> TYP
QSRV --> QSRC
QSRV --> QINIT
QSRV --> QMETH
QSRV --> QVT
QSRV --> QVM
QSRC --> ES
QMETH --> ES
QINIT --> ES
Diagram sources
- service.ts:38-284
- providers.ts:251-278
- config.ts:12-36
- audit.ts:60-157
- health.ts:16-119
- bm25-tokenizer.ts:37-56
- types.ts:1-17
- service.ts:16-152
- search.ts:11-82
- store-methods.ts:150-231
- store-init.ts:1-28
- qdrant-vector-types.ts:8-35
- qdrant-vector-management.ts:126-202
Section sources
- service.ts:1-293
- providers.ts:1-280
- config.ts:1-40
- audit.ts:1-197
- health.ts:1-121
- bm25-tokenizer.ts:1-57
- types.ts:1-17
- service.ts:1-152
- search.ts:1-82
- store-methods.ts:150-231
- store-init.ts:1-28
- qdrant-vector-types.ts:1-57
- qdrant-vector-management.ts:1-301
- EmbeddingService: orchestrates embedding generation, batch processing, cosine similarity, memory embedding composition, health checks, provider selection, and configuration exposure. It integrates metrics, audit logs, and anomaly detection.
- Providers: encapsulate OpenAI and TEI embedding endpoints, including retry logic, transient error handling, and response normalization.
- Config: manages runtime dimension resolution and endpoint construction for embedding providers.
- Health: performs provider-specific health checks and reports operational status.
- Audit: records embedding operations and detects anomalies (latency, vector norms, dimension mismatches).
- BM25 Tokenizer: converts text to sparse vectors for hybrid search with Qdrant.
- Qdrant Integration: initializes collections with named vectors, executes hybrid queries combining dense and sparse signals, and manages vector migrations.
Section sources
- service.ts:38-284
- providers.ts:77-278
- config.ts:12-36
- health.ts:16-119
- audit.ts:60-157
- bm25-tokenizer.ts:37-56
- service.ts:16-152
The embedding layer follows a provider abstraction pattern with explicit selection logic and robust fallbacks. Embeddings are consumed by Qdrant for hybrid search, leveraging both dense vector similarity and BM25 sparse vector matching.
sequenceDiagram
participant Client as "Client"
participant ES as "EmbeddingService"
participant PRV as "Providers"
participant OA as "OpenAI Endpoint"
participant TI as "TEI Endpoint"
participant QD as "Qdrant"
Client->>ES : generateEmbedding(query)
ES->>ES : getProvider()
alt Provider=OpenAI
ES->>PRV : postEmbeddingsOpenAI(query)
PRV->>OA : POST /v1/embeddings
OA-->>PRV : embeddings[]
PRV-->>ES : embeddings[]
else Provider=TEI
ES->>PRV : postEmbeddingsTEI(query)
PRV->>TI : POST /v1/embeddings
TI-->>PRV : embeddings[]
PRV-->>ES : embeddings[]
else Auto/Local
ES->>PRV : postEmbeddings(query)
PRV->>OA : POST /v1/embeddings
OA-->>PRV : embeddings[] or error
alt OpenAI error and TEI configured
PRV->>TI : POST /v1/embeddings
TI-->>PRV : embeddings[]
end
PRV-->>ES : embeddings[]
end
ES->>ES : detectEmbeddingAnomalies()
ES->>QD : searchMemory(embedding)
QD-->>Client : results
Diagram sources
- service.ts:47-127
- providers.ts:251-278
- providers.ts:77-175
- providers.ts:177-249
- search.ts:11-82
- Explicit preference via environment variable selects OpenAI or TEI.
- Auto-detection prefers OpenAI when credentials are present; falls back to TEI if configured.
- Local fallback is available when neither provider is configured.
- Both providers implement retry logic for transient network errors and specific HTTP statuses.
flowchart TD
Start(["postEmbeddings(input)"]) --> Pref["Read EMBEDDING_PROVIDER"]
Pref --> |openai| CheckOA{"OPENAI_API_KEY + MODEL set?"}
CheckOA --> |Yes| OA["postEmbeddingsOpenAI"]
CheckOA --> |No| TEIChk{"TEI_BASE_URL + MODEL set?"}
Pref --> |tei| CheckTI{"TEI_BASE_URL + MODEL set?"}
CheckTI --> |Yes| TI["postEmbeddingsTEI"]
CheckTI --> |No| Error["Throw: TEI not configured"]
Pref --> |auto| OAauto{"OPENAI vars present?"}
OAauto --> |Yes| TryOA["Try OpenAI"]
TryOA --> OAok{"Success?"}
OAok --> |Yes| Done["Return embeddings"]
OAok --> |No| TEIauto{"TEI vars present?"}
TEIauto --> |Yes| TI["Try TEI"]
TI --> Done
OAauto --> |No| TEIauto2{"TEI vars present?"}
TEIauto2 --> |Yes| TI2["postEmbeddingsTEI"]
TEIauto2 --> |No| Error2["Throw: no provider configured"]
Diagram sources
- providers.ts:251-278
- service.ts:258-265
Section sources
- service.ts:258-265
- providers.ts:251-278
- Dimension probing ensures consistent vector sizes across requests.
- Metrics capture request counts, durations, errors, vector sizes, and batch sizes.
- Audit logs success and error events with tenant/request identifiers.
- Anomaly detection flags high latency, unusual vector norms, and dimension mismatches.
classDiagram
class EmbeddingService {
-embeddingDimension : number
+generateEmbedding(text) : Promise~EmbeddingResult~
+generateBatchEmbeddings(texts) : Promise~BatchEmbeddingResult~
+calculateCosineSimilarity(a,b) : number
+generateMemoryEmbedding(memory) : Promise~number[]~
+healthCheck() : Promise~Result~
+getProvider() : Provider
+getConfig() : Config
}
class Audit {
+logEmbeddingAuditSuccess(payload)
+logEmbeddingAuditError(payload)
+detectEmbeddingAnomalies(params)
}
class Metrics {
+embeddingRequests
+embeddingDuration
+embeddingErrors
+embeddingVectorSize
+embeddingBatchSize
}
EmbeddingService --> Audit : "uses"
EmbeddingService --> Metrics : "records"
Diagram sources
- service.ts:38-284
- audit.ts:60-157
Section sources
- service.ts:38-127
- audit.ts:60-157
- Provider-specific health checks validate endpoint reachability and basic response shape.
- OpenAI and TEI health checks handle authentication failures, rate limits, and generic errors.
- The HTTP health route aggregates Qdrant, Redis/cache, and embedding provider health.
sequenceDiagram
participant HC as "runEmbeddingHealthCheck()"
participant OA as "OpenAI"
participant TI as "TEI"
participant CFG as "Config"
HC->>CFG : Read EMBEDDING_PROVIDER
alt Provider=openai
HC->>OA : POST /v1/embeddings("health check")
OA-->>HC : 200 + embeddings
HC-->>HC : healthy=true
else Provider=tei
HC->>TI : POST /v1/embeddings("health check")
TI-->>HC : 200 + embeddings
HC-->>HC : healthy=true
else Auto
HC->>OA : Try OpenAI
OA-->>HC : Success/Failure
alt Failure and TEI configured
HC->>TI : Try TEI
TI-->>HC : Success/Failure
end
end
Diagram sources
- health.ts:16-119
- http-health-routes.ts:46-78
Section sources
- health.ts:16-119
- http-health-routes.ts:46-78
- Tokenizer converts text to sparse vectors with indices and values suitable for Qdrant sparse vector search.
- Stop words are removed, tokens are lowercased, hashed via FNV-1a modulo 30000, and values use sublinear term frequency.
- Hybrid search combines dense vectors (using the embedding dimension) with BM25 sparse vectors and full-text fields.
flowchart TD
A["Input Text"] --> B["Lowercase & Split on Non-Alphanumeric"]
B --> C["Filter Short Tokens & Stop Words"]
C --> D["Compute Term Frequencies"]
D --> E["FNV-1a Hash Mod 30000"]
E --> F["Values = 1 + Log(Term Frequency)"]
F --> G["Produce {indices[], values[]}"]
Diagram sources
- bm25-tokenizer.ts:37-56
Section sources
- bm25-tokenizer.ts:1-57
- store-methods.ts:150-231
- Structured audit logs capture provider, model, input/output characteristics, latency, and error messages.
- Anomaly detection emits warnings or errors for high latency, unusual vector norms, and dimension mismatches.
- Metrics track embedding usage and performance.
Section sources
- audit.ts:60-157
- Environment-driven configuration supports OpenAI and TEI backends.
- Provider selection logic respects explicit preferences and auto-detection rules.
- Endpoint construction and dimension caching are handled centrally.
Section sources
- config.ts:67-74
- config.ts:5-10
- service.ts:267-283
- Retries for transient network errors and specific HTTP statuses (rate limits, gateway timeouts).
- Fallback from OpenAI to TEI when configured and when OpenAI fails.
- Comprehensive error propagation with structured logging and metrics.
Section sources
- providers.ts:14-47
- providers.ts:263-272
- Batch embedding reduces overhead by sending multiple texts in a single request.
- Metrics track batch sizes and vector sizes to optimize memory usage.
- Hybrid search leverages precomputed named vectors and sparse BM25 vectors to improve recall and precision.
Section sources
- service.ts:129-221
- service.ts:223-234
- store-methods.ts:150-231
- Named vectors in Qdrant use the resolved embedding dimension for primary, adapter title, and activation pattern vectors.
- Vector management utilities add, migrate, and remove named vectors safely across collections.
- Hybrid search queries combine dense vectors with BM25 sparse vectors and full-text fields.
graph LR
DIM["Resolved Embedding Dimension<br/>config.ts"] --> NAMED["Named Vectors<br/>qdrant-vector-types.ts"]
NAMED --> COL["Qdrant Collection<br/>store-init.ts"]
COL --> HYBRID["Hybrid Query<br/>store-methods.ts"]
ESVC["EmbeddingService<br/>service.ts"] --> COL
ESVC --> HYBRID
Diagram sources
- config.ts:24-36
- qdrant-vector-types.ts:8-35
- store-init.ts:1-28
- store-methods.ts:150-231
- service.ts:38-127
Section sources
- qdrant-vector-types.ts:8-35
- qdrant-vector-management.ts:126-202
- store-methods.ts:150-231
The embedding layer depends on configuration, metrics, tenant context, and audit/logging utilities. Qdrant integration depends on embedding dimensions and vector naming conventions.
graph TB
CFG["config.ts"] --> ESCFG["embedding/config.ts"]
CFG --> EPROV["embedding/providers.ts"]
CFG --> EHEALTH["embedding/health.ts"]
ESCFG --> ES["embedding/service.ts"]
EPROV --> ES
ES --> AUD["embedding/audit.ts"]
ES --> QSRCH["qdrant/search.ts"]
QSRCH --> QINIT["memory/store-init.ts"]
QSRCH --> QMETH["memory/store-methods.ts"]
QINIT --> QVT["utils/qdrant-vector-types.ts"]
QMETH --> QVT
Diagram sources
- config.ts:67-74
- config.ts:1-40
- providers.ts:1-280
- health.ts:1-121
- service.ts:1-293
- audit.ts:1-197
- search.ts:1-82
- store-init.ts:1-28
- store-methods.ts:150-231
- qdrant-vector-types.ts:1-57
Section sources
- config.ts:67-74
- service.ts:1-293
- providers.ts:1-280
- health.ts:1-121
- audit.ts:1-197
- search.ts:1-82
- store-init.ts:1-28
- store-methods.ts:150-231
- qdrant-vector-types.ts:1-57
- Prefer batch embedding for throughput improvements.
- Monitor embedding vector sizes and adjust models accordingly.
- Use hybrid search to balance dense and sparse retrieval for better recall.
- Leverage Qdrant's named vectors and rescore parameters for efficient similarity search.
- Apply anomaly detection thresholds to proactively identify performance regressions.
- Provider selection: verify environment variables for the chosen provider and fallback behavior.
- Health checks: review provider health endpoints and error messages for authentication, rate limits, or connectivity issues.
- Dimension mismatches: ensure a single embedding dimension is resolved at startup and used consistently across Qdrant vectors.
- Audit logs: inspect structured audit events for anomalies and error traces.
- Hybrid search: confirm BM25 sparse vector configuration and full-text indexing in Qdrant.
Section sources
- health.ts:16-119
- audit.ts:104-157
- store-init.ts:157-169
- http-health-routes.ts:46-78
The embedding services layer provides a robust, configurable, and observable foundation for vector embeddings across multiple providers. Its integration with Qdrant enables hybrid search strategies, while comprehensive health monitoring, audit logging, and anomaly detection ensure reliability and operability in production environments.