Skip to content

Provider Implementations

github-actions[bot] edited this page May 23, 2026 · 1 revision

Provider Implementations

**Referenced Files in This Document** - [providers.ts](file://src/services/embedding/providers.ts) - [config.ts](file://src/services/embedding/config.ts) - [service.ts](file://src/services/embedding/service.ts) - [types.ts](file://src/services/embedding/types.ts) - [audit.ts](file://src/services/embedding/audit.ts) - [health.ts](file://src/services/embedding/health.ts) - [config.ts](file://src/config.ts) - [embedding-metrics.ts](file://src/services/metrics/embedding-metrics.ts) - [http-health-routes.ts](file://src/http/http-health-routes.ts) - [mem-resources-boot-dedupe-regression.test.ts](file://tests/integration/mem-resources-boot-dedupe-regression.test.ts) - [metrics-endpoint.test.ts](file://tests/integration/metrics-endpoint.test.ts)

Table of Contents

  1. Introduction
  2. Project Structure
  3. Core Components
  4. Architecture Overview
  5. Detailed Component Analysis
  6. Dependency Analysis
  7. Performance Considerations
  8. Troubleshooting Guide
  9. Conclusion
  10. Appendices

Introduction

This document explains the embedding provider implementations and the abstraction layer that enables pluggable backends. It covers:

  • OpenAI embeddings integration: API key authentication, model selection, request formatting, and error handling
  • TEI (Text Embeddings Inference) provider: local model serving, HTTP API communication, and model endpoint configuration
  • The provider abstraction layer that selects and routes requests to the appropriate backend
  • The postEmbeddings function and its role in normalizing provider differences
  • Provider-specific optimizations, batch processing, and performance characteristics
  • Error responses, rate limiting, and resilience patterns

Project Structure

The embedding subsystem resides under src/services/embedding and integrates with configuration, metrics, and health endpoints.

graph TB
subgraph "Embedding Service"
SVC["service.ts<br/>EmbeddingService"]
CFG["config.ts<br/>Endpoints, dimension cache"]
TYPES["types.ts<br/>EmbeddingResult, BatchEmbeddingResult"]
AUDIT["audit.ts<br/>Anomaly detection & audit logs"]
HEALTH["health.ts<br/>Health checks"]
PROVIDERS["providers.ts<br/>postEmbeddings, OpenAI/TEI handlers"]
end
subgraph "System Integration"
CONF["config.ts<br/>Environment variables"]
METRICS["embedding-metrics.ts<br/>Prometheus metrics"]
HTTP_HEALTH["http-health-routes.ts<br/>/health integration"]
end
SVC --> PROVIDERS
SVC --> CFG
SVC --> TYPES
SVC --> AUDIT
SVC --> HEALTH
SVC --> METRICS
CONF --> CFG
CONF --> PROVIDERS
HTTP_HEALTH --> HEALTH
Loading

Diagram sources

  • service.ts:1-293
  • config.ts:1-40
  • types.ts:1-17
  • audit.ts:1-197
  • health.ts:1-121
  • providers.ts:1-280
  • config.ts:67-74
  • embedding-metrics.ts:1-51
  • http-health-routes.ts:23-44

Section sources

  • service.ts:1-40
  • providers.ts:1-20
  • config.ts:1-20

Core Components

  • EmbeddingService: Orchestrates embedding generation, batching, anomaly detection, metrics, and health checks. Exposes generateEmbedding, generateBatchEmbeddings, and helper utilities.
  • Providers: Encapsulates provider-specific logic for OpenAI and TEI, including authentication, request formatting, response parsing, and retries.
  • Config: Defines endpoints, runtime dimension caching, and environment-driven settings.
  • Audit: Provides anomaly detection and structured audit logging for embedding operations.
  • Health: Performs provider health checks and reports operational status.
  • Types: Defines standardized result structures for single and batch embeddings.

Section sources

  • service.ts:38-284
  • providers.ts:77-278
  • config.ts:5-40
  • audit.ts:94-157
  • health.ts:16-119
  • types.ts:1-17

Architecture Overview

The provider abstraction layer centralizes embedding generation behind a single entry point that selects the backend based on configuration and availability.

sequenceDiagram
participant Client as "Caller"
participant Service as "EmbeddingService"
participant Providers as "Providers"
participant OpenAI as "OpenAI API"
participant TEI as "TEI Server"
Client->>Service : generateEmbedding(text) or generateBatchEmbeddings(texts)
Service->>Service : getProvider() and validate inputs
Service->>Providers : postEmbeddings(input)
alt Provider preference is "openai"
Providers->>OpenAI : POST /v1/embeddings (Authorization : Bearer ...)
OpenAI-->>Providers : embeddings[]
Providers-->>Service : embeddings[]
else Provider preference is "tei"
Providers->>TEI : POST /v1/embeddings (optional x-api-key)
TEI-->>Providers : embeddings[]
Providers-->>Service : embeddings[]
else Auto-discovery
Providers->>OpenAI : Try OpenAI first
OpenAI-->>Providers : embeddings[] or error
alt OpenAI fails and fallback available
Providers->>TEI : Fallback to TEI
TEI-->>Providers : embeddings[]
end
Providers-->>Service : embeddings[]
end
Service->>Service : Anomaly detection, metrics, audit
Service-->>Client : EmbeddingResult or BatchEmbeddingResult
Loading

Diagram sources

  • service.ts:47-221
  • providers.ts:251-278
  • config.ts:5-10
  • config.ts:67-74

Detailed Component Analysis

OpenAI Provider Implementation

  • Authentication: Uses Authorization header with Bearer token from OPENAI_API_KEY.
  • Model selection: Uses OPENAI_EMBEDDING_MODEL; defaults to a commonly supported model when not set.
  • Endpoint: POST to OPENAI_ENDPOINT (/v1/embeddings).
  • Request formatting: Sends model and input array; supports both single string and array of strings.
  • Error handling: Parses JSON responses, handles non-JSON bodies, and maps HTTP statuses to meaningful errors. Includes retry logic for transient network errors and specific HTTP statuses (429, 502, 503, 504).
  • Rate limiting: Detects 429 and surfaces a clear error message.
  • Resilience: Uses exponential backoff-like delays between retries and logs warnings.
flowchart TD
Start(["postEmbeddingsOpenAI"]) --> Validate["Validate OPENAI_API_KEY and model"]
Validate --> BuildBody["Build request body {model,input[]}"]
BuildBody --> Headers["Set Content-Type and Authorization"]
Headers --> Fetch["POST to OPENAI_ENDPOINT"]
Fetch --> Parse["Parse JSON response"]
Parse --> Ok{"HTTP OK?"}
Ok --> |No| Err["Map error (401 auth, 429 rate limit, others)"]
Ok --> |Yes| Shape["Validate response shape and extract embeddings"]
Shape --> Dim["Set resolved embedding dimension"]
Dim --> Audit["Audit success"]
Audit --> Return(["Return embeddings"])
Err --> Retry{"Retryable?"}
Retry --> |Yes| Backoff["Wait and retry"]
Backoff --> Fetch
Retry --> |No| Throw["Throw error"]
Loading

Diagram sources

  • providers.ts:77-175
  • config.ts:5-6
  • config.ts:67-70

Section sources

  • providers.ts:77-175
  • config.ts:5-6
  • config.ts:67-70

TEI Provider Implementation

  • Authentication: Optional x-api-key header when TEI_API_KEY is configured.
  • Endpoint: Uses TEI_EMBEDDING_ENDPOINT derived from TEI_BASE_URL; falls back to TEI_BASE_URL if endpoint is not set.
  • Model selection: Uses TEI_MODEL; required for TEI provider.
  • Request formatting: Sends input array and model; supports both single string and array of strings.
  • Response parsing: Handles multiple TEI server response shapes by attempting to extract embeddings from embeddings, data.embedding, result, or direct arrays.
  • Error handling: Logs non-JSON responses, maps HTTP statuses to errors, and detects 401 and 429 conditions.
  • Resilience: Uses the same retry mechanism as OpenAI for transient failures.
flowchart TD
Start(["postEmbeddingsTEI"]) --> Validate["Validate TEI_BASE_URL and TEI_MODEL"]
Validate --> Body["Build body {input,model}"]
Body --> Headers["Set Content-Type (+ optional x-api-key)"]
Headers --> Url["Resolve TEI_EMBEDDING_ENDPOINT or TEI_BASE_URL"]
Url --> Fetch["POST to endpoint"]
Fetch --> Json["Parse JSON"]
Json --> Ok{"HTTP OK?"}
Ok --> |No| Err["Map error (401, 429, others)"]
Ok --> |Yes| Extract["Extract embeddings from various shapes"]
Extract --> Dim["Set resolved embedding dimension"]
Dim --> Audit["Audit success"]
Audit --> Return(["Return embeddings"])
Err --> Throw["Throw error"]
Loading

Diagram sources

  • providers.ts:177-249
  • config.ts:8-10
  • config.ts:73-74

Section sources

  • providers.ts:177-249
  • config.ts:8-10
  • config.ts:73-74

Provider Abstraction Layer and postEmbeddings

  • Selection logic: EMBEDDING_PROVIDER controls explicit provider choice ("openai", "tei"). When set to "auto", the system prefers OpenAI if both OPENAI_API_KEY and OPENAI_EMBEDDING_MODEL are present; otherwise falls back to TEI if configured.
  • Fallback behavior: If OpenAI fails and TEI is configured, the system attempts TEI as a fallback.
  • Error propagation: Throws descriptive errors when required environment variables are missing or when provider returns unexpected shapes.
flowchart TD
Start(["postEmbeddings"]) --> Pref["Read EMBEDDING_PROVIDER"]
Pref --> OpenAIOnly{"Pref == 'openai'?"}
OpenAIOnly --> |Yes| OA["postEmbeddingsOpenAI"]
OpenAIOnly --> |No| TEIOnly{"Pref == 'tei'?"}
TEIOnly --> |Yes| TEI["postEmbeddingsTEI"]
TEIOnly --> |No| Auto["Auto-discovery"]
Auto --> HasOA{"OPENAI vars present?"}
HasOA --> |Yes| TryOA["Try OpenAI"]
TryOA --> OAErr{"Error?"}
OAErr --> |Yes| HasTEI{"TEI configured?"}
HasTEI --> |Yes| TEI
HasTEI --> |No| Throw["Throw original error"]
OAErr --> |No| ReturnOA["Return embeddings"]
HasOA --> |No| HasTEI2{"TEI configured?"}
HasTEI2 --> |Yes| TEI
HasTEI2 --> |No| Throw2["No provider configured"]
Loading

Diagram sources

  • providers.ts:251-278
  • config.ts:71-71

Section sources

  • providers.ts:251-278
  • config.ts:71-71

EmbeddingService: Provider-Agnostic API

  • Single embedding: generateEmbedding validates input, delegates to postEmbeddings, performs anomaly detection, updates metrics, and audits success/error.
  • Batch embedding: generateBatchEmbeddings filters empty inputs, tracks batch size, validates dimensions, and records metrics.
  • Provider selection: getProvider returns the selected provider based on configuration and environment.
  • Configuration inspection: getConfig returns current model, dimension, provider status, and preferences.
  • Health: healthCheck delegates to runEmbeddingHealthCheck and is integrated into /health.
classDiagram
class EmbeddingService {
+generateEmbedding(text) EmbeddingResult
+generateBatchEmbeddings(texts) BatchEmbeddingResult
+calculateCosineSimilarity(a,b) number
+generateMemoryEmbedding(memory) number[]
+healthCheck() Promise~{healthy,message}~
+getProvider() "openai|tei|local"
+getConfig() object
-embeddingDimension number
-getModelName(provider) string
}
class Providers {
+postEmbeddings(input) number[][]
+postEmbeddingsOpenAI(input) number[][]
+postEmbeddingsTEI(input) number[][]
}
class Audit {
+logEmbeddingAuditSuccess(payload) void
+logEmbeddingAuditError(payload) void
+detectEmbeddingAnomalies(params) {hasCritical}
}
EmbeddingService --> Providers : "delegates"
EmbeddingService --> Audit : "audits"
Loading

Diagram sources

  • service.ts:38-284
  • providers.ts:251-278
  • audit.ts:60-92

Section sources

  • service.ts:47-221
  • service.ts:254-283
  • audit.ts:60-92

Health Checks and Integration

  • runEmbeddingHealthCheck: Validates provider configuration, attempts a small embedding request, and returns a health status with a message. Distinguishes between authentication failures, rate limits, and other errors.
  • /health integration: The health route bounds embedding health checks with a timeout to keep the endpoint responsive.
sequenceDiagram
participant Route as "HTTP /health"
participant Service as "EmbeddingService"
participant Health as "runEmbeddingHealthCheck"
participant Providers as "Providers"
Route->>Service : healthCheck()
Service->>Health : runEmbeddingHealthCheck()
alt Provider pref is "openai"
Health->>Providers : postEmbeddingsOpenAI("health check")
else Provider pref is "tei"
Health->>Providers : postEmbeddingsTEI("health check")
else Auto
Health->>Providers : Try OpenAI, fallback to TEI
end
Providers-->>Health : embeddings[] or error
Health-->>Service : {healthy,message}
Service-->>Route : {healthy,message}
Loading

Diagram sources

  • health.ts:16-119
  • http-health-routes.ts:23-44

Section sources

  • health.ts:16-119
  • http-health-routes.ts:23-44

Dependency Analysis

  • Environment configuration drives provider selection and endpoints.
  • EmbeddingService depends on Providers for actual embedding calls and on Audit/Metrics for observability.
  • Health checks depend on Providers to validate connectivity and basic operation.
graph LR
CONF["config.ts<br/>Environment"] --> CFG["config.ts<br/>Endpoints & dimension cache"]
CONF --> PROVIDERS["providers.ts<br/>Provider handlers"]
CONF --> SERVICE["service.ts<br/>EmbeddingService"]
SERVICE --> PROVIDERS
SERVICE --> AUDIT["audit.ts<br/>Anomalies & audit"]
SERVICE --> METRICS["embedding-metrics.ts<br/>Prometheus"]
HEALTH["health.ts<br/>Health checks"] --> PROVIDERS
HTTP["http-health-routes.ts<br/>/health"] --> HEALTH
Loading

Diagram sources

  • config.ts:67-74
  • config.ts:5-10
  • providers.ts:251-278
  • service.ts:38-284
  • audit.ts:94-157
  • embedding-metrics.ts:11-47
  • health.ts:16-119
  • http-health-routes.ts:23-44

Section sources

  • config.ts:67-74
  • providers.ts:251-278
  • service.ts:38-284

Performance Considerations

  • Batch processing: generateBatchEmbeddings accepts arrays of strings, reducing overhead and enabling throughput scaling.
  • Vector size tracking: Metrics record vector sizes in bytes, useful for capacity planning.
  • Latency monitoring: Histograms track embedding durations per provider and tenant.
  • Dimension caching: setResolvedEmbeddingDimension caches the first resolved dimension and validates subsequent calls, preventing mismatches and redundant work.
  • Retries and timeouts: Built-in retry logic for transient network errors and bounded health checks prevent cascading failures.

Section sources

  • service.ts:129-221
  • embedding-metrics.ts:11-47
  • config.ts:12-31
  • http-health-routes.ts:23-44

Troubleshooting Guide

Common issues and resolutions:

  • Missing configuration
    • Symptom: Errors indicating missing OPENAI_API_KEY/OPENAI_EMBEDDING_MODEL or TEI_BASE_URL/TEI_MODEL.
    • Resolution: Set required environment variables according to configuration.
  • Authentication failures
    • Symptom: 401 errors from OpenAI or TEI.
    • Resolution: Verify OPENAI_API_KEY or TEI_API_KEY and ensure correct model names.
  • Rate limiting
    • Symptom: 429 errors.
    • Resolution: Implement client-side backoff or reduce request frequency; consider switching providers.
  • Non-JSON responses
    • Symptom: Errors indicating non-JSON responses.
    • Resolution: Check provider endpoint configuration and network stability.
  • Unexpected response shape
    • Symptom: Errors indicating unexpected embedding shape.
    • Resolution: Confirm TEI response compatibility; the provider handler attempts multiple shapes but may require server adjustments.
  • Dimension mismatch
    • Symptom: Errors about embedding dimension mismatch.
    • Resolution: Ensure consistent model usage and call probeEmbeddingDimension at startup to cache the dimension.

Operational checks:

  • Use EmbeddingService.getConfig to inspect current provider and dimension.
  • Use EmbeddingService.healthCheck to validate connectivity and status.
  • Review audit logs for anomaly events and error messages.

Section sources

  • providers.ts:77-175
  • providers.ts:177-249
  • audit.ts:94-157
  • health.ts:16-119
  • service.ts:267-283

Conclusion

The embedding subsystem provides a robust, pluggable abstraction over OpenAI and TEI. It centralizes provider selection, authentication, request formatting, error handling, and observability. The postEmbeddings function ensures provider differences are hidden behind a consistent interface, while EmbeddingService adds batching, anomaly detection, metrics, and health checks. Proper configuration and dimension probing are essential for reliable operation.

Appendices

Environment Variables and Settings

  • OPENAI_API_KEY: OpenAI API key for authentication.
  • OPENAI_EMBEDDING_MODEL: Model identifier for OpenAI embeddings.
  • OPENAI_API_URL: Base URL for OpenAI API (no trailing slash).
  • EMBEDDING_PROVIDER: Provider preference ("auto", "openai", "tei").
  • TEI_BASE_URL: Base URL for TEI server.
  • TEI_MODEL: Model identifier for TEI.
  • TEI_API_KEY: Optional API key for TEI.
  • EMBEDDING_LATENCY_WARN_MS: Threshold for embedding latency warnings.
  • EMBEDDING_NORM_MIN, EMBEDDING_NORM_MAX: Expected vector norm bounds.
  • SEARCH_SCORE_WARN_THRESHOLD: Threshold for search score warnings.

Section sources

  • config.ts:67-83

Startup Dimension Probing

  • Call probeEmbeddingDimension at application startup to resolve and cache the embedding dimension before initializing dependent components like Qdrant stores.

Section sources

  • service.ts:288-292
  • mem-resources-boot-dedupe-regression.test.ts:29-29

Metrics Exposure

  • Embedding metrics are exposed via the metrics endpoint and include counters and histograms for requests, duration, errors, vector sizes, and batch sizes.

Section sources

  • embedding-metrics.ts:11-47
  • metrics-endpoint.test.ts:70-78

Clone this wiki locally