Skip to content

Embedding Provider Abstraction

github-actions[bot] edited this page May 23, 2026 · 1 revision

Embedding Provider Abstraction

**Referenced Files in This Document** - [providers.ts](file://src/services/embedding/providers.ts) - [config.ts](file://src/services/embedding/config.ts) - [service.ts](file://src/services/embedding/service.ts) - [health.ts](file://src/services/embedding/health.ts) - [audit.ts](file://src/services/embedding/audit.ts) - [types.ts](file://src/services/embedding/types.ts) - [bm25-tokenizer.ts](file://src/services/embedding/bm25-tokenizer.ts) - [embedding-metrics.ts](file://src/services/metrics/embedding-metrics.ts) - [config.ts](file://src/config.ts) - [index.ts](file://src/index.ts) - [README.md](file://README.md)

Table of Contents

  1. Introduction
  2. Project Structure
  3. Core Components
  4. Architecture Overview
  5. Detailed Component Analysis
  6. Dependency Analysis
  7. Performance Considerations
  8. Troubleshooting Guide
  9. Conclusion
  10. Appendices

Introduction

This document explains the embedding provider abstraction layer that enables pluggable embedding generation across OpenAI, Text Embeddings Inference (TEI), and a local provider path. It covers provider selection logic, environment-driven detection, configuration requirements, authentication, endpoint configuration, health checking, rate limiting, fallback behavior, and the impact of provider choice on embedding dimensions and performance. It also documents cost considerations and operational best practices.

Project Structure

The embedding subsystem is organized around a small set of focused modules:

  • Provider implementations and selection logic
  • Configuration and dimension caching
  • Service orchestration and metrics
  • Health checking and auditing
  • Supporting utilities for sparse tokenization and metrics
graph TB
subgraph "Embedding Layer"
SVC["EmbeddingService<br/>service.ts"]
CFG["Embedding Config<br/>config.ts"]
PRV["Providers<br/>providers.ts"]
HLT["Health Check<br/>health.ts"]
AUD["Audit & Anomaly Detection<br/>audit.ts"]
TYP["Types<br/>types.ts"]
MET["Metrics<br/>embedding-metrics.ts"]
end
subgraph "Environment"
ENV["Global Config<br/>src/config.ts"]
end
subgraph "Runtime"
IDX["App Bootstrap<br/>index.ts"]
end
ENV --> CFG
ENV --> PRV
ENV --> HLT
SVC --> PRV
SVC --> CFG
SVC --> MET
SVC --> AUD
HLT --> PRV
PRV --> AUD
IDX --> SVC
Loading

Diagram sources

  • service.ts:38-284
  • config.ts:1-40
  • providers.ts:251-278
  • health.ts:16-119
  • audit.ts:94-157
  • embedding-metrics.ts:11-47
  • config.ts:67-74
  • index.ts:89-90

Section sources

  • service.ts:1-37
  • config.ts:1-40
  • providers.ts:251-278
  • health.ts:16-119
  • audit.ts:94-157
  • embedding-metrics.ts:11-47
  • config.ts:67-74
  • index.ts:89-90

Core Components

  • Provider selection and dispatch:
    • Explicit preference via EMBEDDING_PROVIDER ('auto' | 'openai' | 'tei')
    • Auto-detection: prefer OpenAI if both OPENAI_API_KEY and OPENAI_EMBEDDING_MODEL are set; otherwise TEI if TEI_BASE_URL and TEI_MODEL are set; otherwise local provider path
    • Fallback: when auto-selected OpenAI fails, attempt TEI if configured
  • Provider implementations:
    • OpenAI: sends Authorization header with Bearer token; endpoint constructed from OPENAI_API_URL and model from OPENAI_EMBEDDING_MODEL
    • TEI: sends Content-Type JSON; optional x-api-key header if TEI_API_KEY is set; endpoint constructed from TEI_BASE_URL or TEI_EMBEDDING_ENDPOINT
  • Dimension resolution and validation:
    • First successful embedding determines output dimension; subsequent calls assert consistent dimension
    • Startup probe ensures dimension is resolved before memory store initialization
  • Health checking:
    • Dedicated health routine validates provider readiness and reports status and messages
  • Auditing and anomaly detection:
    • Logs structured audit events for successes and errors
    • Detects anomalies such as high latency, unusual vector norms, and dimension mismatches
  • Metrics:
    • Tracks request counts, durations, error counts, vector sizes, and batch sizes

Section sources

  • providers.ts:251-278
  • service.ts:258-283
  • config.ts:12-36
  • health.ts:16-119
  • audit.ts:94-157
  • embedding-metrics.ts:11-47

Architecture Overview

The embedding subsystem composes a provider-agnostic service that delegates to provider-specific implementations. The service orchestrates configuration, health checks, auditing, and metrics, while providers handle transport and authentication.

sequenceDiagram
participant Client as "Caller"
participant Service as "EmbeddingService"
participant Providers as "postEmbeddings*"
participant OpenAI as "OpenAI Endpoint"
participant TEI as "TEI Endpoint"
Client->>Service : generateEmbedding()/generateBatchEmbeddings()
Service->>Service : getProvider()
alt Provider == "openai"
Service->>Providers : postEmbeddingsOpenAI(input)
Providers->>OpenAI : POST /v1/embeddings (Authorization : Bearer)
OpenAI-->>Providers : embeddings[]
Providers-->>Service : embeddings[]
else Provider == "tei"
Service->>Providers : postEmbeddingsTEI(input)
Providers->>TEI : POST /v1/embeddings (optional x-api-key)
TEI-->>Providers : embeddings[]
Providers-->>Service : embeddings[]
else Provider == "local"
Service-->>Client : fallback to local provider path
end
Service->>Service : audit + metrics
Service-->>Client : embeddings + metadata
Loading

Diagram sources

  • service.ts:47-221
  • providers.ts:77-175
  • providers.ts:177-249
  • config.ts:5
  • config.ts:8-10

Detailed Component Analysis

Provider Selection Logic

  • Preference:
    • EMBEDDING_PROVIDER='openai' forces OpenAI
    • EMBEDDING_PROVIDER='tei' forces TEI
    • EMBEDDING_PROVIDER='auto' enables auto-detection
  • Auto-detection:
    • If OPENAI_API_KEY and OPENAI_EMBEDDING_MODEL are set, use OpenAI
    • Else if TEI_BASE_URL and TEI_MODEL are set, use TEI
    • Else fallback to local provider path
  • Fallback behavior:
    • When auto-selected OpenAI fails, attempt TEI if configured; otherwise rethrow original error
flowchart TD
Start(["Start"]) --> Pref{"EMBEDDING_PROVIDER?"}
Pref --> |openai| OA["Use OpenAI"]
Pref --> |tei| TEI["Use TEI"]
Pref --> |auto| Auto["Auto-detect"]
Auto --> CheckOA{"OPENAI_API_KEY<br/>and MODEL set?"}
CheckOA --> |Yes| OA
CheckOA --> |No| CheckTEI{"TEI_BASE_URL<br/>and MODEL set?"}
CheckTEI --> |Yes| TEI
CheckTEI --> |No| Local["Local provider path"]
OA --> TryOA["Try OpenAI"]
TryOA --> OA_OK{"OK?"}
OA_OK --> |Yes| Done(["Done"])
OA_OK --> |No| TEI_Fallback{"TEI configured?"}
TEI_Fallback --> |Yes| TEI
TEI_Fallback --> |No| Fail["Rethrow OpenAI error"]
TEI --> Done
Local --> Done
Loading

Diagram sources

  • providers.ts:251-278
  • service.ts:258-265

Section sources

  • providers.ts:251-278
  • service.ts:258-265

Provider-Specific Configuration and Authentication

  • OpenAI
    • Required: OPENAI_API_KEY, OPENAI_EMBEDDING_MODEL
    • Endpoint: OPENAI_API_URL + '/v1/embeddings'
    • Authentication: Authorization: Bearer <OPENAI_API_KEY>
    • Behavior: Retries on transient network errors and specific HTTP statuses; parses JSON; audits and logs errors; records dimension on first successful call
  • TEI
    • Required: TEI_BASE_URL, TEI_MODEL
    • Optional: TEI_API_KEY (x-api-key header)
    • Endpoint: TEI_BASE_URL or TEI_EMBEDDING_ENDPOINT (constructed similarly to OpenAI)
    • Behavior: Robust response shape extraction across different server variants; retries on transient network errors; audits and logs errors; records dimension on first successful call
  • Local Provider Path
    • Used when neither OpenAI nor TEI is configured; the service returns a local provider designation for routing

Section sources

  • config.ts:5
  • config.ts:8-10
  • providers.ts:77-175
  • providers.ts:177-249
  • config.ts:67-74

Endpoint Configuration and Environment Variables

  • OpenAI
    • OPENAI_API_URL: base URL for OpenAI API (no trailing slash)
    • OPENAI_API_KEY: secret key
    • OPENAI_EMBEDDING_MODEL: model identifier
  • TEI
    • TEI_BASE_URL: base URL for TEI server
    • TEI_MODEL: model identifier
    • TEI_API_KEY: optional API key for TEI
  • Provider Preference
    • EMBEDDING_PROVIDER: 'auto' | 'openai' | 'tei'

Section sources

  • config.ts:67-74
  • config.ts:71
  • config.ts:5
  • config.ts:8-10

Health Checking

  • Dedicated health routine evaluates provider readiness:
    • If EMBEDDING_PROVIDER is 'openai' or 'tei', validates that required variables are set and performs a small embedding request
    • If 'auto', tries OpenAI first; on failure, attempts TEI if configured
    • Returns structured status and message indicating operational state and any throttling (rate limit) conditions
sequenceDiagram
participant HC as "Health Routine"
participant Prov as "postEmbeddings*"
participant OA as "OpenAI"
participant TEI as "TEI"
HC->>HC : Determine preferred provider
alt Preferred == "openai"
HC->>Prov : postEmbeddingsOpenAI("health check")
Prov->>OA : POST /v1/embeddings
OA-->>Prov : embeddings[]
Prov-->>HC : embeddings[]
else Preferred == "tei"
HC->>Prov : postEmbeddingsTEI("health check")
Prov->>TEI : POST /v1/embeddings
TEI-->>Prov : embeddings[]
Prov-->>HC : embeddings[]
else Auto
HC->>Prov : postEmbeddingsOpenAI("health check")
Prov->>OA : POST /v1/embeddings
OA-->>Prov : embeddings[] or error
alt OA ok
HC-->>HC : healthy
else OA error
HC->>Prov : postEmbeddingsTEI("health check")
Prov->>TEI : POST /v1/embeddings
TEI-->>Prov : embeddings[] or error
Prov-->>HC : embeddings[] or error
end
end
HC-->>Caller : {healthy, message}
Loading

Diagram sources

  • health.ts:16-119
  • providers.ts:77-175
  • providers.ts:177-249

Section sources

  • health.ts:16-119

Retry and Backoff Behavior

  • Network-level retries:
    • Retries on transient conditions such as fetch failures, timeouts, connection resets, DNS failures
    • Exponential backoff with bounded jitter across configured attempts
  • Provider-level retries:
    • OpenAI: retries on HTTP 429, 502, 503, 504; parses JSON and handles non-JSON bodies; audits and logs errors
  • Purpose:
    • Improve resilience against temporary network or upstream issues

Section sources

  • providers.ts:7-47
  • providers.ts:26-29
  • providers.ts:92-175

Audit, Anomaly Detection, and Metrics

  • Audit logging:
    • Structured events capture provider, model, input counts, character lengths, output dimensions, latency, and error messages
  • Anomaly detection:
    • Flags high latency, unusual vector norms, and dimension mismatches
  • Metrics:
    • Counters and histograms track embedding requests, durations, errors, vector sizes, and batch sizes
classDiagram
class EmbeddingService {
+generateEmbedding(text)
+generateBatchEmbeddings(texts)
+calculateCosineSimilarity(a,b)
+generateMemoryEmbedding(memory)
+healthCheck()
+getProvider()
+getConfig()
}
class Providers {
+postEmbeddings(input)
+postEmbeddingsOpenAI(input)
+postEmbeddingsTEI(input)
}
class Audit {
+logEmbeddingAuditSuccess(payload)
+logEmbeddingAuditError(payload)
+detectEmbeddingAnomalies(params)
}
class Metrics {
+embeddingRequests
+embeddingDuration
+embeddingErrors
+embeddingVectorSize
+embeddingBatchSize
}
EmbeddingService --> Providers : "delegates"
EmbeddingService --> Audit : "audits"
EmbeddingService --> Metrics : "records"
Loading

Diagram sources

  • service.ts:38-284
  • providers.ts:251-278
  • audit.ts:60-92
  • embedding-metrics.ts:11-47

Section sources

  • audit.ts:60-92
  • audit.ts:94-157
  • embedding-metrics.ts:11-47

Impact on Dimensions and Performance

  • Dimension resolution:
    • First successful embedding sets the resolved dimension; subsequent calls validate consistency
    • Startup probe ensures dimension is known before memory store initialization
  • Performance:
    • Vector size tracked in bytes (float32 assumption)
    • Batch size histogram supports throughput analysis
    • Latency histogram bucketed for performance profiling

Section sources

  • config.ts:12-36
  • index.ts:89-90
  • embedding-metrics.ts:33-47

Rate Limiting Considerations

  • OpenAI:
    • HTTP 429 handled as retriable condition; service logs and surfaces rate limit errors
  • TEI:
    • HTTP 429 treated as retriable; service logs and surfaces rate limit errors
  • Health check:
    • Distinguishes rate-limited vs. unreachable states

Section sources

  • providers.ts:26-29
  • providers.ts:133-142
  • providers.ts:212-215
  • health.ts:29-46

Cost Implications

  • OpenAI:
    • Costs depend on selected model and consumed tokens; cost estimation placeholder in service
  • TEI:
    • Self-hosted; costs primarily relate to compute and infrastructure
  • Recommendation:
    • Monitor token usage and model selection; consider cost-aware model choices and batching strategies

Section sources

  • service.ts:249-252

Provider Switching Scenarios and Examples

  • Switch from OpenAI to TEI:
    • Set EMBEDDING_PROVIDER='tei' or unset OPENAI_API_KEY and OPENAI_EMBEDDING_MODEL; ensure TEI_BASE_URL and TEI_MODEL are set
  • Auto-switch on failure:
    • With EMBEDDING_PROVIDER='auto', if OpenAI fails, the system attempts TEI if configured
  • Local fallback:
    • If neither provider is configured, the service returns a local provider designation for routing

Section sources

  • providers.ts:251-278
  • service.ts:258-265

Sparse Tokenization (BM25-style)

  • Utility for converting text to sparse vectors suitable for BM25-like retrieval
  • Produces indices and values for Qdrant sparse vector search

Section sources

  • bm25-tokenizer.ts:1-57

Dependency Analysis

The embedding subsystem exhibits clear separation of concerns:

  • Global configuration supplies environment variables to providers and config
  • Service depends on providers, config, metrics, and audit
  • Health routines depend on providers and environment
  • Index coordinates startup probe and memory store initialization
graph LR
ENV["src/config.ts"] --> CFG["services/embedding/config.ts"]
ENV --> PRV["services/embedding/providers.ts"]
ENV --> HLT["services/embedding/health.ts"]
PRV --> AUD["services/embedding/audit.ts"]
CFG --> SVC["services/embedding/service.ts"]
MET["services/metrics/embedding-metrics.ts"] --> SVC
AUD --> SVC
IDX["src/index.ts"] --> SVC
SVC --> MEM["Memory Store Init"]
Loading

Diagram sources

  • config.ts:67-74
  • config.ts:5
  • providers.ts:251-278
  • health.ts:16-119
  • audit.ts:60-92
  • embedding-metrics.ts:11-47
  • index.ts:89-90

Section sources

  • config.ts:67-74
  • config.ts:5
  • providers.ts:251-278
  • health.ts:16-119
  • audit.ts:60-92
  • embedding-metrics.ts:11-47
  • index.ts:89-90

Performance Considerations

  • Dimension consistency:
    • Ensure startup probe resolves dimension before memory store initialization to avoid runtime errors
  • Batch processing:
    • Use batch APIs to reduce overhead; monitor batch size distribution
  • Vector size:
    • Track vector sizes to estimate storage and network bandwidth needs
  • Latency:
    • Monitor embedding duration histograms to identify slow providers or models
  • Resilience:
    • Rely on built-in retries for transient failures; consider external retry/backoff policies for downstream consumers

[No sources needed since this section provides general guidance]

Troubleshooting Guide

  • No provider configured:
    • Ensure OPENAI_API_KEY + OPENAI_EMBEDDING_MODEL or TEI_BASE_URL + TEI_MODEL is set
  • Authentication failures:
    • OpenAI: 401 indicates incorrect or missing OPENAI_API_KEY
    • TEI: 401 indicates missing or incorrect TEI_API_KEY
  • Rate limiting:
    • Both providers return 429; health check distinguishes throttled vs. unreachable
  • Unexpected response shapes:
    • Providers attempt to normalize various server response formats; failures indicate misconfiguration or incompatible server
  • Dimension mismatch:
    • After startup probe, subsequent calls validate dimension; mismatch indicates provider change or model switch without re-probing

Section sources

  • providers.ts:78-80
  • providers.ts:129-142
  • providers.ts:209-215
  • config.ts:24-31
  • health.ts:29-46

Conclusion

The embedding provider abstraction cleanly separates provider selection, transport, and authentication from service orchestration, auditing, and metrics. It supports robust auto-detection, graceful fallback, and strong observability. Correctly configuring environment variables and leveraging health checks and anomaly detection ensures reliable operation across OpenAI and TEI deployments.

[No sources needed since this section summarizes without analyzing specific files]

Appendices

Environment Variables Reference

  • OPENAI_API_KEY: OpenAI secret key
  • OPENAI_API_URL: Base URL for OpenAI API (no trailing slash)
  • OPENAI_EMBEDDING_MODEL: Model identifier for embeddings
  • TEI_BASE_URL: Base URL for TEI server
  • TEI_MODEL: Model identifier for TEI
  • TEI_API_KEY: Optional API key for TEI
  • EMBEDDING_PROVIDER: 'auto' | 'openai' | 'tei'
  • EMBEDDING_LATENCY_WARN_MS: Threshold for embedding latency warnings
  • EMBEDDING_NORM_MIN/MAX: Expected range for vector norms
  • SEARCH_SCORE_WARN_THRESHOLD: Threshold for low search scores

Section sources

  • config.ts:67-74
  • config.ts:75-83

Startup and Initialization

  • The application probes embedding dimension at startup and initializes the memory store afterward to ensure consistent vector dimensions.

Section sources

  • index.ts:89-90

Provider Configuration Examples

  • OpenAI:
    • Set OPENAI_API_KEY and OPENAI_EMBEDDING_MODEL; optionally OPENAI_API_URL for Azure or custom endpoints
  • TEI:
    • Set TEI_BASE_URL and TEI_MODEL; optionally TEI_API_KEY for protected endpoints
  • Auto:
    • Leave EMBEDDING_PROVIDER unset or set to 'auto'; the system prefers OpenAI if available, otherwise TEI

Section sources

  • config.ts:67-74
  • config.ts:71
  • providers.ts:251-278

Clone this wiki locally