-
Notifications
You must be signed in to change notification settings - Fork 0
Embedding Processing & Operations
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
This document explains the embedding processing pipeline and core service functionality for generating dense vector embeddings. It covers the EmbeddingService class methods for single and batch embedding generation, text normalization and validation, error handling, cosine similarity computation, memory embedding creation, and vector dimension validation. It also documents integration points with Qdrant for storage and retrieval, performance metrics, latency measurement, and embedding quality assurance techniques.
The embedding subsystem is organized around a central service that orchestrates provider selection, normalization, validation, and telemetry. Supporting modules handle provider-specific HTTP calls, dimension caching, anomaly detection, health checks, and metrics.
graph TB
subgraph "Embedding Service Layer"
ES["EmbeddingService<br/>service.ts"]
CFG["Embedding Config<br/>config.ts"]
TYPES["Embedding Types<br/>types.ts"]
AUDIT["Embedding Audit<br/>audit.ts"]
HEALTH["Embedding Health<br/>health.ts"]
METRICS["Embedding Metrics<br/>embedding-metrics.ts"]
end
subgraph "Providers"
PROV["Providers<br/>providers.ts"]
end
subgraph "Integration"
MEMSTORE["Memory Store<br/>memory-store.ts"]
STOREHAND["Store Handlers<br/>store-adapter-*.ts"]
QUPD["Qdrant Updates<br/>memory-updates.ts"]
BM25["BM25 Tokenizer<br/>bm25-tokenizer.ts"]
end
ES --> PROV
ES --> CFG
ES --> TYPES
ES --> AUDIT
ES --> HEALTH
ES --> METRICS
STOREHAND --> ES
QUPD --> ES
MEMSTORE --> ES
ES -. "sparse vectors" .-> BM25
Diagram sources
- service.ts:38-286
- providers.ts:251-278
- config.ts:12-36
- audit.ts:94-157
- health.ts:16-119
- embedding-metrics.ts:11-47
- store-adapter-default-handler.ts:108-142
- store-adapter-header-handler.ts:53-77
- memory-store.ts:14-25
- memory-updates.ts:51-77
- bm25-tokenizer.ts:37-52
Section sources
- service.ts:1-293
- providers.ts:1-280
- config.ts:1-40
- audit.ts:1-197
- health.ts:1-121
- embedding-metrics.ts:1-51
- EmbeddingService: Central orchestration class providing generateEmbedding, generateBatchEmbeddings, calculateCosineSimilarity, generateMemoryEmbedding, healthCheck, provider selection, and configuration inspection.
- Providers: Provider-agnostic embedding calls to OpenAI and TEI, with retry logic and response parsing.
- Config: Embedding dimension caching and resolution, plus endpoint construction.
- Audit: Anomaly detection and audit logging for embedding requests.
- Health: Operational health checks for configured providers.
- Metrics: Prometheus counters and histograms for embedding requests, durations, errors, vector sizes, and batch sizes.
Key responsibilities:
- Text normalization and validation for single and batch inputs.
- Dimension validation against a cached expected dimension.
- Latency measurement and anomaly detection.
- Cosine similarity computation for similarity scoring.
- Memory embedding generation by concatenating content and metadata fields.
- Integration with Qdrant for storing and retrieving embeddings.
Section sources
- service.ts:38-286
- providers.ts:251-278
- config.ts:12-36
- audit.ts:94-157
- health.ts:16-119
- embedding-metrics.ts:11-47
The embedding pipeline follows a layered design:
- Input normalization and validation occur in the service layer.
- Provider selection and HTTP calls are delegated to providers.
- Responses are parsed, dimension cached, and anomalies detected.
- Metrics and audit logs capture performance and quality signals.
- Qdrant integration stores and retrieves vectors with proper vector naming.
sequenceDiagram
participant Caller as "Caller"
participant Service as "EmbeddingService"
participant Provider as "postEmbeddings"
participant OpenAI as "OpenAI Provider"
participant TEI as "TEI Provider"
participant Config as "Embedding Config"
participant Audit as "Audit/Detection"
Caller->>Service : generateEmbedding(text)
Service->>Service : normalize + validate
Service->>Provider : postEmbeddings(normalized)
alt Provider=OpenAI
Provider->>OpenAI : HTTP POST /v1/embeddings
OpenAI-->>Provider : embeddings[]
Provider-->>Service : embeddings[]
else Provider=TEI
Provider->>TEI : HTTP POST /v1/embeddings
TEI-->>Provider : embeddings[]
Provider-->>Service : embeddings[]
end
Service->>Config : setResolvedEmbeddingDimension(dim)
Service->>Audit : detectEmbeddingAnomalies(...)
Service-->>Caller : {embedding, model, usage}
Diagram sources
- service.ts:47-127
- providers.ts:251-278
- config.ts:16-31
- audit.ts:94-157
- generateEmbedding(text: string): Normalizes input, validates non-empty, delegates to provider, validates returned embedding shape, detects anomalies, records metrics, and returns embedding result.
- generateBatchEmbeddings(texts: string[]): Filters empty/whitespace inputs, validates non-empty batch, tracks batch size, validates dimensions, detects anomalies, and returns batch result.
- calculateCosineSimilarity(embedding1, embedding2): Computes cosine similarity with dimensionality check.
- generateMemoryEmbedding(memory): Concatenates content and metadata fields into a single text, then generates a single embedding.
- healthCheck(): Runs provider health checks with timeouts and appropriate status messages.
- getProvider()/getConfig(): Determines provider preference and exposes configuration.
flowchart TD
Start(["generateEmbedding Entry"]) --> Normalize["Normalize text<br/>trim + length"]
Normalize --> ValidateEmpty{"Non-empty?"}
ValidateEmpty -- "No" --> ThrowEmpty["Throw 'Text cannot be empty'"]
ValidateEmpty -- "Yes" --> CallProvider["Call postEmbeddings(normalized)"]
CallProvider --> ParseResp["Parse provider response<br/>extract embedding"]
ParseResp --> ValidateShape{"Embedding array?"}
ValidateShape -- "No" --> ThrowShape["Throw 'OpenAI returned no embedding'"]
ValidateShape -- "Yes" --> DetectAnomaly["detectEmbeddingAnomalies(...)"]
DetectAnomaly --> DimMismatch{"Dimension OK?"}
DimMismatch -- "No" --> ThrowDim["Throw 'dimension mismatch'"]
DimMismatch -- "Yes" --> Metrics["Record metrics + audit success"]
Metrics --> Return(["Return {embedding, model, usage}"])
Diagram sources
- service.ts:47-127
- audit.ts:94-157
Section sources
- service.ts:47-127
- service.ts:129-221
- service.ts:223-247
- service.ts:254-283
- Single embedding: Trims input; rejects empty strings; validates returned embedding is an array; throws descriptive errors for malformed responses or authentication failures.
- Batch embedding: Filters out empty/whitespace inputs; validates non-zero valid count; ensures all embeddings match the cached dimension; logs dimension mismatches.
- Provider-level error handling: Retries transient network errors and specific HTTP statuses; parses JSON safely; audits provider calls with status, latency, and dimensions.
flowchart TD
StartB(["generateBatchEmbeddings Entry"]) --> Filter["Filter valid texts<br/>trim + length > 0"]
Filter --> AnyLeft{"Any valid texts?"}
AnyLeft -- "No" --> ThrowBatch["Throw 'No valid texts provided'"]
AnyLeft -- "Yes" --> ObserveBatch["Observe batch size histogram"]
ObserveBatch --> CallProv["Call postEmbeddings(valid)"]
CallProv --> CheckDims["Check all embeddings<br/>match cached dimension"]
CheckDims --> WrongDim{"Any wrong dimension?"}
WrongDim -- "Yes" --> LogWarn["Log dimension mismatch"]
WrongDim -- "No" --> Continue["Continue to anomaly detection"]
Diagram sources
- service.ts:129-153
- providers.ts:31-47
- providers.ts:116-143
Section sources
- service.ts:129-153
- providers.ts:77-175
- providers.ts:177-249
- Validates both embeddings are provided and have equal dimensions.
- Computes dot product and magnitudes; returns 0 if magnitude is zero; otherwise returns dot/(||a||*||b||).
flowchart TD
CSStart(["calculateCosineSimilarity"]) --> CheckArgs{"Both embeddings provided?"}
CheckArgs -- "No" --> ReturnZero["Return 0"]
CheckArgs -- "Yes" --> SameDim{"Same dimensions?"}
SameDim -- "No" --> ThrowDim["Throw 'dimensions differ'"]
SameDim -- "Yes" --> Compute["dot += a*b; n1+=a*a; n2+=b*b"]
Compute --> Mag["mag = sqrt(n1)*sqrt(n2)"]
Mag --> ZeroMag{"mag == 0?"}
ZeroMag -- "Yes" --> ReturnZero
ZeroMag -- "No" --> ReturnSim["Return dot/mag"]
Diagram sources
- service.ts:223-234
Section sources
- service.ts:223-234
- Combines content, resource, task, tags, and type into a single normalized string separated by newlines.
- Generates a single embedding from the concatenated text.
sequenceDiagram
participant Service as "EmbeddingService"
participant Provider as "postEmbeddings"
Service->>Service : Build text from content + metadata
Service->>Provider : generateEmbedding(concatenated)
Provider-->>Service : embedding[]
Service-->>Service : return embedding
Diagram sources
- service.ts:236-247
Section sources
- service.ts:236-247
- First successful embedding determines the dimension; subsequent calls assert the cached dimension remains constant.
- EmbeddingService validates returned embeddings against the cached dimension and throws on mismatch.
- Health checks probe the provider to resolve dimension early.
flowchart TD
FirstCall["First embedding call"] --> SetDim["setResolvedEmbeddingDimension(dim)"]
SetDim --> Cache["Cache dimension"]
Subsequent["Subsequent calls"] --> Compare{"dim == cached?"}
Compare -- "No" --> ThrowMismatch["Throw dimension mismatch"]
Compare -- "Yes" --> OK["Proceed"]
Diagram sources
- config.ts:16-31
- service.ts:63-73
- service.ts:153-163
Section sources
- config.ts:12-36
- service.ts:39-41
- service.ts:148-152
- Provider preference is controlled by environment variables; auto-detection prefers OpenAI when available, with TEI fallback.
- Retries are applied to transient network errors and specific HTTP statuses; JSON parsing errors are retried when appropriate.
flowchart TD
Pref["EMBEDDING_PROVIDER"] --> Choice{"openai | tei | auto"}
Choice --> |openai| OA["Use OpenAI"]
Choice --> |tei| TEI["Use TEI"]
Choice --> |auto| Auto["Check OPENAI vars"]
Auto --> HasOA{"OPENAI vars set?"}
HasOA --> |Yes| OA
HasOA --> |No| TEI
OA --> Retry["Retry on transient errors"]
TEI --> Retry
Diagram sources
- providers.ts:251-278
- providers.ts:31-47
Section sources
- providers.ts:251-278
- providers.ts:31-47
- Batch embedding generation is used to produce primary, title, and activation pattern vectors for adapter memories.
- Individual text updates trigger embedding generation and vector upsert with proper vector naming.
- Memory store upserts points with vector fields named by dimension.
sequenceDiagram
participant Handler as "Store Adapter Handler"
participant Service as "EmbeddingService"
participant Qdrant as "Qdrant Store"
Handler->>Service : generateBatchEmbeddings(texts[])
Service-->>Handler : {embeddings[], model, usage}
Handler->>Qdrant : upsert(points with vs<dim>)
Diagram sources
- store-adapter-default-handler.ts:108-142
- store-adapter-header-handler.ts:53-77
- memory-store.ts:57-97
Section sources
- store-adapter-default-handler.ts:108-142
- store-adapter-header-handler.ts:53-77
- memory-store.ts:57-97
- memory-updates.ts:51-77
- Tokenizes text into sparse vectors using FNV-1a hashing modulo 30000 and sublinear term frequency values.
- Useful for BM25-like retrieval alongside dense embeddings.
flowchart TD
TStart["tokenizeToSparse(text)"] --> Normalize["Lowercase + split on non-alphanumeric"]
Normalize --> Filter["Remove stop words + short tokens"]
Filter --> TF["Compute term frequencies"]
TF --> Hash["FNV-1a hash mod 30000"]
Hash --> Values["values = 1 + log(tf)"]
Values --> ReturnSV["Return {indices[], values[]}"]
Diagram sources
- bm25-tokenizer.ts:37-52
Section sources
- bm25-tokenizer.ts:1-57
- EmbeddingService depends on providers for HTTP calls, config for dimension caching, audit for anomaly detection, and metrics for observability.
- Memory store handlers depend on EmbeddingService for batch embeddings and on Qdrant for persistence.
- Health checks depend on providers and config to validate operational status.
graph TB
ES["EmbeddingService"] --> PROV["Providers"]
ES --> CFG["Embedding Config"]
ES --> AUD["Audit"]
ES --> MET["Metrics"]
STOREH["Store Handlers"] --> ES
QSTORE["Qdrant Store"] --> ES
HEALTH["Health Check"] --> PROV
HEALTH --> CFG
Diagram sources
- service.ts:15-36
- providers.ts:1-6
- config.ts:1-6
- embedding-metrics.ts:1-6
- store-adapter-default-handler.ts:108-142
- memory-store.ts:1-11
- health.ts:16-119
Section sources
- service.ts:15-36
- providers.ts:1-6
- config.ts:1-6
- embedding-metrics.ts:1-6
- store-adapter-default-handler.ts:108-142
- memory-store.ts:1-11
- health.ts:16-119
- Latency measurement: Each embedding call measures elapsed time and records a histogram with provider and tenant labels.
- Batch optimization: Batch embeddings reduce overhead; batch size is tracked to identify throughput patterns.
- Vector size tracking: Records vector size in bytes for each embedding to monitor memory footprint.
- Retry strategy: Controlled retries for transient network errors and specific HTTP statuses minimize failures.
- Health checks: Bound health checks prevent slow provider probes from blocking system readiness.
Recommendations:
- Prefer batch embeddings for bulk operations to reduce per-request overhead.
- Monitor embedding_duration_seconds and embedding_batch_size histograms to identify saturation points.
- Tune EMBEDDING_LATENCY_WARN_MS and EMBEDDING_NORM_MIN/MAX thresholds to balance anomaly detection sensitivity.
- Ensure probeEmbeddingDimension runs at startup to avoid runtime dimension mismatches.
Section sources
- embedding-metrics.ts:11-47
- service.ts:52-56
- service.ts:134-146
- providers.ts:31-47
- health.ts:23-44
Common issues and resolutions:
- Authentication failures: Verify OPENAI_API_KEY or TEI API key configuration; provider health checks distinguish 401 scenarios.
- Rate limiting: Provider health checks detect 429 responses; consider backoff or switching providers.
- Non-JSON responses: Provider code retries on specific transient HTTP statuses and logs parse failures.
- Dimension mismatches: Ensure probeEmbeddingDimension is called at startup; EmbeddingService validates returned embeddings against cached dimension.
- Empty or whitespace inputs: Both single and batch methods filter invalid inputs; ensure meaningful content is provided.
Operational checks:
- Use EmbeddingService.healthCheck() to validate provider availability and configuration.
- Review audit logs for embedding_high_latency, embedding_unusual_norm, and embedding_dimension_mismatch anomalies.
- Confirm vector naming in Qdrant uses the correct dimension key (e.g., vs) to avoid retrieval issues.
Section sources
- health.ts:16-119
- providers.ts:116-143
- audit.ts:94-157
- config.ts:16-31
- memory-store.ts:57-97
The embedding subsystem provides a robust, observable, and resilient pipeline for generating dense embeddings. It normalizes inputs, validates dimensions, detects anomalies, and integrates seamlessly with Qdrant for memory storage and retrieval. By leveraging batch processing, bounded health checks, and comprehensive metrics, the system supports high-throughput, low-latency operations while maintaining quality and reliability.