The Embedder Module is a document processing service that generates vector embeddings for text content. It uses Deep Java Library (DJL) with PyTorch to create high-quality embeddings for document chunks and fields, enabling semantic search and similarity matching in the Rokkon pipeline.
- Generates vector embeddings for document chunks
- Supports multiple embedding models
- GPU acceleration (when available)
- Configurable batch processing for optimal performance
- Reactive processing with backpressure handling
- Comprehensive error handling
- Automatic model selection and fallback
The Embedder Module supports various embedding models, including:
- ALL_MINILM_L6_V2 (default): Lightweight and fast, great for general-purpose sentence embeddings
- ALL_MPNET_BASE_V2: Higher accuracy than MiniLM, though a bit larger and slower
- ALL_DISTILROBERTA_V1: Based on DistilRoBERTa; good balance between performance and speed
- PARAPHRASE_MINILM_L3_V2: Even smaller and faster than L6 or L12 versions; ideal for low-latency scenarios
- PARAPHRASE_MULTILINGUAL_MINILM_L12_V2: Multilingual support (50+ languages) + small model size
- E5_SMALL_V2: Smaller sibling of e5-base-v2, good for retrieval tasks, especially with query-document use cases
- E5_LARGE_V2: Larger and more accurate than e5-base-v2, better embeddings at the cost of speed/memory
- MULTI_QA_MINILM_L6_COS_V1: Fine-tuned for semantic search and QA
The Embedder Module:
- Receives documents with chunks through gRPC
- Processes chunks in optimized batches
- Generates vector embeddings using the configured model
- Adds embeddings to the document chunks
- Returns the document with embedded chunks
- JDK 21 or later
- Docker (for containerized deployment)
- Access to Pipeline Engine and Consul services
- GPU with CUDA support (optional, for accelerated processing)
The module can be built using the provided docker-build.sh script:
# Build in production mode
./docker-build.sh
# Build in development mode
./docker-build.sh devAfter building, you can run the module using Docker:
# Production mode
docker run -i --rm -p 49093:49093 \
-e ENGINE_HOST=engine \
-e CONSUL_HOST=consul \
rokkon/embedder-module:latest
# Development mode (uses host networking)
docker run -i --rm --network=host rokkon/embedder-module:devUse the provided test-docker.sh script to test the module:
# Test in development mode
./test-docker.sh
# Test in production mode
./test-docker.sh prodMODULE_HOST: Host address for the module (default: 0.0.0.0)MODULE_PORT: Port for the module (default: 9090)ENGINE_HOST: Host address for the Rokkon Engine (default: localhost)ENGINE_PORT: Port for the Rokkon Engine (default: 8081)CONSUL_HOST: Host address for Consul (default: empty)CONSUL_PORT: Port for Consul (default: -1)HEALTH_CHECK: Whether to perform health checks (default: true)MAX_RETRIES: Maximum number of registration retries (default: 3)
The embedder accepts the following configuration options in the custom_json_config:
embedding_models: List of embedding models to use (default: ["ALL_MINILM_L6_V2"])check_chunks: Whether to check for and process chunks in the document (default: true)check_document_fields: Whether to check and process document fields if chunks are not present (default: true)document_fields: List of document fields to process if chunks are not present (default: ["body", "title"])custom_field_mappings: Custom field mappings for embedding specific fields (default: [])process_keywords: Whether to process keywords in the document (default: true)keyword_ngram_sizes: List of n-gram sizes to use for keywords (default: [1])max_token_size: Maximum token size for text to be embedded (default: 512)max_batch_size: Maximum batch size for GPU processing (default: 32)backpressure_strategy: Strategy for handling backpressure in reactive streams (default: "DROP_OLDEST")log_prefix: Prefix to add to log messages (default: "")result_set_name_template: Template for naming the result set (default: "%s_embeddings_%s")
The Embedder Module integrates with the Rokkon pipeline through:
- gRPC Service: Implements the
PipeStepProcessorservice defined in the Rokkon protobuf - Registration: Automatically registers with the Rokkon Engine on startup
- Health Checks: Provides health status through the standard gRPC health check protocol
The module is built with Quarkus and uses:
- Deep Java Library (DJL) for ML model inference
- PyTorch as the ML engine
- Mutiny for reactive programming
- Micrometer and OpenTelemetry for observability
To contribute to the module, follow the standard Rokkon development workflow.
The Embedder Module supports processing documents that have been chunked multiple times. When a document has been processed by the chunker module twice, the embedder will generate embeddings for all chunks, resulting in 4 sets of embeddings (2 chunks after 2 embedding runs).