Skip to content

JSv4/VectorEmbedderMicroservice

Repository files navigation

Vector Embedder Microservice

A Flask-based microservice for generating text embeddings using SentenceTransformers models. Optimized for offline operation with models bundled in the Docker image.

Features

  • Offline Operation: Models are pre-downloaded during build, no internet required at runtime
  • Fast Cold Starts: Models bundled in image eliminate download time
  • Configurable Models: Use any SentenceTransformers model via build arguments
  • Optimized Concurrency: Configured for 10-50 concurrent requests
  • Google Cloud Run Ready: Optimized for serverless deployment

Build Arguments

The service supports customizing the embedding model at build time:

Available Build Arguments

Argument Default Description
EMBEDDING_MODEL multi-qa-MiniLM-L6-cos-v1 SentenceTransformers model for generating embeddings
TOKENIZER_MODEL sentence-transformers/multi-qa-MiniLM-L6-cos-v1 HuggingFace tokenizer model (should match embedding model)

Popular Model Options

Model Size Use Case
multi-qa-MiniLM-L6-cos-v1 (default) ~90MB Question answering, semantic search
all-MiniLM-L6-v2 ~80MB General purpose, fast inference
all-mpnet-base-v2 ~420MB High quality, slower inference
paraphrase-multilingual-MiniLM-L12-v2 ~470MB Multilingual support (50+ languages)

See SentenceTransformers documentation for more models.

Building the Image

Default Build (multi-qa-MiniLM-L6-cos-v1)

docker build -t vector-embedder-microservice .

Custom Model Build

# Using all-MiniLM-L6-v2 (general purpose)
docker build \
  --build-arg EMBEDDING_MODEL=all-MiniLM-L6-v2 \
  --build-arg TOKENIZER_MODEL=sentence-transformers/all-MiniLM-L6-v2 \
  -t vector-embedder-microservice .

# Using all-mpnet-base-v2 (higher quality)
docker build \
  --build-arg EMBEDDING_MODEL=all-mpnet-base-v2 \
  --build-arg TOKENIZER_MODEL=sentence-transformers/all-mpnet-base-v2 \
  -t vector-embedder-microservice .

# Using multilingual model
docker build \
  --build-arg EMBEDDING_MODEL=paraphrase-multilingual-MiniLM-L12-v2 \
  --build-arg TOKENIZER_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 \
  -t vector-embedder-microservice .

Using Pre-built Public Images

Public Docker images are automatically built and published to GitHub Container Registry (ghcr.io) whenever changes are pushed to the main branch.

Pull and Run from ghcr.io

# Pull the latest image (no authentication needed for public images)
docker pull ghcr.io/OWNER/REPO:latest

# Run the container
docker run -d \
  -e PORT=5001 \
  -e VECTOR_EMBEDDER_API_KEY=your-api-key \
  -p 5001:5001 \
  ghcr.io/OWNER/REPO:latest

Replace OWNER/REPO with your GitHub username and repository name (e.g., jman/vectorembeddermicroservice).

Available Tags

  • latest - Latest build from main branch
  • main - Same as latest
  • v1.0.0 - Specific version tags
  • v1.0 - Minor version tags
  • v1 - Major version tags
  • main-sha-abc1234 - Specific commit SHA

Making the Image Public

After the first build, you need to make the package public:

  1. Go to your GitHub repository
  2. Click Packages in the right sidebar
  3. Click on your package name
  4. Click Package settings (bottom of right sidebar)
  5. Scroll to Danger Zone
  6. Click Change visibilityPublic
  7. Type the package name to confirm

Once public, anyone can pull the image without authentication.

Automated Builds

The Docker image is automatically built and published by GitHub Actions:

Automatic triggers:

  • Push to main → Builds latest and main tags
  • Git tags (e.g., v1.0.0) → Builds versioned tags
  • Pull requests → Builds image but doesn't push

Manual builds: You can trigger a manual build with custom model selection:

  1. Go to Actions tab in GitHub
  2. Click Build and Publish Docker Image workflow
  3. Click Run workflow
  4. Optionally specify custom embedding and tokenizer models
  5. Click Run workflow

Creating versioned releases:

# Tag a version
git tag v1.0.0
git push origin v1.0.0

# This automatically builds and publishes:
# - ghcr.io/OWNER/REPO:v1.0.0
# - ghcr.io/OWNER/REPO:v1.0
# - ghcr.io/OWNER/REPO:v1
# - ghcr.io/OWNER/REPO:latest (if on main branch)

Building Custom Model Variants

To build images with different embedding models:

Via GitHub Actions (recommended):

  1. Go to ActionsBuild and Publish Docker Image
  2. Click Run workflow
  3. Set custom model parameters:
    • Embedding model: all-mpnet-base-v2
    • Tokenizer model: sentence-transformers/all-mpnet-base-v2
  4. Run workflow

This creates a tagged image with your custom model that you can reference by commit SHA.

Deploying to Google Cloud Run

You have two options for deploying to Google Cloud Run:

Option 1: Deploy from GitHub Container Registry (Easiest)

Deploy directly from the public ghcr.io image:

gcloud run deploy vector-embedder-microservice \
  --image ghcr.io/OWNER/REPO:latest \
  --region us-central1 \
  --memory 1Gi \
  --cpu 2 \
  --allow-unauthenticated \
  --set-env-vars VECTOR_EMBEDDER_API_KEY=your-api-key

This pulls the pre-built image from GitHub Container Registry, no build required!

Option 2: Build and Push to Google Artifact Registry

If you prefer to use Google's registry:

# Build locally
docker build -t vector-embedder-microservice .

# Tag for Google Artifact Registry
docker tag vector-embedder-microservice \
  us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice

# Push to registry
docker push us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice

# Deploy to Cloud Run
gcloud run deploy vector-embedder-microservice \
  --image us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice \
  --region us-central1 \
  --memory 1Gi \
  --cpu 2 \
  --set-env-vars VECTOR_EMBEDDER_API_KEY=your-api-key

Replace YOUR-PROJECT-ID with your Google Cloud project ID.

Resource Requirements

Minimum (default model)

  • Memory: 512MB
  • CPU: 1 vCPU
  • Disk: 600MB
  • Concurrency: 8 requests

Recommended (production)

  • Memory: 1GB
  • CPU: 2 vCPU
  • Disk: 600MB-1GB (depending on model)
  • Concurrency: 16 requests (2 workers × 8 threads)

High Performance (larger models)

  • Memory: 2GB+
  • CPU: 4 vCPU
  • Disk: 1GB+
  • Concurrency: 32 requests (4 workers × 8 threads)

Testing Offline Capability

# Run container without network access
docker run --network none \
  -e PORT=5001 \
  -e VECTOR_EMBEDDER_API_KEY=test123 \
  -p 5001:5001 \
  vector-embedder-microservice

# Test the endpoint
curl -X POST http://localhost:5001/embeddings \
  -H "Content-Type: application/json" \
  -H "X-API-Key: test123" \
  -d '{"text": "This is a test sentence"}'

API Usage

Generate Embeddings

Endpoint: POST /embeddings

Headers:

  • Content-Type: application/json
  • X-API-Key: <your-api-key>

Request Body:

{
  "text": "Your text to embed"
}

Response:

{
  "embeddings": [0.123, -0.456, 0.789, ...]
}

Configuration

Environment Variables

Variable Required Default Description
VECTOR_EMBEDDER_API_KEY No abc123 API key for authentication
PORT No 5001 Port to run the service on
EMBEDDING_MODEL No From build arg Override model at runtime (not recommended)
TOKENIZER_MODEL No From build arg Override tokenizer at runtime (not recommended)

Note: EMBEDDING_MODEL and TOKENIZER_MODEL are set during build. Only override at runtime if you have the desired models already cached in the image.

Development

Running Tests Locally

The project includes comprehensive unit tests for both the embedding logic and API endpoints.

Install development dependencies:

pip install -r requirements-dev.txt

Run all tests:

pytest

Run tests with coverage report:

pytest --cov=. --cov-report=html

Run specific test file:

pytest test_embeddings.py
pytest test_main.py

View coverage report: After running tests with coverage, open htmlcov/index.html in your browser to see a detailed coverage report.

Test Structure

  • test_embeddings.py: Tests for embedding generation and text chunking logic

    • Model loading configuration
    • Text chunking with various lengths
    • Embedding generation and averaging
    • Edge cases and error handling
  • test_main.py: Tests for Flask API endpoints

    • Authentication and API key validation
    • Request/response format validation
    • Error handling (missing fields, invalid data)
    • HTTP method validation

Continuous Integration

Tests run automatically on:

  • Push to main/develop branches
  • Pull requests to main/develop
  • Manual workflow dispatch

The CI pipeline:

  1. Tests against Python 3.9, 3.10, and 3.11
  2. Runs linting checks with flake8
  3. Executes full test suite with pytest
  4. Generates and uploads coverage reports
  5. Archives coverage HTML report as artifact

View test results in the Actions tab of the GitHub repository.

About

Flask microservice to create vector embedders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages