Vector Embedder Microservice

A Flask-based microservice for generating text embeddings using SentenceTransformers models. Optimized for offline operation with models bundled in the Docker image.

Features

Offline Operation: Models are pre-downloaded during build, no internet required at runtime
Fast Cold Starts: Models bundled in image eliminate download time
Configurable Models: Use any SentenceTransformers model via build arguments
Optimized Concurrency: Configured for 10-50 concurrent requests
Google Cloud Run Ready: Optimized for serverless deployment

Build Arguments

The service supports customizing the embedding model at build time:

Available Build Arguments

Argument	Default	Description
`EMBEDDING_MODEL`	`multi-qa-MiniLM-L6-cos-v1`	SentenceTransformers model for generating embeddings
`TOKENIZER_MODEL`	`sentence-transformers/multi-qa-MiniLM-L6-cos-v1`	HuggingFace tokenizer model (should match embedding model)

Popular Model Options

Model	Size	Use Case
`multi-qa-MiniLM-L6-cos-v1` (default)	~90MB	Question answering, semantic search
`all-MiniLM-L6-v2`	~80MB	General purpose, fast inference
`all-mpnet-base-v2`	~420MB	High quality, slower inference
`paraphrase-multilingual-MiniLM-L12-v2`	~470MB	Multilingual support (50+ languages)

See SentenceTransformers documentation for more models.

Building the Image

Default Build (multi-qa-MiniLM-L6-cos-v1)

docker build -t vector-embedder-microservice .

Custom Model Build

# Using all-MiniLM-L6-v2 (general purpose)
docker build \
  --build-arg EMBEDDING_MODEL=all-MiniLM-L6-v2 \
  --build-arg TOKENIZER_MODEL=sentence-transformers/all-MiniLM-L6-v2 \
  -t vector-embedder-microservice .

# Using all-mpnet-base-v2 (higher quality)
docker build \
  --build-arg EMBEDDING_MODEL=all-mpnet-base-v2 \
  --build-arg TOKENIZER_MODEL=sentence-transformers/all-mpnet-base-v2 \
  -t vector-embedder-microservice .

# Using multilingual model
docker build \
  --build-arg EMBEDDING_MODEL=paraphrase-multilingual-MiniLM-L12-v2 \
  --build-arg TOKENIZER_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 \
  -t vector-embedder-microservice .

Using Pre-built Public Images

Public Docker images are automatically built and published to GitHub Container Registry (ghcr.io) whenever changes are pushed to the main branch.

Pull and Run from ghcr.io

# Pull the latest image (no authentication needed for public images)
docker pull ghcr.io/OWNER/REPO:latest

# Run the container
docker run -d \
  -e PORT=5001 \
  -e VECTOR_EMBEDDER_API_KEY=your-api-key \
  -p 5001:5001 \
  ghcr.io/OWNER/REPO:latest

Replace OWNER/REPO with your GitHub username and repository name (e.g., jman/vectorembeddermicroservice).

Available Tags

latest - Latest build from main branch
main - Same as latest
v1.0.0 - Specific version tags
v1.0 - Minor version tags
v1 - Major version tags
main-sha-abc1234 - Specific commit SHA

Making the Image Public

After the first build, you need to make the package public:

Go to your GitHub repository
Click Packages in the right sidebar
Click on your package name
Click Package settings (bottom of right sidebar)
Scroll to Danger Zone
Click Change visibility → Public
Type the package name to confirm

Once public, anyone can pull the image without authentication.

Automated Builds

The Docker image is automatically built and published by GitHub Actions:

Automatic triggers:

Push to main → Builds latest and main tags
Git tags (e.g., v1.0.0) → Builds versioned tags
Pull requests → Builds image but doesn't push

Manual builds: You can trigger a manual build with custom model selection:

Go to Actions tab in GitHub
Click Build and Publish Docker Image workflow
Click Run workflow
Optionally specify custom embedding and tokenizer models
Click Run workflow

Creating versioned releases:

# Tag a version
git tag v1.0.0
git push origin v1.0.0

# This automatically builds and publishes:
# - ghcr.io/OWNER/REPO:v1.0.0
# - ghcr.io/OWNER/REPO:v1.0
# - ghcr.io/OWNER/REPO:v1
# - ghcr.io/OWNER/REPO:latest (if on main branch)

Building Custom Model Variants

To build images with different embedding models:

Via GitHub Actions (recommended):

Go to Actions → Build and Publish Docker Image
Click Run workflow
Set custom model parameters:
- Embedding model: all-mpnet-base-v2
- Tokenizer model: sentence-transformers/all-mpnet-base-v2
Run workflow

This creates a tagged image with your custom model that you can reference by commit SHA.

Deploying to Google Cloud Run

You have two options for deploying to Google Cloud Run:

Option 1: Deploy from GitHub Container Registry (Easiest)

Deploy directly from the public ghcr.io image:

gcloud run deploy vector-embedder-microservice \
  --image ghcr.io/OWNER/REPO:latest \
  --region us-central1 \
  --memory 1Gi \
  --cpu 2 \
  --allow-unauthenticated \
  --set-env-vars VECTOR_EMBEDDER_API_KEY=your-api-key

This pulls the pre-built image from GitHub Container Registry, no build required!

Option 2: Build and Push to Google Artifact Registry

If you prefer to use Google's registry:

# Build locally
docker build -t vector-embedder-microservice .

# Tag for Google Artifact Registry
docker tag vector-embedder-microservice \
  us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice

# Push to registry
docker push us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice

# Deploy to Cloud Run
gcloud run deploy vector-embedder-microservice \
  --image us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice \
  --region us-central1 \
  --memory 1Gi \
  --cpu 2 \
  --set-env-vars VECTOR_EMBEDDER_API_KEY=your-api-key

Replace YOUR-PROJECT-ID with your Google Cloud project ID.

Resource Requirements

Minimum (default model)

Memory: 512MB
CPU: 1 vCPU
Disk: 600MB
Concurrency: 8 requests

Recommended (production)

Memory: 1GB
CPU: 2 vCPU
Disk: 600MB-1GB (depending on model)
Concurrency: 16 requests (2 workers × 8 threads)

High Performance (larger models)

Memory: 2GB+
CPU: 4 vCPU
Disk: 1GB+
Concurrency: 32 requests (4 workers × 8 threads)

Testing Offline Capability

# Run container without network access
docker run --network none \
  -e PORT=5001 \
  -e VECTOR_EMBEDDER_API_KEY=test123 \
  -p 5001:5001 \
  vector-embedder-microservice

# Test the endpoint
curl -X POST http://localhost:5001/embeddings \
  -H "Content-Type: application/json" \
  -H "X-API-Key: test123" \
  -d '{"text": "This is a test sentence"}'

API Usage

Generate Embeddings

Endpoint: POST /embeddings

Headers:

Content-Type: application/json
X-API-Key: <your-api-key>

Request Body:

{
  "text": "Your text to embed"
}

Response:

{
  "embeddings": [0.123, -0.456, 0.789, ...]
}

Configuration

Environment Variables

Variable	Required	Default	Description
`VECTOR_EMBEDDER_API_KEY`	No	`abc123`	API key for authentication
`PORT`	No	`5001`	Port to run the service on
`EMBEDDING_MODEL`	No	From build arg	Override model at runtime (not recommended)
`TOKENIZER_MODEL`	No	From build arg	Override tokenizer at runtime (not recommended)

Note: EMBEDDING_MODEL and TOKENIZER_MODEL are set during build. Only override at runtime if you have the desired models already cached in the image.

Development

Running Tests Locally

The project includes comprehensive unit tests for both the embedding logic and API endpoints.

Install development dependencies:

pip install -r requirements-dev.txt

Run all tests:

pytest

Run tests with coverage report:

pytest --cov=. --cov-report=html

Run specific test file:

pytest test_embeddings.py
pytest test_main.py

View coverage report: After running tests with coverage, open htmlcov/index.html in your browser to see a detailed coverage report.

Test Structure

test_embeddings.py: Tests for embedding generation and text chunking logic
- Model loading configuration
- Text chunking with various lengths
- Embedding generation and averaging
- Edge cases and error handling
test_main.py: Tests for Flask API endpoints
- Authentication and API key validation
- Request/response format validation
- Error handling (missing fields, invalid data)
- HTTP method validation

Continuous Integration

Tests run automatically on:

Push to main/develop branches
Pull requests to main/develop
Manual workflow dispatch

The CI pipeline:

Tests against Python 3.9, 3.10, and 3.11
Runs linting checks with flake8
Executes full test suite with pytest
Generates and uploads coverage reports
Archives coverage HTML report as artifact

View test results in the Actions tab of the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
.idea		.idea
__pycache__		__pycache__
.coveragerc		.coveragerc
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
embeddings.py		embeddings.py
main.py		main.py
preload_models.py		preload_models.py
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_embeddings.py		test_embeddings.py
test_main.py		test_main.py

JSv4/VectorEmbedderMicroservice

Folders and files

Latest commit

History

Repository files navigation