A Flask-based microservice for generating text embeddings using SentenceTransformers models. Optimized for offline operation with models bundled in the Docker image.
- Offline Operation: Models are pre-downloaded during build, no internet required at runtime
- Fast Cold Starts: Models bundled in image eliminate download time
- Configurable Models: Use any SentenceTransformers model via build arguments
- Optimized Concurrency: Configured for 10-50 concurrent requests
- Google Cloud Run Ready: Optimized for serverless deployment
The service supports customizing the embedding model at build time:
| Argument | Default | Description |
|---|---|---|
EMBEDDING_MODEL |
multi-qa-MiniLM-L6-cos-v1 |
SentenceTransformers model for generating embeddings |
TOKENIZER_MODEL |
sentence-transformers/multi-qa-MiniLM-L6-cos-v1 |
HuggingFace tokenizer model (should match embedding model) |
| Model | Size | Use Case |
|---|---|---|
multi-qa-MiniLM-L6-cos-v1 (default) |
~90MB | Question answering, semantic search |
all-MiniLM-L6-v2 |
~80MB | General purpose, fast inference |
all-mpnet-base-v2 |
~420MB | High quality, slower inference |
paraphrase-multilingual-MiniLM-L12-v2 |
~470MB | Multilingual support (50+ languages) |
See SentenceTransformers documentation for more models.
docker build -t vector-embedder-microservice .# Using all-MiniLM-L6-v2 (general purpose)
docker build \
--build-arg EMBEDDING_MODEL=all-MiniLM-L6-v2 \
--build-arg TOKENIZER_MODEL=sentence-transformers/all-MiniLM-L6-v2 \
-t vector-embedder-microservice .
# Using all-mpnet-base-v2 (higher quality)
docker build \
--build-arg EMBEDDING_MODEL=all-mpnet-base-v2 \
--build-arg TOKENIZER_MODEL=sentence-transformers/all-mpnet-base-v2 \
-t vector-embedder-microservice .
# Using multilingual model
docker build \
--build-arg EMBEDDING_MODEL=paraphrase-multilingual-MiniLM-L12-v2 \
--build-arg TOKENIZER_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 \
-t vector-embedder-microservice .Public Docker images are automatically built and published to GitHub Container Registry (ghcr.io) whenever changes are pushed to the main branch.
# Pull the latest image (no authentication needed for public images)
docker pull ghcr.io/OWNER/REPO:latest
# Run the container
docker run -d \
-e PORT=5001 \
-e VECTOR_EMBEDDER_API_KEY=your-api-key \
-p 5001:5001 \
ghcr.io/OWNER/REPO:latestReplace OWNER/REPO with your GitHub username and repository name (e.g., jman/vectorembeddermicroservice).
latest- Latest build from main branchmain- Same as latestv1.0.0- Specific version tagsv1.0- Minor version tagsv1- Major version tagsmain-sha-abc1234- Specific commit SHA
After the first build, you need to make the package public:
- Go to your GitHub repository
- Click Packages in the right sidebar
- Click on your package name
- Click Package settings (bottom of right sidebar)
- Scroll to Danger Zone
- Click Change visibility → Public
- Type the package name to confirm
Once public, anyone can pull the image without authentication.
The Docker image is automatically built and published by GitHub Actions:
Automatic triggers:
- Push to main → Builds
latestandmaintags - Git tags (e.g.,
v1.0.0) → Builds versioned tags - Pull requests → Builds image but doesn't push
Manual builds: You can trigger a manual build with custom model selection:
- Go to Actions tab in GitHub
- Click Build and Publish Docker Image workflow
- Click Run workflow
- Optionally specify custom embedding and tokenizer models
- Click Run workflow
Creating versioned releases:
# Tag a version
git tag v1.0.0
git push origin v1.0.0
# This automatically builds and publishes:
# - ghcr.io/OWNER/REPO:v1.0.0
# - ghcr.io/OWNER/REPO:v1.0
# - ghcr.io/OWNER/REPO:v1
# - ghcr.io/OWNER/REPO:latest (if on main branch)To build images with different embedding models:
Via GitHub Actions (recommended):
- Go to Actions → Build and Publish Docker Image
- Click Run workflow
- Set custom model parameters:
- Embedding model:
all-mpnet-base-v2 - Tokenizer model:
sentence-transformers/all-mpnet-base-v2
- Embedding model:
- Run workflow
This creates a tagged image with your custom model that you can reference by commit SHA.
You have two options for deploying to Google Cloud Run:
Deploy directly from the public ghcr.io image:
gcloud run deploy vector-embedder-microservice \
--image ghcr.io/OWNER/REPO:latest \
--region us-central1 \
--memory 1Gi \
--cpu 2 \
--allow-unauthenticated \
--set-env-vars VECTOR_EMBEDDER_API_KEY=your-api-keyThis pulls the pre-built image from GitHub Container Registry, no build required!
If you prefer to use Google's registry:
# Build locally
docker build -t vector-embedder-microservice .
# Tag for Google Artifact Registry
docker tag vector-embedder-microservice \
us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice
# Push to registry
docker push us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice
# Deploy to Cloud Run
gcloud run deploy vector-embedder-microservice \
--image us-central1-docker.pkg.dev/YOUR-PROJECT-ID/models/vector-embedder-microservice \
--region us-central1 \
--memory 1Gi \
--cpu 2 \
--set-env-vars VECTOR_EMBEDDER_API_KEY=your-api-keyReplace YOUR-PROJECT-ID with your Google Cloud project ID.
- Memory: 512MB
- CPU: 1 vCPU
- Disk: 600MB
- Concurrency: 8 requests
- Memory: 1GB
- CPU: 2 vCPU
- Disk: 600MB-1GB (depending on model)
- Concurrency: 16 requests (2 workers × 8 threads)
- Memory: 2GB+
- CPU: 4 vCPU
- Disk: 1GB+
- Concurrency: 32 requests (4 workers × 8 threads)
# Run container without network access
docker run --network none \
-e PORT=5001 \
-e VECTOR_EMBEDDER_API_KEY=test123 \
-p 5001:5001 \
vector-embedder-microservice
# Test the endpoint
curl -X POST http://localhost:5001/embeddings \
-H "Content-Type: application/json" \
-H "X-API-Key: test123" \
-d '{"text": "This is a test sentence"}'Endpoint: POST /embeddings
Headers:
Content-Type: application/jsonX-API-Key: <your-api-key>
Request Body:
{
"text": "Your text to embed"
}Response:
{
"embeddings": [0.123, -0.456, 0.789, ...]
}| Variable | Required | Default | Description |
|---|---|---|---|
VECTOR_EMBEDDER_API_KEY |
No | abc123 |
API key for authentication |
PORT |
No | 5001 |
Port to run the service on |
EMBEDDING_MODEL |
No | From build arg | Override model at runtime (not recommended) |
TOKENIZER_MODEL |
No | From build arg | Override tokenizer at runtime (not recommended) |
Note: EMBEDDING_MODEL and TOKENIZER_MODEL are set during build. Only override at runtime if you have the desired models already cached in the image.
The project includes comprehensive unit tests for both the embedding logic and API endpoints.
Install development dependencies:
pip install -r requirements-dev.txtRun all tests:
pytestRun tests with coverage report:
pytest --cov=. --cov-report=htmlRun specific test file:
pytest test_embeddings.py
pytest test_main.pyView coverage report:
After running tests with coverage, open htmlcov/index.html in your browser to see a detailed coverage report.
-
test_embeddings.py: Tests for embedding generation and text chunking logic
- Model loading configuration
- Text chunking with various lengths
- Embedding generation and averaging
- Edge cases and error handling
-
test_main.py: Tests for Flask API endpoints
- Authentication and API key validation
- Request/response format validation
- Error handling (missing fields, invalid data)
- HTTP method validation
Tests run automatically on:
- Push to main/develop branches
- Pull requests to main/develop
- Manual workflow dispatch
The CI pipeline:
- Tests against Python 3.9, 3.10, and 3.11
- Runs linting checks with flake8
- Executes full test suite with pytest
- Generates and uploads coverage reports
- Archives coverage HTML report as artifact
View test results in the Actions tab of the GitHub repository.