Knowledge corpus service with vector search - your AI's long-term memory.
Engram is a FastAPI-based service providing semantic search over a knowledge corpus using PostgreSQL with pgvector. It runs on GPU-enabled infrastructure for fast embedding generation with sentence-transformers.
- Python: 3.11+ (3.12+ recommended for production)
- GPU: NVIDIA GPU with CUDA support
- Base Image: NVIDIA PyTorch 25.12 (for GPU deployment)
- Database: PostgreSQL 16 with pgvector extension
Internet/Tailscale → Caddy (TLS termination) → Engram (port 8800) → PostgreSQL + pgvector
-
Caddy: Reverse proxy with automatic TLS via Step CA ACME
- Terminates TLS at
engram.your-domain.example.com - Proxies to Engram on
localhost:8800
- Terminates TLS at
-
Engram App: FastAPI service
- GPU-accelerated embedding generation
- Runs in NVIDIA PyTorch container
- Uses BAAI/bge-large-en-v1.5 model (1024-dim embeddings)
-
PostgreSQL: Vector database backend
- pgvector extension for similarity search
- Separate container on internal network
TLS certificates are automatically managed via:
- Step CA: Internal ACME server at
your-ca.example.com:9000 - Auto-renewal: Caddy handles certificate lifecycle
- Zero-trust: Mesh-only access via Tailscale
See deploy/README.md for deployment instructions.
# Install dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run locally
python -m engram.cliMIT