Minimal vector search stack:
server(Go): HTTP API for training, adding vectors, and searching.faiss(Python/FastAPI): FAISS index service (GPU-first).
Configuration is TOML (config/vsearch.toml) and is mounted into containers at /etc/vsearch/vsearch.toml.
client (embeddings) ──HTTP──> server (Go) ──HTTP──> faiss service (FastAPI + FAISS)
- The system is vector-in / vector-out. There is no embedding model or document ingestion yet.
- Index parameters live under
[index]inconfig/vsearch.tomland are consumed by the FAISS service.
Prereqs:
- Docker + Docker Compose v2
- NVIDIA Container Toolkit + a GPU-capable runtime (required for the
faissservice today)
Start the stack:
docker compose -f docker/compose.yaml up --buildNo local GPU? You can still run the Go API container and point it at a remote FAISS service:
docker compose -f docker/compose.yaml up --no-deps --build serverEndpoints (default):
- API server:
http://localhost:8080 - FAISS service:
http://localhost:50051
Health:
curl -sS http://localhost:8080/health
curl -sS http://localhost:50051/healthSingle source of truth: config/vsearch.toml.
Key sections:
[server]: bind address, timeouts[faiss]: where the Go server reaches the FAISS service[index]: FAISS index shape and search params (dimension,metric,nlist,nprobe)
Notes:
index.use_gpuis currently not wired in the FAISS service; it always attempts GPU initialization.server.max_concurrent_requestsand[metrics]are present in config but not enforced/exposed in the Go server yet.
The Go server exposes:
GET /healthPOST /v1/train(train IVF)POST /v1/vectors(add vectors with external IDs)POST /v1/search(kNN search)
All vector payloads must have inner length index.dimension.
For a quick manual test, temporarily set:
index.dimension = 4index.nlist = 1index.nprobe = 1
Then restart containers and run:
Train:
curl -sS -X POST http://localhost:8080/v1/train \
-H 'Content-Type: application/json' \
-d '{"vectors":[[0.0,0.0,0.0,0.0],[1.0,1.0,1.0,1.0],[2.0,2.0,2.0,2.0]]}'Add:
curl -sS -X POST http://localhost:8080/v1/vectors \
-H 'Content-Type: application/json' \
-d '{"vectors":[[0.1,0.1,0.1,0.1],[1.1,1.1,1.1,1.1]],"ids":["a","b"]}'Search:
curl -sS -X POST http://localhost:8080/v1/search \
-H 'Content-Type: application/json' \
-d '{"vectors":[[0.0,0.0,0.0,0.0]],"k":2}'cmd/vsearch/: Go entrypointinternal/server/: HTTP routes and request/response shapesinternal/faiss/: Go client for the FAISS servicefaiss_service/: Python FastAPI service hosting the FAISS indexconfig/vsearch.toml: runtime config (mounted into containers)docker/: Dockerfiles anddocker/compose.yaml
- The FAISS index is in-memory only; there is no persistence/snapshots.
- No ingestion pipeline: PDFs/HTML/code parsing, chunking, embeddings, dedup, and backfills are not implemented.
- No authn/z, multitenancy, quotas, or schema/versioning for stored content.
- Metrics are not exposed from the Go server (package exists but is unused).
See PHASE2_PLAN.md.