Hiro is a local semantic search engine / AI knowledgebase. It crawls web pages, turns their content into embeddings with a SentenceTransformer model, stores those embeddings in Postgres using pgvector, and exposes a small web UI for vector-based search.
At a high level, the system works like this:
Website URL
│
▼
Protagonist crawler
│ crawls pages and extracts title/body/description
▼
Wintermute embedding service
│ creates embeddings for crawled content
▼
Postgres + pgvector
│ stores documents and vectors
▼
Wintermute search service
│ embeds user queries and runs vector similarity search
▼
Yours-Truly web UI
│ displays matching pages in the browser
protagonist/ is a Go command-line crawler. Its entry point is:
protagonist/cmd/crawl.go
It uses Colly to crawl pages from a starting URL up to a configurable maximum depth. For every fetched page, it extracts:
- the page URL
- the
<title>text - the
<meta name="description">content - the body text
- discovered links
It then sends the page to Wintermute's embedding service over gRPC at localhost:50052.
wintermute/ is a Python backend exposing two gRPC services.
The embedding service is implemented in:
wintermute/embed/server.py
It receives crawled pages using the service defined in proto/embedding.proto. For each page, it loads the BAAI/bge-base-en SentenceTransformer model, embeds the page content, and inserts or updates the document in Postgres.
The search service is implemented in:
wintermute/search/server.py
It receives search queries using the service defined in proto/search.proto. For each query, it creates an embedding with the same model, calls the Postgres match_documents(...) function, and returns the highest-ranked hybrid matches.
Postgres stores the crawled documents and their embeddings in a documents table. The embedding column uses pgvector's vector(768) type, matching the output size of BAAI/bge-base-en.
Search uses a hybrid ranking strategy:
- semantic similarity via pgvector cosine distance
- keyword relevance via PostgreSQL full-text search using
tsvector,websearch_to_tsquery, andts_rank_cd
The current fixed ranking blend is 70% semantic similarity and 30% keyword relevance. The README setup below creates the table, the match_documents(...) helper function, an HNSW vector index, and a GIN full-text index.
yours-truly/ is a Go web application using Fiber, Go HTML templates, HTMX, and static CSS/JS. Its entry point is:
yours-truly/cmd/server.go
It serves the search UI on port 8973. When a user searches, it calls Wintermute's search gRPC service at localhost:50053, transforms the response into display-friendly results, and renders them in:
yours-truly/views/search.gohtml
HTMX is used for partial search-result updates through /htmx/search, while /search?q=... renders the full page.
The gRPC contracts live in:
proto/embedding.proto
proto/search.proto
These definitions are used to generate Python gRPC stubs for Wintermute and Go gRPC clients for Protagonist and Yours-Truly.
A small evaluation harness lives in:
eval/run_eval.py
eval/queries.example.json
It measures search quality against a labeled set of queries. Create eval/queries.json with either binary relevance:
[
{
"query": "contact support",
"relevant_urls": [
"https://example.com/contact",
"https://example.com/help"
]
}
]or graded relevance for NDCG:
[
{
"query": "pricing plans",
"relevance": {
"https://example.com/pricing": 2,
"https://example.com/blog/how-pricing-works": 1
}
}
]Then run it while the Wintermute search service is running:
uv run python eval/run_eval.py --queries eval/queries.json --host localhost:50053The harness reports:
Precision@k: how many of the topkresults are relevantRecall@k: how many known relevant documents were found in the topkMRR: whether the first relevant result appears near the topMAP: ranking quality across all relevant resultsNDCG@k: ranking quality with graded relevance labels
Useful options:
uv run python eval/run_eval.py --queries eval/queries.json --k 1 5 10 --show-cases
uv run python eval/run_eval.py --queries eval/queries.json --json-output eval/results.json- Search pagination fields exist in
proto/search.proto, but the current search service always requests 10 matches. - The SentenceTransformer device is hardcoded to
mps, which is appropriate for Apple Silicon. Usecpuorcudaif running elsewhere. - Local service hosts and the Postgres DSN are currently hardcoded for development.
- Protagonist: Crawler
- Wintermute: indexer + hybrid query engine
- Yours-Truly: Search UI
- postgres + pgvector: Database, embedding storage, full-text search, and vector search
Start Postgres with pgvector and run database migrations:
docker compose up -d postgres
go install github.com/jackc/tern/v2@latest # if tern is not already installed
tern migrate --migrations db/migrations --config db/tern.confInstall Python dependencies with uv:
uv syncIf you do not have uv installed, see https://docs.astral.sh/uv/getting-started/installation/.
Run Wintermute's embedding and search services in separate terminals:
uv run python -m wintermute.embed.server
uv run python -m wintermute.search.serverCrawl a site into the embedding index:
cd protagonist
go run ./cmd -url https://example.com -max-depth 2Run the web UI:
cd yours-truly
go run ./cmdThen open:
http://localhost:8973
docker-compose.yml starts Postgres with pgvector on host port 51432. Schema changes are managed with tern.
Install tern if needed:
go install github.com/jackc/tern/v2@latestRun migrations:
tern migrate --migrations db/migrations --config db/tern.confThe migrations create:
- the
vectorextension - the
documentstable - the
match_documents(...)hybrid search function - the HNSW embedding index
- the generated full-text
search_vectorcolumn and GIN index
The local development defaults match the current application code:
dbname=hiro user=hiro password=hiro host=localhost port=51432
Generated gRPC stubs are committed intentionally so normal development does not require protoc. Regenerate them only when files in proto/ change.
Python stubs for Wintermute:
uv run python -m grpc_tools.protoc -I proto \
--python_out=wintermute/embed/stubs \
--pyi_out=wintermute/embed/stubs \
--grpc_python_out=wintermute/embed/stubs \
proto/embedding.proto
uv run python -m grpc_tools.protoc -I proto \
--python_out=wintermute/search/stubs \
--pyi_out=wintermute/search/stubs \
--grpc_python_out=wintermute/search/stubs \
proto/search.protoIf regenerated, keep the Python gRPC imports package-relative:
from . import embedding_pb2 as embedding__pb2
from . import search_pb2 as search__pb2Go stubs for Protagonist and Yours-Truly:
cd protagonist
protoc -I=../proto --go_out=adapters/index/grpc --go_opt=paths=source_relative \
--go-grpc_out=adapters/index/grpc --go-grpc_opt=paths=source_relative \
../proto/embedding.proto
cd ../yours-truly
protoc -I=../proto --go_out=adapters/search/grpc --go_opt=paths=source_relative \
--go-grpc_out=adapters/search/grpc --go-grpc_opt=paths=source_relative \
../proto/search.proto