Skip to content

cairn-geocoder/cairn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

218 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Cairn

CI Security scan Code Scanning Alerts License: MIT License: Apache 2.0

Offline, airgap-ready geocoder written in Rust.

Cairn (n.) — a pile of stones marking a trail. Each tile is a stone. Drop the pile on disk. The geocoder reads it.

Status

Alpha. Forward search, autocomplete, fuzzy, layer filter, focus bias, structured search, and reverse geocoding all working end-to-end on a Liechtenstein dataset (OSM PBF + WhosOnFirst SQLite).

Try the live demo — 9 preset queries dispatched against a real cairn.kaldera.dev backend on Hetzner k3s with a Liechtenstein bundle. Free-form composer below the presets lets you craft any /v1/search, /v1/structured, or /v1/reverse call.

Goals

  • Forward + reverse geocoding, autocomplete, structured search
  • Single static binary + single bundle artifact (tar)
  • Zero network at runtime — full airgap deploy
  • Region extracts via tile-tree subset (Valhalla-style 3-level grid)
  • Single-machine commodity hardware, no cluster

Non-goals

  • Multi-tenant SaaS
  • Live OSM diff replication (planned post-MVP)
  • Cloud-native horizontal scaling

Architecture

Three layers:

  1. Builder (cairn-build) — ingests OSM PBF, WhosOnFirst SQLite, OpenAddresses CSV. Emits per-tile rkyv blobs, a tantivy text index, an admin polygon layer (bincode), and a centroid layer for nearest fallback. Writes a manifest.toml with blake3 hashes.
  2. Bundle — flat directory of immutable mmap-ready files.
  3. Server (cairn-serve) — axum HTTP API. Loads the bundle once at startup; no DB, no daemon dependencies.

Tile model

64-bit PlaceId: [level: 3 | tile_id: 22 | local_id: 39]

Level Cell size Contents
0 4° × 4° Countries, regions
1 1° × 1° Cities, counties, postcodes
2 0.25° × 0.25° Streets, addresses, POIs, neighborhoods

Workspace

crates/
  cairn-geocoder/         umbrella, re-exports
  cairn-place/            Place, PlaceId, schema (rkyv-archived)
  cairn-tile/             tile coords, manifest, blob IO, blake3 verify
  cairn-text/             tantivy index + autocomplete + fuzzy + geo-bias
  cairn-spatial/          R*-tree PIP for admin polygons + nearest centroids
  cairn-parse/            address parsing (libpostal FFI deferred)
  cairn-import-osm/       OSM PBF: place / POI nodes + named highway ways
  cairn-import-wof/       WhosOnFirst SPR + multilingual names + polygons
  cairn-import-oa/        OpenAddresses CSV
  cairn-import-geonames/  Geonames TSV (stub)
  cairn-api/              axum handlers
bins/
  cairn-build/            CLI build / extract / verify / info
  cairn-serve/            HTTP runtime

Quick start

# 1. Fetch source data (one-time, can be mirrored offline after)
mkdir -p data
curl -fsSL -o data/liechtenstein.osm.pbf \
  https://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf
curl -fsSL -o data/wof-li.db.bz2 \
  https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-li-latest.db.bz2
bunzip2 data/wof-li.db.bz2

# 2. Build the workspace
cargo build --release -p cairn-build -p cairn-serve

# 3. Build a bundle
./target/release/cairn-build build \
  --osm data/liechtenstein.osm.pbf \
  --wof data/wof-li.db \
  --out bundle \
  --bundle-id liechtenstein

# 4. Verify integrity
./target/release/cairn-build verify --bundle bundle
# OK: 6 tiles verified, text=ok, admin=ok, points=ok

# 5. Inspect
./target/release/cairn-build info --bundle bundle

# 6. Serve
./target/release/cairn-serve --bundle bundle --bind 127.0.0.1:8080

Endpoints

GET /healthz
GET /readyz                         200 ready / 503 if no text index
GET /v1/search                      forward + autocomplete
GET /v1/structured                  field-by-field search
GET /v1/reverse                     PIP + nearest fallback

/v1/search

Param Type Notes
q string (required) Free-text query.
mode search|autocomplete Default search.
limit int (1–100) Default 10.
fuzzy int 0–2 Edit distance. Forward mode only.
layer csv Restrict to kinds (e.g. country,city,street).
focus.lat, focus.lon float Focus point for distance-biased rerank.
focus.weight float Distance penalty weight (default 0.5).
curl 'http://localhost:8080/v1/search?q=Vaduz&layer=city&focus.lat=47.165&focus.lon=9.51'
curl 'http://localhost:8080/v1/search?q=Vad&mode=autocomplete'
curl 'http://localhost:8080/v1/search?q=vaaduz&fuzzy=2'

/v1/structured

Param Type Notes
house_number / road / unit string Address parts.
postcode / city / district / region / country string Admin parts.
limit, focus.* as above

Builds a concatenated query, picks a layer hint based on the finest non-empty field (address → street → city → region → country).

curl 'http://localhost:8080/v1/structured?road=Aeulestrasse&city=Vaduz'
curl 'http://localhost:8080/v1/structured?country=Liechtenstein'

/v1/reverse

Param Type Notes
lat, lon float (required)
limit int 1–50 Default 10.
nearest int 0–50 Fallback K-nearest centroids when PIP empty.

Response includes source: "pip" \| "nearest". PIP results are sorted finest containing polygon first; admin chain available via admin_path.

curl 'http://localhost:8080/v1/reverse?lat=47.141&lon=9.523'
curl 'http://localhost:8080/v1/reverse?lat=48.0&lon=10.5&nearest=5'

Bundle layout

bundle/
├── manifest.toml              schema, source hashes, per-tile blake3
├── tiles/<level>/<row>/<col>/<id>.bin     rkyv-archived Place blobs
├── index/text/                tantivy segments (mmap'd at runtime)
└── spatial/
    ├── admin.bin              bincode AdminLayer (polygons + metadata)
    └── points.bin             bincode PointLayer (centroids for nearest fallback)

Build sources

Source Format Coverage Loaded by
OpenStreetMap *.osm.pbf Global --osm
WhosOnFirst SQLite Per-country admin bundles --wof
OpenAddresses CSV Per-region authoritative addresses --oa
Geonames TSV Global populated places --geonames (stub)

Quality gates

cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace

26 unit tests cover Place ID encoding, tile blob roundtrip, tile blake3 corruption detection, OSM tag classification, OA row validation, WoF parent-chain walking, tantivy search/autocomplete/fuzzy/layer/focus, admin PIP ordering, and nearest-K queries.

Security

  • Continuous Trivy scans of the published image and the source tree (every push, every PR, daily cron). Results land in the GitHub Code Scanning tab as the live, version-pinned CVE list.
  • Bundle integrity is anchored by per-tile blake3 hashes in manifest.toml; cairn-build verify recomputes every tile, every spatial blob, and every tantivy segment hash and bails on mismatch.
  • API key auth and per-IP rate limiting are opt-in via env vars; X-Forwarded-For is honored only when the per-connection peer is inside a configured CIDR allowlist.
  • Full threat model, triage policy, and reporting instructions: SECURITY.md.

Roadmap

See ROADMAP.md for deferred phases (libpostal FFI, address interpolation, OSM admin relations, per-tile spatial partitioning, distribution tooling).

License

Dual-licensed: MIT OR Apache-2.0. Pick whichever fits.

References

About

Offline, airgap-ready geocoder written in Rust.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors