Maharashtra's official cadastral plot outlines sit several metres off the real fields — an artifact of how hand-drawn paper maps were georeferenced onto satellite imagery. This is a method that, for each plot, predicts its true on-the-ground boundary, attaches a calibrated confidence, and flags the plots it can't responsibly place.
It runs from classical computer vision + a calibrated confidence model — no GPU, no deep
learning, no API keys. One command turns a village bundle into a contract-valid
predictions.geojson.
uv sync
uv run python -m src.pipeline # both villages → data/<village>/predictions.geojson
uv run python -m src.pipeline data/<village_dir> # a single (e.g. hidden) village bundle
uv run python -m pytest # 15 unit tests(Windows console: prefix with $env:PYTHONUTF8 = "1" so unicode prints don't crash.)
Data: the village rasters (
imagery.tif,boundaries.tif) are kept out of git to stay lean, so a fresh clone won't have them. Drop each bundle's files intodata/<village_slug>/(alongsideinput.geojson) before running. The pipeline takes anyvillage_dirand needs noexample_truths.geojson, so it runs on hidden villages as-is.
The boundary in the land record is the right shape but the wrong place — and sometimes the shape itself disagrees with the recorded area, in which case no amount of moving will fix it. The judgment being tested: which edge is right, how much to trust it, and which records to believe.
A six-stage pipeline, every parameter derived from village statistics — nothing is tuned to the example truths.
-
Edge-affinity signal (
src/signals.py) — per plot-patch, fuse imagery edges (Scharr ∨ blurred-Canny) with the sparseboundaries.tifhint into a 0–1 edge map. Worked per-patch, not as a 65 MP village raster. -
Per-plot match (
src/match.py) — FFT cross-correlate the plot's outline against the edge map; the offset that lands the outline on real edges is the correction. The search radius scales to plot size (a plot rarely needs to move much past its own dimension), which kills the distant false peaks that plague dense terrain. Each match reports a peak-to-sidelobe ratio (PSR) and edge-support score. -
Global drift warp (
src/warp.py) — the drift is a smooth rubber-sheet distortion, so we fit an affine field (official → true) with RANSAC over the confident matches, which discards false-peak outliers. This places every plot, including ones the local match can't. -
Combine (
src/refine.py) — restraint first: if the local matcher reliably says the official is already in place (sharp peak, near-zero shift), keep it — don't let the warp move an already-correct plot. Otherwise trust the local match when its peak is sharp (real per-plot drift can deviate from the smooth warp), or moderately sharp and agreeing with the warp; else fall back to the warp with a tight prior-anchored refine. Open terrain trusts the local match; dense terrain leans on the warp. -
Confidence (
src/confidence.py,src/synthetic.py) — a gradient-boosted regressor whose output is the predicted IoU, used as confidence (so it ranks corrections by how good they are — serving both AUC and rank-correlation). It's trained deployment-faithfully: we bootstrap local truths from the corrector's own confident outputs (no plot is naturally on-field under the global drift), build several synthetic displaced villages with control plots, run the full combine on them, and learnfeatures → IoU. Features include ascore_official × shiftterm that flags plots moved despite an already-strong official edge. -
Decide (
src/decide.py) — flag area errors (drawn area vs recorded extent outside a data-driven band), degenerate geometry, and sub-floor confidence; correct everything else. The confidence floor serves accuracy, calibration and restraint at once.
Self-scored on the public example truths (only 6/3 plots — directional, not a grade), and validated at scale on synthetic displaced villages with known truth (hundreds of plots, train/eval separated, with control plots for restraint):
| Nashik (open) | Kolhapur (dense) | |
|---|---|---|
| median IoU, official → predicted (truths) | 0.61 → 0.89 | 0.51 → 0.80 |
| centroid error (truths) | ~9 m → 2.9 m | ~8 m → 2.2 m |
| coverage | 2178 corrected / 279 flagged | 2093 / 415 |
| held-out calibration AUC (deployment-faithful) | 0.87 | 0.77 |
| held-out Spearman(conf, IoU) | 0.44 | 0.54 |
| restraint false-shift on synthetic controls | 0.10 | 0.33 |
Ablation at scale (median IoU, each layer earns its place):
| method | Nashik | Kolhapur |
|---|---|---|
| no-op (drifted) | 0.60 | 0.32 |
| global-median shift | 0.64 | 0.51 |
| warp only | 0.85 | 0.66 |
| full pipeline (corrected) | 0.97 | 0.90 |
Reproduce: uv run benchmarks/ablation.py (layer ablation) · uv run benchmarks/restraint.py
(restraint + deployment-faithful calibration, guard on/off) · uv run qa/analyze.py (predictions audit).
- No truth-tuning. Thresholds are generic defaults or per-village statistics; the example truths are only a directional sanity check, never fit on.
- Deployment-faithful self-supervised calibration. Confidence is learned from the full combine on each village's own synthetic displaced villages — so it reflects the exact paths and features we deploy, on that village, with no overfitting to a handful of truths.
- Restraint by evidence. An already-correct plot is kept in place when the imagery says so, and a plot moved despite a strong official edge is distrusted — rather than moving everything.
- The public calibration number is noise — only 6/3 truths, and when all are accurate the tool can't compute AUC and falls back to a 3-point rank-correlation. Across confidence-model variants that left the geometry byte-identical, that public number swung from +0.94 to −0.5; it is unoptimisable and the contract says not to chase it. The graded signal is held-out AUC (0.87 / 0.77), which is stable.
- Restraint in dense terrain is imperfect (~0.33 control false-shift on Kolhapur): a control that the warp nudges onto a nearby edge is genuinely hard to distinguish from a real correction. The guard + flagging cut it roughly in half; it isn't zero.
- The synthetic benchmark bootstraps from on-field plots, so its absolute numbers are an upper-ish bound for the recoverable subset; the relative ablation ordering is robust.
- The affine (vs higher-order) warp and a mild matcher bias leave a few metres on the table; the synthetic drift is affine, so warp recovery there is mildly optimistic — flagged as an assumption.
src/ method — signals · match · warp · refine · confidence · synthetic · decide · pipeline
benchmarks/ calibrate.py (confidence) · ablation.py (at-scale validation)
qa/ EDA + overlay/montage renderers (visual audit)
tests/ pytest suite (geometry, decide, confidence, schema)
bhume/ starter kit (vendored, unmodified) — load/score/patch helpers
data/<village>/ input.geojson · imagery.tif · boundaries.tif · example_truths · predictions.geojson
CONTRACT.md the data + submission spec
transcripts/ AI usage (problem understanding + solution build)
Setup uses uv; uv sync installs everything (GDAL ships in the
rasterio/geopandas wheels — no system GDAL needed). See CONTRACT.md for the I/O spec.