Design: identity resolution — matching the same place across sources

## Goal

When the same physical place exists in multiple sources (e.g. OSM node 123 and Wheelmap node 456 both describe the same café), we need to (a) detect that they're the same and (b) merge them into one `places` row with both IDs recorded in `external_ids`.

Tracking under #10.

## Why this is hard

External sources do not share IDs. There is no authoritative cross-reference. We have to infer identity from observable features:

- Geographic proximity (within N meters)
- Name similarity (fuzzy match, language-aware)
- Category compatibility (a café and a parking lot can't be the same place even if collocated)
- Address agreement when available

False matches collapse distinct places into one row. False non-matches duplicate places. Both degrade the registry.

## Open design questions

- **Matching algorithm**: simple distance + name similarity threshold, or something heavier (vector embeddings, learned model)? Start simple, document the threshold values.
- **When does matching run**: on every ingest of a new source, or as a separate periodic job over the whole `places` table?
- **Match confidence**: do we record a confidence score per `external_ids` entry so that low-confidence matches can be reviewed later?
- **Human-in-the-loop**: should low-confidence matches go to a review queue rather than auto-merge?
- **Schema impact**: `external_ids` is currently `[]string` (commit `009d937`) — does it need to become `[]{source, id, confidence, added_at}`?

## Out of scope

- Implementing the matcher. This issue is the design.
- Splitting an existing merged row back apart (un-merge). Separate concern.

## Acceptance

A short design doc (in-repo, under `docs/`) covering: matching algorithm choice, schema changes to `external_ids` if any, when matching runs, and how low-confidence cases are handled. Then implementation is broken into separate issues.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: identity resolution — matching the same place across sources #61

Goal

Why this is hard

Open design questions

Out of scope

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Design: identity resolution — matching the same place across sources #61

Description

Goal

Why this is hard

Open design questions

Out of scope

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions