Skip to content

Design: Source interface + multi-source ingestion pipeline #60

@koinsaari

Description

@koinsaari

Goal

Lift the OSM-specific pipeline out of cmd/ingestion/main.go into a Source abstraction so that adding a new external source (Wheelmap, AXSMap, government open data, etc.) is mechanical and doesn't require touching the main loop.

Tracking under #10.

Current state

cmd/ingestion/main.go hardcodes the OSM path:
osm.StreamNodesosm.Evaluateosm.TransformNodeplace.Repository.UpsertBatch.

Adding a second source today would mean duplicating the main loop or branching on flags — both bad.

Proposed shape (to validate)

// internal/sources/source.go
type Source interface {
    Name() string                                  // "osm", "wheelmap", ...
    Stream(ctx context.Context, sink func(models.Place) error) error
}
  • Each source lives under internal/sources/<name>/. OSM moves there from internal/osm.
  • cmd/ingestion selects a source by subcommand or flag and runs it through a shared batcher into UpsertBatch.
  • Sources are responsible for emitting models.Place records; they do not see the DB.

Open questions

  • Does Source also own filtering (today: osm.Evaluate) or is that a separate stage in the pipeline?
  • Where does category derivation (osm.DeriveRank) live — in the source, or in a normalization stage between source and batcher?
  • How do sources signal progress / errors uniformly (slog fields)?
  • Does the interface need a Mode (full vs. diff) or do those become two methods?

Out of scope

  • Implementing a second source. This issue is the abstraction only.
  • Identity resolution across sources — tracked separately.

Acceptance

  • internal/sources/source.go exists with the Source interface.
  • OSM is migrated under internal/sources/osm/ and implements Source.
  • cmd/ingestion/main.go selects sources by name and contains no OSM-specific logic.
  • Existing OSM unit tests still pass; integration test (separate issue) still works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:ingestionOSM and other data source ingestionenhancementNew feature or requestpriority:shouldShould-have, rough edges

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions