Skip to content

ETL: sync cities, space stations, outposts, and POIs into station_* tables #193

@GitAddRemote

Description

@GitAddRemote

User Story

As a backend engineer, I need station_cities, station_space_stations, station_outposts, and station_pois populated from UEX API data with correct parent FKs to planets, moons, and orbits, so that inventory terminals have valid location references and the full settlement hierarchy is queryable.

Definition of Done

  • CitiesSyncStep, SpaceStationsSyncStep, OutpostsSyncStep, PoisSyncStep ETL step classes created
  • All steps registered in CatalogEtlService at tier-5 (after planets and moons)
  • Each table populated with idempotent upsert by uex_id
  • Parent FKs resolved from in-memory maps built from prior steps
  • Unresolvable parent FKs emit severity='warn' warnings and skip the record
  • jurisdiction_id FK resolved from station_jurisdictions where provided by UEX
  • faction_id FK resolved from station_factions where provided by UEX
  • Unit tests for each step: success, orphan parent handling, upsert idempotency
  • pnpm test passes

Acceptance Criteria

  • After ETL: all station_space_stations rows have a non-null orbit_id FK or a corresponding warning row
  • After ETL: station_cities rows reference valid planet_id or moon_id (at least one non-null)
  • All four tables support idempotent re-run without duplicates
  • SELECT count(*) FROM station_space_stations matches UEX space stations endpoint count minus records with unresolvable FKs

Technical Elaboration

UEX Endpoints

  • GET /cities{ id, name, code, id_planet, id_moon, id_orbit, id_jurisdiction, is_available_live, ... }
  • GET /space_stations{ id, name, code, id_orbit, id_faction, id_jurisdiction, is_available_live, pad_types, ... }
  • GET /outposts{ id, name, code, id_orbit, id_planet, id_moon, is_available_live, ... }
  • GET /pois{ id, name, code, id_orbit, id_star_system, is_available_live, ... }

Target Schema Highlights

station_cities: uex_id INT UNIQUE, name VARCHAR(255), code VARCHAR(50), planet_id UUID NULLABLE FK, moon_id UUID NULLABLE FK, orbit_id UUID NULLABLE FK, jurisdiction_id UUID NULLABLE FK, is_available_live BOOLEAN

station_space_stations: uex_id INT UNIQUE, name VARCHAR(255), code VARCHAR(50), orbit_id UUID FK, faction_id UUID NULLABLE FK, jurisdiction_id UUID NULLABLE FK, pad_types TEXT[], is_available_live BOOLEAN

station_outposts: uex_id INT UNIQUE, name VARCHAR(255), orbit_id UUID NULLABLE FK, planet_id UUID NULLABLE FK, moon_id UUID NULLABLE FK, is_available_live BOOLEAN

station_pois: uex_id INT UNIQUE, name VARCHAR(255), orbit_id UUID NULLABLE FK, star_system_id UUID NULLABLE FK, is_available_live BOOLEAN

pad_types Handling

UEX returns pad_types as a comma-separated string (e.g. "S,M,L,XL"). Split on comma and store as TEXT[]:

padTypes: rawStation.pad_types?.split(',').map(s => s.trim()) ?? []

Index with GIN for = ANY(pad_types) queries.

FK Resolution

Use the same in-memory UUID map pattern from prior steps. Pass context object containing all resolved maps through the ETL step chain:

interface EtlContext {
  orbitByUexId: Map<number, string>;
  planetByUexId: Map<number, string>;
  moonByUexId: Map<number, string>;
  factionByUexId: Map<number, string>;
  jurisdictionByUexId: Map<number, string>;
}

Parent Nullable Rules

  • Cities: at least one of planet_id, moon_id must resolve — both null emits severity='error'
  • Space stations: orbit_id is required — null emits severity='error'
  • Outposts and POIs: all parent FKs are nullable — emit severity='warn' only if none resolve

Design Elaboration

These four tables are the terminal nodes of the location hierarchy — they are the places where station_terminals and inventory rows point. Correctness of parent FK relationships here determines the integrity of every catalog query that joins across location types.

The pad_types TEXT[] with GIN index enables efficient WHERE 'XL' = ANY(pad_types) filtering for the ship hangar / large-pad landing queries that are anticipated in the route planning feature. Storing as an array (rather than JSONB or normalized pad_type rows) is appropriate because the set of valid pad types is small (~5 values) and filters are always single-value point queries.


Depends on: #191, #192

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendBackend services and logicdatabaseSchema, migrations, indexingenhancementNew feature or requesttech-storyTechnical implementation storyuex-syncUEXcorp API sync and integration

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions