Skip to content

ETL: sync terminals and terminal distances with hourly update cadence #195

@GitAddRemote

Description

@GitAddRemote

User Story

As a backend engineer, I need station_terminals and station_terminal_distances populated from UEX API data on an hourly schedule with a 12-hour skip guard, so that inventory lookup always has accurate terminal locations and travel distances without overloading the UEX API or violating their cache TTL.

Definition of Done

  • TerminalsSyncStep ETL step created and registered at tier-7 (after space stations, outposts, cities)
  • TerminalDistancesSyncStep ETL step created and registered at tier-7 (after terminals)
  • station_terminals populated with idempotent upsert by uex_id
  • Parent location FK resolved using (location_type, location_uex_id) — matched against station_space_stations, station_outposts, or station_cities
  • station_terminal_distances populated with upsert by (terminal_id, orbit_id) composite
  • Scheduler runs terminal sync every hour (@Cron('0 * * * *'))
  • Skip guard: if last terminal sync was < 12 hours ago, skip and log at DEBUG level (respects UEX 12-hour cache TTL on terminal data)
  • Unresolvable location FKs emit severity='warn' and the terminal is still inserted with location_uex_id preserved
  • Unit tests: upsert, skip guard behavior, unresolvable location FK
  • pnpm test passes

Acceptance Criteria

  • After first ETL run: SELECT count(*) FROM station_terminals matches UEX terminals count
  • Scheduler triggered within 12 hours of last run: terminal sync step exits early, no UEX API calls made
  • Scheduler triggered after 12 hours: terminal sync runs full upsert cycle
  • Re-running terminal sync: no duplicate rows
  • All station_terminal_distances rows have valid terminal_id and orbit_id FKs

Technical Elaboration

UEX Endpoints

  • GET /terminals{ id, name, code, id_space_station, id_outpost, id_city, type, is_available_live, ... }
  • GET /terminal_distances{ id_terminal, id_orbit, distance_km, ... }

Target Schema

station_terminals:

uex_id             INTEGER UNIQUE,
name               VARCHAR(255),
code               VARCHAR(50),
location_type      VARCHAR(30),  -- 'space_station' | 'outpost' | 'city'
location_uex_id    INTEGER,      -- raw UEX id for the parent location
space_station_id   UUID NULLABLE FK → station_space_stations,
outpost_id         UUID NULLABLE FK → station_outposts,
city_id            UUID NULLABLE FK → station_cities,
terminal_type      VARCHAR(50),
is_available_live  BOOLEAN,

station_terminal_distances:

terminal_id  UUID FK → station_terminals,
orbit_id     UUID FK → station_orbits,
distance_km  DECIMAL(15,2),
PRIMARY KEY (terminal_id, orbit_id)

Location FK Resolution

function resolveTerminalLocation(terminal: UexTerminal, ctx: EtlContext) {
  if (terminal.id_space_station) {
    return {
      type: 'space_station',
      uexId: terminal.id_space_station,
      spaceStationId: ctx.spaceStationByUexId.get(terminal.id_space_station) ?? null,
    };
  } else if (terminal.id_outpost) {
    return { type: 'outpost', uexId: terminal.id_outpost, outpostId: ctx.outpostByUexId.get(...) ?? null };
  } else if (terminal.id_city) {
    return { type: 'city', uexId: terminal.id_city, cityId: ctx.cityByUexId.get(...) ?? null };
  }
  return null; // emit error warning
}

Always store location_type and location_uex_id regardless of FK resolution outcome — these are the canonical identifiers for location display even when the UUID FK is unavailable.

Skip Guard Implementation

Check station_etl_run_state for the last successful terminal sync run:

const lastRun = await this.runStateRepo.findOne({
  where: { stepName: 'terminals-sync', status: 'completed' },
  order: { completedAt: 'DESC' },
});
const hoursSince = lastRun
  ? (Date.now() - lastRun.completedAt.getTime()) / (1000 * 60 * 60)
  : Infinity;
if (hoursSince < 12) {
  this.logger.debug('Terminal sync skipped: last run was ${hoursSince.toFixed(1)}h ago');
  return;
}

Cron Registration

@Cron('0 * * * *')
async syncTerminals() {
  await this.catalogEtlService.runStep('terminals-sync');
}

Register in CatalogEtlModule alongside the main ETL scheduler.

Design Elaboration

Terminal data changes more frequently than other location data (new terminals added, terminal types change) but the UEX API caches terminal data with a 12-hour TTL. The skip guard enforces this by checking elapsed time since the last successful run — avoiding redundant API calls that would return stale cached data anyway.

The location_type + location_uex_id dual storage on station_terminals enables location display without a UUID join when the FK is unavailable (e.g., a terminal whose parent space station hasn't been synced yet). Once the parent is synced, the UUID FK is backfilled on the next terminal sync run.


Depends on: #193

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendBackend services and logiccronjobsScheduled jobs and background tasksdatabaseSchema, migrations, indexingenhancementNew feature or requesttech-storyTechnical implementation storyuex-syncUEXcorp API sync and integration

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions