Skip to content

ETL: sync jump points with synthetic reverse-direction rows #194

@GitAddRemote

Description

@GitAddRemote

User Story

As a backend engineer, I need station_jump_points populated from UEX API data with synthetic reverse-direction rows generated for each one-way jump point entry, so that route planning queries can traverse jump point connections bidirectionally without requiring the UI to manually invert source/destination.

Definition of Done

  • JumpPointsSyncStep ETL step created and registered at tier-6 (after star systems)
  • Real jump point rows upserted by uex_id with is_synthetic = FALSE
  • For each real row, a synthetic reverse row created with is_synthetic = TRUE and source_uex_id = uex_id of the real row
  • Synthetic rows use a stable generated UUID (derived from source uex_id so re-runs are idempotent)
  • Both entry_orbit_id and exit_orbit_id resolved from station_orbits via uex_id
  • Unresolvable orbit FKs emit severity='warn' and skip both the real and synthetic row
  • size field mapped from UEX string values to ENUM('S','M','L','XL','XXL') — unmapped values emit warning and default to null
  • Unit tests: real row insert, synthetic row generation, idempotent re-run, unresolvable orbit
  • pnpm test passes

Acceptance Criteria

  • After ETL: SELECT count(*) FROM station_jump_points WHERE is_synthetic = FALSE equals UEX jump points count
  • After ETL: SELECT count(*) FROM station_jump_points WHERE is_synthetic = TRUE equals the same count
  • Re-running ETL: total row count unchanged (no duplicates added)
  • Synthetic row source_uex_id correctly references the real row's uex_id
  • Querying jump points between two star systems works in both directions using entry_orbit_id or exit_orbit_id

Technical Elaboration

UEX Endpoint

GET /jump_points{ id, name, id_star_system_origin, id_star_system_destination, id_orbit_entry, id_orbit_exit, size, is_available_live, ... }

Target Schema

station_jump_points:

uex_id          INTEGER UNIQUE NULLABLE,  -- null for synthetic rows
is_synthetic    BOOLEAN NOT NULL DEFAULT FALSE,
source_uex_id   INTEGER NULLABLE,         -- uex_id of the real row this was derived from
name            VARCHAR(255),
entry_orbit_id  UUID FK → station_orbits,
exit_orbit_id   UUID FK → station_orbits,
size            VARCHAR(10),              -- S, M, L, XL, XXL
is_available_live BOOLEAN,

Synthetic Row UUID Generation

Generate a deterministic UUID for synthetic rows using uuidv5 with a fixed namespace and the source uex_id as the name:

import { v5 as uuidv5 } from 'uuid';
const SYNTHETIC_JP_NAMESPACE = '6ba7b810-9dad-11d1-80b4-00c04fd430c8';

function syntheticJpUuid(sourceUexId: number): string {
  return uuidv5(`synthetic-jp-${sourceUexId}`, SYNTHETIC_JP_NAMESPACE);
}

This ensures the same synthetic UUID is generated on every ETL run, making ON CONFLICT (id) DO UPDATE work correctly for synthetic rows.

Upsert Strategy

Real rows: ON CONFLICT (uex_id) DO UPDATE SET ...

Synthetic rows: ON CONFLICT (id) DO UPDATE SET ... (using the generated UUID as PK directly)

Size Mapping

const SIZE_MAP: Record<string, string> = {
  small: 'S', medium: 'M', large: 'L', 'extra-large': 'XL', 'xx-large': 'XXL',
  s: 'S', m: 'M', l: 'L', xl: 'XL', xxl: 'XXL',
};

Unmapped values: emit severity='warn' warning, set size = null.

Idempotency for Synthetic Rows

On re-run, the step should upsert synthetic rows by their generated UUID. Since the UUID is deterministic, the upsert will update the existing synthetic row rather than inserting a duplicate.

Design Elaboration

UEX only records jump points in one direction — from the discovered origin to the destination. For bidirectional route graph traversal, both directions must exist as rows. Generating synthetic reverse rows in the ETL step keeps the data model simple: all consumers query station_jump_points without needing to apply bidirectional logic in application code.

The is_synthetic flag preserves data provenance — operators and future migrations can distinguish real UEX data from application-generated data. The source_uex_id field traces each synthetic row back to its origin, enabling cleanup if a real row is deleted.


Depends on: #191

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendBackend services and logicdatabaseSchema, migrations, indexingenhancementNew feature or requesttech-storyTechnical implementation storyuex-syncUEXcorp API sync and integration

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions