Skip to content

feat: ETL sync for categories and category attributes (ISSUE-196)#225

Merged
GitAddRemote merged 3 commits into
mainfrom
feature/ISSUE-196
May 26, 2026
Merged

feat: ETL sync for categories and category attributes (ISSUE-196)#225
GitAddRemote merged 3 commits into
mainfrom
feature/ISSUE-196

Conversation

@GitAddRemote
Copy link
Copy Markdown
Owner

Closes #196

Summary

  • CategoriesSyncStep: fetches GET /categories from UEX and populates station_category and station_category_attribute
  • Two-level hierarchy: synthetic section rows are upserted first (one per unique (type, section) pair) using ON CONFLICT (type, name) WHERE is_section = TRUE … RETURNING id; leaf category rows are upserted in a second pass with parent_id set to the BIGSERIAL id returned by the section upsert — the self-referencing FK constraint is satisfied because sections always exist before leaves
  • Leaf conflict target: ON CONFLICT (uex_id) WHERE uex_id IS NOT NULL DO UPDATE SET keeps name, section, type, flags, and timestamps current on re-runs
  • Attribute upsert: each attributes[] entry upserted into station_category_attribute keyed on uex_id; is_lower_better cast to boolean or null; missing-name attributes emit severity=warn and are skipped
  • Type validation: type string validated against ('item', 'service', 'contract') CHECK constraint values; unknown types stored as null with a warn warning
  • Registered in CatalogEtlModule providers and appended to ETL_STEPS at tier-8 (after jump-points-sync, before items/vehicles/commodities)

Test plan

  • pnpm test --filter backend passes (149 tests, 14 suites green)
  • One section row upserted per unique (type, section) pair, not per category
  • Same section produces same conflict target on re-run (idempotent)
  • parent_id on leaf row equals id returned by section upsert
  • Categories with section: null get parent_id = null
  • Unknown type stored as null, warning emitted, row still upserted
  • Attributes upserted with correct category_uex_id and is_lower_better boolean
  • Attribute with missing name skipped with warning
  • Empty category list produces no inserts and no warnings

- Fetches GET /categories from UEX and upserts into station_category
- Synthetic section rows created per unique (type, section) pair using
  ON CONFLICT (type, name) WHERE is_section = TRUE; RETURNING id used to
  resolve parent_id FK for leaf rows in the same execute() call
- Section rows always upserted before leaf rows to satisfy self-referencing FK
- Leaf rows conflict on uex_id WHERE uex_id IS NOT NULL; parent_id, type,
  section, is_game_related, is_mining kept current on re-runs
- Category attributes upserted into station_category_attribute per uex_id
- Warns on missing name (category or attribute), unknown type (stored as null)
- Registered in CatalogEtlModule and ETL_STEPS pipeline at tier-8
  (after jump-points-sync, before items/vehicles/commodities)
- 22 unit tests: section creation, parent FK resolution, attribute upsert,
  idempotency, warnings, empty list handling
Copilot AI review requested due to automatic review settings May 25, 2026 20:57
@GitAddRemote GitAddRemote self-assigned this May 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Catalog ETL step to ingest UEX /categories into the station_category and station_category_attribute tables, building a two-level hierarchy (synthetic section rows → leaf categories) and persisting per-category attributes.

Changes:

  • Added CategoriesSyncStep to fetch UEX categories, upsert section + leaf category rows, and upsert category attributes with warning emission for invalid data.
  • Registered CategoriesSyncStep in CatalogEtlModule and appended it to the CatalogEtlService ETL step sequence.
  • Added unit tests covering section grouping, parent FK resolution, leaf/attribute upserts, warnings, and basic idempotency expectations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
backend/src/modules/catalog-etl/steps/categories-sync.step.ts New ETL step implementing section/leaf category upserts and attribute upserts from UEX /categories.
backend/src/modules/catalog-etl/steps/categories-sync.step.spec.ts Unit tests for the new categories ETL step behavior.
backend/src/modules/catalog-etl/catalog-etl.service.ts Wires the new step into the ETL execution order.
backend/src/modules/catalog-etl/catalog-etl.service.spec.ts Updates DI test setup to include the new step provider.
backend/src/modules/catalog-etl/catalog-etl.module.ts Registers the new step in the Nest module providers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/src/modules/catalog-etl/steps/categories-sync.step.ts
Comment thread backend/src/modules/catalog-etl/steps/categories-sync.step.ts Outdated
…erts

Two correctness bugs in the categories-sync step:

- NULL type in ON CONFLICT: Postgres unique indexes treat NULLs as
  distinct, so ON CONFLICT (type, name) WHERE is_section = TRUE would
  insert a new section row on every run when type is NULL. Changed the
  index and ON CONFLICT clause to use COALESCE(type, '') so NULL types
  are treated as a single deterministic sentinel value.

- Empty-string section: cat.section truthiness check correctly skips ''
  for section-row collection, but record.section ?? null passed '' into
  the leaf INSERT. Normalize once via section?.trim() || null and use
  that value for section-row collection, parent lookup, and leaf insert.

Updated spec: renamed conflict-target test to assert the COALESCE form;
added tests for empty-string and whitespace-only section normalization.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread backend/src/modules/catalog-etl/steps/categories-sync.step.ts Outdated
Comment thread backend/src/migrations/1748000000000-BigBangBaselineMigration.ts
Comment thread backend/src/modules/catalog-etl/steps/categories-sync.step.spec.ts Outdated
…gories section index

- Wrap COALESCE expression in extra parens in ON CONFLICT clause:
  (COALESCE(type, ''), name) → ((COALESCE(type, '')), name) so Postgres
  matches it against the expression index rather than treating it as a
  column list (which would fail with no matching unique constraint)
- Update spec assertion to match corrected ON CONFLICT form
- Add migration 1779700000000 to drop/recreate uq_categories_section_type
  on already-applied databases: drops the old plain-column index
  ("type", "name") and creates the expression index (COALESCE("type", ''), "name")
  so existing environments stay in sync without re-running the baseline
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

@GitAddRemote GitAddRemote merged commit 292ffc6 into main May 26, 2026
10 checks passed
@GitAddRemote GitAddRemote deleted the feature/ISSUE-196 branch May 26, 2026 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ETL: sync categories and category attributes into station_* tables

2 participants