Skip to content

feat(flowsheet): explicit metadata_status enum lifecycle (BS#891)#1004

Merged
jakebromberg merged 2 commits into
mainfrom
feat/891-metadata-status-column
May 22, 2026
Merged

feat(flowsheet): explicit metadata_status enum lifecycle (BS#891)#1004
jakebromberg merged 2 commits into
mainfrom
feat/891-metadata-status-column

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

Summary

Replaces the implicit two-column state machine (metadata_attempt_at + populated-column inspection) with an explicit metadata_status enum on flowsheet. Five states encode the full enrichment lifecycle:

Companion enriching_since timestamptz carries the race-claim timestamp; the C6 (#896) recovery sweep uses a 60s TTL against it to revert stuck rows.

Migration shape

DDL-only. The CASE-derived backfill from existing state is documented but not inlined — a 2.6M-row UPDATE inside the migration transaction would block writes for the duration. Backfill recipe at docs/flowsheet-metadata-status-backfill.md.

  • CREATE TYPE wxyc_schema.metadata_status_enum AS ENUM(...)
  • ALTER TABLE flowsheet ADD COLUMN metadata_status ... NOT NULL DEFAULT 'pending' — constant-default ADD COLUMN is metadata-only on PG11+; no row rewrite, no AccessExclusiveLock beyond the catalog update.
  • ALTER TABLE flowsheet ADD COLUMN enriching_since timestamptz — nullable, no default.
  • Two partial indexes with IF NOT EXISTS (same pattern as 0070 / 0074):
    • flowsheet_metadata_status_pending_idx on (id) WHERE entry_type='track' AND artist_name IS NOT NULL AND metadata_status='pending' — covers the C6 sweep.
    • flowsheet_metadata_status_enriching_stale_idx on (enriching_since) WHERE metadata_status='enriching' — covers the stale-claim recovery sweep.

Ops note: Build both indexes CONCURRENTLY on prod before merging. The migration's IF NOT EXISTS clauses make the apply a no-op when the prebuilt indexes are present. Statements for the CONCURRENTLY build are in the migration's comment block.

V2 wire format

transformToV2 now projects metadata_status onto track rows. iOS branches on it to decide whether to render inline metadata or fall back to the proxy-fetch path (WXYC/wxyc-ios-64#287 already shipped the decoder; #270 will ship the consumer logic).

metadata_attempt_at stays as a historical marker — once Epic C C6 (#896) flips the cron predicate to metadata_status = 'pending', it's no longer used for control flow. The metadata_attempt_at partials in schema.ts and the corresponding cron query are not removed in this PR; they get dropped alongside the cron flip.

Test plan

  • tests/unit/database/schema.flowsheet-metadata-status.test.ts — 14 schema-source tests pin migration + schema.ts to the documented enum order, the constant-default ADD COLUMN shape, the two partial-index predicates, the no-CONCURRENTLY-in-DDL rule, and the no-inline-backfill rule.
  • tests/unit/services/flowsheet.transformToV2.metadata-status.test.ts — 6 unit tests covering all 5 enum values projected onto track rows + the non-track exclusion regression guard.
  • Full unit suite: 2059/2059 passing.
  • node scripts/validate-migrations.mjs clean.
  • npm run lint introduces no new warnings/errors on the touched files.
  • npm run format:check clean.

Critical-path next steps (out of scope for this PR)

  1. Build the two partial indexes CONCURRENTLY on prod.
  2. Merge this PR.
  3. Run the backfill per docs/flowsheet-metadata-status-backfill.md.
  4. Ship [C2] Build CDC-driven enrichment consumer (separate worker) #892 (consumer claim), [C7] Verify ETL-inserted rows reach the enrichment consumer via CDC #896 (cron predicate flip), and the other Epic C children.
  5. Drop the now-unused metadata_attempt_at_pending_* partials.

Related

Replaces the implicit two-column state machine ({metadata_attempt_at,
artwork_url/discogs_url}) with an explicit `metadata_status` enum column
on flowsheet. Five states: pending, enriching, enriched_match,
enriched_no_match, failed_no_retry. Backfill recipe in
docs/flowsheet-metadata-status-backfill.md.

Companion `enriching_since timestamptz` column carries the race-claim
timestamp Epic C C2 (BS#892) will use; the stale-claim recovery sweep
Epic C C6 (BS#896) will use a 60s TTL against it. Two partial indexes
ship with the migration: pending-sweep (id-keyed) and stale-enriching
(enriching_since-keyed). Both must be built CONCURRENTLY on prod first
per the 0070/0074 pattern; the migration's IF NOT EXISTS clauses make
the apply a no-op when the prebuilt indexes are present.

Migration is DDL-only — backfill SQL is documented but not inlined.
Per bulk-update playbook, the 2.6M-row UPDATE runs in batched ops sessions
with synchronous_commit=off, paired with ANALYZE per BS#934.

V2 wire format now projects metadata_status onto track rows so iOS
(WXYC/wxyc-ios-64#287, #270) can branch on it without falling back to
the proxy-fetch path for already-enriched rows.

Schema-source tests pin the migration <-> schema.ts <-> projection contract.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Schema constraint shape report

no schema or migration changes detected in this PR

@jakebromberg
Copy link
Copy Markdown
Member Author

Code review

Found 2 issues:

  1. Wrong issue reference: the PR body, schema.ts comments, docs/migrations.md, and docs/flowsheet-metadata-status-backfill.md all attribute the cron predicate flip to "Epic C C6 ([C7] Verify ETL-inserted rows reach the enrichment consumer via CDC #896)" / "BS#896". But #896 is titled [C7] Verify ETL-inserted rows reach the enrichment consumer via CDC — it does not flip the cron predicate. The actual C6 (Retune flowsheet-metadata-backfill cron as safety-net, which contains the metadata_attempt_at IS NULLmetadata_status = 'pending' predicate change) is #895. Since these references are load-bearing context for the staged drop of metadata_attempt_at partials and the backfill ordering, they should point to [C6] Retune flowsheet-metadata-backfill cron as safety-net (hourly, 15-min grace) #895.

),
// BS#891. Partial B-tree on (id) covering the `metadata_status = 'pending'`
// slice. Replaces the `metadata_attempt_at IS NULL` partials above as the
// sweep predicate once Epic C C6 (#896) flips the cron to read this
// column. Both partials coexist during the transition — drop the old
// `*_metadata_attempt_pending_*` partials in the same PR that flips the
// cron, not here. Same predicate shape as 0070: filter on the three
// clauses the cron query carries (entry_type, artist_name, the
// lifecycle column itself).
//
// Built CONCURRENTLY out-of-band on prod first via:
// CREATE INDEX CONCURRENTLY flowsheet_metadata_status_pending_idx
// ON wxyc_schema.flowsheet (id)
// WHERE entry_type = 'track' AND artist_name IS NOT NULL
// AND metadata_status = 'pending';
// Migration SQL carries IF NOT EXISTS so the apply is a no-op against
// the prod DB where the index is already present.
index('flowsheet_metadata_status_pending_idx')
.on(table.id)
.where(
sql`${table.entry_type} = 'track' AND ${table.artist_name} IS NOT NULL AND ${table.metadata_status} = 'pending'`
),
// BS#891. Partial B-tree on `enriching_since` covering the stale-claim
// recovery sweep that Epic C C6 (#896) will run. Keyed on
// `enriching_since` (not id) so the sweep can range-scan the slice
// older than the TTL directly:

## When to run
After the migration deploys and the CONCURRENTLY-built partial indexes are in place, and **before** [BS#896](https://github.com/WXYC/Backend-Service/issues/896) ships the cron predicate change. Order:
1. Build `flowsheet_metadata_status_pending_idx` CONCURRENTLY on prod (see `0078_flowsheet-metadata-status.sql` for the exact statement).
2. Build `flowsheet_metadata_status_enriching_stale_idx` CONCURRENTLY on prod.
3. Merge the migration. Its `IF NOT EXISTS` clauses make the apply a no-op.
4. Run this backfill.
5. Ship BS#896 (cron predicate change).

`flowsheet` carries two `timestamp with time zone` markers that record "this row was attempted by job X" so subsequent passes can target only the rows that still need work, without confusing tried-and-no-match for tried-and-failed:
- `legacy_link_attempted_at` — set when the `legacy_release_id → library.id` resolver ran for the row and could not link. Migration 0063, populated by `jobs/broken-fk-recovery`. Lets B-2.2's LML backfill query both never-had-legacy-id rows and the broken-FK residual in one predicate.
- `metadata_attempt_at` — set when the LML metadata fetch responded for the row (success-with-match OR success-no-match). Migration 0069 (#639), stamped at runtime by `apps/backend/services/metadata/enrichment.service.ts` inside `.then()` (the `.catch` branch deliberately leaves it NULL so transient LML failures stay retryable). The historical drain (#638) and the recurring drift-repair sweep (#639 Phase 2) target `metadata_attempt_at IS NULL`. **Load-bearing dependency: the "stay retryable" guarantee only holds because `jobs/flowsheet-metadata-backfill/` runs nightly (`0 6 * * *` UTC, cron-registered via deploy-base) and re-attempts every `metadata_attempt_at IS NULL` row. If that cron is paused or removed, the `.catch` arm silently strands the row's enrichment forever — the runtime path has no second chance.** Being deprecated as a control-flow signal by `metadata_status` (BS#891, migration 0078): once Epic C C6 (BS#896) flips the cron predicate to `metadata_status = 'pending'`, this column stays as a historical marker only. Backfill recipe at [`docs/flowsheet-metadata-status-backfill.md`](flowsheet-metadata-status-backfill.md).

  1. Stale runbook path in migration SQL comment: line 33 of the migration points to docs/runbooks/flowsheet-metadata-status-backfill.md, but there is no docs/runbooks/ directory in the repo — the actual runbook lives at docs/flowsheet-metadata-status-backfill.md (the path the PR body uses). Comment-only fix, but the migration's content hash is frozen via applied-hashes.json, so updating the comment requires re-running node scripts/freeze-migration-hashes.mjs (or pinning per the existing HISTORICAL_NO_ANALYZE_NEEDED_TAGS-style allowlist if you'd rather leave the file untouched).

-- DDL-only. The CASE-derived backfill from {metadata_attempt_at,
-- artwork_url, discogs_url} to {enriched_match, enriched_no_match} is
-- a separate one-shot ops step gated by the bulk-update playbook
-- (sync_commit=off, batched). A 2.6M-row UPDATE inside this migration's
-- transaction would block writes for the duration — see CLAUDE.md
-- "Migrations are DDL-only" and project_bulk_update_playbook in memory.
-- Runbook: docs/runbooks/flowsheet-metadata-status-backfill.md.
--

…k path

Reviewer-flagged drift in cross-references:
- C6 (cron predicate flip to metadata_status='pending') is #895, not #896.
  #896 is C7 (CDC delivery verification). Five surfaces corrected:
  schema.ts comments, migration SQL comment, migrations.md, runbook body
  and Related section.
- Migration SQL comment referenced docs/runbooks/flowsheet-metadata-status-backfill.md
  but the file is at docs/flowsheet-metadata-status-backfill.md (no
  runbooks/ subdir exists).
- Also: prettier format fix on docs/flowsheet-metadata-status-backfill.md
  (markdown table alignment) so CI's format:check passes.

Migration SQL hash refrozen via scripts/freeze-migration-hashes.mjs after
the comment-only edit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant