feat(enrichment-worker): scaffold CDC consumer + claim primitive (BS#892 PR-1)#1007
Merged
Conversation
…mitive (BS#892 PR-1) First PR of Epic C C2: stand up the apps/enrichment-worker/ package with the log-only CDC dispatcher and the atomic claim primitive that the N×N consumer cardinality depends on. No DB writes; no LML calls. Components: - claim.ts: claimRowForEnrichment(id) — UPDATE flowsheet SET metadata_status = 'enriching', enriching_since = now() WHERE id = $1 AND metadata_status = 'pending'. The narrow WHERE makes this race-free across N consumer instances (the first claimer wins, every sibling sees 0 rows and skips). - cdc-subscriber.ts: filterForEnrichment + makeLogOnlyHandler. The filter is the perimeter that decides which CDC events the consumer acts on (flowsheet INSERT, entry_type=track, metadata_status=pending, non-empty artist_name). - worker.ts: entrypoint. Starts the LISTEN connection and registers the log-only handler. Graceful SIGTERM/SIGINT shutdown. CDC fan-out audit (acceptance pre-condition for the cardinality decision): - cdc-listener.ts:43-49 opens a per-process postgres() client — no pool collapse across N BS instances or N workers. - cdc-websocket.ts:89-99 is pure fan-out — no upstream dedup or coalescing. - pg_notify is fire-and-forget; missed events fall to the C6 (#895) cron. - The audit is documented in cdc-subscriber.ts's file header so the next reader can trust the dispatch path. PR-2 will swap the log-only handler for claim → @wxyc/lml-client.lookupMetadata → finalize UPDATE → SSE broadcast (also closes #893/#628). The split lets the dispatch path be verified in prod (deploy this PR, watch the would-enrich logs flow on real CDC traffic) before any write-side risk. Tests: 16 unit tests pin the claim contract (5: pending row claimed; WHERE narrows by id + status; sibling-claimed no-op; terminal-state no-op; DB errors propagate) and the filter perimeter (11: happy path + every skip case from the documented criteria). Suite: 2070/2070 pass. Mock: tests/mocks/database.mock.ts adds metadata_status + enriching_since to the flowsheet mock so the new primitive's .set() typechecks under the mock's Drizzle moduleNameMapper. CLAUDE.md: new package row in the monorepo table.
…ry guard Review feedback on PR-1: - claim.ts: WHERE switches from raw `sql\`\`` template to `and(eq(flowsheet.id, id), eq(flowsheet.metadata_status, 'pending'))`. The typed builders give compile-time checking on the metadata_status enum literal (the 5-state column from BS#891) and drop hand-quoted column names. Matches the backend services convention; the closest analog (jobs/flowsheet-metadata-backfill/enrich.ts) uses raw SQL but for the same idempotency guard, typed builders are the more defensive shape. - worker.ts: shutdown handler now latches on first signal. SIGTERM+SIGINT (or a duplicate signal) racing through stopCdcListener + closeDatabaseConnection in parallel was previously possible; the inner functions are idempotent so this was safe but not robust. - claim.test.ts: 'narrows the WHERE' assertion swapped from SQL-string-matching to a structural check on the .where() call. The id + 'pending' contract is now compile-time-enforced by the typed builders; the runtime behavior is still pinned by the sibling-claimed / terminal-state / DB-error tests.
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First PR of BS#892 (Epic C C2). Stands up the
apps/enrichment-worker/package with the log-only CDC dispatcher and the atomic claim primitive that the N×N consumer cardinality depends on. No DB writes; no LML calls — that's PR-2.Why split
The keystone consumer is too big for one PR (acceptance: package + dispatcher + claim + LML wiring + finalize + SSE broadcast + two-consumer integration test). Splitting lets the dispatch path be verified in prod (deploy this PR, watch the
would-enrichlogs flow on real CDC traffic) before any write-side risk.PR-2 (follow-up): swap the log-only handler for
claim→@wxyc/lml-client.lookupMetadata→ finalize UPDATE → SSE broadcast (also closes #893/#628). The two-consumer integration test from #892 acceptance also lands in PR-2 since it needs the full pipeline.What's in this PR
apps/enrichment-worker/claim.ts—claimRowForEnrichment(id). AtomicUPDATE flowsheet SET metadata_status='enriching', enriching_since=now() WHERE id=$1 AND metadata_status='pending'. Race-free across N consumer instances: the first claimer wins, every sibling sees 0 rows and skips. Returns{ claimed: true, id }or{ claimed: false }.apps/enrichment-worker/cdc-subscriber.ts—filterForEnrichment(event)+makeLogOnlyHandler(). The filter is the perimeter for which CDC events the consumer acts on: flowsheet INSERT,entry_type='track',metadata_status='pending', non-emptyartist_name, numeric id.apps/enrichment-worker/worker.ts— entrypoint. Starts the LISTEN connection via@wxyc/database's sharedstartCdcListener()and registers the log-only handler. Graceful SIGTERM/SIGINT shutdown closes the LISTEN connection and the DB pool cleanly.package.json(@wxyc/enrichment-worker),tsconfig.json(referencesshared/database),tsup.config.ts(ESM, node20, single-file output).tests/mocks/database.mock.ts— addsmetadata_status+enriching_sinceto theflowsheetmock so the new primitive's.set()typechecks under the existing Drizzle mock moduleNameMapper.CLAUDE.md— new package row.CDC fan-out audit (acceptance pre-condition)
Documented inline in
cdc-subscriber.ts's file header so the next reader can trust the dispatch path:cdc-listener.ts:43-49opens a per-processpostgres()client — no pool collapse across N BS instances or N workers. Each Node process gets its own LISTEN connection; PGpg_notifybroadcasts to every listener.cdc-websocket.ts:89-99is pure fan-out — no upstream dedup or coalescing of events.pg_notifyis fire-and-forget perdocs/cdc.md:25; a worker that drops its LISTEN connection misses events until reconnect with no replay endpoint. The C6 cron is the mandatory complement, not optional safety net.This validates the N×N cardinality decision in #892's body. No fallback to N×1 leader election needed.
Tests
tests/unit/apps/enrichment-worker/claim.test.ts— 5 tests pinning the claim contract: pending row claimed; WHERE narrows byid+metadata_status='pending'; sibling-claimed no-op; terminal-state no-op; DB errors propagate.tests/unit/apps/enrichment-worker/cdc-subscriber.test.ts— 11 tests pinning the filter perimeter: happy path + every skip case from the documented criteria (non-flowsheet, UPDATE, DELETE, null data, non-track entry_type, already-claimed metadata_status, null/empty artist_name, non-number id, null/undefined album_title/track_title coercion). Plus 1 test formakeLogOnlyHandlerbehavior.Pre-flight
npm run typecheck— cleannpm run lint— 0 errors, 422 warnings (none new)npm run format:check— cleannpm run build --workspace=@wxyc/enrichment-worker— cleannpm run test:unit— 2070/2070 passOut of scope (PR-2)
@wxyc/lml-clientcarries its own per BS#906/G4)Related
metadata_statusenum column (merged 2026-05-22)