Skip to content

fix(llmo): make syncBrandSites non-destructive to the primary citation row (LLMO-4621)#2295

Open
rainer-friederich wants to merge 3 commits intomainfrom
fix/LLMO-4621
Open

fix(llmo): make syncBrandSites non-destructive to the primary citation row (LLMO-4621)#2295
rainer-friederich wants to merge 3 commits intomainfrom
fix/LLMO-4621

Conversation

@rainer-friederich
Copy link
Copy Markdown
Contributor

@rainer-friederich rainer-friederich commented Apr 30, 2026

Related Issues

Problem

syncBrandSites (src/support/brands-storage.js) did an unconditional DELETE then early-returned on empty urls without re-INSERT. A second caller — most plausibly POST /v2/orgs/{orgId}/brands from Brandalf brand-sync — re-upserting an active root with urls=[] for an existing active brand silently wiped the primary citation row. The audit-time symptom: 13 active root brands with brands.site_id set, bs_count=0, and zero prompts. Bleed stopped at Satori 2026-04-28 20:58 UTC by coincidence.

The Pattern A investigation is on LLMO-4621 comment 54398141.

Fix split

The LLMO-4621 invariant — "every active brand with brands.site_id set has at least one brand_sites row mirroring brands.site_id" — has two halves. Each half is enforced where the codebase rules say it belongs.

invariant enforced where mechanism
Primary brand_sites row exists when brands.site_id is set DB brands_ensure_primary_brand_site trigger in adobe/mysticat-data-service#496, mirrors the existing set_brand_site_organization_id auto-fill pattern
syncBrandSites cannot wipe the primary brand_sites row App (this PR) non-destructive syncBrandSites rewrite
Onboarding upsertBrand failures are visible to operators App (this PR) log.warnlog.error + Slack say()

mysticat-data-service/CLAUDE.md explicitly disrecommends triggers that RAISE on writes ("If you're writing a BEFORE UPDATE trigger that raises an exception, you're probably granting UPDATE when you shouldn't be"), so the non-destruction half cannot live in a BEFORE DELETE … RETURN NULL trigger. It stays in app code.

What's in this PR

src/support/brands-storage.js

  • syncBrandSites rewritten non-destructively. Replaces DELETE-then-INSERT with: load brands.site_id (the protected primary), upsert the desired URL-derived rows, then DELETE only pre-existing rows whose site_id is NOT in the desired set AND NOT the primary. The targeted DELETE uses PK ids to avoid PostgREST not.in quoting edge cases.

src/controllers/llmo/llmo-onboarding.js

  • The swallowed Failed to create initial brand in normalized table is elevated from log.warn to log.error and now also emits a Slack-visible warning via say(). Onboarding still proceeds (the SpaceCat site row exists), but operators see the failure in Coralogix and the onboarding Slack thread immediately rather than weeks later via audit query — this was the silent-failure surface that hid the original bleed.

What's NOT in this PR (deliberately)

  • ensurePrimaryBrandSite helper — superseded by the DB trigger in adobe/mysticat-data-service#496.
  • BEFORE DELETE trigger on brand_sites — would be the BEFORE … RAISE anti-pattern explicitly disrecommended by mysticat-data-service/CLAUDE.md. The non-destruction half stays in syncBrandSites instead.
  • Backfill of the 24 historical brands — covered by adobe-rnd/llmo-data-retrieval-service#1494.

Test plan

  • npx mocha test/support/brands-storage.test.js — 81 passing (3 existing error-path tests revised to match the new flow, 1 new regression test added)
  • npx mocha test/controllers/llmo/llmo-onboarding.test.js — 97 passing (the swallow assertion updated to match the new log.error + say shape)
  • npx mocha test/controllers/brands.test.js — 231 passing
  • npm run lint clean (exit 0; pre-existing deprecation warnings only, none from this change)

New / revised tests covering LLMO-4621

  • throws when brand_sites upsert fails during syncBrandSites — first error path in the new flow (desired-set upsert failure)
  • throws when brand_sites orphan-cleanup delete fails during syncBrandSites — third error path in the new flow (orphan DELETE failure)
  • throws when SELECT existing brand_sites fails during syncBrandSites orphan-cleanup — second error path in the new flow (load existing failure)
  • preserves primary brand_sites row in syncBrandSites when caller urls do not match the primary site — the actual non-destruction invariant (primary site_id is excluded from the orphan DELETE even when the caller's URLs target a different site entirely)

Deploy ordering

The two companion PRs (this one + adobe/mysticat-data-service#496) are independent and can deploy in either order or in parallel. Either one alone closes the api-service write path. Both together cover all writers (api-service + future scripts/services) AND make the invariant impossible to violate via the destructive DELETE pattern. The backfill (adobe-rnd/llmo-data-retrieval-service#1494) runs against prod after this PR merges to repair the 24 historical brands.

🤖 Generated with Claude Code

rainer-friederich and others added 2 commits April 30, 2026 09:56
…andSites (LLMO-4621)

Closes the LLMO-4621 Pattern A bleed where 13 active root brands ended up
with no `brand_sites` linkage despite `brands.site_id` being set. Root
cause: `syncBrandSites` did an unconditional DELETE then early-returned on
empty `urls` without re-INSERT, and `upsertBrand` gated the entire
brand_sites flow on `brand.urls !== undefined`. A second caller (most
plausibly `POST /v2/orgs/{orgId}/brands` from Brandalf brand-sync) passing
`urls=[]` for an existing active brand silently wiped the primary
citation row.

Three changes in `src/support/brands-storage.js`:

1. New `ensurePrimaryBrandSite` helper. After the `brands` upsert, when
   the resulting row has `site_id` set, unconditionally upsert
   `(organization_id, brand_id, site_id, paths=['/'], type='base')` with
   `ON CONFLICT (brand_id, site_id) DO NOTHING`. Runs independent of
   `brand.urls`. No-op when `site_id` is null (pending Brandalf siblings).

2. `syncBrandSites` is now non-destructive to the primary citation row.
   Replaces DELETE-then-INSERT with: load `brands.site_id` (the protected
   primary), upsert the desired URL-derived rows, then DELETE only
   pre-existing rows whose `site_id` is NOT in the desired set AND NOT
   the primary. The targeted DELETE uses PK ids to avoid PostgREST
   `not.in` quoting edge cases.

3. `upsertBrand` and `updateBrand` extend their `.select()` to return
   `site_id` so the helper can run with the resolved primary.

`src/controllers/llmo/llmo-onboarding.js`: the swallowed
`Failed to create initial brand in normalized table` is elevated from
`log.warn` to `log.error` and now also emits a Slack-visible warning
via `say()` so silent regressions surface immediately. Onboarding still
proceeds (the SpaceCat site row exists), but operators see the failure
in Coralogix and the onboarding Slack thread instead of weeks later via
audit query.

Backfill of the 24 historical brands (Pattern A + Pattern B) is tracked
separately in adobe-rnd/llmo-data-retrieval-service#1494.

A DB-side trigger asserting `EXISTS(brand_sites WHERE brand_id=b.id AND
site_id=b.site_id)` for active brands is intentionally NOT in this PR —
it ships as a follow-up against `mysticat-data-service` AFTER the
backfill has run, otherwise the migration would fail on the existing
stranded rows.

Test plan
* All 85 brands-storage unit tests green (3 existing error-path tests
  updated to match the new flow, 5 new regression tests added covering
  the four LLMO-4621 scenarios from the Jira plan).
* All 97 llmo-onboarding tests green (the swallow assertion updated to
  match the new log.error + say shape).
* `npm run lint` clean.

Refs: https://jira.corp.adobe.com/browse/LLMO-4621

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gger (LLMO-4621)

Slims the LLMO-4621 forward fix in this PR to only the parts that must
live in app code. The primary-row existence invariant moves to a DB
trigger landing in `adobe/mysticat-data-service` (companion PR).

Why split:
* `mysticat-data-service/CLAUDE.md` explicitly disrecommends triggers
  that RAISE on writes ("If you're writing a BEFORE UPDATE trigger that
  raises an exception, you're probably granting UPDATE when you
  shouldn't be"). It DOES bless auto-fill triggers like the existing
  `set_brand_site_organization_id`. Splitting puts each invariant where
  the codebase rules say it belongs.
* DB trigger covers all writers (api-service, backfill scripts, future
  services) for the existence invariant.
* `syncBrandSites` non-destruction stays in app code because the only
  pure-DB alternative is a `BEFORE DELETE … RETURN NULL` trigger, which
  is the exact anti-pattern the data-service rules disrecommend.

Removed from this PR:
* `ensurePrimaryBrandSite` helper.
* The unconditional helper calls in `upsertBrand` and `updateBrand`.
* The extended `.select('id, name, site_id')` and `.select('id, site_id')`
  in those functions (no longer needed without the helper).
* Four Part-1 unit tests covering the helper.

Kept in this PR:
* `syncBrandSites` rewritten non-destructively (the destructive
  DELETE-then-conditional-INSERT was the actual bug; protecting the
  primary `brand_sites` row from the orphan-cleanup DELETE is what stops
  the bleed at the api-service write path).
* Onboarding swallow elevation from `log.warn` to `log.error` plus a
  Slack-visible `say()` warning at `controllers/llmo/llmo-onboarding.js`
  line 1351 — independent of the data invariant; closes the
  silent-failure surface that hid the original bleed.
* Three error-path tests for the new `syncBrandSites` (upsert,
  orphan-cleanup DELETE, SELECT existing) plus the
  primary-row-protection test.

Companion PR (DB trigger): adobe/mysticat-data-service follow-up adds
`brands_ensure_primary_brand_site` — `AFTER INSERT OR UPDATE OF site_id
ON brands` that auto-inserts the matching `brand_sites` row with
`ON CONFLICT (brand_id, site_id) DO NOTHING`, mirroring the existing
`set_brand_site_organization_id` pattern.

Test plan
* `npx mocha test/support/brands-storage.test.js` — 81 passing
* `npx mocha test/controllers/llmo/llmo-onboarding.test.js` — 97 passing
* `npx mocha test/controllers/brands.test.js` — 231 passing
* `npm run lint` exit 0

Refs: https://jira.corp.adobe.com/browse/LLMO-4621

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This PR will trigger a patch release when merged.

@rainer-friederich rainer-friederich changed the title fix(llmo): preserve primary brand_sites row in upsertBrand and syncBrandSites (LLMO-4621) fix(llmo): make syncBrandSites non-destructive to the primary citation row (LLMO-4621) Apr 30, 2026
CI's coverage gate flagged lines 202-203 and 236-237 of brands-storage.js
(the brand-fetch and sites-fetch error throws inside the rewritten
syncBrandSites). Add two tests so coverage returns to 100%.

* `throws when brands SELECT fails during syncBrandSites primary lookup`
  — covers the brandFetchError throw at line 202.
* `throws when sites SELECT fails during syncBrandSites desired-set resolution`
  — covers the sitesError throw at line 236.

Test plan
* `npm test` — 8657 passing, coverage 100%/100%/100%.

Refs: https://jira.corp.adobe.com/browse/LLMO-4621

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant