fix(llmo): make syncBrandSites non-destructive to the primary citation row (LLMO-4621)#2295
Open
rainer-friederich wants to merge 3 commits intomainfrom
Open
fix(llmo): make syncBrandSites non-destructive to the primary citation row (LLMO-4621)#2295rainer-friederich wants to merge 3 commits intomainfrom
rainer-friederich wants to merge 3 commits intomainfrom
Conversation
…andSites (LLMO-4621)
Closes the LLMO-4621 Pattern A bleed where 13 active root brands ended up
with no `brand_sites` linkage despite `brands.site_id` being set. Root
cause: `syncBrandSites` did an unconditional DELETE then early-returned on
empty `urls` without re-INSERT, and `upsertBrand` gated the entire
brand_sites flow on `brand.urls !== undefined`. A second caller (most
plausibly `POST /v2/orgs/{orgId}/brands` from Brandalf brand-sync) passing
`urls=[]` for an existing active brand silently wiped the primary
citation row.
Three changes in `src/support/brands-storage.js`:
1. New `ensurePrimaryBrandSite` helper. After the `brands` upsert, when
the resulting row has `site_id` set, unconditionally upsert
`(organization_id, brand_id, site_id, paths=['/'], type='base')` with
`ON CONFLICT (brand_id, site_id) DO NOTHING`. Runs independent of
`brand.urls`. No-op when `site_id` is null (pending Brandalf siblings).
2. `syncBrandSites` is now non-destructive to the primary citation row.
Replaces DELETE-then-INSERT with: load `brands.site_id` (the protected
primary), upsert the desired URL-derived rows, then DELETE only
pre-existing rows whose `site_id` is NOT in the desired set AND NOT
the primary. The targeted DELETE uses PK ids to avoid PostgREST
`not.in` quoting edge cases.
3. `upsertBrand` and `updateBrand` extend their `.select()` to return
`site_id` so the helper can run with the resolved primary.
`src/controllers/llmo/llmo-onboarding.js`: the swallowed
`Failed to create initial brand in normalized table` is elevated from
`log.warn` to `log.error` and now also emits a Slack-visible warning
via `say()` so silent regressions surface immediately. Onboarding still
proceeds (the SpaceCat site row exists), but operators see the failure
in Coralogix and the onboarding Slack thread instead of weeks later via
audit query.
Backfill of the 24 historical brands (Pattern A + Pattern B) is tracked
separately in adobe-rnd/llmo-data-retrieval-service#1494.
A DB-side trigger asserting `EXISTS(brand_sites WHERE brand_id=b.id AND
site_id=b.site_id)` for active brands is intentionally NOT in this PR —
it ships as a follow-up against `mysticat-data-service` AFTER the
backfill has run, otherwise the migration would fail on the existing
stranded rows.
Test plan
* All 85 brands-storage unit tests green (3 existing error-path tests
updated to match the new flow, 5 new regression tests added covering
the four LLMO-4621 scenarios from the Jira plan).
* All 97 llmo-onboarding tests green (the swallow assertion updated to
match the new log.error + say shape).
* `npm run lint` clean.
Refs: https://jira.corp.adobe.com/browse/LLMO-4621
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gger (LLMO-4621)
Slims the LLMO-4621 forward fix in this PR to only the parts that must
live in app code. The primary-row existence invariant moves to a DB
trigger landing in `adobe/mysticat-data-service` (companion PR).
Why split:
* `mysticat-data-service/CLAUDE.md` explicitly disrecommends triggers
that RAISE on writes ("If you're writing a BEFORE UPDATE trigger that
raises an exception, you're probably granting UPDATE when you
shouldn't be"). It DOES bless auto-fill triggers like the existing
`set_brand_site_organization_id`. Splitting puts each invariant where
the codebase rules say it belongs.
* DB trigger covers all writers (api-service, backfill scripts, future
services) for the existence invariant.
* `syncBrandSites` non-destruction stays in app code because the only
pure-DB alternative is a `BEFORE DELETE … RETURN NULL` trigger, which
is the exact anti-pattern the data-service rules disrecommend.
Removed from this PR:
* `ensurePrimaryBrandSite` helper.
* The unconditional helper calls in `upsertBrand` and `updateBrand`.
* The extended `.select('id, name, site_id')` and `.select('id, site_id')`
in those functions (no longer needed without the helper).
* Four Part-1 unit tests covering the helper.
Kept in this PR:
* `syncBrandSites` rewritten non-destructively (the destructive
DELETE-then-conditional-INSERT was the actual bug; protecting the
primary `brand_sites` row from the orphan-cleanup DELETE is what stops
the bleed at the api-service write path).
* Onboarding swallow elevation from `log.warn` to `log.error` plus a
Slack-visible `say()` warning at `controllers/llmo/llmo-onboarding.js`
line 1351 — independent of the data invariant; closes the
silent-failure surface that hid the original bleed.
* Three error-path tests for the new `syncBrandSites` (upsert,
orphan-cleanup DELETE, SELECT existing) plus the
primary-row-protection test.
Companion PR (DB trigger): adobe/mysticat-data-service follow-up adds
`brands_ensure_primary_brand_site` — `AFTER INSERT OR UPDATE OF site_id
ON brands` that auto-inserts the matching `brand_sites` row with
`ON CONFLICT (brand_id, site_id) DO NOTHING`, mirroring the existing
`set_brand_site_organization_id` pattern.
Test plan
* `npx mocha test/support/brands-storage.test.js` — 81 passing
* `npx mocha test/controllers/llmo/llmo-onboarding.test.js` — 97 passing
* `npx mocha test/controllers/brands.test.js` — 231 passing
* `npm run lint` exit 0
Refs: https://jira.corp.adobe.com/browse/LLMO-4621
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This PR will trigger a patch release when merged. |
CI's coverage gate flagged lines 202-203 and 236-237 of brands-storage.js (the brand-fetch and sites-fetch error throws inside the rewritten syncBrandSites). Add two tests so coverage returns to 100%. * `throws when brands SELECT fails during syncBrandSites primary lookup` — covers the brandFetchError throw at line 202. * `throws when sites SELECT fails during syncBrandSites desired-set resolution` — covers the sitesError throw at line 236. Test plan * `npm test` — 8657 passing, coverage 100%/100%/100%. Refs: https://jira.corp.adobe.com/browse/LLMO-4621 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issues
brand_siteslinkage despitebrands.site_idbeing setbrands_ensure_primary_brand_sitetriggerProblem
syncBrandSites(src/support/brands-storage.js) did an unconditional DELETE then early-returned on emptyurlswithout re-INSERT. A second caller — most plausiblyPOST /v2/orgs/{orgId}/brandsfrom Brandalf brand-sync — re-upserting an active root withurls=[]for an existing active brand silently wiped the primary citation row. The audit-time symptom: 13 active root brands withbrands.site_idset,bs_count=0, and zero prompts. Bleed stopped at Satori 2026-04-28 20:58 UTC by coincidence.The Pattern A investigation is on LLMO-4621 comment 54398141.
Fix split
The LLMO-4621 invariant — "every active brand with
brands.site_idset has at least onebrand_sitesrow mirroringbrands.site_id" — has two halves. Each half is enforced where the codebase rules say it belongs.brand_sitesrow exists whenbrands.site_idis setbrands_ensure_primary_brand_sitetrigger in adobe/mysticat-data-service#496, mirrors the existingset_brand_site_organization_idauto-fill patternsyncBrandSitescannot wipe the primarybrand_sitesrowsyncBrandSitesrewriteupsertBrandfailures are visible to operatorslog.warn→log.error+ Slacksay()mysticat-data-service/CLAUDE.mdexplicitly disrecommends triggers that RAISE on writes ("If you're writing a BEFORE UPDATE trigger that raises an exception, you're probably granting UPDATE when you shouldn't be"), so the non-destruction half cannot live in aBEFORE DELETE … RETURN NULLtrigger. It stays in app code.What's in this PR
src/support/brands-storage.jssyncBrandSitesrewritten non-destructively. Replaces DELETE-then-INSERT with: loadbrands.site_id(the protected primary), upsert the desired URL-derived rows, then DELETE only pre-existing rows whosesite_idis NOT in the desired set AND NOT the primary. The targeted DELETE uses PKids to avoid PostgRESTnot.inquoting edge cases.src/controllers/llmo/llmo-onboarding.jsFailed to create initial brand in normalized tableis elevated fromlog.warntolog.errorand now also emits a Slack-visible warning viasay(). Onboarding still proceeds (the SpaceCat site row exists), but operators see the failure in Coralogix and the onboarding Slack thread immediately rather than weeks later via audit query — this was the silent-failure surface that hid the original bleed.What's NOT in this PR (deliberately)
ensurePrimaryBrandSitehelper — superseded by the DB trigger in adobe/mysticat-data-service#496.BEFORE DELETEtrigger onbrand_sites— would be theBEFORE … RAISEanti-pattern explicitly disrecommended bymysticat-data-service/CLAUDE.md. The non-destruction half stays insyncBrandSitesinstead.Test plan
npx mocha test/support/brands-storage.test.js— 81 passing (3 existing error-path tests revised to match the new flow, 1 new regression test added)npx mocha test/controllers/llmo/llmo-onboarding.test.js— 97 passing (the swallow assertion updated to match the newlog.error+sayshape)npx mocha test/controllers/brands.test.js— 231 passingnpm run lintclean (exit 0; pre-existing deprecation warnings only, none from this change)New / revised tests covering LLMO-4621
throws when brand_sites upsert fails during syncBrandSites— first error path in the new flow (desired-set upsert failure)throws when brand_sites orphan-cleanup delete fails during syncBrandSites— third error path in the new flow (orphan DELETE failure)throws when SELECT existing brand_sites fails during syncBrandSites orphan-cleanup— second error path in the new flow (load existing failure)preserves primary brand_sites row in syncBrandSites when caller urls do not match the primary site— the actual non-destruction invariant (primary site_id is excluded from the orphan DELETE even when the caller's URLs target a different site entirely)Deploy ordering
The two companion PRs (this one + adobe/mysticat-data-service#496) are independent and can deploy in either order or in parallel. Either one alone closes the api-service write path. Both together cover all writers (api-service + future scripts/services) AND make the invariant impossible to violate via the destructive DELETE pattern. The backfill (adobe-rnd/llmo-data-retrieval-service#1494) runs against prod after this PR merges to repair the 24 historical brands.
🤖 Generated with Claude Code