Skip to content

fix(data-access): drop dangling belongs_to: Brand on BrandSemrushProject#1617

Merged
rainer-friederich merged 3 commits into
mainfrom
fix/brand-semrush-project-no-brand-entity
May 22, 2026
Merged

fix(data-access): drop dangling belongs_to: Brand on BrandSemrushProject#1617
rainer-friederich merged 3 commits into
mainfrom
fix/brand-semrush-project-no-brand-entity

Conversation

@rainer-friederich
Copy link
Copy Markdown
Contributor

What

The BrandSemrushProject schema declares belongs_to: Brand, but this package does not ship a Brand entity — no Brand model or collection is registered in entity.registry.js. With the reference in place, every BrandSemrushProject instantiation throws Collection BrandCollection not found from base.model.js's eager reference resolution in reference.js#toAccessorConfigs:126. The result: every spacecat-api-service /v2/orgs/:org/brands/:brand/semrush/* route 500s the moment it hits a real DB row.

The bug is invisible to the unit tests because test/unit/util.js#createElectroMocks stubs getCollection() to always return a placeholder — reproducing it requires the real EntityRegistry, which only happens at runtime.

Fix

Replace .addReference('belongs_to', 'Brand') with the two things it produced internally (see schema.builder.js#addReference):

  1. An explicit UUID-validated brandId attribute.
  2. .addAllIndex(['brandId']).

This preserves the BrandSemrushProjectCollection.allByBrandId(brandId) accessor that the semrush handlers depend on (spacecat-api-service: src/support/semrush/handlers/prompts.js). The only thing lost is the navigation accessor getBrand() on the model side, which nothing consumes today (and which could not have worked without a Brand entity in the first place).

When/if a Brand entity is added to this package, the attribute + addAllIndex block can be replaced by .addReference('belongs_to', 'Brand') again, which additionally yields getBrand().

How I hit this

Discovered while running the semrush proxy locally against the mysticat-data-service docker stack with real BrandSemrushProject rows seeded from dev. Every GET /v2/orgs/.../semrush/{prompts,projects} 500'd with a stack trace ending in EntityRegistry.getCollection (entity.registry.js:154). After applying this fix (initially as a node_modules patch), the same routes returned 200 with the seeded rows enriched correctly.

Test plan

  • npm test -w packages/spacecat-shared-data-access — 2050 passing, coverage thresholds met
  • npm run lint -w packages/spacecat-shared-data-access — clean
  • Updated the schema test's describe block label from "auto-generated by belongs_to Brand" to "auto-generated by addAllIndex(["brandId"])" — same attribute assertions still pass

Related

  • spacecat-api-service follow-up: bump to the new shared-lib version once this merges, so the semrush proxy works against real rows on dev/stage/prod.
  • A separate, larger PR could introduce a proper Brand entity here (model, collection, schema, tests, registry registration); that would unlock getBrand() navigation but is independent of this crash fix.

🤖 Generated with Claude Code

The BrandSemrushProject schema declared `belongs_to: Brand`, but this
package does not ship a Brand entity (no Brand model or collection is
registered in entity.registry.js). Every BrandSemrushProject
instantiation therefore threw "Collection BrandCollection not found"
from base.model.js's eager reference resolution in
reference.js#toAccessorConfigs:126 — 500-ing every spacecat-api-service
/v2/orgs/:org/brands/:brand/semrush/* route end-to-end. The bug was
invisible to the unit tests because test/unit/util.js#createElectroMocks
stubs `getCollection()` to always return a placeholder; reproducing it
requires the real EntityRegistry, which only happens at runtime.

Replace the reference with the two things it produced internally (see
schema.builder.js#addReference): an explicit UUID-validated `brandId`
attribute and an `addAllIndex(['brandId'])`. This preserves the
`BrandSemrushProjectCollection.allByBrandId(brandId)` accessor that the
semrush handlers depend on (see spacecat-api-service:
src/support/semrush/handlers/prompts.js). The only thing lost is the
navigation accessor `getBrand()` on the model side, which nothing
currently consumes (a `Brand` entity would need to exist for that to
work in the first place).

When/if a Brand entity is added to this package, the attribute +
addAllIndex block can be replaced by `addReference('belongs_to', 'Brand')`
again, which additionally yields `getBrand()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@MysticatBot MysticatBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rainer-friederich,

Strengths

  • Correct root-cause identification and fix (brand-semrush-project.schema.js:29-54): The addReference('belongs_to', 'Brand') call invokes reference.js#toAccessorConfigs at instantiation time, calling registry.getCollection('BrandCollection') which throws because no Brand entity is registered. This is a real 500-in-production fix, well-traced through the stack.
  • Validation contract preserved (brand-semrush-project.schema.js:48-51): The brandId attribute retains UUID validation via isValidUUID(value) with required: true, matching exactly what addReference would have generated internally.
  • Minimal blast radius: Only the schema declaration and its unit test change. No model, collection, or handler code is touched. The allByBrandId accessor that consumers depend on continues to exist.
  • Forward compatibility documented: The comment explains a clear path back to addReference('belongs_to', 'Brand') once the Brand entity is registered in this package.

Issues

Important (Should Fix)

Index type mismatch - addAllIndex vs the original belongs_to index (brand-semrush-project.schema.js:55)

The comment block claims the fix "mirrors what addReference('belongs_to', 'Brand') would have produced internally," but the index structures are fundamentally different:

  • addReference('belongs_to', 'Brand') produces a GSI with pk: { composite: ['brandId'] } and sk: { composite: ['updatedAt'] }, type BELONGS_TO. This partitions by brandId - efficient O(1) lookups for a single brand's projects.
  • .addAllIndex(['brandId']) produces a GSI with pk: { template: 'ALL_BRAND_SEMRUSH_PROJECTS' } and sk: { composite: ['brandId'] }, type ALL. This puts all records in a single partition sorted by brandId - effectively a scan filtered by sort-key prefix.

Why it matters: Although both produce an allByBrandId accessor, the query semantics differ. With the ALL index, querying for a specific brand's projects requires scanning the entire partition. With the BELONGS_TO index, it is a direct partition-key lookup. At scale, the ALL-index approach concentrates all records in one partition (DynamoDB's 10 GB partition limit, write throttling), and performance degrades proportionally to total record count.

How to fix: Use .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }) instead of .addAllIndex(['brandId']). This creates an index with brandId as partition key and updatedAt as sort key - semantically equivalent to what belongs_to would have produced, without requiring a registered Brand entity. The allByBrandId and allByBrandIdAndUpdatedAt accessors will both be generated.

Minor (Nice to Have)

  1. Verbose inline comment (brand-semrush-project.schema.js:29-47): 18 lines of comment for a 4-line code change. The bug explanation is thoroughly documented in the PR description and commit message. Trim to 3-4 lines: FK to Brand entity not shipped by this package, mirrors addReference output, replaceable once Brand entity is registered.

  2. Test describe label is inaccurate (brand-semrush-project.schema.test.js:130): The label "auto-generated by addAllIndex" is misleading since addAllIndex does not generate the brandId attribute - it only creates an index referencing the already-declared attribute. Something like brandId attribute (explicit FK to brands table) would be accurate regardless of index implementation.

Recommendations

  • Add a test that verifies the allByBrandId accessor is actually produced by the schema build. The current test validates the attribute contract but does not assert the index/accessor combination, which is what downstream consumers depend on.

Assessment

Ready to merge? No - with one fix needed.

The Important finding (index type mismatch) changes query performance characteristics compared to the original belongs_to behavior. The fix is a one-line change from .addAllIndex(['brandId']) to .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }). The minor issues are cosmetic and should not block merge.

Next Steps

  1. Replace .addAllIndex(['brandId']) with .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }) to produce the correct GSI structure.
  2. Optionally trim the inline comment and fix the test label.

@MysticatBot MysticatBot added the ai-reviewed Reviewed by AI label May 22, 2026
Copy link
Copy Markdown

@MysticatBot MysticatBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rainer-friederich,

Strengths

  • Correct root-cause identification and fix (brand-semrush-project.schema.js:29-54): The addReference('belongs_to', 'Brand') call invokes reference.js#toAccessorConfigs at instantiation time, calling registry.getCollection('BrandCollection') which throws because no Brand entity is registered. This is a real 500-in-production fix, well-traced through the stack.
  • Validation contract preserved (brand-semrush-project.schema.js:48-51): The brandId attribute retains UUID validation via isValidUUID(value) with required: true, matching exactly what addReference would have generated internally.
  • Minimal blast radius: Only the schema declaration and its unit test change. No model, collection, or handler code is touched. The allByBrandId accessor that consumers depend on continues to exist.
  • Forward compatibility documented: The comment explains a clear path back to addReference('belongs_to', 'Brand') once the Brand entity is registered in this package.

Issues

Important (Should Fix)

Index type mismatch - addAllIndex vs the original belongs_to index (brand-semrush-project.schema.js:55)

The comment block claims the fix "mirrors what addReference('belongs_to', 'Brand') would have produced internally," but the index structures are fundamentally different:

  • addReference('belongs_to', 'Brand') produces a GSI with pk: { composite: ['brandId'] } and sk: { composite: ['updatedAt'] }, type BELONGS_TO. This partitions by brandId - efficient O(1) lookups for a single brand's projects.
  • .addAllIndex(['brandId']) produces a GSI with pk: { template: 'ALL_BRAND_SEMRUSH_PROJECTS' } and sk: { composite: ['brandId'] }, type ALL. This puts all records in a single partition sorted by brandId - effectively a scan filtered by sort-key prefix.

Why it matters: Although both produce an allByBrandId accessor, the query semantics differ. With the ALL index, querying for a specific brand's projects requires scanning the entire partition. With the BELONGS_TO index, it is a direct partition-key lookup. At scale, the ALL-index approach concentrates all records in one partition (DynamoDB's 10 GB partition limit, write throttling), and performance degrades proportionally to total record count.

How to fix: Use .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }) instead of .addAllIndex(['brandId']). This creates an index with brandId as partition key and updatedAt as sort key - semantically equivalent to what belongs_to would have produced, without requiring a registered Brand entity.

Minor (Nice to Have)

  1. Verbose inline comment (brand-semrush-project.schema.js:29-47): 18 lines of comment for a 4-line code change. Trim to 3-4 lines.
  2. Test describe label is inaccurate (brand-semrush-project.schema.test.js:130): "auto-generated by addAllIndex" is misleading since addAllIndex does not generate the attribute.

Recommendations

  • Add a test that verifies the allByBrandId accessor is actually produced by the schema build.

Assessment

Ready to merge? No - with one fix needed.

The Important finding (index type mismatch) changes query performance characteristics compared to the original belongs_to behavior. The fix is a one-line change from .addAllIndex(['brandId']) to .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }).

Next Steps

  1. Replace .addAllIndex(['brandId']) with .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }) to produce the correct GSI structure.
  2. Optionally trim the inline comment and fix the test label.

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 0m 41s | Cost: $3.93 | Commit: 3c57c08aecf8b6e3bcbeec6f4b8a17e6820c88b7
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@aliciadriani aliciadriani self-requested a review May 22, 2026 09:23
aliciadriani
aliciadriani previously approved these changes May 22, 2026
Copy link
Copy Markdown
Collaborator

@aliciadriani aliciadriani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Author: @rainer-friederich Scope: 2 files, +27/-5, schema fix only.

Summary

Minimal, correct crash fix. BrandSemrushProject declared belongs_to: Brand but spacecat-shared-data-access has no Brand entity registered — so every instantiation threw Collection BrandCollection not found from the schema builder's eager reference resolver, 500ing every /semrush/* route that touched a real DB row.

The fix manually expands the schema to add the brandId attribute plus addAllIndex(['brandId']), preserving the allByBrandId accessor and findBySlice that the semrush handlers depend on. The only thing dropped is getBrand(), which was always broken (no Brand entity to resolve against) and — verified — has no consumers anywhere in the org.

Verification performed

  • CI: all 4 checks green (CLA, Semantic Release, Kodiak, Test)
  • HEAD verified: local clone at 3c57c08 matches PR head
  • Test suite: 2050 passed, 0 failed, 0 skipped
  • Schema instantiation: exercised by "initializes the BrandSemrushProject instance correctly" — the Collection BrandCollection not found crash path is gone
  • getBrand() consumer search: org-wide search across adobe and adobe-rnd returned zero callers of BrandSemrushProject.getBrand(). All 100 getBrand-named hits were unrelated (getBrandForOrgSite, getBrandById, getBrandSlug, getBrandConfig, etc.). index.d.ts in this PR also explicitly omits getBrand() from the TypeScript surface with an explanatory comment, so no typed consumer could have depended on it.

Must Fix

None.

Should Fix

None.

Nits / clarifications worth recording

Two small clarifications on how the schema actually behaves — not blocking, but worth noting in the PR record so future readers don't have to re-derive them:

  1. Index-type equivalence is not exact. addReference('belongs_to', 'Brand') would have produced a BELONGS_TO-type index (pk: brandId, sk: updatedAt). The PR uses addAllIndex(['brandId']) which is an ALL-type index (pk: fixed entity template, sk: brandId). Structurally different.

    This has no runtime effect here because BrandSemrushProject is PostgREST-backed (mysticat-data-service), and the PostgREST query path in queryByIndexKeys applies all key fields as SQL WHERE filters regardless of index type. So the manual expansion is functionally equivalent for this entity, just not structurally identical to what the macro would emit. If anyone later migrates this entity to DynamoDB, the index shape would need a second look.

  2. findBySlice is a real indexed lookup, not a partition scan + filter. The PostgREST path in #queryPage calls #applyKeyFilters(query, keys), which maps all three keys to SQL WHERE conditions. The actual query is:

   SELECT * FROM brand_to_semrush_projects
   WHERE brand_id = ? AND semrush_location_id = ? AND language = ?
   LIMIT 1

The DB's uq_brand_to_semrush_slice UNIQUE(brand_id, semrush_location_id, language) covers all three columns, so this is a proper indexed key lookup. No in-memory filtering or partition scan involved.

What's good

  • The long comment block in the schema is the right call. It explains the problem, what was lost, what's preserved, and the exact migration path back to addReference if a Brand entity is ever added. Prevents a future contributor from "fixing" it back to the broken state.
  • The fix is exactly as minimal as it should be — no scope creep, no unrelated changes.
  • addAllIndex(['semrushProjectId']) is correctly preserved alongside the new brandId index.
  • The test label update accurately reflects what actually generates the attribute now.

Verdict: ✅ Safe to merge.

Addresses MysticatBot review on #1617:

1. Swap `.addAllIndex(['brandId'])` for
   `.addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] })`.
   The new shape mirrors what `addReference('belongs_to', 'Brand')`
   would have produced internally (pk: composite['brandId'], sk:
   composite['updatedAt']) rather than the ALL-typed template-pk index
   the previous form generated. Same `allByBrandId` accessor is
   produced; additionally yields `allByBrandIdAndUpdatedAt` /
   `findByBrandIdAndUpdatedAt` which the original belongs_to would
   also produce.

2. Trim the inline comment from 18 lines to 7 — the full bug context
   stays in the PR description / commit message.

3. Relabel the brandId describe block in the schema test from
   "auto-generated by addAllIndex(['brandId'])" to "explicit FK to
   brands table" — the attribute is declared explicitly; the index
   only references it.

4. Add composite-key accessor smoke tests:
   `allByBrandIdAndUpdatedAt` and `findByBrandIdAndUpdatedAt`. The
   existing single-key tests already cover `allByBrandId` /
   `findByBrandId`; the new tests catch a regression where the
   sort-key composite is accidentally dropped without breaking the
   single-key forms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This PR will trigger a patch release when merged.

@rainer-friederich
Copy link
Copy Markdown
Contributor Author

Thanks for the review. Addressed all three points in 7e84b4ee:

Important — index shape mismatch. Swapped .addAllIndex(['brandId']) for .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }). This mirrors the (pk: composite['brandId'], sk: composite['updatedAt']) shape that addReference('belongs_to', 'Brand') would have produced internally, rather than the ALL-typed (pk: template, sk: composite['brandId']) shape the previous form generated. Same allByBrandId accessor is produced; additionally yields allByBrandIdAndUpdatedAt / findByBrandIdAndUpdatedAt which the original belongs_to would also produce.

One caveat on the "ALL index = partition concentration" concern: v3 of this package is PostgREST/Postgres-backed (DynamoDB v2 is retired), so the GSI shape doesn't translate to a partition concept at the storage layer — the actual hot-path index lives in the brand_to_semrush_projects SQL migration. But the structural argument for mirroring the belongs_to output exactly still stands: addIndex is the more honest expression of intent, and it keeps the door open for a future DynamoDB-backed consumer without a schema rewrite.

Minor — verbose comment. Trimmed from 18 lines to 7. Full bug context kept in the PR description and commit message.

Minor — test describe label. Relabeled brandId attribute (auto-generated by addAllIndex(["brandId"]))brandId attribute (explicit FK to brands table). Correct now; the attribute is declared explicitly, the index only references it.

Recommendation — accessor test. The existing auto-generated index accessors block in brand-semrush-project.collection.test.js already covers allByBrandId / findByBrandId / allBySemrushProjectId / findBySemrushProjectId (existence + delegation). I added a new assertion for allByBrandIdAndUpdatedAt / findByBrandIdAndUpdatedAt so a future regression that drops the sort-key composite is caught.

2051 tests passing, lint clean. Ready for re-review.

Copy link
Copy Markdown

@MysticatBot MysticatBot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rainer-friederich,

Strengths

  • All prior review findings addressed - the index shape, comment verbosity, test label, and accessor coverage were all resolved cleanly in a single follow-up commit.
  • Index shape now mirrors belongs_to semantics (brand-semrush-project.schema.js:34): .addIndex({ composite: ['brandId'] }, { composite: ['updatedAt'] }) produces the correct (pk: brandId, sk: updatedAt) GSI structure, matching what addReference('belongs_to', 'Brand') would generate internally. This also yields the allByBrandIdAndUpdatedAt / findByBrandIdAndUpdatedAt accessors for free.
  • Comment trimmed to essential context (brand-semrush-project.schema.js:21-27): 7 lines covering the why, the workaround, and the path back to addReference - exactly right for a future reader.
  • Regression coverage for composite-key accessors (brand-semrush-project.collection.test.js:55-63): The new assertion catches a future regression where the updatedAt sort-key composite is accidentally dropped.
  • Correct root-cause fix: Replacing addReference('belongs_to', 'Brand') with an explicit brandId attribute + index eliminates the Collection BrandCollection not found crash at model instantiation while preserving the accessor surface that downstream handlers depend on.

Assessment

Ready to merge? Yes

The crash fix is correct, minimal, and all prior review feedback has been addressed. The 2051 tests pass, lint is clean, and the schema now accurately reflects intent without depending on a non-existent Brand entity.


Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 22s | Cost: $0.75 | Commit: 7e84b4eea2ea6a945cc72191edb32892576dcc5a
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@rainer-friederich rainer-friederich merged commit b52d815 into main May 22, 2026
5 checks passed
@rainer-friederich rainer-friederich deleted the fix/brand-semrush-project-no-brand-entity branch May 22, 2026 09:56
solaris007 pushed a commit that referenced this pull request May 22, 2026
## [@adobe/spacecat-shared-data-access-v3.70.1](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-data-access-v3.70.0...@adobe/spacecat-shared-data-access-v3.70.1) (2026-05-22)

### Bug Fixes

* **data-access:** drop dangling belongs_to: Brand on BrandSemrushProject ([#1617](#1617)) ([b52d815](b52d815))
@solaris007
Copy link
Copy Markdown
Member

🎉 This PR is included in version @adobe/spacecat-shared-data-access-v3.70.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

rainer-friederich added a commit to adobe/spacecat-api-service that referenced this pull request May 22, 2026
Picks up adobe/spacecat-shared#1617 — drops the dangling
`belongs_to: Brand` reference on BrandSemrushProject in favour of an
explicit (brandId, updatedAt) index. Without this, every
BrandSemrushProject instantiation threw "Collection BrandCollection
not found" at runtime because no Brand entity is registered in the
data-access package; the failure 500-ed every /v2/orgs/.../semrush/*
route the moment a real row was returned from PostgREST. Now the
allByBrandId / findBySlice paths work end-to-end against the
mysticat-data-service stack.

Verified locally: 179 semrush-related tests passing; GET
/v2/orgs/.../semrush/projects returns the seeded
brand_to_semrush_projects rows with enrichment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants