Skip to content

feat(data-access): add Preflight entity (SITES-45417)#1622

Merged
GeezerPelletier merged 12 commits into
mainfrom
sites-44675-preflight-entity
May 27, 2026
Merged

feat(data-access): add Preflight entity (SITES-45417)#1622
GeezerPelletier merged 12 commits into
mainfrom
sites-44675-preflight-entity

Conversation

@GeezerPelletier
Copy link
Copy Markdown
Contributor

@GeezerPelletier GeezerPelletier commented May 22, 2026

Summary

  • Adds Preflight entity (model, collection, schema) to spacecat-shared-data-access
  • Tracks on-demand preflight runs per site: status, result, error, timing, and creator info
  • asyncJobId hidden from toJSON() as an internal FK reference; cleanup via ON DELETE CASCADE on the backing AsyncJob
  • PreflightCollection.allBySiteIdAndUrl(siteId, url?) for filtered lookups (url filter applied in-memory)
  • Unit tests: 100% line/statement/branch coverage

Related

  • SITES-44675 (mysticat-data-service DB migration - prerequisite, merged)
  • SITES-44686 (spacecat-api-service REST controllers - depends on this)

Test plan

  • npm test -w packages/spacecat-shared-data-access -- 2225 passing, 0 failing
  • npm run lint -w packages/spacecat-shared-data-access -- clean
  • Integration test requires Docker + ECR -- will be validated in CI

Generated with Claude Code

Adds the Preflight entity (model, collection, schema) for SITES-44675.
Preflights track on-demand preflight runs per site with 7-day TTL.
asyncJobId is hidden from toJSON() as it is an internal FK reference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
withRecordExpiry is a DynamoDB-era helper marked postgrestIgnore in v3 —
it never writes to PostgreSQL. The preflights table has no record_expires_at
column; cleanup is handled by ON DELETE CASCADE on async_job_id.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This PR will trigger a minor release when merged.

Copy link
Copy Markdown
Member

@ekremney ekremney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Guy — reviewing this as the JS-layer counterpart to mysticat-data-service#622, so a couple of the items below cross-reference what's landing on the DB side. Asks are framed around what's cheap to lock in while the entity still has zero consumers; nothing here is shipping-broken.

What's solid here

  • Removing .withRecordExpiry(7) (commit da3b03ec) was the right call — it's postgrestIgnore: true and doesn't reach Postgres in v3. The CASCADE-only lifecycle matches ADR 002 and the data-service migration.
  • Reusing async_job_status at the DB level keeps the status vocabulary unified across async_jobs and preflights.
  • The in-memory allBySiteIdAndUrl filter is fine at low volume; only its JSDoc justification is wrong (item 2 below).
  • Fixture FK references all land on existing rows in async-jobs.fixture.js (b3b1c2e0…, c4d5e6f7…, 12d221bb…) and sites.fixture.js. The generic FK-deferral retry in test/it/util/seed.js will pick them up.
  • The toJSON() override stripping asyncJobId works — it just shouldn't be needed (item 6).

Significant items

1. Missing index.d.ts — every other entity has one

find packages/spacecat-shared-data-access/src/models -name index.d.ts | wc -l returns 41. find … -name index.js -not -path "*/base/*" | wc -l also returns 41. This PR adds the 42nd entity's index.js but no .d.ts, so the barrel exports Preflight and PreflightCollection as untyped values.

Every downstream consumer (spacecat-api-service, spacecat-audit-worker, spacecat-import-worker) authors against these declarations — that's how getSite(), allBySiteId(), the getters/setters surface in their IDEs. Without a .d.ts, the first PR in those repos that touches Preflight either flies blind or writes a duplicate type by hand.

packages/spacecat-shared-data-access/src/models/page-intent/index.d.ts is the minimal template — 34 lines, two interfaces. The Preflight version needs the same plus: getAsyncJobId(): string, getCreatedBy(): { email: string; displayName?: string }, getStartedAt() / setStartedAt(), getEndedAt() / setEndedAt(), getResult() / setResult(), getError() / setError(), getStatus() / setStatus(), and the collection's allBySiteIdAndUrl(siteId: string, url?: string). If you add belongs_to AsyncJob per item 4, also getAsyncJob() and allByAsyncJobId() / findByAsyncJobId().

2. The 7-day TTL story is stale in three places — and one of them is the changelog

PR body says "7-day TTL via withRecordExpiry(7)". Commit da3b03ec removed that call. preflight.collection.js:18-19 carries the same stale assumption into a load-bearing design comment:

"URL filtering is applied in-memory; the 7-day TTL bounds per-site volume to a manageable size, making a composite (site_id, url) index unnecessary at this stage."

Two separate problems:

  1. The reasoning chain in the JSDoc is wrong, not just the literal "7-day" word. The TTL bounds-volume claim is the justification for skipping a (site_id, url) index. Without TTL, the bound doesn't exist, so the justification doesn't hold either. Either the comment says "fine at expected volume, revisit when…" or the index lands.
  2. semantic-release-comment-action on this PR says "This PR will trigger a minor release when merged." Release notes get derived from the PR body. Today's body will tell every consuming repo that Preflight has a 7-day TTL it doesn't have. Six months from now somebody reading the changelog designs around a guarantee that doesn't exist.

Separate concern the three PRs collectively don't address: nothing actually deletes AsyncJobs on a schedule. AsyncJob.recordExpiresAt is postgrestIgnore: true and never reaches Postgres. Today both tables grow indefinitely; the CASCADE only fires when something else removes the AsyncJob. If the team intends a sweeper, worth tracking; if not, the JSDoc should be honest.

3. Preflight.Status duplicates AsyncJob.Status — name the load-time failure case

Forward drift is the obvious risk; the sharper failure mode is load-time:

  • Postgres column type is async_job_status. A future migration that adds PAUSED to that enum will let Postgres accept INSERT … status='PAUSED' into preflights.
  • Preflight.Status (in JS) is the four current values, enforced by BaseCollection.#validateItem against attribute.type (array). Round-tripping that row through setStatus('PAUSED').save() then fails locally, even though it was valid in Postgres a moment before.
  • The reverse is worse: a JS-only addition to Preflight.Status lets a caller setStatus('PAUSED'), validation passes, PostgREST write fails with a Postgres enum-violation surfaced as DataAccessError with a cryptic cause.

Two fixes, pick one:

  • Reuse: import AsyncJob from '../async-job/async-job.model.js'; and use Object.values(AsyncJob.Status) in the schema; expose static Status = AsyncJob.Status on Preflight if call sites want the namespaced name. This mirrors the DB-side decision to reuse async_job_status directly.
  • Lock with a test: expect(Preflight.Status).to.deep.equal(AsyncJob.Status) in the model test. CI fails the moment either side drifts.

The reuse path is structurally better; the test is the cheap second-best.

4. Now that the DB side enforces 1-to-1 on async_job_id, the JS layer should reflect it

The data-service PR now has UNIQUE on async_job_id (per your update on #622). The JS schema still models asyncJobId as a plain string attribute with hidden: true, not a belongs_to AsyncJob reference. That mismatch costs:

  • No auto-generated preflight.getAsyncJob() traversal.
  • No auto-generated findByAsyncJobId(id) on PreflightCollection.
  • When the API service inevitably needs to walk AsyncJob → Preflight (per ADR 002, the AsyncJob lifecycle drives Preflight status), it'll fall back to raw postgrestClient.from('preflights').eq('async_job_id', id) — bypassing validation, bypassing the toJSON() override (so asyncJobId leaks if anyone forgets to strip it manually), bypassing any future entity hooks.

Concrete change:

.addReference('belongs_to', 'AsyncJob', [], { required: true })

and drop the manual addAttribute('asyncJobId', …) — the reference auto-generates the asyncJobId attribute with the right type/validation. The toJSON() override still needs to strip asyncJobId (since hidden: true is inert — item 6), or — better — switch to a getAsyncJob()-based reference walk and stop exposing the FK at all.

5. createdBy validator + properties block — JSONB is locked in, the contract isn't

You've clarified on #622 that createdBy is intentionally an event-log-style audit snapshot of the IMS identity at trigger time. Fair — the JSONB shape is settled. But on the JS side, the contract that should be enforceable in pure model-layer code (no DB cost) currently isn't:

.addAttribute('createdBy', {
  type: 'map',
  required: true,
  properties: {
    email: { type: 'string' },
    displayName: { type: 'string' },
  },
  validate: (value) => !value || (isObject(value) && !!value.email),
})

Two problems:

  1. The properties block is decorative. BaseCollection.#validateItem (base.collection.js:377) only checks isObject(value) for type: 'map'. grep -n properties src/models/base/base.collection.js returns nothing. The { email, displayName } shape declared above is consulted by no validator — pure documentation that looks executable.
  2. The custom validate only checks !!value.email. displayName isn't validated, email isn't constrained to be a non-empty string ({ email: 12345 } passes, { email: '' } correctly fails on falsy). A caller can persist { email: 12345, displayName: ['oops'] }.

You've deferred CHECK (created_by ? 'email') on the DB side to "if the pattern spreads repo-wide." That's defensible at the SQL boundary. It's more defensible at the JS boundary, because:

  • This is the first entity to use the JSON shape. Whatever lands here is the precedent the next entity author copies.
  • The validator change is one line: validate: (v) => isObject(v) && typeof v.email === 'string' && v.email.length > 0 && (v.displayName === undefined || typeof v.displayName === 'string'). No follow-up needed.

If you'd rather drop the properties block instead (since nothing reads it), that's also fine — but having it sit there looking authoritative is worse than not having it.

6. hidden: true and properties: {…} are inert — same class of bug, surface the pattern

Item 5 covered properties. Same pathology bites hidden:

BaseModel.toJSON() (base.model.js:351-361) iterates schema.getAttributes() and returns every defined attribute. It doesn't check attribute.hidden. Grep confirms: nothing in src/ consumes the flag. Site declares externalOwnerId / externalSiteId as hidden: true and doesn't override toJSON(), so those fields do appear in JSON output (pre-existing bug, not yours).

The data-access CLAUDE.md attribute table documents both hidden and properties as if they work. They don't.

For this PR specifically: either drop the hidden: true from asyncJobId (since the manual delete json.asyncJobId in the override does the real work) and drop the properties: blocks from createdBy and error, or call out that they're aspirational. Schemas that look like they document a contract but don't are worse than schemas that don't pretend.

Out of scope but worth tracking: fixing BaseModel.toJSON() and BaseCollection.#validateItem once would let every entity author trust the schema as written. Same class of bug, two base-class fixes.

7. No integration test for the only cleanup mechanism

Every comparable entity has a test/it/<entity>/<entity>.test.js. PageIntent, AsyncJob, Audit, Opportunity, all of them. This PR adds none. The seeder is generic and the FK targets exist, so the seeding side just works — what's missing is the assertion file.

The thing that most warrants a real IT here is the cascade, because that's the only cleanup mechanism this stack has:

it('cascade-deletes when the parent AsyncJob is removed', async () => {
  const asyncJob = await AsyncJob.create({ status: 'IN_PROGRESS' });
  const preflight = await Preflight.create({
    siteId: sampleData.sites[0].getId(),
    asyncJobId: asyncJob.getId(),
    url: 'https://x.example/page',
    createdBy: { email: 'u@x', displayName: 'U' },
  });
  await asyncJob.remove();
  expect(await Preflight.findById(preflight.getId())).to.be.null;
});

Plus the usual: defaults flow through create() (status → IN_PROGRESS), allBySiteId auto-accessor returns proper instances, allBySiteIdAndUrl filters in a real PostgREST round-trip, toJSON() strips asyncJobId after persistence, and — once item 4 lands — findByAsyncJobId(id) returns the expected single result (the DB-side UNIQUE you just added guarantees that).

If this is genuinely out of scope, file a ticket and link it from the PR body so it doesn't get lost.

Worth raising (cheap to fix, or follow-up)

8. URL length: JS schema permissive, SQL strict

Postgres has CHECK (length(url) <= 2048); JS only validates isValidUrl. A long-but-parseable URL passes JS, hits Postgres, round-trips as DataAccessError wrapping a CHECK violation. One-line fix: validate: (value) => isValidUrl(value) && value.length <= 2048.

9. Site doesn't gain has_many 'Preflights'

belongs_to Site is declared by 19 schemas; site.schema.js registers has_many for 13 of them. Missing back-edges include ContactSalesLead, GeoExperiment, Report, SentimentGuideline, SentimentTopic, AuditUrl. So Preflight isn't breaking a universal pattern — it's joining the convention-gap club.

That said: one line in site.schema.js gives you site.getPreflights() for free and shrinks the gap club by one. Worth doing unless there's a reason not to.

10. result column purpose is undocumented

Preflight.result is type: 'any', AsyncJob.result is type: 'any' with a wider validator (isObject(value) || isArray(value) vs Preflight's isObject(value) only). Neither the schema nor the model documents whether Preflight.result is:

  • A denormalized copy of AsyncJob.result (writes happen in both places)
  • A transformation of it (Preflight stores post-processed audit result, AsyncJob stores raw output)
  • Independent state (different things)

Two writers diverging silently is a classic source of "the dashboard says X but the audit log says Y" bugs. A 2-line JSDoc on the result attribute clarifying which is canonical would close this.

Related sibling-parity nit: AsyncJob's result allows arrays; Preflight's doesn't. If preflight results may ever be array-shaped (some audit producers return arrays), mirror.

11. error validator returns a truthy string, not a boolean

validate: (value) => !value || (isObject(value) && value.code && value.message) short-circuits to value.message (a string) on success. BaseCollection.#validateItem checks result === false strictly, so this happens to work — but Boolean(...) or !! would harden it. Catch-all unit test: expect(errorAttr.validate({ code: 'X', message: 'Y' })).to.be.true.

12. error map lacks details field that AsyncJob.error has

async-job.schema.js:40-46 includes details: { type: 'any' }. Preflight's doesn't. If preflight errors should carry structured detail (same code paths often produce both), mirror.

Cosmetic

  • addReference('belongs_to', 'Site', [], { required: true }){ required: true } is the SchemaBuilder default; PageIntent and friends omit both args.
  • No CANCELLED row in the fixture — three rows cover IN_PROGRESS / COMPLETED / FAILED.
  • Preflight not in test/it/util/seed.js:31-63 SEED_PRIORITY — falls to MAX_SAFE_INTEGER. Works (sites + asyncJobs are seeded first), but adding it explicitly keeps the file self-documenting.

What I'd most like resolved before merge

  1. index.d.ts (item 1). It's the public contract; shipping the entity without it pushes the cost onto every consuming repo.
  2. belongs_to AsyncJob (item 4). Now that the DB enforces 1-to-1, the JS layer should give you getAsyncJob() and findByAsyncJobId() cleanly, so the API service doesn't fall back to raw PostgREST calls that bypass the entity layer.
  3. createdBy validator + decorative properties block (item 5). The JSONB shape is settled; the contract should match what the schema looks like it documents, since this is the precedent the next entity author will copy.

Items 2 (TTL narrative), 3 (Status duplication), 6 (inert flags), 7 (CASCADE IT) are also high-value but can land as follow-ups if you'd rather keep this PR small. Items 8–12 are happily deferrable.

The bones are solid — the asks are surgical, but they're asks that get much more expensive once consumers exist. Better now while you're the only one touching the surface.

@GeezerPelletier GeezerPelletier changed the title feat(data-access): add Preflight entity feat(data-access): add Preflight entity (SITES-45417) May 26, 2026
- Add index.d.ts with typed interfaces for Preflight and PreflightCollection
- Switch asyncJobId from manual addAttribute to belongs_to AsyncJob reference,
  giving auto-generated getAsyncJob() and findByAsyncJobId()
- Tighten createdBy validator (typeof email === 'string' && length > 0);
  drop inert properties blocks from createdBy and error
- Add URL length guard (value.length <= 2048) to match DB CHECK constraint
- Fix stale TTL JSDoc in collection.js; cleanup is via ON DELETE CASCADE
- Add Preflight.Status deep-equals AsyncJob.Status test to guard against drift

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier
Copy link
Copy Markdown
Contributor Author

GeezerPelletier commented May 26, 2026

Thanks for the thorough review, ekremney! Here's what landed in commit 95d9845:

Fixed in this PR:

Item 1 — index.d.ts: Added. Covers all getters/setters, getSite()/getAsyncJob() traversals, allBySiteId, findByAsyncJobId, and allBySiteIdAndUrl.

Item 2 (TTL narrative): Fixed in collection.js JSDoc — now reads "volume is bounded by AsyncJob lifecycle (preflights cascade-delete when their backing AsyncJob is removed)". PR body was already updated in a prior commit.

Item 3 (Status duplication): Added expect(Preflight.Status).to.deep.equal(AsyncJob.Status) to the model test as the CI drift-guard. The reuse path would create a circular import (preflight.model.jsasync-job.model.js via the schema), so the test guard is the cleaner fix here.

Item 4 — belongs_to AsyncJob: Switched from manual addAttribute('asyncJobId', { hidden: true, ... }) to .addReference('belongs_to', 'AsyncJob', [], { required: true }). Auto-generates getAsyncJobId(), getAsyncJob(), and findByAsyncJobId() on the collection. The toJSON() override in the model still manually strips asyncJobId (since hidden is inert as you noted in item 6).

Item 5 (createdBy validator): Tightened to typeof value.email === 'string' && value.email.length > 0. Dropped the inert properties blocks from both createdBy and error.

Item 6 (inert flags): hidden: true removed (gone with the addAttribute replacement in item 4). properties blocks dropped from createdBy and error.

Item 8 (URL length): Added value.length <= 2048 alongside isValidUrl to match the DB CHECK constraint.

Deferred with follow-up tickets:

  • Item 7 (CASCADE integration test) — SITES-45422
  • Item 9 (Site.has_many 'Preflights') — SITES-45423
  • Items 10–12 (result/error docs, error.details parity) — SITES-45424

@GeezerPelletier GeezerPelletier requested a review from ekremney May 26, 2026 15:38
@GeezerPelletier
Copy link
Copy Markdown
Contributor Author

@ekremney all your must-fix items have been addressed in commit 95d9845 — would appreciate a re-review when you get a chance! @MysticatBot re-review please.

- Add test/it/preflight/preflight.test.js covering: findById, create with
  IN_PROGRESS default, allBySiteId, allBySiteIdAndUrl with and without url
  filter, findByAsyncJobId, toJSON() strips asyncJobId, and cascade-delete
  when the backing AsyncJob is removed
- Add asyncJobs and preflights to SEED_PRIORITY in seed.js so FK ordering
  is explicit rather than relying on the deferral loop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier
Copy link
Copy Markdown
Contributor Author

Integration test for the cascade-delete behavior (originally tracked as SITES-45422) has been folded directly into this PR — see commit 9b3332b8.\n\ntest/it/preflight/preflight.test.js covers: findById, create defaulting to IN_PROGRESS, allBySiteId, allBySiteIdAndUrl with and without a URL filter, findByAsyncJobId, toJSON() stripping asyncJobId, and cascade-delete when the backing AsyncJob is removed. SITES-45422 has been closed as it is captured here under SITES-45417.

Guy Pelletier and others added 2 commits May 27, 2026 08:50
Includes the preflights table migration from mysticat-data-service PR #622.
Previous pin (v5.15.0) caused IT test skips with schema cache miss on the
preflights table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier GeezerPelletier requested review from MysticatBot-Dev and removed request for MysticatBot-Dev May 27, 2026 13:34
Copy link
Copy Markdown

@MysticatBot-Dev MysticatBot-Dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @GeezerPelletier,

Re-review after your latest round of commits. The prior review feedback from ekremney has been thoroughly addressed - nice work on the responsive iteration.

Strengths

  • All 8 actionable items from ekremney's review cleanly resolved: index.d.ts added, belongs_to AsyncJob reference replacing manual attribute, createdBy validator tightened, inert hidden/properties dropped, URL length validation added, TTL narrative corrected, drift test added, integration test folded in.
  • Using addReference('belongs_to', 'AsyncJob') (preflight.schema.js:22-23) is the right pattern - gives consumers getAsyncJob() and findByAsyncJobId() for free without manual wiring.
  • The cascade-delete integration test (test/it/preflight/preflight.test.js:71-85) exercises the only cleanup mechanism end-to-end. This is the highest-value test for this entity.
  • The expect(Preflight.Status).to.deep.equal(AsyncJob.Status) drift guard is a pragmatic CI tripwire that avoids the circular-import problem the author identified.
  • Collection unit tests verify filtering behavior with assertions on output shape, not just interaction stubs.
  • Seeding asyncJobs and preflights in explicit SEED_PRIORITY order (test/it/util/seed.js:35-36) documents the FK ordering dependency rather than relying on retry.

Issues

Important (Should Fix)

  1. preflight.schema.js:36 - createdBy validator allows null/falsy bypass on a required field.

    The validate function starts with !value ||, which means null, undefined, '', and 0 all pass. For optional fields like startedAt/endedAt/result/error, this pattern is correct (falsy means "absent, skip validation"). For createdBy with required: true, it creates a semantic hole: a caller doing preflight.setCreatedBy(null).save() passes the validator despite the field requiring a populated object. The required flag prevents omission at creation but does not guard post-creation setter mutations.

    Fix: Remove the !value || prefix:

    validate: (value) => isObject(value) && typeof value.email === 'string' && value.email.length > 0

Minor (Nice to Have)

  1. preflight.schema.js:52 - error validator uses JS truthiness rather than explicit type checks. value.code && value.message evaluates to value.message (a string) on success, not a boolean. The framework's === false check makes this currently safe, but { code: '', message: 'x' } fails unexpectedly (empty code is falsy), and { code: 0, message: 'x' } also fails. The semantics depend on truthiness rather than an explicit contract.

    Fix: validate: (value) => !value || (isObject(value) && typeof value.code === 'string' && value.code.length > 0 && typeof value.message === 'string' && value.message.length > 0)

  2. index.d.ts:16 - setStatus(status: string) accepts any string but the runtime restricts to the four Status enum values. A string literal union type ('IN_PROGRESS' | 'COMPLETED' | 'FAILED' | 'CANCELLED') would give downstream consumers compile-time feedback.

  3. preflight.collection.js:37 - if (!url) treats empty string as falsy, so allBySiteIdAndUrl(siteId, '') returns all records unfiltered. This might surprise a caller expecting zero results from an explicit empty-string filter.

Recommendations

  • Add a validator-focused test block exercising boundary inputs for createdBy (null, empty email, non-string email, non-string displayName), error (empty code, empty message), and url (2048 chars passes, 2049 rejected). These validators set precedent as the first JSONB-map entity with this shape.
  • The in-memory filter in allBySiteIdAndUrl works at current volume but scales with AsyncJob retention. Consider adding a JSDoc note: "If per-site preflight count exceeds low hundreds, migrate to a PostgREST-level filter on (site_id, url)." Makes the trade-off explicit for future maintainers.

Assessment

Ready to merge? With fixes - address the createdBy validator null-bypass (item 1). The remaining items are low-risk improvements.

Next Steps

  1. Fix the createdBy validate function (remove !value || prefix). One-line change.
  2. Optionally tighten the error validator in the same pass - it is cheap and mirrors the fix.

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 2m 59s | Cost: $2.97 | Commit: 6121ef9385eb51f781affb366b34be75fe41bcea
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@MysticatBot-Dev MysticatBot-Dev added the ai-reviewed Reviewed by AI label May 27, 2026
Copy link
Copy Markdown
Member

@ekremney ekremney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the turnaround, Guy — every blocking item addressed, plus the integration test landed in-PR rather than deferred (cascade-deletes assertion at test/it/preflight/preflight.test.js:471 is the headline). belongs_to AsyncJob cleanly gives us findByAsyncJobId, and the .d.ts now declares the contract for consuming repos.

The Status drift-guard test (preflight.model.test.js:541-543) does the job of locking the enum even though I'd have preferred direct reuse; I'm not fully convinced about the circular-import concern (the test file imports both AsyncJob and Preflight without issue), but the test guard is functionally equivalent so happy either way.

One small note on item 5: validator only contracts email, but the .d.ts declares displayName?: string. A caller can persist { email: 'x@y', displayName: 12345 } — runtime accepts, types say invalid. Optional one-line tighten if you want symmetry, otherwise fine as "displayName is best-effort."

Deferred items (9, 10–12) tracked with tickets — that's the right call.

Approving. 🚀

…ES-45417)

- Remove !value || bypass from createdBy validator (required field must always
  enforce non-null; post-creation setCreatedBy(null) now correctly fails)
- Tighten error validator to use explicit typeof/length checks instead of JS
  truthiness (empty string code/message now correctly rejected)
- Narrow setStatus() type in index.d.ts to string literal union for compile-time
  safety in consuming repos
- Add preflight.schema.test.js with boundary tests for createdBy, error, and url
  validators

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GeezerPelletier
Copy link
Copy Markdown
Contributor Author

Addressed MysticatBot's review in 7c46eb51:

  1. createdBy validator — removed !value || short-circuit. required: true doesn't protect post-creation setter mutations; the validator now always enforces the non-null constraint, so setCreatedBy(null).save() correctly fails.
  2. error validator — replaced JS truthiness checks with explicit typeof value.code === 'string' && value.code.length > 0 (and same for message). Empty-string code or message now correctly rejects.
  3. index.d.ts setStatus — narrowed from string to 'IN_PROGRESS' | 'COMPLETED' | 'FAILED' | 'CANCELLED' for compile-time safety in consuming repos.
  4. preflight.schema.test.js — added new schema validator boundary tests covering all three attributes (null/empty/non-string inputs for createdBy and error, 2048-char limit for url).

@GeezerPelletier
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @ekremney — the blocking items (.d.ts, belongs_to AsyncJob, createdBy validator, inert flags, URL length, TTL narrative, cascade IT) all landed in the previous round, and the deferred ones (Site back-edge, result column doc, error.details parity) are tracked with follow-up tickets. Appreciate you working through the cross-repo context with the data-service PR as well. 🙏

@GeezerPelletier GeezerPelletier requested review from MysticatBot-Dev and removed request for MysticatBot-Dev May 27, 2026 14:27
Copy link
Copy Markdown

@MysticatBot-Dev MysticatBot-Dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @GeezerPelletier,

Re-review after commit 7c46eb5. All prior findings addressed cleanly.

Strengths

  • All three prior findings resolved in a single focused commit - good turnaround.
  • createdBy validator (preflight.schema.js:36): removing the !value || guard on a required: true field correctly closes the semantic hole where setCreatedBy(null).save() bypassed validation.
  • error validator (preflight.schema.js:52): explicit typeof checks with length guards eliminate the truthiness ambiguity - { code: 0, message: 'x' } and { code: '', message: 'x' } now correctly fail.
  • setStatus type (index.d.ts:29): string literal union gives downstream consumers compile-time feedback without introducing a runtime dependency on the model's Status object.
  • New schema test file exercises validator boundaries methodically (null, undefined, empty string, wrong type, valid cases for each validated attribute). Sets a good precedent for future JSONB-map entities.

Assessment

Ready to merge? Yes.

The entity design is architecturally sound, the validator fixes are surgical and correct, and the human reviewer's approval aligns with this assessment. No new concerns.


Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 0m 25s | Cost: $1.45 | Commit: 7c46eb5102bf905ae34af5da846844d791b0901c
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@GeezerPelletier GeezerPelletier merged commit 2b57e6a into main May 27, 2026
5 checks passed
@GeezerPelletier GeezerPelletier deleted the sites-44675-preflight-entity branch May 27, 2026 14:40
@solaris007
Copy link
Copy Markdown
Member

🎉 This PR is included in version @adobe/spacecat-shared-data-access-v3.72.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

rainer-friederich added a commit that referenced this pull request May 28, 2026
## Why

This PR adds a small CI gate that would have caught the regression we
just fixed in #1636 at PR time, not days later in a downstream
consumer's CI run.

**The original incident (3.72.0):**
[#1622](#1622) introduced
the Preflight entity with `createdBy` and `error` declared as `type:
'map'` without a `properties` block. ElectroDB validates schema shape
eagerly at `new Service(...)` time and throws
`InvalidAttributeDefinition` on that combination. Every existing unit
test in this package passed because **the v3 PostgREST runtime never
instantiates an ElectroDB Service** — so invalid attribute declarations
are entirely invisible to this repo's CI.

The bug surfaced two days later in
[spacecat-api-service#2506](adobe/spacecat-api-service#2506)
(the LLMO-5190 Phase 1 PR that bumped to 3.73.0) because
`test/controllers/fixes.test.js` constructs the full v1 Service from
`EntityRegistry.getEntities()` for indexing-side-effect reasons. By
then, 3.72.0 was already published.

Fix shipped in
[#1636](#1636) → 3.73.1.
This PR is the prevention.

## What

- Adds `electrodb@3.5.0` as a **devDependency** of
`spacecat-shared-data-access` (no runtime dep added — the test-only
import).
- Adds `test/unit/models/base/electrodb-service-construction.test.js`
with two assertions:
1. `new electrodb.Service(EntityRegistry.getEntities())` does not throw.
2. Every registered entity is exposed on the resulting service (sanity
check that the registry/service mapping stays 1:1).

## Why this version of electrodb

Pinned to `electrodb@3.5.0` — the version transitively resolved by
`spacecat-api-service` today. That's the version that actually exercises
the failing code path in downstream CI, so this is the version we want
to defend against. Bumping locally won't help if api-service stays on
the older version.

## Why no per-entity wiring

The test reads the full registry at runtime. Any entity added to
`entity.registry.js` is automatically covered — no boilerplate per new
entity. The cost is one ~720ms test.

## Verified the test actually catches the bug

Ran it against `origin/main` (pre-#1636) — failed exactly as predicted:

```
ElectroError: Invalid "properties" definition for Map attribute: "undefined".
The "properties" definition must describe the attributes that the Map will accept
  at Attribute.buildChildMapProperties (/.../electrodb/src/schema.js:208:13)
  at new MapAttribute (/.../electrodb/src/schema.js:658:34)
  ...
  at new Service (/.../electrodb/src/service.js:168:14)
```

So this is not a tautological "the schema is what the schema is" check —
it genuinely exercises the validation path that broke.

## Out of scope (intentional)

- Not converting the existing schemas to a "proper" ElectroDB Map (with
`properties` blocks). That's the kind of cleanup that creates more drift
than it removes — the `validate` functions already enforce shape, and
adding redundant `properties` declarations would be dual-source-of-truth
without buying anything.
- Not adding a runtime ElectroDB dependency. The package's PostgREST
runtime stays unchanged; this is strictly a test-time safety net for
downstream consumers.

## Dependency on #1636

This branch cherry-picks the
[#1636](#1636) fix commit
so CI passes immediately. The two PRs are functionally independent —
whichever lands first, the other rebases cleanly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants