Skip to content

Added lazy routing#28356

Open
vershwal wants to merge 5 commits into
mainfrom
lazy-routing
Open

Added lazy routing#28356
vershwal wants to merge 5 commits into
mainfrom
lazy-routing

Conversation

@vershwal
Copy link
Copy Markdown
Member

@vershwal vershwal commented Jun 4, 2026

ref https://linear.app/ghost/issue/HKG-1817

Why

The eager UrlService precomputes a full resource → URL map at boot and holds every published resource in memory. We want to replace it with an on-demand service that computes each answer per call. The earlier experiment (#28336, behind config.lazyRouting) was reverted (HKG-1816); this rebuilds it as production-ready code with cleaner, smaller, individually-testable pieces.

This PR only adds the new service and its tests — nothing is wired into the request path yet. The UrlServiceFacade already has the seams, but no instance selects the lazy backend. Wiring and the eager-removal cutover are separate milestone issues.

The overriding goal is parity: the lazy service must return the same answer as the eager service for the same inputs, because downstream issues will shadow-compare the two before we switch over.

What

Built as four small modules so each piece is testable in isolation, plus a real-DB parity suite:

  • permalink-matcher.ts — pure, regex-free function that matches a request path against a permalink template (/:slug/, /:year/:month/:slug/) and extracts the params. No ReDoS surface; malformed/unsupported shapes are non-matches.
  • router-filter.ts — the NQL routing semantics (expansions, the page: transformer, routerTypeOf, buildFilter, filterMatches). A deliberate copy of the eager generator's rules, not a shared module, so a shared bug can't hide identical wrong output on both sides while eager is the comparison oracle.
  • lazy-find-resource.ts — the single DB-touching seam, injected into resolveUrl. Mirrors the eager resourceConfig visibility rules (published-only posts/pages, public TagPublic/Author that gate on having posts) so a guessed slug can't surface anything the eager path hid. Queries only by real column params (id/uuid/slug) and prunes the loaded record to the exact shape the eager service exposes — dropping the post body and other excluded fields, and trimming tags/authors/primary_tag/primary_author to {id, slug} — so the resolved record never leaks fields eager strips.
  • lazy-url-service.ts — composes the helpers. Forward lookups are pure; all DB access is confined to resolveUrl. Ownership is exclusive (first matching router of the type wins), matching eager's reservation. The /404/ fallback matches eager's miss path exactly, including subdirectory handling. Logs a thin-resource error when a filtered router can't see a relation the caller didn't load — that's exactly the case where lazy would /404/ while eager returns a URL.

Notable design choice: resolveUrl re-validates the resolved record against its canonical URL. The slug-only DB lookup can't filter on derived or relation permalink segments (:year/:month, :primary_tag), so it regenerates the record's URL with the same replacePermalink the eager generator uses and confirms the captured params match. This makes the reverse path agree with eager by construction: the eager service only resolves a URL that equals a resource's generated (canonical) URL, so a wrong-date (/2026/05/<slug>/) or wrong-tag URL 404s in both backends rather than resolving the post by slug alone.

Testing

  • Restored the preserved lazy-url-service contract spec from skipped to running (HKG-1817's acceptance criterion).
  • Added unit tests for every module: matcher (13), router-filter (14), find-resource (19), service (33, incl. ownership exclusivity and the canonical re-check).
  • Added an eager/lazy parity integration test that drives both services from identical routing sets and feeds lazy the exact records eager cached, comparing all directions a caller depends on — forward (relative + absolute, including /404/), ownership (every router × every resource), and reverse (every eager-generated URL resolves back to the same resource, with full record-shape parity). A featured-collection scenario exercises priority fallthrough (this caught and fixed an ownership-exclusivity bug during development), and a date-based scenario asserts both services 404 a non-canonical (wrong-date) URL.
  • tsc --noEmit, eslint, all unit suites, and the parity integration suite pass.

Known follow-ups (deferred to the wiring/comparison issue)

  • resolveUrl queries findResource only by real column params (id/uuid/slug) and validates derived/relation segments via the canonical re-check, so custom permalinks (/:year/:month/:slug/, /:primary_tag/:slug/) resolve and 404 in parity with eager. Permalinks that key off a field the pruned record doesn't carry remain a wiring-phase follow-up.
  • The thin-resource warning covers filter-referenced relations, not permalink-referenced fields.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 313c444f-49e4-4f7e-b861-59c8632f96dd

📥 Commits

Reviewing files that changed from the base of the PR and between 709e013 and d27f2e5.

📒 Files selected for processing (9)
  • ghost/core/core/server/services/url/lazy-find-resource.ts
  • ghost/core/core/server/services/url/lazy-url-service.ts
  • ghost/core/core/server/services/url/permalink-matcher.ts
  • ghost/core/core/server/services/url/router-filter.ts
  • ghost/core/test/integration/url-service-parity.test.js
  • ghost/core/test/unit/server/services/url/lazy-find-resource.test.js
  • ghost/core/test/unit/server/services/url/lazy-url-service.test.js
  • ghost/core/test/unit/server/services/url/permalink-matcher.test.js
  • ghost/core/test/unit/server/services/url/router-filter.test.js
🚧 Files skipped from review as they are similar to previous changes (9)
  • ghost/core/test/unit/server/services/url/permalink-matcher.test.js
  • ghost/core/test/unit/server/services/url/router-filter.test.js
  • ghost/core/test/unit/server/services/url/lazy-find-resource.test.js
  • ghost/core/core/server/services/url/router-filter.ts
  • ghost/core/core/server/services/url/permalink-matcher.ts
  • ghost/core/test/unit/server/services/url/lazy-url-service.test.js
  • ghost/core/core/server/services/url/lazy-url-service.ts
  • ghost/core/test/integration/url-service-parity.test.js
  • ghost/core/core/server/services/url/lazy-find-resource.ts

Walkthrough

This PR adds a lazy URL-resolution stack: NQL-based router filter utilities, a permalink matcher, a per-request DB lookup factory that returns eager-shaped records, and LazyUrlService which registers routers, generates forward URLs, determines ownership, and resolves incoming paths to resources. It includes unit tests for each utility/service and an integration suite verifying parity with the existing eager UrlService.

Possibly related PRs

  • TryGhost/Ghost#28335: Directly related as it reverts/removes the lazy URL service implementation introduced in this PR.
  • TryGhost/Ghost#27714: Related config-flag and boot/bridge wiring that integrates or conditionally enables the new LazyUrlService/createFindResource modules.
  • TryGhost/Ghost#27938: Changes callers/serializers to ensure LazyUrlService receives required relation data so router filters evaluate correctly.

Suggested reviewers

  • kevinansfield
  • allouis
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Added lazy routing' directly and clearly describes the main change: adding a new lazy routing/URL service as an alternative to the eager service.
Description check ✅ Passed The description comprehensively explains the rationale, design choices, and testing approach for the lazy routing implementation, and is directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lazy-routing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ghost/core/test/integration/url-service-parity.test.js (1)

237-238: ⚡ Quick win

Make the zero-post gating case deterministic instead of fixture-dependent.

assert.ok(checked > 0, ...) makes this test fail when fixture composition changes, even if lazy/eager parity is still correct. Consider creating an explicit public tag/author with no posts in test setup for this case so coverage stays stable over fixture churn.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/integration/url-service-parity.test.js` around lines 237 -
238, The test currently uses assert.ok(checked > 0, ...) which is
fixture-dependent; modify the test setup to create an explicit public tag or
author with zero posts (e.g. create a public Tag/Author entity in the test's
before/arrange step) and then assert deterministically against that created
entity instead of relying on checked > 0; update the assertion (the assert.ok
call and any usage of checked) to verify that the newly created public
tag/author is present and has zero posts, so the parity check is stable
regardless of fixture composition.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ghost/core/core/server/services/url/lazy-url-service.ts`:
- Line 130: The reverse-filter check in resolveUrl() skips normalization and
calls filterMatches(config.compiledFilter, resource) directly; change
resolveUrl() to pass the resolved resource through the existing normalization
helper _recordForFilter() (i.e. call _recordForFilter(resource) and use that
result for filterMatches) so page: filters are evaluated against the same
normalized shape used by getUrlForResource() and ownsResource(); update any
variable names accordingly and add/adjust tests to cover resolveUrl() behavior
for page:false routers.

In `@ghost/core/core/server/services/url/permalink-matcher.ts`:
- Around line 36-55: The params object should be a null-prototype map so
captured placeholder names like "__proto__" aren't swallowed by
Object.prototype; change the declaration of params in permalink-matcher.ts (the
variable currently declared as const params: Record<string, string> = {}) to use
a null-prototype object (e.g. Object.create(null)) or an ES Map and update the
local type annotations accordingly, keep the rest of the capture logic
(fieldName check, decodeURIComponent assignment) the same; then add a unit test
in ghost/core/test/unit/server/services/url/permalink-matcher.test.js that
matches a template with a placeholder like "/:__proto__/" and asserts the
returned params includes "__proto__" => the decoded value (and that
JSON.stringify-based cache keys or findResource lookups would receive that
param), to prevent regression.

---

Nitpick comments:
In `@ghost/core/test/integration/url-service-parity.test.js`:
- Around line 237-238: The test currently uses assert.ok(checked > 0, ...) which
is fixture-dependent; modify the test setup to create an explicit public tag or
author with zero posts (e.g. create a public Tag/Author entity in the test's
before/arrange step) and then assert deterministically against that created
entity instead of relying on checked > 0; update the assertion (the assert.ok
call and any usage of checked) to verify that the newly created public
tag/author is present and has zero posts, so the parity check is stable
regardless of fixture composition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f3fdf7bf-af42-42d8-a240-96eb49e258e4

📥 Commits

Reviewing files that changed from the base of the PR and between 6000d28 and aa6bace.

📒 Files selected for processing (9)
  • ghost/core/core/server/services/url/lazy-find-resource.ts
  • ghost/core/core/server/services/url/lazy-url-service.ts
  • ghost/core/core/server/services/url/permalink-matcher.ts
  • ghost/core/core/server/services/url/router-filter.ts
  • ghost/core/test/integration/url-service-parity.test.js
  • ghost/core/test/unit/server/services/url/lazy-find-resource.test.js
  • ghost/core/test/unit/server/services/url/lazy-url-service.test.js
  • ghost/core/test/unit/server/services/url/permalink-matcher.test.js
  • ghost/core/test/unit/server/services/url/router-filter.test.js

Comment thread ghost/core/core/server/services/url/lazy-url-service.ts Outdated
Comment thread ghost/core/core/server/services/url/permalink-matcher.ts Outdated
vershwal added 5 commits June 4, 2026 12:30
ref https://linear.app/ghost/issue/HKG-1817

- the lazy URL service needs to turn a request path back into a resource, which means matching it against each router's permalink template (/:slug/, /:year/:month/:slug/) and extracting the placeholders
- kept this as a standalone pure function rather than folding it into the service so the matching rules can be exhaustively unit-tested in isolation, with no service or DB setup
- matched segment-by-segment instead of compiling a regex per template: Ghost only uses whole-segment placeholders, so a literal walk both keeps the logic obvious and removes any ReDoS surface on attacker-controlled request paths
- validated placeholder names with a character scan rather than a small regex so the module is regex-free end to end, matching its documented design; the check only ever ran on template field names and never on request paths, so this is a consistency change rather than a security fix
- treated malformed %-escapes and mixed literal+placeholder segments as non-matches rather than throwing, since neither is a valid Ghost permalink and a reverse lookup should quietly fall through to 404
ref https://linear.app/ghost/issue/HKG-1817

- a router decides whether it owns a resource by running its routes.yaml filter (featured:true, tag:news, page:false) against the resource, so the lazy service needs the same NQL semantics the eager UrlGenerator uses
- extracted the expansions, the page:true/false transformer, and the type mapping into a small module of pure helpers so the service file stays about routing decisions and these rules can be tested directly
- copied the expansions and transformer from the eager generator on purpose instead of sharing one module: the eager service is our parity oracle while both run side by side, and a shared helper could hide a bug by producing the same wrong answer on both sides
- made a missing relation (e.g. a thin record with no tags) a non-match rather than an error, matching the eager generator's try/catch, so a filtered route quietly falls through instead of throwing mid-request
- routerTypeOf accepts both the singular DB `type` ('post') and the plural router key ('posts') because migrated callers pass either shape
ref https://linear.app/ghost/issue/HKG-1817

- the lazy service resolves a request path to a resource on demand, so it needs a way to load one record by the params a permalink captured; this hook is that single DB-touching seam, injected into resolveUrl so the service itself stays pure and testable without a database
- mirrored the eager resourceConfig's visibility rules (posts/pages published-only, tags/authors through the public TagPublic/Author models that gate on having posts) so a guessed slug can never surface a draft, private tag, or staff user that the eager path kept hidden
- restricted the lookup to real, unique column params (id/uuid/slug) and treated a paramset with none as a clean non-match, so a multi-segment permalink like /:year/:month/:slug/ or /:primary_tag/:slug/ can't pass a non-column key into findOne and turn a simple URL miss into a 500; validating the derived/relation segments against the record is the deferred column-aware follow-up
- pruned every loaded record to the same shape the eager resourceConfig exposes, sourcing the exclude lists from that config so the two can't drift: dropped the post body/status, tag description/meta, author bio and contact fields, and the auto-loaded posts_meta relation, because the resolved record is what downstream consumers see and e.g. ContentStatsService.getResourceTitle branches on title being absent for posts/pages
- trimmed tags/authors/primary_tag/primary_author down to {id, slug} to match the eager withRelatedFields, so reverse lookups can't leak author emails or full tag rows out of the URL service
- mirrored eager's quirk of exposing primary_tag/primary_author as null on pages while never carrying the tags/authors arrays there, so the page record shape matches the eager service field for field
- loaded tags and authors only for posts, because collection routers filter on tag:/author: and resolveUrl re-checks that filter against the record; pages always come from the filterless static-pages router, so the extra relations would be wasted work
- derived primary_author from the first author to match the eager withRelatedPrimary, since Post.toJSON already computes primary_tag but not primary_author, and date/tag/author permalinks depend on it
- built it as a factory over injected models rather than reaching for the model layer directly, keeping the module unit-testable and leaving the choice of real models to the later wiring step
ref https://linear.app/ghost/issue/HKG-1817

- this is the on-demand replacement for the eager UrlService: it answers the same forward (resource -> URL), ownership, and reverse (URL -> resource) questions, but computes each answer per call from the registered router configs instead of precomputing and holding a full resource map at boot
- composed it from the already-tested permalink-matcher and router-filter helpers so this file is only about the routing decision (pick the first registered router of the resource's type whose filter matches) and stays small enough to reason about against the eager generator
- made ownership exclusive like the eager service: eager reserves each resource for the first router that claims it, so ownsResource only returns true when no earlier router of the same type also matches, otherwise a catch-all router would wrongly claim resources already owned by a higher-priority collection
- kept forward lookups pure and confined every DB hit to resolveUrl via the injected findResource hook, so the whole service is unit-testable without a database and there is one obvious seam where on-demand loading happens
- memoized findResource within a single resolveUrl call, keyed by (resourceType, params), so routers that share a permalink shape or fall through across filters don't issue the same DB query twice, while router priority and the per-config filter re-check are preserved
- re-validated the resolved record against its canonical URL: the slug-only DB lookup can't filter on derived/relation permalink segments (year/month, primary_tag), so without this a wrong-date or wrong-tag URL would resolve a post the eager service 404s; regenerating the record's URL with the same replacePermalink the eager generator uses and comparing the captured params makes the reverse path agree with eager by construction
- matched the eager getUrlByResourceId 404 path exactly (no subdirectory prefix on the miss fallback) so the two services agree even on subdirectory installs
- logged a thin-resource error when a filtered router can't see a relation the caller didn't load: that is precisely the case where lazy would 404 while eager (full data in memory) returns a URL, so surfacing it protects parity during the shadow-comparison window
- restored the preserved contract spec from skipped to running now that the implementation it pins exists again
ref https://linear.app/ghost/issue/HKG-1817

- the whole point of this milestone is to swap the eager service for the lazy one without changing any URL, so the lazy service is only trustworthy if it provably returns the same answer as the eager service for the same inputs; this integration test is that proof, run against real fixtures and a real findResource over the model layer
- drove both services from one identical routing set per scenario and fed the lazy service the exact records the eager service cached, so any difference the test reports is a routing-logic divergence and not a data-shape artefact
- compared all three directions a caller depends on: forward (relative and absolute URL for every cached resource, including the resources that resolve to /404/), ownership (every router against every resource), and reverse (every eager-generated URL resolves back to the same resource and type)
- asserted the reverse lookup returns the same record shape as the eager service, not just the same id and type: exact field-set parity, matching slug/name, no leaked title, and relations trimmed to {id, slug}, so a divergence like lazy returning the post body or untrimmed author rows can't slip through
- added a date-based permalink scenario and asserted both services 404 a wrong-date variant of a generated URL, locking the canonical re-check so lazy can't resolve a non-canonical URL the eager service rejects
- added a featured-collection scenario alongside the default set specifically to exercise priority fallthrough; this is what caught the ownership exclusivity bug now fixed in the service, where a catch-all router claimed resources already owned by a higher-priority collection
- guarded against a vacuous pass by asserting the fixtures actually produce cached resources and generated URLs, so the parity assertions can never silently iterate over nothing
@vershwal vershwal requested a review from allouis June 4, 2026 07:51
return !!compiledFilter.queryJSON(record);
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
debug('NQL match failed', message);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not an actual logging.warn or logging.error? Do we expect errors here?

* the `:field` placeholders, or returns `null` when the path doesn't match.
*/
export function matchPermalink(template: string, urlPath: string): Record<string, string> | null {
if (typeof template !== 'string' || typeof urlPath !== 'string') {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever expected? Should we error here, or log?

export function createFindResource(models: Models): FindResource {
const loadOne = async (
Model: BookshelfModel,
query: Record<string, unknown>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be a stronger type? it's either {slug: string} or {id: string} or {uuid: string| right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe it can be a combination of those? I think really we only ever need one though.

if (config.resourceType !== routerType) {
continue;
}
this._warnIfThin(config, resource, routerType);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a production ready implementation this should throw completely - it should never happen. I also wonder if we can just check once at the top of getUrlForResource

* Matches a permalink template (e.g. `/:slug/`) against a URL path and extracts
* the `:field` placeholders, or returns `null` when the path doesn't match.
*/
export function matchPermalink(template: string, urlPath: string): Record<string, string> | null {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only care about the slug id and uuid params here? Maybe we can be specific to that, as well as doing some basic checking of the values?

e.g. a permalink of /post/:uuid should not match /post/blahblah because "blahblah" is not a valid uuid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants