Add llms.txt exports and AI crawler toggle#27984
Conversation
|
It looks like this PR contains a migration 👀 General requirements
Schema changes
Data changes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds LLM export/discovery: a new site setting (llms_enabled), admin toggle and fixtures, LlmsService producing cached llms.txt and llms-full.txt with size bounding, markdown rendering utilities, HTTP handlers for /llms.txt, /llms-full.txt and .md entry routes, discovery middleware, frontend wiring (pretty-URLs skip, static-theme fallthrough, site middleware and entry controller negotiation), defaults/config, migration, dependency addition, and comprehensive unit/e2e/acceptance tests. Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
ghost/core/core/frontend/services/llms/service.js (1)
88-91: 🏗️ Heavy liftAvoid parallel full-body loading of all pages and posts.
Fetching both collections concurrently with
limit: 'all'and body fields raises peak memory pressure on large sites.Refactor direction
- const [pages, posts] = await Promise.all([ - this.#fetchPublicEntries('page'), - this.#fetchPublicEntries('post') - ]); + const pages = await this.#fetchPublicEntries('page'); + const posts = await this.#fetchPublicEntries('post');Then consider chunked iteration/pagination inside
#fetchPublicEntriesso entries are appended progressively within the byte budget.Also applies to: 111-114, 234-239
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/frontend/services/llms/service.js` around lines 88 - 91, The current code loads all pages and posts in parallel with full bodies, causing peak memory pressure; update the two-call pattern that uses this.#fetchPublicEntries('page') and this.#fetchPublicEntries('post') so you do not fetch both collections fully in parallel and avoid using limit: 'all' with body fields. Modify `#fetchPublicEntries` to support chunked pagination/iteration (e.g., page/limit or cursor-based) and have the callers fetch one collection at a time and append entries progressively within a configurable byte/item budget; apply the same change to the other call sites that fetch pages and posts concurrently (the other invocations of this.#fetchPublicEntries) so all consumers stream or paginate entries instead of loading full bodies into memory at once.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ghost/core/core/frontend/services/llms/markdown.js`:
- Around line 110-112: The function getLlmsIndexUrl has an unused parameter
entry causing a lint error; remove the unused parameter from its declaration
(change function getLlmsIndexUrl(entry) to function getLlmsIndexUrl()) and
update any call sites that pass an argument to call it without parameters; also
apply the same removal for the duplicate occurrence referenced at the 141-141
location so both definitions/signatures no longer include the unused entry
parameter.
- Around line 43-50: The function getResourcePathFromMarkdownPath currently maps
'/index.md' to '/index/' instead of '/'—modify it so after stripping the '.md'
suffix (in getResourcePathFromMarkdownPath) you check if the resulting
resourcePath is '' or '/index' and return '/' (with trailing slash) in those
cases; otherwise keep the existing behavior of ensuring the returned path ends
with a '/' by returning resourcePath + '/' when needed.
- Around line 68-76: The function markdownFromHtml currently calls
collapseWhitespace on nhm.translate(html) which flattens line breaks and
destroys markdown structure; instead, call nhm.translate(html || '') directly,
normalize CRLF to \n and then collapse only horizontal whitespace within each
line (not across newlines) or remove collapseWhitespace entirely so
headings/lists/paragraph breaks are preserved; keep the existing
.replace(/\n{3,}/g, '\n\n').trim() step. Update markdownFromHtml to use
nhm.translate and an in-line normalization that preserves \n (or a helper that
collapses spaces per-line) rather than collapseWhitespace.
In `@ghost/core/core/frontend/services/llms/service.js`:
- Around line 135-137: The truncation footer is appended after the 5 MiB cap
check which lets the final output exceed the declared limit; update the assembly
logic in service.js (the code that sets output and checks wasTruncated) to
reserve space for the footer before enforcing the 5 MiB cap — i.e., compute the
footer length (the string starting with '\n_Truncated after 5 MiB..._\n'),
subtract that from the max-bytes limit when truncating/assembling output, and
only append the footer if it fits within the reserved space (or trim the main
output to make room) so the final payload never exceeds the declared 5 MiB cap.
In `@ghost/core/core/frontend/services/routing/controllers/entry.js`:
- Around line 74-76: The current early return when markdownEntry is falsy (the
"if (!markdownEntry) { return next(); }" branch) causes a 404 during content
negotiation; remove the return-next short-circuit so the controller falls back
to the normal entry rendering path (i.e., stop calling next() here and allow the
subsequent standard entry rendering logic to run), or explicitly invoke the
standard entry renderer instead of next() so missing markdown simply uses the
existing entry rendering flow.
In `@ghost/core/core/frontend/web/middleware/llms-discovery.js`:
- Around line 4-16: The appendHeaderValue function treats a comma-delimited
header string as a single token, allowing duplicate relations to be appended;
update appendHeaderValue to split existingValue (and newValue if it may contain
commas) by ',' into trimmed tokens, deduplicate by checking tokens membership,
and then return a comma+space joined string of the merged tokens; keep the
function name appendHeaderValue and ensure it handles both string and array
existingValue inputs and preserves order while avoiding duplicates.
In `@ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js`:
- Around line 210-214: The test references a nonexistent symbol
routerManagerGetResourceByIdStub which causes a ReferenceError; remove that
stale call or replace it with a correctly declared stub for
routerManager.getResourceById (e.g., use the existing routerManager stub used
elsewhere in the test or create a sinon stub named
routerManagerGetResourceByIdStub that stubs routerManager.getResourceById) so
the test no longer references an undeclared identifier and can proceed to
assertions.
In `@ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js`:
- Line 15: The test currently stubs settingsCache.get to always return false
which forces llms_enabled to false and short-circuits discovery header checks;
update the stub in the public-site test so settingsCache.get returns values per
key (e.g. return true for 'llms_enabled' and appropriate values for other keys)
by using a callsFake/implementation that switches on the key, ensuring
llms_enabled is true so the discovery headers code path is exercised (refer to
settingsCache.get and the llms_enabled check in the llms-discovery test).
---
Nitpick comments:
In `@ghost/core/core/frontend/services/llms/service.js`:
- Around line 88-91: The current code loads all pages and posts in parallel with
full bodies, causing peak memory pressure; update the two-call pattern that uses
this.#fetchPublicEntries('page') and this.#fetchPublicEntries('post') so you do
not fetch both collections fully in parallel and avoid using limit: 'all' with
body fields. Modify `#fetchPublicEntries` to support chunked pagination/iteration
(e.g., page/limit or cursor-based) and have the callers fetch one collection at
a time and append entries progressively within a configurable byte/item budget;
apply the same change to the other call sites that fetch pages and posts
concurrently (the other invocations of this.#fetchPublicEntries) so all
consumers stream or paginate entries instead of loading full bodies into memory
at once.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bb3038e0-8514-4fde-bfd7-30a0d0bb1b1f
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (31)
apps/admin-x-framework/src/test/msw-utils.tsapps/admin-x-framework/src/test/responses/settings.jsonapps/admin-x-settings/src/components/settings/general/general-settings.tsxapps/admin-x-settings/src/components/settings/general/seo-meta.tsxapps/admin-x-settings/test/acceptance/general/seometa.test.tsghost/core/core/frontend/services/llms/handler.jsghost/core/core/frontend/services/llms/markdown.jsghost/core/core/frontend/services/llms/service.jsghost/core/core/frontend/services/routing/controllers/entry.jsghost/core/core/frontend/web/middleware/index.jsghost/core/core/frontend/web/middleware/llms-discovery.jsghost/core/core/frontend/web/middleware/static-theme.jsghost/core/core/frontend/web/site.jsghost/core/core/server/api/endpoints/utils/serializers/input/settings.jsghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-group-mapper.jsghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-type-mapper.jsghost/core/core/server/data/migrations/versions/6.31/2026-04-14-22-07-44-add-llms-setting.jsghost/core/core/server/data/schema/default-settings/default-settings.jsonghost/core/core/server/web/shared/middleware/pretty-urls.jsghost/core/core/shared/config/defaults.jsonghost/core/core/shared/settings-cache/cache-manager.jsghost/core/package.jsonghost/core/test/e2e-frontend/llms.test.jsghost/core/test/unit/frontend/services/llms/handler.test.jsghost/core/test/unit/frontend/services/llms/service.test.jsghost/core/test/unit/frontend/services/routing/controllers/entry.test.jsghost/core/test/unit/frontend/web/middleware/llms-discovery.test.jsghost/core/test/unit/frontend/web/middleware/static-theme.test.jsghost/core/test/unit/server/data/schema/integrity.test.jsghost/core/test/unit/server/web/shared/middleware/pretty-urls.test.jsghost/core/test/utils/fixtures/default-settings.json
There was a problem hiding this comment.
♻️ Duplicate comments (1)
ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js (1)
15-21:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStub
llms_enabledexplicitly in this test path.This stub still leaves
llms_enabledunset (null), which can short-circuit discovery header behavior and make the test miss the intended code path.Suggested fix
sinon.stub(settingsCache, 'get').callsFake((key) => { if (key === 'is_private') { return false; } + if (key === 'llms_enabled') { + return true; + } return null; });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js` around lines 15 - 21, The settingsCache.get stub used in the test leaves 'llms_enabled' as null, which can short-circuit the discovery header path; update the sinon.stub(settingsCache, 'get').callsFake callback to explicitly return the intended test value for 'llms_enabled' (e.g., return true or false depending on the test scenario) alongside the existing 'is_private' handling so the llms discovery code path is exercised (refer to the stub in the test where settingsCache.get is defined).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js`:
- Around line 15-21: The settingsCache.get stub used in the test leaves
'llms_enabled' as null, which can short-circuit the discovery header path;
update the sinon.stub(settingsCache, 'get').callsFake callback to explicitly
return the intended test value for 'llms_enabled' (e.g., return true or false
depending on the test scenario) alongside the existing 'is_private' handling so
the llms discovery code path is exercised (refer to the stub in the test where
settingsCache.get is defined).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c098ceb3-3568-4d22-a917-5092839a6a6e
⛔ Files ignored due to path filters (1)
ghost/core/test/e2e-api/admin/__snapshots__/settings.test.js.snapis excluded by!**/*.snap
📒 Files selected for processing (6)
apps/admin-x-settings/test/acceptance/search.test.tsghost/core/core/frontend/services/llms/markdown.jsghost/core/test/e2e-api/admin/settings.test.jsghost/core/test/unit/frontend/services/routing/controllers/entry.test.jsghost/core/test/unit/frontend/web/middleware/llms-discovery.test.jsghost/core/test/unit/server/data/exporter/index.test.js
💤 Files with no reviewable changes (1)
- ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js
✅ Files skipped from review due to trivial changes (2)
- ghost/core/test/e2e-api/admin/settings.test.js
- ghost/core/test/unit/server/data/exporter/index.test.js
9ff9b66 to
35b8909
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
ghost/core/core/frontend/services/llms/markdown.js (2)
48-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMap
/index.mdback to root route.Line 48/49 currently resolves
/index.mdto/index/, which breaks reversible routing for the root page.Proposed fix
function getResourcePathFromMarkdownPath(pathname) { if (!pathname || !pathname.endsWith('.md')) { return null; } + if (pathname === '/index.md') { + return '/'; + } + const resourcePath = pathname.slice(0, -3) || '/'; return resourcePath.endsWith('/') ? resourcePath : `${resourcePath}/`; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/frontend/services/llms/markdown.js` around lines 48 - 49, The current logic computes resourcePath from pathname.slice(0, -3) and then forces a trailing slash, which turns '/index.md' into '/index/'; change the post-processing to map the special case '/index' back to '/' (i.e. if resourcePath === '/index' return '/'), otherwise keep the existing trailing-slash behavior; update the code around the resourcePath computation and the return expression that currently uses resourcePath.endsWith('/') to handle this '/index' special case.
69-75:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve markdown block structure during HTML conversion.
Line 69 collapses all whitespace (including newlines), flattening headings/lists/paragraphs in generated markdown.
Proposed fix
function markdownFromHtml(html) { - const markdown = collapseWhitespace(nhm.translate(html || '')); + const markdown = (nhm.translate(html || '') || '') + .replace(/\r\n/g, '\n') + .trim(); if (!markdown) { return null; } - return markdown.replace(/\n{3,}/g, '\n\n').trim(); + return markdown + .replace(/[ \t]+\n/g, '\n') + .replace(/\n{3,}/g, '\n\n') + .trim(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/frontend/services/llms/markdown.js` around lines 69 - 75, The current code calls collapseWhitespace(nhm.translate(html || '')) which removes all newlines and flattens block-level Markdown (headings, lists, paragraphs); change the call so whitespace collapsing preserves newline boundaries (e.g., use a variant/option of collapseWhitespace that does not remove \n or introduce a helper that only collapses horizontal whitespace but keeps \n), keeping the rest of the flow (variable markdown, null check, and the final replace(/\n{3,}/g, '\n\n').trim()). Update the usage of collapseWhitespace in this function to a newline-preserving variant so block structure produced by nhm.translate is retained.ghost/core/core/frontend/services/routing/controllers/entry.js (1)
74-76:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFallback to normal entry rendering when markdown fetch misses.
Line 75 returning
next()can turn a valid entry request into a 404 during content negotiation. Use the existing HTML renderer fallback instead.Proposed fix
return llmsService.fetchPublicEntry(res.routerOptions.resourceType, entry.id) .then((markdownEntry) => { if (!markdownEntry) { - return next(); + return renderer.renderEntry(req, res)(entry); } res.type(markdownContentType); res.send(renderEntryMarkdown(markdownEntry)); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/frontend/services/routing/controllers/entry.js` around lines 74 - 76, The current code returns next() when markdownEntry is falsy, which causes valid requests to be treated as 404s; instead, remove the early return and invoke the existing HTML renderer fallback for the same entry path (i.e., when markdownEntry is not found call the project's HTML renderer rather than next()). Locate the check using markdownEntry and replace the "return next();" with a call into the existing HTML rendering path used elsewhere in this controller (the HTML renderer function/handler already used for non-markdown responses), passing the same req, res, and next so content negotiation falls back to HTML.
🧹 Nitpick comments (1)
ghost/core/core/frontend/services/llms/service.js (1)
111-114: 🏗️ Heavy liftAvoid preloading posts when pages already consume the full export budget.
#buildLlmsFullTxt()loads pages and posts in parallel, but posts are unnecessary if the pages section already truncates. Deferring post fetch until after the pages budget check would cut heavy work on large sites.Proposed direction
- const [pages, posts] = await Promise.all([ - this.#fetchPublicEntries('page'), - this.#fetchPublicEntries('post') - ]); + const pages = await this.#fetchPublicEntries('page'); ... if (!wasTruncated) { + const posts = await this.#fetchPublicEntries('post'); const postSection = this.#appendBoundedSection(output, 'Posts', posts, entry => this.#buildFullEntry(entry)); output = postSection.output; wasTruncated = postSection.wasTruncated; }Also applies to: 129-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/frontend/services/llms/service.js` around lines 111 - 114, The current parallel fetch (const [pages, posts] = await Promise.all([...])) in `#buildLlmsFullTxt` triggers fetching posts even when pages already hit the export/truncation budget; change the logic to fetch pages first via this.#fetchPublicEntries('page'), inspect the pages result for truncation/budget exhaustion (the same check used later in the function), and only call this.#fetchPublicEntries('post') if pages did not already consume the full export budget; apply the same sequential-fetch fix to the other occurrence noted (the block around lines 129–133) so posts are only fetched when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ghost/core/test/unit/frontend/services/llms/service.test.js`:
- Around line 125-130: The truncation test needs an explicit byte-size
assertion: after calling service.getLlmsFullTxt() and existing content checks
(llmsFullTxt, /Truncated after 5 MiB/), add an assertion that
Buffer.byteLength(llmsFullTxt, 'utf8') is less than or equal to 5 * 1024 * 1024
(5 MiB) to prevent regressions; update the test that uses llmsFullTxt in
ghost/core/test/unit/frontend/services/llms/service.test.js to include this
byte-length check so the test fails if the payload exceeds the hard cap.
---
Duplicate comments:
In `@ghost/core/core/frontend/services/llms/markdown.js`:
- Around line 48-49: The current logic computes resourcePath from
pathname.slice(0, -3) and then forces a trailing slash, which turns '/index.md'
into '/index/'; change the post-processing to map the special case '/index' back
to '/' (i.e. if resourcePath === '/index' return '/'), otherwise keep the
existing trailing-slash behavior; update the code around the resourcePath
computation and the return expression that currently uses
resourcePath.endsWith('/') to handle this '/index' special case.
- Around line 69-75: The current code calls
collapseWhitespace(nhm.translate(html || '')) which removes all newlines and
flattens block-level Markdown (headings, lists, paragraphs); change the call so
whitespace collapsing preserves newline boundaries (e.g., use a variant/option
of collapseWhitespace that does not remove \n or introduce a helper that only
collapses horizontal whitespace but keeps \n), keeping the rest of the flow
(variable markdown, null check, and the final replace(/\n{3,}/g,
'\n\n').trim()). Update the usage of collapseWhitespace in this function to a
newline-preserving variant so block structure produced by nhm.translate is
retained.
In `@ghost/core/core/frontend/services/routing/controllers/entry.js`:
- Around line 74-76: The current code returns next() when markdownEntry is
falsy, which causes valid requests to be treated as 404s; instead, remove the
early return and invoke the existing HTML renderer fallback for the same entry
path (i.e., when markdownEntry is not found call the project's HTML renderer
rather than next()). Locate the check using markdownEntry and replace the
"return next();" with a call into the existing HTML rendering path used
elsewhere in this controller (the HTML renderer function/handler already used
for non-markdown responses), passing the same req, res, and next so content
negotiation falls back to HTML.
---
Nitpick comments:
In `@ghost/core/core/frontend/services/llms/service.js`:
- Around line 111-114: The current parallel fetch (const [pages, posts] = await
Promise.all([...])) in `#buildLlmsFullTxt` triggers fetching posts even when pages
already hit the export/truncation budget; change the logic to fetch pages first
via this.#fetchPublicEntries('page'), inspect the pages result for
truncation/budget exhaustion (the same check used later in the function), and
only call this.#fetchPublicEntries('post') if pages did not already consume the
full export budget; apply the same sequential-fetch fix to the other occurrence
noted (the block around lines 129–133) so posts are only fetched when needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f7de417e-e6d2-452e-880a-14a562ee5829
⛔ Files ignored due to path filters (2)
ghost/core/test/e2e-api/admin/__snapshots__/settings.test.js.snapis excluded by!**/*.snappnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (35)
apps/admin-x-framework/src/test/msw-utils.tsapps/admin-x-framework/src/test/responses/settings.jsonapps/admin-x-settings/src/components/settings/general/general-settings.tsxapps/admin-x-settings/src/components/settings/general/seo-meta.tsxapps/admin-x-settings/test/acceptance/general/seometa.test.tsapps/admin-x-settings/test/acceptance/search.test.tsghost/core/core/frontend/services/llms/handler.jsghost/core/core/frontend/services/llms/markdown.jsghost/core/core/frontend/services/llms/service.jsghost/core/core/frontend/services/routing/controllers/entry.jsghost/core/core/frontend/web/middleware/index.jsghost/core/core/frontend/web/middleware/llms-discovery.jsghost/core/core/frontend/web/middleware/static-theme.jsghost/core/core/frontend/web/site.jsghost/core/core/server/api/endpoints/utils/serializers/input/settings.jsghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-group-mapper.jsghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-type-mapper.jsghost/core/core/server/data/migrations/versions/6.40/2026-04-14-22-07-44-add-llms-setting.jsghost/core/core/server/data/schema/default-settings/default-settings.jsonghost/core/core/server/web/shared/middleware/pretty-urls.jsghost/core/core/shared/config/defaults.jsonghost/core/core/shared/settings-cache/cache-manager.jsghost/core/package.jsonghost/core/test/e2e-api/admin/settings.test.jsghost/core/test/e2e-frontend/llms.test.jsghost/core/test/legacy/models/model-settings.test.jsghost/core/test/unit/frontend/services/llms/handler.test.jsghost/core/test/unit/frontend/services/llms/service.test.jsghost/core/test/unit/frontend/services/routing/controllers/entry.test.jsghost/core/test/unit/frontend/web/middleware/llms-discovery.test.jsghost/core/test/unit/frontend/web/middleware/static-theme.test.jsghost/core/test/unit/server/data/exporter/index.test.jsghost/core/test/unit/server/data/schema/integrity.test.jsghost/core/test/unit/server/web/shared/middleware/pretty-urls.test.jsghost/core/test/utils/fixtures/default-settings.json
✅ Files skipped from review due to trivial changes (5)
- ghost/core/test/legacy/models/model-settings.test.js
- ghost/core/test/unit/server/data/exporter/index.test.js
- ghost/core/test/unit/server/data/schema/integrity.test.js
- ghost/core/core/shared/settings-cache/cache-manager.js
- apps/admin-x-framework/src/test/responses/settings.json
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #27984 +/- ##
==========================================
+ Coverage 73.81% 73.85% +0.03%
==========================================
Files 1523 1527 +4
Lines 128935 129593 +658
Branches 15479 15629 +150
==========================================
+ Hits 95177 95708 +531
- Misses 32799 32923 +124
- Partials 959 962 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ghost/core/core/frontend/services/llms/markdown.js (1)
83-89: 💤 Low valueConsider defensive handling for invalid date values.
new Date(value).toISOString()throwsRangeErrorifvalueis a non-falsy but invalid date string. Ghost data is typically validated, but a defensive wrapper would prevent unexpected crashes during markdown generation.Proposed fix
function formatIsoDate(value) { if (!value) { return null; } - return new Date(value).toISOString(); + const date = new Date(value); + return isNaN(date.getTime()) ? null : date.toISOString(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/frontend/services/llms/markdown.js` around lines 83 - 89, The function formatIsoDate currently calls new Date(value).toISOString() which will throw for non-falsy but invalid date values; update formatIsoDate to defensively construct a Date object, verify it's valid (e.g., using isNaN(date.getTime()) or Number.isFinite(date.getTime())), and only call toISOString() when valid, returning null (or a safe fallback) when the date is invalid or unparsable; reference the formatIsoDate function and its return path so the change is localized and avoids throwing during markdown generation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@ghost/core/core/frontend/services/llms/markdown.js`:
- Around line 83-89: The function formatIsoDate currently calls new
Date(value).toISOString() which will throw for non-falsy but invalid date
values; update formatIsoDate to defensively construct a Date object, verify it's
valid (e.g., using isNaN(date.getTime()) or Number.isFinite(date.getTime())),
and only call toISOString() when valid, returning null (or a safe fallback) when
the date is invalid or unparsable; reference the formatIsoDate function and its
return path so the change is localized and avoids throwing during markdown
generation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0ab175cb-86dc-43c1-9288-644354929ff3
⛔ Files ignored due to path filters (2)
ghost/core/test/e2e-api/admin/__snapshots__/settings.test.js.snapis excluded by!**/*.snapghost/core/test/e2e-api/members/__snapshots__/well-known.test.js.snapis excluded by!**/*.snap
📒 Files selected for processing (4)
ghost/core/core/frontend/services/llms/markdown.jsghost/core/core/frontend/services/llms/service.jsghost/core/core/frontend/services/routing/controllers/entry.jsghost/core/core/frontend/web/middleware/llms-discovery.js
Ghost now exposes llms.txt and per-entry markdown for public content, with a matching admin setting so sites can disable the feature without theme overrides. The setting defaults to on, redirects llms-specific URLs back to canonical HTML when disabled, and keeps the markdown export subdirectory-aware and strictly public-only.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The stub returned false for every settingsCache.get() call, which made llms_enabled === false and caused the middleware to bail out before adding discovery headers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getByLabel('Search') resolved to 2 elements because both the sidebar
search input and the SEO Meta 'Search' tab matched. Use
getByPlaceholder('Search settings') for an unambiguous selector.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
routerManager was removed from the entry controller on main. The markdown content negotiation test no longer needs it — the controller reads resourceType from res.routerOptions directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The llms-discovery middleware adds Link and X-Llms-Txt headers to all siteApp responses, including the /members/.well-known/jwks.json endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Snapshot template literals need double-backslash-quote for literal escaped quotes in serialized values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getMarkdownPath("/") produces "/index.md", but the reverse
getResourcePathFromMarkdownPath("/index.md") returned "/index/"
instead of "/". Now handles the /index.md special case.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
collapseWhitespace replaces all whitespace sequences with a single space, which flattens markdown block structure (lists, code blocks, paragraphs) into one line. The node-html-markdown output already has proper structure; only excessive blank lines need normalizing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The truncation note was appended after the size check, so the final output could exceed 5 MiB. Now the budget accounts for the footer size upfront. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When fetchPublicEntry returns null during content negotiation, the controller called next() which skipped the resolved entry and could 404. Now falls through to normal HTML rendering instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
appendHeaderValue checked array elements with includes(), but a single header string like "a, b" is one element and would not match "b". Now splits comma-delimited values before checking for duplicates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… entry The llms_enabled migration now lives in 6.40 (from merged PR #27995). Removed the old 6.31 migration and a duplicate snapshot entry left by the rebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8d87f2c to
3e154ee
Compare
… to async Reverted admin UI toggle changes (general-settings keywords, seo-meta toggle, seometa acceptance test) - these should ship in a follow-up PR. Converted entry controller markdown test from done() callback to async/await to match the vitest migration on main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The admin UI belongs in this PR since this is where the setting starts controlling behavior (llms.txt generation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These were left over from the original feature commit - the setting is already in the correct position from the merged PR #27995. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ng in cache-manager JSDoc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.
Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.
Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)
Ref #27984
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Closing in favour of #28042 |
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.
Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.
Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)
Ref #27984
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.
Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.
Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)
Ref #27984
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.
Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.
Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)
Ref #27984
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.
Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.
Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)
Ref #27984
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.
Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.
Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)
Ref #27984
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
llms.txtandllms-full.txtgeneration plus per-entry Markdown export for public posts and pagesllms_enabledsite setting and expose it in Admin under Meta data > Search.txtand.mdroutes back to canonical HTML routes when the setting is disabledRebased from #27400 onto latest main (resolved conflicts in general-settings.tsx, entry.test.js, integrity.test.js).
Known CI fixes still needed
Must fix (CI blockers)
entryparam —markdown.js:110getLlmsIndexUrl(entry)never usesentry, failsno-unused-varsllms-discovery.test.js:15stubssettingsCache.getto returnfalsefor all keys, which makesllms_enabled === falseand the middleware bails out before adding headers. Stub needs per-key valuesexporter/index.test.js:232expects 110 settings butllms_enabledbumps it to 111 (likely higher after rebase). Count needs updatingtest/e2e-api/admin/settings.test.js:69snapshot doesn't includellms_enabled. Needs regeneratinggetByLabel('Search')resolves to 2 elements after adding'ai search engines'to metadata keywords. Test selector needs to be more specificrouterManagerwas removed from entry controller and its test on main. The new markdown test at line ~252 still stubsrouterManagerGetResourceByIdStub. Needs rewriting without routerManagermodel-settings.test.jsSETTINGS_LENGTH needs bumping forllms_enabledllms_enabledbeforemeta_title, not just onellms_enabledin responseLinkandX-Llms-Txtheaders to all siteApp responses including/members/.well-known/jwks.jsonNice to have (code quality)
collapseWhitespaceinmarkdownFromHtml—markdown.js:68-76wrapsnhm.translate()withcollapseWhitespace()which replaces all\s+with a single space, flattening markdown block structure (lists, code blocks, paragraphs) into one line. Remove the wrapper and only normalize excessive blank linesgetResourcePathFromMarkdownPath('/index.md')returns/index/— should return/for root-route reversibility sincegetMarkdownPath('/')produces/index.md#buildLlmsFullTxtpreloads all post bodies — fetches all pages and posts withlimit: 'all'includinghtml/plaintextcolumns in parallel before the 5 MiB bounding logic runs. On large sites this is a lot of memory. Consider sequential fetching with a budget (requires larger refactor)#appendBoundedSectionstops atFIVE_MIBbut the truncation note is appended afterward. Reserve space for the footernext()on null markdown —entry.js:96-98callsnext()whenfetchPublicEntryreturns null during content negotiation, which skips the resolved entry and can 404. Should fall through to normal HTML renderingllms-discovery.js:4-15appendHeaderValuechecksvalues.includes(newValue)on the whole string, not individual comma-separated entries