Skip to content

Add llms.txt exports and AI crawler toggle#27984

Closed
ErisDS wants to merge 20 commits into
mainfrom
codex/llms-setting-pr
Closed

Add llms.txt exports and AI crawler toggle#27984
ErisDS wants to merge 20 commits into
mainfrom
codex/llms-setting-pr

Conversation

@ErisDS
Copy link
Copy Markdown
Member

@ErisDS ErisDS commented May 20, 2026

Summary

  • Add llms.txt and llms-full.txt generation plus per-entry Markdown export for public posts and pages
  • Add an llms_enabled site setting and expose it in Admin under Meta data > Search
  • Redirect llms-specific .txt and .md routes back to canonical HTML routes when the setting is disabled

Rebased from #27400 onto latest main (resolved conflicts in general-settings.tsx, entry.test.js, integrity.test.js).

image

Known CI fixes still needed

Must fix (CI blockers)

  • Lint: unused entry parammarkdown.js:110 getLlmsIndexUrl(entry) never uses entry, fails no-unused-vars
  • Unit test: llms-discovery stubllms-discovery.test.js:15 stubs settingsCache.get to return false for all keys, which makes llms_enabled === false and the middleware bails out before adding headers. Stub needs per-key values
  • Unit test: exporter settings countexporter/index.test.js:232 expects 110 settings but llms_enabled bumps it to 111 (likely higher after rebase). Count needs updating
  • Acceptance test: settings API snapshottest/e2e-api/admin/settings.test.js:69 snapshot doesn't include llms_enabled. Needs regenerating
  • Admin-X Settings: search testgetByLabel('Search') resolves to 2 elements after adding 'ai search engines' to metadata keywords. Test selector needs to be more specific
  • Entry controller test: routerManager removedrouterManager was removed from entry controller and its test on main. The new markdown test at line ~252 still stubs routerManagerGetResourceByIdStub. Needs rewriting without routerManager
  • Legacy test: settings countmodel-settings.test.js SETTINGS_LENGTH needs bumping for llms_enabled
  • Settings API snapshot: all 19 blocks — all snapshot blocks needed llms_enabled before meta_title, not just one
  • Stripe settings header snapshot — content-length changed from 5343 to 5379 due to llms_enabled in response
  • JWKS header snapshot — llms-discovery middleware adds Link and X-Llms-Txt headers to all siteApp responses including /members/.well-known/jwks.json

Nice to have (code quality)

  • collapseWhitespace in markdownFromHtmlmarkdown.js:68-76 wraps nhm.translate() with collapseWhitespace() which replaces all \s+ with a single space, flattening markdown block structure (lists, code blocks, paragraphs) into one line. Remove the wrapper and only normalize excessive blank lines
  • getResourcePathFromMarkdownPath('/index.md') returns /index/ — should return / for root-route reversibility since getMarkdownPath('/') produces /index.md
  • #buildLlmsFullTxt preloads all post bodies — fetches all pages and posts with limit: 'all' including html/plaintext columns in parallel before the 5 MiB bounding logic runs. On large sites this is a lot of memory. Consider sequential fetching with a budget (requires larger refactor)
  • Truncation footer can exceed 5 MiB cap#appendBoundedSection stops at FIVE_MIB but the truncation note is appended afterward. Reserve space for the footer
  • Entry controller next() on null markdownentry.js:96-98 calls next() when fetchPublicEntry returns null during content negotiation, which skips the resolved entry and can 404. Should fall through to normal HTML rendering
  • Link header dedup misses comma-delimited formllms-discovery.js:4-15 appendHeaderValue checks values.includes(newValue) on the whole string, not individual comma-separated entries

@ErisDS ErisDS added the migration [pull request] Includes migration for review label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

It looks like this PR contains a migration 👀
Here's the checklist for reviewing migrations:

General requirements

  • ⚠️ Tested performance on staging database servers, as performance on local machines is not comparable to a production environment
  • Satisfies idempotency requirement (both up() and down())
  • Does not reference models
  • Filename is in the correct format (and correctly ordered)
  • Targets the next minor version
  • All code paths have appropriate log messages
  • Uses the correct utils
  • Contains a minimal changeset
  • Does not mix DDL/DML operations
  • Tested in MySQL and SQLite

Schema changes

  • Both schema change and related migration have been implemented
  • For index changes: has been performance tested for large tables
  • For new tables/columns: fields use the appropriate predefined field lengths
  • For new tables/columns: field names follow the appropriate conventions
  • Does not drop a non-alpha table outside of a major version

Data changes

  • Mass updates/inserts are batched appropriately
  • Does not loop over large tables/datasets
  • Defends against missing or invalid data
  • For settings updates: follows the appropriate guidelines

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds LLM export/discovery: a new site setting (llms_enabled), admin toggle and fixtures, LlmsService producing cached llms.txt and llms-full.txt with size bounding, markdown rendering utilities, HTTP handlers for /llms.txt, /llms-full.txt and .md entry routes, discovery middleware, frontend wiring (pretty-URLs skip, static-theme fallthrough, site middleware and entry controller negotiation), defaults/config, migration, dependency addition, and comprehensive unit/e2e/acceptance tests.

Suggested reviewers

  • kevinansfield
  • vershwal
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding llms.txt exports and an AI crawler toggle feature for the Ghost platform.
Description check ✅ Passed The description is well-detailed and directly related to the changeset, explaining the new llms.txt/llms-full.txt generation, the llms_enabled setting, and redirect behavior when disabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/llms-setting-pr

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
ghost/core/core/frontend/services/llms/service.js (1)

88-91: 🏗️ Heavy lift

Avoid parallel full-body loading of all pages and posts.

Fetching both collections concurrently with limit: 'all' and body fields raises peak memory pressure on large sites.

Refactor direction
-        const [pages, posts] = await Promise.all([
-            this.#fetchPublicEntries('page'),
-            this.#fetchPublicEntries('post')
-        ]);
+        const pages = await this.#fetchPublicEntries('page');
+        const posts = await this.#fetchPublicEntries('post');

Then consider chunked iteration/pagination inside #fetchPublicEntries so entries are appended progressively within the byte budget.

Also applies to: 111-114, 234-239

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/frontend/services/llms/service.js` around lines 88 - 91, The
current code loads all pages and posts in parallel with full bodies, causing
peak memory pressure; update the two-call pattern that uses
this.#fetchPublicEntries('page') and this.#fetchPublicEntries('post') so you do
not fetch both collections fully in parallel and avoid using limit: 'all' with
body fields. Modify `#fetchPublicEntries` to support chunked pagination/iteration
(e.g., page/limit or cursor-based) and have the callers fetch one collection at
a time and append entries progressively within a configurable byte/item budget;
apply the same change to the other call sites that fetch pages and posts
concurrently (the other invocations of this.#fetchPublicEntries) so all
consumers stream or paginate entries instead of loading full bodies into memory
at once.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ghost/core/core/frontend/services/llms/markdown.js`:
- Around line 110-112: The function getLlmsIndexUrl has an unused parameter
entry causing a lint error; remove the unused parameter from its declaration
(change function getLlmsIndexUrl(entry) to function getLlmsIndexUrl()) and
update any call sites that pass an argument to call it without parameters; also
apply the same removal for the duplicate occurrence referenced at the 141-141
location so both definitions/signatures no longer include the unused entry
parameter.
- Around line 43-50: The function getResourcePathFromMarkdownPath currently maps
'/index.md' to '/index/' instead of '/'—modify it so after stripping the '.md'
suffix (in getResourcePathFromMarkdownPath) you check if the resulting
resourcePath is '' or '/index' and return '/' (with trailing slash) in those
cases; otherwise keep the existing behavior of ensuring the returned path ends
with a '/' by returning resourcePath + '/' when needed.
- Around line 68-76: The function markdownFromHtml currently calls
collapseWhitespace on nhm.translate(html) which flattens line breaks and
destroys markdown structure; instead, call nhm.translate(html || '') directly,
normalize CRLF to \n and then collapse only horizontal whitespace within each
line (not across newlines) or remove collapseWhitespace entirely so
headings/lists/paragraph breaks are preserved; keep the existing
.replace(/\n{3,}/g, '\n\n').trim() step. Update markdownFromHtml to use
nhm.translate and an in-line normalization that preserves \n (or a helper that
collapses spaces per-line) rather than collapseWhitespace.

In `@ghost/core/core/frontend/services/llms/service.js`:
- Around line 135-137: The truncation footer is appended after the 5 MiB cap
check which lets the final output exceed the declared limit; update the assembly
logic in service.js (the code that sets output and checks wasTruncated) to
reserve space for the footer before enforcing the 5 MiB cap — i.e., compute the
footer length (the string starting with '\n_Truncated after 5 MiB..._\n'),
subtract that from the max-bytes limit when truncating/assembling output, and
only append the footer if it fits within the reserved space (or trim the main
output to make room) so the final payload never exceeds the declared 5 MiB cap.

In `@ghost/core/core/frontend/services/routing/controllers/entry.js`:
- Around line 74-76: The current early return when markdownEntry is falsy (the
"if (!markdownEntry) { return next(); }" branch) causes a 404 during content
negotiation; remove the return-next short-circuit so the controller falls back
to the normal entry rendering path (i.e., stop calling next() here and allow the
subsequent standard entry rendering logic to run), or explicitly invoke the
standard entry renderer instead of next() so missing markdown simply uses the
existing entry rendering flow.

In `@ghost/core/core/frontend/web/middleware/llms-discovery.js`:
- Around line 4-16: The appendHeaderValue function treats a comma-delimited
header string as a single token, allowing duplicate relations to be appended;
update appendHeaderValue to split existingValue (and newValue if it may contain
commas) by ',' into trimmed tokens, deduplicate by checking tokens membership,
and then return a comma+space joined string of the merged tokens; keep the
function name appendHeaderValue and ensure it handles both string and array
existingValue inputs and preserves order while avoiding duplicates.

In `@ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js`:
- Around line 210-214: The test references a nonexistent symbol
routerManagerGetResourceByIdStub which causes a ReferenceError; remove that
stale call or replace it with a correctly declared stub for
routerManager.getResourceById (e.g., use the existing routerManager stub used
elsewhere in the test or create a sinon stub named
routerManagerGetResourceByIdStub that stubs routerManager.getResourceById) so
the test no longer references an undeclared identifier and can proceed to
assertions.

In `@ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js`:
- Line 15: The test currently stubs settingsCache.get to always return false
which forces llms_enabled to false and short-circuits discovery header checks;
update the stub in the public-site test so settingsCache.get returns values per
key (e.g. return true for 'llms_enabled' and appropriate values for other keys)
by using a callsFake/implementation that switches on the key, ensuring
llms_enabled is true so the discovery headers code path is exercised (refer to
settingsCache.get and the llms_enabled check in the llms-discovery test).

---

Nitpick comments:
In `@ghost/core/core/frontend/services/llms/service.js`:
- Around line 88-91: The current code loads all pages and posts in parallel with
full bodies, causing peak memory pressure; update the two-call pattern that uses
this.#fetchPublicEntries('page') and this.#fetchPublicEntries('post') so you do
not fetch both collections fully in parallel and avoid using limit: 'all' with
body fields. Modify `#fetchPublicEntries` to support chunked pagination/iteration
(e.g., page/limit or cursor-based) and have the callers fetch one collection at
a time and append entries progressively within a configurable byte/item budget;
apply the same change to the other call sites that fetch pages and posts
concurrently (the other invocations of this.#fetchPublicEntries) so all
consumers stream or paginate entries instead of loading full bodies into memory
at once.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb3038e0-8514-4fde-bfd7-30a0d0bb1b1f

📥 Commits

Reviewing files that changed from the base of the PR and between 255f180 and 0dc9307.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (31)
  • apps/admin-x-framework/src/test/msw-utils.ts
  • apps/admin-x-framework/src/test/responses/settings.json
  • apps/admin-x-settings/src/components/settings/general/general-settings.tsx
  • apps/admin-x-settings/src/components/settings/general/seo-meta.tsx
  • apps/admin-x-settings/test/acceptance/general/seometa.test.ts
  • ghost/core/core/frontend/services/llms/handler.js
  • ghost/core/core/frontend/services/llms/markdown.js
  • ghost/core/core/frontend/services/llms/service.js
  • ghost/core/core/frontend/services/routing/controllers/entry.js
  • ghost/core/core/frontend/web/middleware/index.js
  • ghost/core/core/frontend/web/middleware/llms-discovery.js
  • ghost/core/core/frontend/web/middleware/static-theme.js
  • ghost/core/core/frontend/web/site.js
  • ghost/core/core/server/api/endpoints/utils/serializers/input/settings.js
  • ghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-group-mapper.js
  • ghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-type-mapper.js
  • ghost/core/core/server/data/migrations/versions/6.31/2026-04-14-22-07-44-add-llms-setting.js
  • ghost/core/core/server/data/schema/default-settings/default-settings.json
  • ghost/core/core/server/web/shared/middleware/pretty-urls.js
  • ghost/core/core/shared/config/defaults.json
  • ghost/core/core/shared/settings-cache/cache-manager.js
  • ghost/core/package.json
  • ghost/core/test/e2e-frontend/llms.test.js
  • ghost/core/test/unit/frontend/services/llms/handler.test.js
  • ghost/core/test/unit/frontend/services/llms/service.test.js
  • ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js
  • ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js
  • ghost/core/test/unit/frontend/web/middleware/static-theme.test.js
  • ghost/core/test/unit/server/data/schema/integrity.test.js
  • ghost/core/test/unit/server/web/shared/middleware/pretty-urls.test.js
  • ghost/core/test/utils/fixtures/default-settings.json

Comment thread ghost/core/core/frontend/services/llms/markdown.js
Comment thread ghost/core/core/frontend/services/llms/markdown.js
Comment thread ghost/core/core/frontend/services/llms/markdown.js Outdated
Comment thread ghost/core/core/frontend/services/llms/service.js
Comment thread ghost/core/core/frontend/services/routing/controllers/entry.js
Comment thread ghost/core/core/frontend/web/middleware/llms-discovery.js
Comment thread ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js Outdated
Comment thread ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js (1)

15-21: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stub llms_enabled explicitly in this test path.

This stub still leaves llms_enabled unset (null), which can short-circuit discovery header behavior and make the test miss the intended code path.

Suggested fix
 sinon.stub(settingsCache, 'get').callsFake((key) => {
     if (key === 'is_private') {
         return false;
     }
+    if (key === 'llms_enabled') {
+        return true;
+    }

     return null;
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js` around
lines 15 - 21, The settingsCache.get stub used in the test leaves 'llms_enabled'
as null, which can short-circuit the discovery header path; update the
sinon.stub(settingsCache, 'get').callsFake callback to explicitly return the
intended test value for 'llms_enabled' (e.g., return true or false depending on
the test scenario) alongside the existing 'is_private' handling so the llms
discovery code path is exercised (refer to the stub in the test where
settingsCache.get is defined).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js`:
- Around line 15-21: The settingsCache.get stub used in the test leaves
'llms_enabled' as null, which can short-circuit the discovery header path;
update the sinon.stub(settingsCache, 'get').callsFake callback to explicitly
return the intended test value for 'llms_enabled' (e.g., return true or false
depending on the test scenario) alongside the existing 'is_private' handling so
the llms discovery code path is exercised (refer to the stub in the test where
settingsCache.get is defined).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c098ceb3-3568-4d22-a917-5092839a6a6e

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc9307 and d335381.

⛔ Files ignored due to path filters (1)
  • ghost/core/test/e2e-api/admin/__snapshots__/settings.test.js.snap is excluded by !**/*.snap
📒 Files selected for processing (6)
  • apps/admin-x-settings/test/acceptance/search.test.ts
  • ghost/core/core/frontend/services/llms/markdown.js
  • ghost/core/test/e2e-api/admin/settings.test.js
  • ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js
  • ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js
  • ghost/core/test/unit/server/data/exporter/index.test.js
💤 Files with no reviewable changes (1)
  • ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js
✅ Files skipped from review due to trivial changes (2)
  • ghost/core/test/e2e-api/admin/settings.test.js
  • ghost/core/test/unit/server/data/exporter/index.test.js

@ErisDS ErisDS force-pushed the codex/llms-setting-pr branch from 9ff9b66 to 35b8909 Compare May 20, 2026 10:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
ghost/core/core/frontend/services/llms/markdown.js (2)

48-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Map /index.md back to root route.

Line 48/49 currently resolves /index.md to /index/, which breaks reversible routing for the root page.

Proposed fix
 function getResourcePathFromMarkdownPath(pathname) {
     if (!pathname || !pathname.endsWith('.md')) {
         return null;
     }
 
+    if (pathname === '/index.md') {
+        return '/';
+    }
+
     const resourcePath = pathname.slice(0, -3) || '/';
     return resourcePath.endsWith('/') ? resourcePath : `${resourcePath}/`;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/frontend/services/llms/markdown.js` around lines 48 - 49, The
current logic computes resourcePath from pathname.slice(0, -3) and then forces a
trailing slash, which turns '/index.md' into '/index/'; change the
post-processing to map the special case '/index' back to '/' (i.e. if
resourcePath === '/index' return '/'), otherwise keep the existing
trailing-slash behavior; update the code around the resourcePath computation and
the return expression that currently uses resourcePath.endsWith('/') to handle
this '/index' special case.

69-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve markdown block structure during HTML conversion.

Line 69 collapses all whitespace (including newlines), flattening headings/lists/paragraphs in generated markdown.

Proposed fix
 function markdownFromHtml(html) {
-    const markdown = collapseWhitespace(nhm.translate(html || ''));
+    const markdown = (nhm.translate(html || '') || '')
+        .replace(/\r\n/g, '\n')
+        .trim();
 
     if (!markdown) {
         return null;
     }
 
-    return markdown.replace(/\n{3,}/g, '\n\n').trim();
+    return markdown
+        .replace(/[ \t]+\n/g, '\n')
+        .replace(/\n{3,}/g, '\n\n')
+        .trim();
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/frontend/services/llms/markdown.js` around lines 69 - 75, The
current code calls collapseWhitespace(nhm.translate(html || '')) which removes
all newlines and flattens block-level Markdown (headings, lists, paragraphs);
change the call so whitespace collapsing preserves newline boundaries (e.g., use
a variant/option of collapseWhitespace that does not remove \n or introduce a
helper that only collapses horizontal whitespace but keeps \n), keeping the rest
of the flow (variable markdown, null check, and the final replace(/\n{3,}/g,
'\n\n').trim()). Update the usage of collapseWhitespace in this function to a
newline-preserving variant so block structure produced by nhm.translate is
retained.
ghost/core/core/frontend/services/routing/controllers/entry.js (1)

74-76: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fallback to normal entry rendering when markdown fetch misses.

Line 75 returning next() can turn a valid entry request into a 404 during content negotiation. Use the existing HTML renderer fallback instead.

Proposed fix
                 return llmsService.fetchPublicEntry(res.routerOptions.resourceType, entry.id)
                     .then((markdownEntry) => {
                         if (!markdownEntry) {
-                            return next();
+                            return renderer.renderEntry(req, res)(entry);
                         }
 
                         res.type(markdownContentType);
                         res.send(renderEntryMarkdown(markdownEntry));
                     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/frontend/services/routing/controllers/entry.js` around lines
74 - 76, The current code returns next() when markdownEntry is falsy, which
causes valid requests to be treated as 404s; instead, remove the early return
and invoke the existing HTML renderer fallback for the same entry path (i.e.,
when markdownEntry is not found call the project's HTML renderer rather than
next()). Locate the check using markdownEntry and replace the "return next();"
with a call into the existing HTML rendering path used elsewhere in this
controller (the HTML renderer function/handler already used for non-markdown
responses), passing the same req, res, and next so content negotiation falls
back to HTML.
🧹 Nitpick comments (1)
ghost/core/core/frontend/services/llms/service.js (1)

111-114: 🏗️ Heavy lift

Avoid preloading posts when pages already consume the full export budget.

#buildLlmsFullTxt() loads pages and posts in parallel, but posts are unnecessary if the pages section already truncates. Deferring post fetch until after the pages budget check would cut heavy work on large sites.

Proposed direction
-        const [pages, posts] = await Promise.all([
-            this.#fetchPublicEntries('page'),
-            this.#fetchPublicEntries('post')
-        ]);
+        const pages = await this.#fetchPublicEntries('page');

         ...
         if (!wasTruncated) {
+            const posts = await this.#fetchPublicEntries('post');
             const postSection = this.#appendBoundedSection(output, 'Posts', posts, entry => this.#buildFullEntry(entry));
             output = postSection.output;
             wasTruncated = postSection.wasTruncated;
         }

Also applies to: 129-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/frontend/services/llms/service.js` around lines 111 - 114,
The current parallel fetch (const [pages, posts] = await Promise.all([...])) in
`#buildLlmsFullTxt` triggers fetching posts even when pages already hit the
export/truncation budget; change the logic to fetch pages first via
this.#fetchPublicEntries('page'), inspect the pages result for truncation/budget
exhaustion (the same check used later in the function), and only call
this.#fetchPublicEntries('post') if pages did not already consume the full
export budget; apply the same sequential-fetch fix to the other occurrence noted
(the block around lines 129–133) so posts are only fetched when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ghost/core/test/unit/frontend/services/llms/service.test.js`:
- Around line 125-130: The truncation test needs an explicit byte-size
assertion: after calling service.getLlmsFullTxt() and existing content checks
(llmsFullTxt, /Truncated after 5 MiB/), add an assertion that
Buffer.byteLength(llmsFullTxt, 'utf8') is less than or equal to 5 * 1024 * 1024
(5 MiB) to prevent regressions; update the test that uses llmsFullTxt in
ghost/core/test/unit/frontend/services/llms/service.test.js to include this
byte-length check so the test fails if the payload exceeds the hard cap.

---

Duplicate comments:
In `@ghost/core/core/frontend/services/llms/markdown.js`:
- Around line 48-49: The current logic computes resourcePath from
pathname.slice(0, -3) and then forces a trailing slash, which turns '/index.md'
into '/index/'; change the post-processing to map the special case '/index' back
to '/' (i.e. if resourcePath === '/index' return '/'), otherwise keep the
existing trailing-slash behavior; update the code around the resourcePath
computation and the return expression that currently uses
resourcePath.endsWith('/') to handle this '/index' special case.
- Around line 69-75: The current code calls
collapseWhitespace(nhm.translate(html || '')) which removes all newlines and
flattens block-level Markdown (headings, lists, paragraphs); change the call so
whitespace collapsing preserves newline boundaries (e.g., use a variant/option
of collapseWhitespace that does not remove \n or introduce a helper that only
collapses horizontal whitespace but keeps \n), keeping the rest of the flow
(variable markdown, null check, and the final replace(/\n{3,}/g,
'\n\n').trim()). Update the usage of collapseWhitespace in this function to a
newline-preserving variant so block structure produced by nhm.translate is
retained.

In `@ghost/core/core/frontend/services/routing/controllers/entry.js`:
- Around line 74-76: The current code returns next() when markdownEntry is
falsy, which causes valid requests to be treated as 404s; instead, remove the
early return and invoke the existing HTML renderer fallback for the same entry
path (i.e., when markdownEntry is not found call the project's HTML renderer
rather than next()). Locate the check using markdownEntry and replace the
"return next();" with a call into the existing HTML rendering path used
elsewhere in this controller (the HTML renderer function/handler already used
for non-markdown responses), passing the same req, res, and next so content
negotiation falls back to HTML.

---

Nitpick comments:
In `@ghost/core/core/frontend/services/llms/service.js`:
- Around line 111-114: The current parallel fetch (const [pages, posts] = await
Promise.all([...])) in `#buildLlmsFullTxt` triggers fetching posts even when pages
already hit the export/truncation budget; change the logic to fetch pages first
via this.#fetchPublicEntries('page'), inspect the pages result for
truncation/budget exhaustion (the same check used later in the function), and
only call this.#fetchPublicEntries('post') if pages did not already consume the
full export budget; apply the same sequential-fetch fix to the other occurrence
noted (the block around lines 129–133) so posts are only fetched when needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f7de417e-e6d2-452e-880a-14a562ee5829

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff9b66 and 35b8909.

⛔ Files ignored due to path filters (2)
  • ghost/core/test/e2e-api/admin/__snapshots__/settings.test.js.snap is excluded by !**/*.snap
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (35)
  • apps/admin-x-framework/src/test/msw-utils.ts
  • apps/admin-x-framework/src/test/responses/settings.json
  • apps/admin-x-settings/src/components/settings/general/general-settings.tsx
  • apps/admin-x-settings/src/components/settings/general/seo-meta.tsx
  • apps/admin-x-settings/test/acceptance/general/seometa.test.ts
  • apps/admin-x-settings/test/acceptance/search.test.ts
  • ghost/core/core/frontend/services/llms/handler.js
  • ghost/core/core/frontend/services/llms/markdown.js
  • ghost/core/core/frontend/services/llms/service.js
  • ghost/core/core/frontend/services/routing/controllers/entry.js
  • ghost/core/core/frontend/web/middleware/index.js
  • ghost/core/core/frontend/web/middleware/llms-discovery.js
  • ghost/core/core/frontend/web/middleware/static-theme.js
  • ghost/core/core/frontend/web/site.js
  • ghost/core/core/server/api/endpoints/utils/serializers/input/settings.js
  • ghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-group-mapper.js
  • ghost/core/core/server/api/endpoints/utils/serializers/input/utils/settings-key-type-mapper.js
  • ghost/core/core/server/data/migrations/versions/6.40/2026-04-14-22-07-44-add-llms-setting.js
  • ghost/core/core/server/data/schema/default-settings/default-settings.json
  • ghost/core/core/server/web/shared/middleware/pretty-urls.js
  • ghost/core/core/shared/config/defaults.json
  • ghost/core/core/shared/settings-cache/cache-manager.js
  • ghost/core/package.json
  • ghost/core/test/e2e-api/admin/settings.test.js
  • ghost/core/test/e2e-frontend/llms.test.js
  • ghost/core/test/legacy/models/model-settings.test.js
  • ghost/core/test/unit/frontend/services/llms/handler.test.js
  • ghost/core/test/unit/frontend/services/llms/service.test.js
  • ghost/core/test/unit/frontend/services/routing/controllers/entry.test.js
  • ghost/core/test/unit/frontend/web/middleware/llms-discovery.test.js
  • ghost/core/test/unit/frontend/web/middleware/static-theme.test.js
  • ghost/core/test/unit/server/data/exporter/index.test.js
  • ghost/core/test/unit/server/data/schema/integrity.test.js
  • ghost/core/test/unit/server/web/shared/middleware/pretty-urls.test.js
  • ghost/core/test/utils/fixtures/default-settings.json
✅ Files skipped from review due to trivial changes (5)
  • ghost/core/test/legacy/models/model-settings.test.js
  • ghost/core/test/unit/server/data/exporter/index.test.js
  • ghost/core/test/unit/server/data/schema/integrity.test.js
  • ghost/core/core/shared/settings-cache/cache-manager.js
  • apps/admin-x-framework/src/test/responses/settings.json

Comment thread ghost/core/test/unit/frontend/services/llms/service.test.js
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 78.45220% with 142 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.85%. Comparing base (7b87322) to head (8d87f2c).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
ghost/core/core/frontend/services/llms/markdown.js 71.18% 50 Missing and 1 partial ⚠️
ghost/core/core/frontend/services/llms/service.js 83.72% 42 Missing ⚠️
ghost/core/core/frontend/services/llms/handler.js 72.46% 37 Missing and 1 partial ⚠️
...ore/core/frontend/web/middleware/llms-discovery.js 81.57% 6 Missing and 1 partial ⚠️
...ore/frontend/services/routing/controllers/entry.js 83.33% 3 Missing ⚠️
...e/core/server/web/shared/middleware/pretty-urls.js 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #27984      +/-   ##
==========================================
+ Coverage   73.81%   73.85%   +0.03%     
==========================================
  Files        1523     1527       +4     
  Lines      128935   129593     +658     
  Branches    15479    15629     +150     
==========================================
+ Hits        95177    95708     +531     
- Misses      32799    32923     +124     
- Partials      959      962       +3     
Flag Coverage Δ
admin-tests 53.54% <ø> (ø)
e2e-tests 73.85% <78.45%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ghost/core/core/frontend/services/llms/markdown.js (1)

83-89: 💤 Low value

Consider defensive handling for invalid date values.

new Date(value).toISOString() throws RangeError if value is a non-falsy but invalid date string. Ghost data is typically validated, but a defensive wrapper would prevent unexpected crashes during markdown generation.

Proposed fix
 function formatIsoDate(value) {
     if (!value) {
         return null;
     }
 
-    return new Date(value).toISOString();
+    const date = new Date(value);
+    return isNaN(date.getTime()) ? null : date.toISOString();
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/frontend/services/llms/markdown.js` around lines 83 - 89, The
function formatIsoDate currently calls new Date(value).toISOString() which will
throw for non-falsy but invalid date values; update formatIsoDate to defensively
construct a Date object, verify it's valid (e.g., using isNaN(date.getTime()) or
Number.isFinite(date.getTime())), and only call toISOString() when valid,
returning null (or a safe fallback) when the date is invalid or unparsable;
reference the formatIsoDate function and its return path so the change is
localized and avoids throwing during markdown generation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ghost/core/core/frontend/services/llms/markdown.js`:
- Around line 83-89: The function formatIsoDate currently calls new
Date(value).toISOString() which will throw for non-falsy but invalid date
values; update formatIsoDate to defensively construct a Date object, verify it's
valid (e.g., using isNaN(date.getTime()) or Number.isFinite(date.getTime())),
and only call toISOString() when valid, returning null (or a safe fallback) when
the date is invalid or unparsable; reference the formatIsoDate function and its
return path so the change is localized and avoids throwing during markdown
generation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ab175cb-86dc-43c1-9288-644354929ff3

📥 Commits

Reviewing files that changed from the base of the PR and between 35b8909 and 8d87f2c.

⛔ Files ignored due to path filters (2)
  • ghost/core/test/e2e-api/admin/__snapshots__/settings.test.js.snap is excluded by !**/*.snap
  • ghost/core/test/e2e-api/members/__snapshots__/well-known.test.js.snap is excluded by !**/*.snap
📒 Files selected for processing (4)
  • ghost/core/core/frontend/services/llms/markdown.js
  • ghost/core/core/frontend/services/llms/service.js
  • ghost/core/core/frontend/services/routing/controllers/entry.js
  • ghost/core/core/frontend/web/middleware/llms-discovery.js

@ErisDS ErisDS mentioned this pull request May 20, 2026
4 tasks
JohnONolan and others added 14 commits May 20, 2026 18:30
Ghost now exposes llms.txt and per-entry markdown for public content, with a matching admin setting so sites can disable the feature without theme overrides.

The setting defaults to on, redirects llms-specific URLs back to canonical HTML when disabled, and keeps the markdown export subdirectory-aware and strictly public-only.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The stub returned false for every settingsCache.get() call, which
made llms_enabled === false and caused the middleware to bail out
before adding discovery headers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getByLabel('Search') resolved to 2 elements because both the sidebar
search input and the SEO Meta 'Search' tab matched. Use
getByPlaceholder('Search settings') for an unambiguous selector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
routerManager was removed from the entry controller on main. The
markdown content negotiation test no longer needs it — the controller
reads resourceType from res.routerOptions directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The llms-discovery middleware adds Link and X-Llms-Txt headers to all
siteApp responses, including the /members/.well-known/jwks.json endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Snapshot template literals need double-backslash-quote for literal escaped
quotes in serialized values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getMarkdownPath("/") produces "/index.md", but the reverse
getResourcePathFromMarkdownPath("/index.md") returned "/index/"
instead of "/". Now handles the /index.md special case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
collapseWhitespace replaces all whitespace sequences with a single space,
which flattens markdown block structure (lists, code blocks, paragraphs)
into one line. The node-html-markdown output already has proper structure;
only excessive blank lines need normalizing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The truncation note was appended after the size check, so the final
output could exceed 5 MiB. Now the budget accounts for the footer
size upfront.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When fetchPublicEntry returns null during content negotiation, the
controller called next() which skipped the resolved entry and could 404.
Now falls through to normal HTML rendering instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
appendHeaderValue checked array elements with includes(), but a single
header string like "a, b" is one element and would not match "b". Now
splits comma-delimited values before checking for duplicates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… entry

The llms_enabled migration now lives in 6.40 (from merged PR #27995).
Removed the old 6.31 migration and a duplicate snapshot entry left
by the rebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ErisDS ErisDS force-pushed the codex/llms-setting-pr branch from 8d87f2c to 3e154ee Compare May 20, 2026 17:31
ErisDS and others added 4 commits May 20, 2026 19:01
… to async

Reverted admin UI toggle changes (general-settings keywords, seo-meta
toggle, seometa acceptance test) - these should ship in a follow-up PR.
Converted entry controller markdown test from done() callback to
async/await to match the vitest migration on main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The admin UI belongs in this PR since this is where the setting starts
controlling behavior (llms.txt generation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These were left over from the original feature commit - the setting is
already in the correct position from the merged PR #27995.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ErisDS and others added 2 commits May 21, 2026 13:45
…ng in cache-manager JSDoc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ErisDS added a commit that referenced this pull request May 21, 2026
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.

Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.

Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
  createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)

Ref #27984

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ErisDS
Copy link
Copy Markdown
Member Author

ErisDS commented May 22, 2026

Closing in favour of #28042

@ErisDS ErisDS closed this May 22, 2026
ErisDS added a commit that referenced this pull request May 27, 2026
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.

Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.

Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
  createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)

Ref #27984

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ErisDS added a commit that referenced this pull request May 27, 2026
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.

Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.

Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
  createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)

Ref #27984

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ErisDS added a commit that referenced this pull request May 27, 2026
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.

Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.

Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
  createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)

Ref #27984

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ErisDS added a commit that referenced this pull request May 28, 2026
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.

Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.

Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
  createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)

Ref #27984

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ErisDS added a commit that referenced this pull request May 28, 2026
Ghost is moving toward stateless, compute-on-request architecture.
This implements llms.txt/llms-full.txt generation, per-entry .md
export, Accept: text/markdown content negotiation, and discovery
headers using that approach — no in-memory cache, no event listeners,
full dependency injection via factory functions.

Gated behind a config flag ("llms": false by default) so the feature
can be tested before broad rollout. The llms_enabled setting (already
in main) acts as the user-facing toggle within an enabled deployment.

Key design decisions:
- Factory functions (createLlmsService, createLlmsHandler,
  createLlmsDiscovery) receive all dependencies explicitly
- Uses urlServiceFacade for lazy routing compatibility
- Paginated DB queries for llms-full.txt (100/batch, 5 MiB budget)
- Index queries exclude html column to reduce memory
- Cache-Control headers on all responses for CDN/proxy caching
- pretty-urls extension bypass scoped to .md and .txt only
- markdown.js is fully pure (no Ghost singleton imports)

Ref #27984

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

migration [pull request] Includes migration for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants