Skip to content

Add llms.txt and llms-full.txt generation for LLM-friendly docs#920

Merged
ngnijland merged 9 commits intomasterfrom
claude/add-llms-txt-docs-CuGVb
Feb 18, 2026
Merged

Add llms.txt and llms-full.txt generation for LLM-friendly docs#920
ngnijland merged 9 commits intomasterfrom
claude/add-llms-txt-docs-CuGVb

Conversation

@ngnijland
Copy link
Copy Markdown
Collaborator

@ngnijland ngnijland commented Feb 11, 2026

Add a build script that generates llms.txt (lightweight index) and llms-full.txt (full documentation content) from the Starlight doc sources. These files follow the llms.txt specification, making the documentation easily consumable by LLMs and indexable by services like Context7 with minimal token usage.

  • llms.txt: structured index with title, description, and URL per page
  • llms-full.txt: all doc content as clean markdown (MDX/HTML stripped)
  • Runs automatically before each build via package.json scripts

claude and others added 6 commits February 11, 2026 15:22
Add a build script that generates llms.txt (lightweight index) and
llms-full.txt (full documentation content) from the Starlight doc
sources. These files follow the llms.txt specification, making the
documentation easily consumable by LLMs and indexable by services
like Context7 with minimal token usage.

- llms.txt: structured index with title, description, and URL per page
- llms-full.txt: all doc content as clean markdown (MDX/HTML stripped)
- Runs automatically before each build via package.json scripts

https://claude.ai/code/session_01Jj2MZELm7URFgydFbwwA8m
Replace the custom build script with the purpose-built
starlight-llms-txt plugin, which generates llms.txt, llms-full.txt,
and llms-small.txt from the rendered Starlight documentation at build
time. This makes the docs easily accessible for LLMs and indexable by
services like Context7 with minimal token usage.

- Remove production guard so Starlight builds docs in all environments
- Add starlight-llms-txt plugin with RocketSim project name/description
- Remove custom generate-llms-txt.mjs script (replaced by plugin)
- Revert package.json build script and .gitignore changes

https://claude.ai/code/session_01Jj2MZELm7URFgydFbwwA8m
@ngnijland ngnijland force-pushed the claude/add-llms-txt-docs-CuGVb branch from 98a6fe7 to bb2d730 Compare February 18, 2026 14:00
@ngnijland ngnijland requested a review from Copilot February 18, 2026 14:12
@ngnijland ngnijland marked this pull request as ready for review February 18, 2026 14:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds LLM-friendly documentation generation by integrating the starlight-llms-txt plugin and creating a custom post-processing integration. The implementation generates two text file variants (llms-full.txt and llms-small.txt) that follow the llms.txt specification, making documentation easily consumable by LLMs and indexable by services like Context7.

Changes:

  • Integrated starlight-llms-txt plugin (v0.7.0) to generate base llms documentation files
  • Created custom post-processing integration to clean and transform generated content (removes home/404 pages, converts JSX components to markdown links, handles directives)
  • Fixed escaped markdown formatting in multiple documentation files
  • Added missing status bar image file and corrected broken image reference

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/astro.config.ts Added starlight-llms-txt plugin configuration and custom post-process integration
docs/src/integrations/llms-txt-post-process.ts New integration that post-processes generated llms files with content transformations and cleanup
docs/package.json Added starlight-llms-txt dependency (v0.7.0)
docs/package-lock.json Dependency lock updates for starlight-llms-txt and its transitive dependencies
docs/src/content/docs/docs/features/app-actions/user-defaults-editor.mdx Fixed escaped markdown formatting in link text
docs/src/content/docs/docs/features/app-actions/network-speed-control-and-simulator-airplane-mode.mdx Fixed escaped markdown formatting in link text
docs/src/content/docs/docs/features/app-actions/general-app-actions.mdx Fixed escaped markdown formatting in multiple link texts
docs/src/content/docs/docs/features/capturing/statusbar-appearance.md Fixed broken image reference
docs/src/content/docs/docs/features/capturing/statusbar-appearance/status_bar_override_9_41-1024x416.jpg Added missing status bar screenshot image
docs/src/styles/starlight-custom.css Increased search modal max-width from 40rem to 45rem
Files not reviewed (1)
  • docs/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/^:::(tip|note|caution|danger)\n([\s\S]*?)^:::/gm,
(_match, type: string, content: string) => {
const label = type.charAt(0).toUpperCase() + type.slice(1);
const quoted = content
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern uses ^::: anchors which require the delimiters to be at the start of a line. However, the multiline flag m is used, and the pattern [\s\S]*? in the middle is non-greedy. If the closing ::: is not at the start of a line (e.g., indented or has trailing content on the same line), this pattern won't match correctly. Consider whether the closing delimiter should strictly be at the start of a line, or if the pattern should be adjusted to handle cases where it might be indented or have other content.

Copilot uses AI. Check for mistakes.
border-radius: 8px;
background: var(--sl-color-gray-6);
max-width: 40rem;
max-width: 45rem;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CSS change increases the max-width from 40rem to 45rem, but it's not mentioned in the PR description and doesn't seem related to the llms.txt generation feature. If this is an intentional improvement to the search modal width, it should be mentioned in the PR description. If it's an accidental change, consider reverting it or creating a separate PR for this UI adjustment.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +57
// Filter out unwanted pages
const filteredPages = pages.filter((page) => {
const titleMatch = page.match(/^# (.+)/);
if (!titleMatch) return true;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split regex uses a positive lookahead (?=^# ) with the multiline flag m, which splits on positions just before lines starting with "# ". However, this could potentially create empty strings in the array if there are consecutive matches. While the filter on line 58 checks for titleMatch, if a page section doesn't have a title (titleMatch is null), it's still kept (returns true on line 60). Consider whether pages without titles should be kept or filtered out, and add explicit handling for edge cases like empty sections or malformed content.

Suggested change
// Filter out unwanted pages
const filteredPages = pages.filter((page) => {
const titleMatch = page.match(/^# (.+)/);
if (!titleMatch) return true;
// Filter out unwanted pages and empty/malformed sections
const filteredPages = pages.filter((page) => {
const trimmed = page.trim();
// Drop empty or whitespace-only sections (can be created by the split regex)
if (!trimmed) return false;
const titleMatch = page.match(/^# (.+)/);
if (!titleMatch) {
// Keep non-empty sections without an explicit "# " title
return true;
}

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +111
const quoted = content
.trimEnd()
.split("\n")
.map((line) => `> ${line}`)
.join("\n");
return `> **${label}:** ${quoted.slice(2)}`;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blockquote generation logic on line 114 uses quoted.slice(2) to remove the first "> " from the quoted string. However, this assumes that the quoted string always starts with "> ", which is true because line 112 maps each line to add the "> " prefix. But the logic is a bit confusing because it adds "> " to all lines including the first one, then slices off the first 2 characters. Consider refactoring to make the intent clearer, such as handling the first line separately or using a more explicit approach like: > **${label}:** concatenated with the content lines properly quoted.

Suggested change
const quoted = content
.trimEnd()
.split("\n")
.map((line) => `> ${line}`)
.join("\n");
return `> **${label}:** ${quoted.slice(2)}`;
const lines = content.trimEnd().split("\n");
const [firstLine, ...restLines] = lines;
const quotedLines: string[] = [];
if (firstLine !== undefined) {
quotedLines.push(`> **${label}:** ${firstLine}`);
}
for (const line of restLines) {
quotedLines.push(`> ${line}`);
}
return quotedLines.join("\n");

Copilot uses AI. Check for mistakes.
Comment thread docs/astro.config.ts
Comment on lines +46 to +53
plugins: [
starlightLlmsTxt({
projectName: "RocketSim",
description:
"RocketSim enhances the iOS Simulator with features for capturing screenshots and recordings, comparing designs, testing push notifications, deep links, location simulation, network speed control, accessibility toggles, and more.",
rawContent: true,
}),
],
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions generating "llms.txt (lightweight index)" but the actual implementation generates "llms-small.txt" and "llms-full.txt" files. Consider updating either the PR description to match the implementation, or ensure that the starlight-llms-txt plugin actually generates a file named "llms.txt" as mentioned in the description. The discrepancy could confuse users trying to understand what files are being generated.

Copilot uses AI. Check for mistakes.
result = result.replace(/<Tweet[\s\S]*?\/>/g, (match) => {
const idMatch = match.match(/id="([^"]+)"/);
const id = idMatch?.[1];
return id ? `https://x.com/x/status/${id}` : match;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated X/Twitter link URL appears to be incorrect. The pattern "https://x.com/x/status/${id}" includes an extra "/x/" in the path. X/Twitter status URLs should follow the format "https://x.com/[username]/status/[id]" or simply "https://x.com/i/web/status/[id]". The current implementation will generate invalid URLs like "https://x.com/x/status/123456" which won't work. Consider using a format like "https://x.com/i/web/status/${id}" instead, or extract the username from the Tweet component if available.

Suggested change
return id ? `https://x.com/x/status/${id}` : match;
return id ? `https://x.com/i/web/status/${id}` : match;

Copilot uses AI. Check for mistakes.
Comment thread docs/src/integrations/llms-txt-post-process.ts
@ngnijland ngnijland merged commit 706a5d4 into master Feb 18, 2026
1 check passed
@ngnijland ngnijland deleted the claude/add-llms-txt-docs-CuGVb branch February 18, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants