Skip to content

Include blog posts in llms discovery files#646

Merged
Atharva-Kanherkar merged 2 commits into
mainfrom
codex/seo-llms-blog-posts
May 7, 2026
Merged

Include blog posts in llms discovery files#646
Atharva-Kanherkar merged 2 commits into
mainfrom
codex/seo-llms-blog-posts

Conversation

@Atharva-Kanherkar
Copy link
Copy Markdown
Collaborator

Summary

  • add public blog post links to /llms.txt
  • add blog markdown bodies to /llms-full.txt with source, publish metadata, and normalized links
  • preserve existing docs, agent skills, and platform page coverage

Review

  • Cursor Agent ran a gstack /review style pass and recommended shipping as-is.

Tests

  • pnpm -C web test -- docs.test.ts
  • pnpm -C web lint
  • pnpm -C web build
  • built llms.txt and llms-full.txt smoke checks for the blog URL/title and normalized links

Scope Guardrails

  • no homepage/main landing UI changes
  • no shared marketing footer changes
  • no authenticated/internal UI changes
  • no DB or destructive changes

@Atharva-Kanherkar Atharva-Kanherkar merged commit a42d405 into main May 7, 2026
3 checks passed
@Atharva-Kanherkar Atharva-Kanherkar deleted the codex/seo-llms-blog-posts branch May 7, 2026 20:43
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR adds public blog posts to the /llms.txt index and /llms-full.txt bundle by integrating the existing blog module into docs.ts. A new renderBlogMarkdown helper formats each post with its title, description, source URL, publish date, author, and link-normalized content, while normalizeBlogMarkdownForExport extends the existing docs normalization to handle non-/docs absolute paths in blog content.

  • buildLlmsIndex gains a ## Blog posts TOC section listing every post returned by getAllPosts().
  • buildLlmsFull appends the same TOC section and then embeds the full rendered markdown of each post, ordered before the existing docs sections.
  • Tests are updated to assert that a known blog post's URL, title, source header, and normalized internal links are present in both outputs.

Confidence Score: 4/5

Safe to merge; changes are additive and isolated to the llms discovery file generation path with no runtime user-facing impact.

The logic is straightforward and the tests cover the key assertions. The double-read pattern in buildLlmsFull is wasteful but harmless at build time, and the link-expansion regex running over raw content including code blocks is a latent concern that won't affect current posts.

web/src/lib/docs.ts — specifically the buildLlmsFull blog post fetch and the normalizeBlogMarkdownForExport regex scope.

Important Files Changed

Filename Overview
web/src/lib/docs.ts Adds blog post inclusion to llms.txt/llms-full.txt; introduces normalizeBlogMarkdownForExport and renderBlogMarkdown; double file-read pattern in buildLlmsFull and a subtle regex ordering concern worth noting.
web/src/lib/docs.test.ts Extends existing llms.txt and llms-full.txt test assertions to cover blog post URL, title, source header, and normalized links for the first blog post.
testing/codex-seo-llms-blog-posts.md New test-contract document for the blog-posts-in-llms feature; captures unit/lint/build/smoke and manual cURL verification steps.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["content/blog/*.mdx"] -->|getAllPosts + getPostBySlug| B["BlogPostWithContent\nslug, title, date, author, description, content"]

    B --> C["buildLlmsIndex"]
    B --> D["buildLlmsFull"]

    C -->|"TOC bullet per post"| E["/llms.txt\n## Blog posts section"]

    D -->|"TOC bullet per post"| F["/llms-full.txt\n## Blog posts section"]
    D -->|"renderBlogMarkdown"| G["normalizeBlogMarkdownForExport"]
    G -->|"Step 1: normalizeMarkdownForExport\nCallout to blockquote\n/docs/* to full docs-md URL"| H["Intermediate content"]
    H -->|"Step 2: remaining abs paths to full URL"| I["Normalized post body"]
    I --> F
Loading

Fix All in Codex

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
web/src/lib/docs.ts:1640-1642
`buildLlmsFull` fetches metadata for every blog post via `getAllPosts()` and then immediately calls `getPostBySlug(post.slug)` for each one, reading every file twice. `getPostBySlug` already returns the full `BlogPostWithContent` (metadata + body), so `getAllPosts()` can be replaced with a direct map over `getAllSlugs()` to avoid the redundant reads.

```suggestion
  const blogPosts = getAllSlugs()
    .map((slug) => getPostBySlug(slug))
    .filter((post): post is BlogPostWithContent => Boolean(post));
```

### Issue 2 of 2
web/src/lib/docs.ts:1436-1440
**Link-expansion regex runs over code-block content**

`normalizeBlogMarkdownForExport` applies `/\]\((\/[^)\s]*)\)/g` to the raw string result of `normalizeMarkdownForExport`, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., `](` `/platform/foo` `)` — the regex will rewrite it to a full URL, corrupting the code sample in `llms-full.txt`. The same latent issue exists in `normalizeMarkdownForExport` for the `/docs/` replacement, but the broader `\/[^)\s]*` pattern in the new function catches a wider set of paths.

Reviews (1): Last reviewed commit: "fix(seo): include blog posts in llms fil..." | Re-trigger Greptile

Comment thread web/src/lib/docs.ts
Comment on lines +1640 to +1642
const blogPosts = getAllPosts()
.map((post) => getPostBySlug(post.slug))
.filter((post): post is BlogPostWithContent => Boolean(post));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 buildLlmsFull fetches metadata for every blog post via getAllPosts() and then immediately calls getPostBySlug(post.slug) for each one, reading every file twice. getPostBySlug already returns the full BlogPostWithContent (metadata + body), so getAllPosts() can be replaced with a direct map over getAllSlugs() to avoid the redundant reads.

Suggested change
const blogPosts = getAllPosts()
.map((post) => getPostBySlug(post.slug))
.filter((post): post is BlogPostWithContent => Boolean(post));
const blogPosts = getAllSlugs()
.map((slug) => getPostBySlug(slug))
.filter((post): post is BlogPostWithContent => Boolean(post));
Prompt To Fix With AI
This is a comment left during a code review.
Path: web/src/lib/docs.ts
Line: 1640-1642

Comment:
`buildLlmsFull` fetches metadata for every blog post via `getAllPosts()` and then immediately calls `getPostBySlug(post.slug)` for each one, reading every file twice. `getPostBySlug` already returns the full `BlogPostWithContent` (metadata + body), so `getAllPosts()` can be replaced with a direct map over `getAllSlugs()` to avoid the redundant reads.

```suggestion
  const blogPosts = getAllSlugs()
    .map((slug) => getPostBySlug(slug))
    .filter((post): post is BlogPostWithContent => Boolean(post));
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment thread web/src/lib/docs.ts
Comment on lines +1436 to +1440
return normalizeMarkdownForExport(content, origin).replace(
/\]\((\/[^)\s]*)\)/g,
(_, href) => `](${origin}${href})`,
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Link-expansion regex runs over code-block content

normalizeBlogMarkdownForExport applies /\]\((\/[^)\s]*)\)/g to the raw string result of normalizeMarkdownForExport, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., ]( /platform/foo ) — the regex will rewrite it to a full URL, corrupting the code sample in llms-full.txt. The same latent issue exists in normalizeMarkdownForExport for the /docs/ replacement, but the broader \/[^)\s]* pattern in the new function catches a wider set of paths.

Prompt To Fix With AI
This is a comment left during a code review.
Path: web/src/lib/docs.ts
Line: 1436-1440

Comment:
**Link-expansion regex runs over code-block content**

`normalizeBlogMarkdownForExport` applies `/\]\((\/[^)\s]*)\)/g` to the raw string result of `normalizeMarkdownForExport`, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., `](` `/platform/foo` `)` — the regex will rewrite it to a full URL, corrupting the code sample in `llms-full.txt`. The same latent issue exists in `normalizeMarkdownForExport` for the `/docs/` replacement, but the broader `\/[^)\s]*` pattern in the new function catches a wider set of paths.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant