Include blog posts in llms discovery files by Atharva-Kanherkar · Pull Request #646 · agentclash/agentclash

Atharva-Kanherkar · 2026-05-07T20:40:30Z

Summary

add public blog post links to /llms.txt
add blog markdown bodies to /llms-full.txt with source, publish metadata, and normalized links
preserve existing docs, agent skills, and platform page coverage

Review

Cursor Agent ran a gstack /review style pass and recommended shipping as-is.

Tests

pnpm -C web test -- docs.test.ts
pnpm -C web lint
pnpm -C web build
built llms.txt and llms-full.txt smoke checks for the blog URL/title and normalized links

Scope Guardrails

no homepage/main landing UI changes
no shared marketing footer changes
no authenticated/internal UI changes
no DB or destructive changes

greptile-apps · 2026-05-07T20:45:26Z

Greptile Summary

This PR adds public blog posts to the /llms.txt index and /llms-full.txt bundle by integrating the existing blog module into docs.ts. A new renderBlogMarkdown helper formats each post with its title, description, source URL, publish date, author, and link-normalized content, while normalizeBlogMarkdownForExport extends the existing docs normalization to handle non-/docs absolute paths in blog content.

buildLlmsIndex gains a ## Blog posts TOC section listing every post returned by getAllPosts().
buildLlmsFull appends the same TOC section and then embeds the full rendered markdown of each post, ordered before the existing docs sections.
Tests are updated to assert that a known blog post's URL, title, source header, and normalized internal links are present in both outputs.

Confidence Score: 4/5

Safe to merge; changes are additive and isolated to the llms discovery file generation path with no runtime user-facing impact.

The logic is straightforward and the tests cover the key assertions. The double-read pattern in buildLlmsFull is wasteful but harmless at build time, and the link-expansion regex running over raw content including code blocks is a latent concern that won't affect current posts.

web/src/lib/docs.ts — specifically the buildLlmsFull blog post fetch and the normalizeBlogMarkdownForExport regex scope.

Important Files Changed

Filename	Overview
web/src/lib/docs.ts	Adds blog post inclusion to llms.txt/llms-full.txt; introduces `normalizeBlogMarkdownForExport` and `renderBlogMarkdown`; double file-read pattern in `buildLlmsFull` and a subtle regex ordering concern worth noting.
web/src/lib/docs.test.ts	Extends existing llms.txt and llms-full.txt test assertions to cover blog post URL, title, source header, and normalized links for the first blog post.
testing/codex-seo-llms-blog-posts.md	New test-contract document for the blog-posts-in-llms feature; captures unit/lint/build/smoke and manual cURL verification steps.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["content/blog/*.mdx"] -->|getAllPosts + getPostBySlug| B["BlogPostWithContent\nslug, title, date, author, description, content"]

    B --> C["buildLlmsIndex"]
    B --> D["buildLlmsFull"]

    C -->|"TOC bullet per post"| E["/llms.txt\n## Blog posts section"]

    D -->|"TOC bullet per post"| F["/llms-full.txt\n## Blog posts section"]
    D -->|"renderBlogMarkdown"| G["normalizeBlogMarkdownForExport"]
    G -->|"Step 1: normalizeMarkdownForExport\nCallout to blockquote\n/docs/* to full docs-md URL"| H["Intermediate content"]
    H -->|"Step 2: remaining abs paths to full URL"| I["Normalized post body"]
    I --> F

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
web/src/lib/docs.ts:1640-1642
`buildLlmsFull` fetches metadata for every blog post via `getAllPosts()` and then immediately calls `getPostBySlug(post.slug)` for each one, reading every file twice. `getPostBySlug` already returns the full `BlogPostWithContent` (metadata + body), so `getAllPosts()` can be replaced with a direct map over `getAllSlugs()` to avoid the redundant reads.

```suggestion
  const blogPosts = getAllSlugs()
    .map((slug) => getPostBySlug(slug))
    .filter((post): post is BlogPostWithContent => Boolean(post));
```

### Issue 2 of 2
web/src/lib/docs.ts:1436-1440
**Link-expansion regex runs over code-block content**

`normalizeBlogMarkdownForExport` applies `/\]\((\/[^)\s]*)\)/g` to the raw string result of `normalizeMarkdownForExport`, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., `](` `/platform/foo` `)` — the regex will rewrite it to a full URL, corrupting the code sample in `llms-full.txt`. The same latent issue exists in `normalizeMarkdownForExport` for the `/docs/` replacement, but the broader `\/[^)\s]*` pattern in the new function catches a wider set of paths.

_{Reviews (1): Last reviewed commit: "fix(seo): include blog posts in llms fil..." | Re-trigger Greptile}

greptile-apps · 2026-05-07T20:45:30Z

+  const blogPosts = getAllPosts()
+    .map((post) => getPostBySlug(post.slug))
+    .filter((post): post is BlogPostWithContent => Boolean(post));


buildLlmsFull fetches metadata for every blog post via getAllPosts() and then immediately calls getPostBySlug(post.slug) for each one, reading every file twice. getPostBySlug already returns the full BlogPostWithContent (metadata + body), so getAllPosts() can be replaced with a direct map over getAllSlugs() to avoid the redundant reads.

Suggested change

const blogPosts = getAllPosts()

.map((post) => getPostBySlug(post.slug))

.filter((post): post is BlogPostWithContent => Boolean(post));

const blogPosts = getAllSlugs()

.map((slug) => getPostBySlug(slug))

.filter((post): post is BlogPostWithContent => Boolean(post));

Prompt To Fix With AI

This is a comment left during a code review. Path: web/src/lib/docs.ts Line: 1640-1642 Comment: `buildLlmsFull` fetches metadata for every blog post via `getAllPosts()` and then immediately calls `getPostBySlug(post.slug)` for each one, reading every file twice. `getPostBySlug` already returns the full `BlogPostWithContent` (metadata + body), so `getAllPosts()` can be replaced with a direct map over `getAllSlugs()` to avoid the redundant reads. ```suggestion const blogPosts = getAllSlugs() .map((slug) => getPostBySlug(slug)) .filter((post): post is BlogPostWithContent => Boolean(post)); ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-07T20:45:30Z

+  return normalizeMarkdownForExport(content, origin).replace(
+    /\]\((\/[^)\s]*)\)/g,
+    (_, href) => `](${origin}${href})`,
+  );
+}


Link-expansion regex runs over code-block content

normalizeBlogMarkdownForExport applies /\]\((\/[^)\s]*)\)/g to the raw string result of normalizeMarkdownForExport, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., ]( /platform/foo ) — the regex will rewrite it to a full URL, corrupting the code sample in llms-full.txt. The same latent issue exists in normalizeMarkdownForExport for the /docs/ replacement, but the broader \/[^)\s]* pattern in the new function catches a wider set of paths.

Prompt To Fix With AI

This is a comment left during a code review. Path: web/src/lib/docs.ts Line: 1436-1440 Comment: **Link-expansion regex runs over code-block content** `normalizeBlogMarkdownForExport` applies `/\]\((\/[^)\s]*)\)/g` to the raw string result of `normalizeMarkdownForExport`, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., `](` `/platform/foo` `)` — the regex will rewrite it to a full URL, corrupting the code sample in `llms-full.txt`. The same latent issue exists in `normalizeMarkdownForExport` for the `/docs/` replacement, but the broader `\/[^)\s]*` pattern in the new function catches a wider set of paths. How can I resolve this? If you propose a fix, please make it concise.

Atharva-Kanherkar added 2 commits May 8, 2026 02:05

test(seo): lock llms blog discovery contract

a40921e

fix(seo): include blog posts in llms files

1e4baf3

Atharva-Kanherkar merged commit a42d405 into main May 7, 2026
3 checks passed

Atharva-Kanherkar deleted the codex/seo-llms-blog-posts branch May 7, 2026 20:43

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include blog posts in llms discovery files#646

Include blog posts in llms discovery files#646
Atharva-Kanherkar merged 2 commits into
mainfrom
codex/seo-llms-blog-posts

Atharva-Kanherkar commented May 7, 2026

Uh oh!

Uh oh!

greptile-apps Bot commented May 7, 2026

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 7, 2026

Uh oh!

greptile-apps Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Atharva-Kanherkar commented May 7, 2026

Summary

Review

Tests

Scope Guardrails

Uh oh!

Uh oh!

greptile-apps Bot commented May 7, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant