Include blog posts in llms discovery files#646
Conversation
Greptile SummaryThis PR adds public blog posts to the
Confidence Score: 4/5Safe to merge; changes are additive and isolated to the llms discovery file generation path with no runtime user-facing impact. The logic is straightforward and the tests cover the key assertions. The double-read pattern in web/src/lib/docs.ts — specifically the
|
| Filename | Overview |
|---|---|
| web/src/lib/docs.ts | Adds blog post inclusion to llms.txt/llms-full.txt; introduces normalizeBlogMarkdownForExport and renderBlogMarkdown; double file-read pattern in buildLlmsFull and a subtle regex ordering concern worth noting. |
| web/src/lib/docs.test.ts | Extends existing llms.txt and llms-full.txt test assertions to cover blog post URL, title, source header, and normalized links for the first blog post. |
| testing/codex-seo-llms-blog-posts.md | New test-contract document for the blog-posts-in-llms feature; captures unit/lint/build/smoke and manual cURL verification steps. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["content/blog/*.mdx"] -->|getAllPosts + getPostBySlug| B["BlogPostWithContent\nslug, title, date, author, description, content"]
B --> C["buildLlmsIndex"]
B --> D["buildLlmsFull"]
C -->|"TOC bullet per post"| E["/llms.txt\n## Blog posts section"]
D -->|"TOC bullet per post"| F["/llms-full.txt\n## Blog posts section"]
D -->|"renderBlogMarkdown"| G["normalizeBlogMarkdownForExport"]
G -->|"Step 1: normalizeMarkdownForExport\nCallout to blockquote\n/docs/* to full docs-md URL"| H["Intermediate content"]
H -->|"Step 2: remaining abs paths to full URL"| I["Normalized post body"]
I --> F
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
web/src/lib/docs.ts:1640-1642
`buildLlmsFull` fetches metadata for every blog post via `getAllPosts()` and then immediately calls `getPostBySlug(post.slug)` for each one, reading every file twice. `getPostBySlug` already returns the full `BlogPostWithContent` (metadata + body), so `getAllPosts()` can be replaced with a direct map over `getAllSlugs()` to avoid the redundant reads.
```suggestion
const blogPosts = getAllSlugs()
.map((slug) => getPostBySlug(slug))
.filter((post): post is BlogPostWithContent => Boolean(post));
```
### Issue 2 of 2
web/src/lib/docs.ts:1436-1440
**Link-expansion regex runs over code-block content**
`normalizeBlogMarkdownForExport` applies `/\]\((\/[^)\s]*)\)/g` to the raw string result of `normalizeMarkdownForExport`, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., `](` `/platform/foo` `)` — the regex will rewrite it to a full URL, corrupting the code sample in `llms-full.txt`. The same latent issue exists in `normalizeMarkdownForExport` for the `/docs/` replacement, but the broader `\/[^)\s]*` pattern in the new function catches a wider set of paths.
Reviews (1): Last reviewed commit: "fix(seo): include blog posts in llms fil..." | Re-trigger Greptile
| const blogPosts = getAllPosts() | ||
| .map((post) => getPostBySlug(post.slug)) | ||
| .filter((post): post is BlogPostWithContent => Boolean(post)); |
There was a problem hiding this comment.
buildLlmsFull fetches metadata for every blog post via getAllPosts() and then immediately calls getPostBySlug(post.slug) for each one, reading every file twice. getPostBySlug already returns the full BlogPostWithContent (metadata + body), so getAllPosts() can be replaced with a direct map over getAllSlugs() to avoid the redundant reads.
| const blogPosts = getAllPosts() | |
| .map((post) => getPostBySlug(post.slug)) | |
| .filter((post): post is BlogPostWithContent => Boolean(post)); | |
| const blogPosts = getAllSlugs() | |
| .map((slug) => getPostBySlug(slug)) | |
| .filter((post): post is BlogPostWithContent => Boolean(post)); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: web/src/lib/docs.ts
Line: 1640-1642
Comment:
`buildLlmsFull` fetches metadata for every blog post via `getAllPosts()` and then immediately calls `getPostBySlug(post.slug)` for each one, reading every file twice. `getPostBySlug` already returns the full `BlogPostWithContent` (metadata + body), so `getAllPosts()` can be replaced with a direct map over `getAllSlugs()` to avoid the redundant reads.
```suggestion
const blogPosts = getAllSlugs()
.map((slug) => getPostBySlug(slug))
.filter((post): post is BlogPostWithContent => Boolean(post));
```
How can I resolve this? If you propose a fix, please make it concise.| return normalizeMarkdownForExport(content, origin).replace( | ||
| /\]\((\/[^)\s]*)\)/g, | ||
| (_, href) => `](${origin}${href})`, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Link-expansion regex runs over code-block content
normalizeBlogMarkdownForExport applies /\]\((\/[^)\s]*)\)/g to the raw string result of normalizeMarkdownForExport, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., ]( /platform/foo ) — the regex will rewrite it to a full URL, corrupting the code sample in llms-full.txt. The same latent issue exists in normalizeMarkdownForExport for the /docs/ replacement, but the broader \/[^)\s]* pattern in the new function catches a wider set of paths.
Prompt To Fix With AI
This is a comment left during a code review.
Path: web/src/lib/docs.ts
Line: 1436-1440
Comment:
**Link-expansion regex runs over code-block content**
`normalizeBlogMarkdownForExport` applies `/\]\((\/[^)\s]*)\)/g` to the raw string result of `normalizeMarkdownForExport`, without skipping fenced or indented code blocks. If a future blog post contains a literal markdown link example inside a code block — e.g., `](` `/platform/foo` `)` — the regex will rewrite it to a full URL, corrupting the code sample in `llms-full.txt`. The same latent issue exists in `normalizeMarkdownForExport` for the `/docs/` replacement, but the broader `\/[^)\s]*` pattern in the new function catches a wider set of paths.
How can I resolve this? If you propose a fix, please make it concise.
Summary
/llms.txt/llms-full.txtwith source, publish metadata, and normalized linksReview
/reviewstyle pass and recommended shipping as-is.Tests
pnpm -C web test -- docs.test.tspnpm -C web lintpnpm -C web buildllms.txtandllms-full.txtsmoke checks for the blog URL/title and normalized linksScope Guardrails