Skip to content

feat: MCP Server, llms.txt, adapter scaffold CLI#20

Merged
Digidai merged 9 commits intomainfrom
pr2/mcp-llmstxt-cli
Mar 26, 2026
Merged

feat: MCP Server, llms.txt, adapter scaffold CLI#20
Digidai merged 9 commits intomainfrom
pr2/mcp-llmstxt-cli

Conversation

@Digidai
Copy link
Owner

@Digidai Digidai commented Mar 25, 2026

Summary

  • MCP Server (packages/mcp/): convert_url tool for Claude Desktop, Cursor, and other MCP clients. npm workspaces configured for monorepo publishing.
  • llms.txt (src/handlers/llms-txt.ts): GET /llms.txt and /.well-known/llms.txt serve API description for AI discoverability. fetchTargetLlmsTxt() with KV cache (24h TTL, negative caching).
  • Adapter scaffold CLI (scripts/create-adapter.ts): Generates adapter + test boilerplate. npx ts-node scripts/create-adapter.ts <name> <url-pattern>
  • New tests: 36 new tests (17 MCP + 8 llms.txt + 4 formatOutput + 5 CLI + 2 import chain)

Test Coverage

  • 563 tests pass (up from 527 in PR1)
  • 3 pre-existing failures unchanged (idempotency key, deepcrawl checkpoint, stale task)
  • tsc --noEmit passes clean

Files Changed

Area Files Lines
MCP Server 6 new files +503
llms.txt 2 new files + index.ts routes +310
Adapter CLI 2 new files +243
Tests 5 new test files +547

Test plan

  • All 563 tests pass (vitest run)
  • TypeScript compiles clean (tsc --noEmit)
  • MCP Server: 17 tests cover convert_url tool, API errors, timeout, custom URL
  • llms.txt: 8 tests cover routes, KV cache hits/misses, negative caching, timeout
  • Adapter CLI: 5 tests cover generation, validation, duplicate detection
  • formatOutput: 4 tests cover all output formats
  • Import chain: 2 tests verify all 22 modules load without error

🤖 Generated with Claude Code

Digidai and others added 5 commits March 25, 2026 23:56
Split monolithic index.ts into 14 focused modules:
- src/runtime-state.ts: shared runtime counters and rate limit maps
- src/middleware/auth.ts: Bearer token authentication
- src/middleware/rate-limit.ts: IP and distributed rate limiting
- src/helpers/crypto.ts: sha256, stableStringify
- src/helpers/format.ts: formatOutput() consolidating 4x DRY violation
- src/helpers/response.ts: CSP constants, error responses
- src/handlers/convert.ts: core convertUrl() split into 6 sub-functions
- src/handlers/stream.ts: SSE streaming
- src/handlers/health.ts: /api/health endpoint
- src/handlers/batch.ts: /api/batch endpoint
- src/handlers/extract.ts: /api/extract endpoint
- src/handlers/deepcrawl.ts: /api/deepcrawl endpoint
- src/handlers/jobs.ts: /api/jobs + JobCoordinator Durable Object
- src/handlers/og-image.ts: OG image generation

Key improvements:
- convertUrl() broken into orchestrator + 6 sub-functions for readability
- formatOutput() eliminates 4x repeated switch(format) pattern
- Prepares codebase for open-source release (Apache-2.0)

Test results: 527/530 pass (3 pre-existing failures unchanged)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- LICENSE (Apache-2.0)
- CONTRIBUTING.md (dev setup, adapter guide, PR process)
- SECURITY.md (responsible disclosure, SSRF, rate limiting)
- docs/ARCHITECTURE.md (system overview with ASCII diagrams)
- docs/designs/open-source-commercialization.md (strategy doc)
- TODOS.md (deferred work items from CEO/Eng review)
- wrangler.toml: replace KV namespace ID with placeholder

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- packages/mcp/ with convert_url tool using @modelcontextprotocol/sdk
- npm workspaces configured for monorepo
- 7 MCP server tests covering API calls, errors, timeouts, and params

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scripts/adapter-scaffold.ts: pure logic for generating adapter + test boilerplate
- scripts/create-adapter.ts: CLI entry point for scaffold generation
- src/helpers/format.ts: extracted formatOutput with markdown/html/text/json support
- 5 CLI tests, 4 formatOutput tests, 1 import chain test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GET /llms.txt and /.well-known/llms.txt serve API description
- fetchTargetLlmsTxt with KV cache (24h TTL, negative caching)
- 8 tests for llms.txt handler and cache logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Contributor

cloudflare-workers-and-pages bot commented Mar 25, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
website2markdown cfd8813 Mar 26 2026, 02:38 AM

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 763eade8f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +14 to +16
"dependencies": {
"@modelcontextprotocol/sdk": "^1.0.0"
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Declare zod as a direct MCP package dependency

The MCP entrypoint imports zod (packages/mcp/src/index.ts), but packages/mcp/package.json only declares @modelcontextprotocol/sdk as a runtime dependency. In installs that do not expose transitive dependencies (common with pnpm and some production packaging setups), mcp-website2markdown will fail at startup with a module resolution error for zod, making the CLI unusable.

Useful? React with 👍 / 👎.

import * as path from "node:path";

/** Valid adapter name: lowercase alphanumeric and hyphens, starting with a letter or digit. */
const VALID_NAME_RE = /^[a-z0-9][a-z0-9-]*$/;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject all-digit adapter names in scaffold validation

The current name regex accepts numeric-only values (for example "36"), but toCamelCaseExportName then produces an identifier like 36Adapter, which is invalid TypeScript in the generated adapter file (export const 36Adapter ...). This causes scaffolded code to fail compilation immediately for those inputs.

Useful? React with 👍 / 👎.

Digidai and others added 4 commits March 26, 2026 00:57
- Regenerate package-lock.json for npm workspaces
- Pin @modelcontextprotocol/sdk to ~1.0.0 to avoid breaking API changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SSRF: fetchTargetLlmsTxt uses redirect:"error" instead of "follow"
- Missing dep: add zod to MCP package dependencies
- Injection: escape urlPattern with JSON.stringify in adapter scaffold
- Cache: use 1h TTL for negative llms.txt cache (vs 24h for positive)
- OOM: add 10MB response size limit in MCP convertUrl

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- .github/workflows/publish-mcp.yml: publish on mcp-v* tag
- TODOS.md: mark MCP Monorepo Tooling as completed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- deepcrawl resume: remove maxDepth/maxPages from checkpoint config
  comparison (operational limits, not structural config) and fix test
  to use higher max_pages on resume
- jobs idempotency: clone request before first fetch to avoid
  consumed-body error on second call
- jobs rerun: use maxRetries:1 so failed task is retryable

All 566 tests now pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Digidai Digidai merged commit cfd8813 into main Mar 26, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant