Skip to content

congard/playwright-reader-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

playwright-reader-mcp

Just another web fetching MCP. But mine.

MCP server that reads web pages through a headless Firefox browser and returns clean, structured Markdown content.

Given a URL, the server navigates to the target page in a background Firefox instance, waits for the page content to stabilize, extracts the main article body using Mozilla's Readability API, and converts it into a token-efficient Markdown document. Uses a real browser (not a simple HTTP client) to bypass anti-bot firewalls.

Note

This project was created during experiments with local inference: Qwen3.6 27B (Qwen3.6-27B-UD-Q4_K_XL-MTP.gguf) + llama.cpp + 7900XTX (ROCm) + VS Code Insiders 1.124 + Copilot.

Note

Tested on Linux only. There is no guarantee it works on other systems (especially Windows).

Features

  • Headless Firefox - renders pages with a real browser engine, bypassing bot detection
  • Content stability detection - waits for dynamic content to fully load before extracting
  • Readability extraction - uses Mozilla's Readability API to isolate the main article body
  • Markdown output - converts HTML to clean, structured Markdown with links preserved
  • Dual mode - runs as an MCP server (stdio) or as a CLI tool for direct Markdown output
  • Extensible tool architecture - abstract McpTool base class for adding new tools

Installation

npm install
npx playwright install firefox

Usage

CLI Mode

Read a webpage and output Markdown to stdout:

npm run build
node dist/main.js https://example.com

CLI Options

Option Default Description
--headless true Run browser in headless mode
--no-headless - Run browser in visible (headed) mode
--no-close - Keep browser open after reading (useful for debugging)
--user-agent Firefox 128 UA Custom User-Agent header
--viewport-width 1280 Viewport width in pixels
--viewport-height 720 Viewport height in pixels
--user-data-dir ~/.config/playwright-reader-mcp/firefox-profile Persistent browser profile directory
--no-links false Disable the "Links on this page" section

Examples

# Read a webpage
node dist/main.js https://example.com

# Read without links section
node dist/main.js https://example.com --no-links

# Run in visible (headed) mode for debugging
node dist/main.js https://example.com --no-headless

MCP Server Mode

Start the MCP server on stdio (no URL argument):

node dist/main.js

Configure your MCP client to connect via stdio transport. For example:

{
  "mcpServers": {
    "playwright-reader": {
      "command": "node",
      "args": ["dist/main.js"]
    }
  }
}

MCP Tool

read_webpage - Navigate to a URL, extract the main article, and return clean Markdown.

  • Input: { url: string }
  • Output: Markdown text with article title, source URL, body content, and page links

Testing the MCP Server

You can test the MCP server interactively using the MCP Inspector:

npx @modelcontextprotocol/inspector node dist/main.js

This opens a web UI where you can browse and invoke tools.

Alternatively, you can send JSON-RPC messages directly to stdin:

node dist/main.js

Then paste the following JSON-RPC messages (press Enter after each):

{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}
{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"read_webpage","arguments":{"url":"https://example.com"}}}

Project Structure

src/
  main.ts              # Server entry point (MCP server + CLI mode)
  cli_args.ts          # Command-line argument parsing with yargs
  browser_config.ts    # BrowserConfig interface, defaults, and merge helper
  browser_manager.ts   # Firefox browser lifecycle management
  tools.ts             # Abstract McpTool base class
  stealth_reader.ts    # read_webpage tool implementation
  content_extractor.ts # HTML-to-Markdown conversion with Turndown
  logger.ts            # Structured logging with Pino
tests/
  content_extractor.test.ts  # Unit tests for content extraction

Development

# Build
npm run build

# Watch mode
npm run dev

# Run tests
npm test

# Watch tests
npm run test:watch

Dependencies

License

MIT

Releases

No releases published

Contributors