| Feature | Description |
|---|---|
| Cloudflare Bypass | Automatically handles Cloudflare protection using cloudscraper library |
| Multiple Transports | Supports both stdio and HTTP transport protocols |
| Content Cleaning | Converts HTML to clean, LLM-friendly Markdown format |
| Smart Chunking | Automatically splits large responses into 10k token chunks |
| Docker Support | Production-ready containerized deployment |
| Multiple Methods | Supports GET and POST HTTP methods |
| Binary Handling | Base64 encoding for non-text content |
| File Export | Save scraped content directly to disk |
| Tool | Return Type | Use Case | Chunking Support | File Output |
|---|---|---|---|---|
| scrape_url | String (content only) | Quick content retrieval for AI processing | Yes | No |
| scrape_url_raw | Dictionary (metadata + content) | Full response details with headers and timing | Yes | No |
| scrape_url_to_file | Dictionary (save confirmation) | Export content to workspace files | No | Yes |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | Yes | - | Target URL to scrape |
method |
string | No | "GET" | HTTP method (GET or POST) |
clean_content |
boolean | No | true | Convert HTML to Markdown |
continuation_token |
string | No | null | Token for retrieving next chunk |
| Field | Type | Description |
|---|---|---|
| Response | string | Page content with chunk instructions if applicable |
Note: When content exceeds 10k tokens, response includes continuation instructions embedded in the text.
| Field | Type | Always Present | Description |
|---|---|---|---|
status_code |
integer | Yes | HTTP response status code |
headers |
object | Yes | Response headers (hop-by-hop headers removed) |
content |
string | Yes | Page content or current chunk |
content_type |
string | Yes | MIME type of response |
response_time |
number | Yes | Request duration in seconds |
chunked |
boolean | When chunked | Indicates response was split |
chunk_index |
integer | When chunked | Current chunk number (1-based) |
total_chunks |
integer | When chunked | Total number of chunks |
continuation_token |
string | When more chunks | Token for next chunk retrieval |
total_tokens |
integer | When chunked | Total tokens in full response |
message |
string | When chunked | Human-readable chunk status |
error |
string | On failure | Error description |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | Yes | - | Target URL to scrape |
file_path |
string | Yes | - | Path where content should be saved |
method |
string | No | "GET" | HTTP method (GET or POST) |
clean_content |
boolean | No | false | Convert HTML to Markdown before saving |
overwrite |
boolean | No | false | Replace file if it exists |
| Field | Type | Always Present | Description |
|---|---|---|---|
status_code |
integer | Yes | HTTP response status code |
headers |
object | Yes | Response headers (hop-by-hop headers removed) |
content_type |
string | Yes | MIME type of saved content |
response_time |
number | Yes | Request duration in seconds |
file_path |
string | On success | Absolute path to saved file |
bytes_written |
integer | On success | Number of bytes written to disk |
message |
string | On success | Confirmation message |
error |
string | On failure | Error description |
| Requirement | Version | Purpose |
|---|---|---|
| Python | 3.10+ | Runtime environment |
| uv | Latest | Dependency management |
| Git | Any | Repository cloning |
Clone the repository and install dependencies:
git clone https://github.com/yourusername/cloudscraper-mcp-server.git
cd cloudscraper-mcp-server
uv sync| Transport | Best For | Configuration |
|---|---|---|
| stdio | Claude Code, VSCode, Direct AI integration | Default mode, no environment variables needed |
| http | n8n, Web apps, API integrations, Remote access | Requires MCP_TRANSPORT=http |
| Variable | Default | Options | Description |
|---|---|---|---|
MCP_TRANSPORT |
stdio | stdio, http | Transport protocol selection |
MCP_HOST |
0.0.0.0 | Any valid IP | Host binding for HTTP mode |
MCP_PORT |
8000 | Any valid port | Port for HTTP mode |
uv run server.pyMCP_TRANSPORT=http MCP_HOST=0.0.0.0 MCP_PORT=8000 uv run server.pyclaude mcp add cloudscraper-mcp \
--type stdio \
--command "uv" \
--args "run" "server.py" \
--directory "/path/to/cloudscraper-mcp-server"{
"mcpServers": {
"cloudscraper-mcp": {
"type": "stdio",
"command": "uv",
"args": [
"run",
"server.py"
],
"cwd": "/path/to/cloudscraper-mcp-server"
}
}
}For containerized deployment instructions, see DOCKER.md
| Component | Technology | Purpose |
|---|---|---|
| Protocol | FastMCP 2.0+ | Model Context Protocol implementation |
| Scraping | cloudscraper 1.2.71+ | Cloudflare bypass engine |
| Compression | brotli 1.0.9+ | Response decompression |
| Parsing | beautifulsoup4 4.10.0+ | HTML parsing |
| Conversion | markdownify 0.11.6+ | HTML to Markdown transformation |
| Tokenization | tiktoken 0.5.0+ | Token counting for chunking |
| Feature | Value | Description |
|---|---|---|
| Max Tokens Per Chunk | 10,000 | Maximum tokens in a single response |
| Chunk Expiry | 2 minutes | Cache lifetime for chunk retrieval |
| Token Encoding | cl100k_base | tiktoken encoding model |
| Continuation Pattern | chunk_id:index | Token format for sequential retrieval |
| Header | Value | Purpose |
|---|---|---|
| User-Agent | Chrome 120 | Browser impersonation |
| Sec-Ch-Ua | Chrome/Chromium | Client hints |
| Sec-Fetch-* | cors/same-origin | Fetch metadata |
| Origin/Referer | Auto-generated | Request legitimacy |
Made with CloudScraper and FastMCP