Skip to content

LLMTooling/cloudscraper-mcp-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CloudScraper MCP Server

A Model Context Protocol server that enables AI agents to bypass Cloudflare protection and scrape web content

Python Version FastMCP License Docker


Core Features

Feature Description
Cloudflare Bypass Automatically handles Cloudflare protection using cloudscraper library
Multiple Transports Supports both stdio and HTTP transport protocols
Content Cleaning Converts HTML to clean, LLM-friendly Markdown format
Smart Chunking Automatically splits large responses into 10k token chunks
Docker Support Production-ready containerized deployment
Multiple Methods Supports GET and POST HTTP methods
Binary Handling Base64 encoding for non-text content
File Export Save scraped content directly to disk

Available MCP Tools

Tool Comparison

Tool Return Type Use Case Chunking Support File Output
scrape_url String (content only) Quick content retrieval for AI processing Yes No
scrape_url_raw Dictionary (metadata + content) Full response details with headers and timing Yes No
scrape_url_to_file Dictionary (save confirmation) Export content to workspace files No Yes

Shared Parameters

Parameter Type Required Default Description
url string Yes - Target URL to scrape
method string No "GET" HTTP method (GET or POST)
clean_content boolean No true Convert HTML to Markdown
continuation_token string No null Token for retrieving next chunk

scrape_url Response Fields

Field Type Description
Response string Page content with chunk instructions if applicable

Note: When content exceeds 10k tokens, response includes continuation instructions embedded in the text.


scrape_url_raw Response Fields

Field Type Always Present Description
status_code integer Yes HTTP response status code
headers object Yes Response headers (hop-by-hop headers removed)
content string Yes Page content or current chunk
content_type string Yes MIME type of response
response_time number Yes Request duration in seconds
chunked boolean When chunked Indicates response was split
chunk_index integer When chunked Current chunk number (1-based)
total_chunks integer When chunked Total number of chunks
continuation_token string When more chunks Token for next chunk retrieval
total_tokens integer When chunked Total tokens in full response
message string When chunked Human-readable chunk status
error string On failure Error description

scrape_url_to_file Parameters

Parameter Type Required Default Description
url string Yes - Target URL to scrape
file_path string Yes - Path where content should be saved
method string No "GET" HTTP method (GET or POST)
clean_content boolean No false Convert HTML to Markdown before saving
overwrite boolean No false Replace file if it exists

scrape_url_to_file Response Fields

Field Type Always Present Description
status_code integer Yes HTTP response status code
headers object Yes Response headers (hop-by-hop headers removed)
content_type string Yes MIME type of saved content
response_time number Yes Request duration in seconds
file_path string On success Absolute path to saved file
bytes_written integer On success Number of bytes written to disk
message string On success Confirmation message
error string On failure Error description

Installation

Prerequisites

Requirement Version Purpose
Python 3.10+ Runtime environment
uv Latest Dependency management
Git Any Repository cloning

Setup Steps

Clone the repository and install dependencies:

git clone https://github.com/yourusername/cloudscraper-mcp-server.git
cd cloudscraper-mcp-server
uv sync

Configuration

Transport Protocols

Transport Best For Configuration
stdio Claude Code, VSCode, Direct AI integration Default mode, no environment variables needed
http n8n, Web apps, API integrations, Remote access Requires MCP_TRANSPORT=http

Environment Variables

Variable Default Options Description
MCP_TRANSPORT stdio stdio, http Transport protocol selection
MCP_HOST 0.0.0.0 Any valid IP Host binding for HTTP mode
MCP_PORT 8000 Any valid port Port for HTTP mode

Usage Examples

Running with Stdio Transport (Default)

uv run server.py

Running with HTTP Transport

MCP_TRANSPORT=http MCP_HOST=0.0.0.0 MCP_PORT=8000 uv run server.py

Claude Code Integration

claude mcp add cloudscraper-mcp \
  --type stdio \
  --command "uv" \
  --args "run" "server.py" \
  --directory "/path/to/cloudscraper-mcp-server"

VSCode/IDE Configuration

{
  "mcpServers": {
    "cloudscraper-mcp": {
      "type": "stdio",
      "command": "uv",
      "args": [
        "run",
        "server.py"
      ],
      "cwd": "/path/to/cloudscraper-mcp-server"
    }
  }
}

Docker Deployment

For containerized deployment instructions, see DOCKER.md


Technical Stack

Component Technology Purpose
Protocol FastMCP 2.0+ Model Context Protocol implementation
Scraping cloudscraper 1.2.71+ Cloudflare bypass engine
Compression brotli 1.0.9+ Response decompression
Parsing beautifulsoup4 4.10.0+ HTML parsing
Conversion markdownify 0.11.6+ HTML to Markdown transformation
Tokenization tiktoken 0.5.0+ Token counting for chunking

Advanced Features

Response Chunking System

Feature Value Description
Max Tokens Per Chunk 10,000 Maximum tokens in a single response
Chunk Expiry 2 minutes Cache lifetime for chunk retrieval
Token Encoding cl100k_base tiktoken encoding model
Continuation Pattern chunk_id:index Token format for sequential retrieval

Security Headers

Header Value Purpose
User-Agent Chrome 120 Browser impersonation
Sec-Ch-Ua Chrome/Chromium Client hints
Sec-Fetch-* cors/same-origin Fetch metadata
Origin/Referer Auto-generated Request legitimacy

Made with CloudScraper and FastMCP

About

Model Context Protocol server (wrapper) for cloudscraper

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •