A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches and extract results. It can be used directly as a command-line tool or as a Model Context Protocol (MCP) server to provide real-time search capabilities to AI assistants like Claude.
- Local SERP API Alternative: No need to rely on paid search engine results API services, all searches are executed locally
- Advanced Anti-Bot Detection Bypass Techniques:
- Intelligent browser fingerprint management that simulates real user behavior
- Automatic saving and restoration of browser state to reduce verification frequency
- Smart headless/headed mode switching, automatically switching to headed mode when verification is needed
- Randomization of device and locale settings to reduce detection risk
- Raw HTML Retrieval: Ability to fetch the raw HTML of search result pages (with CSS and JavaScript removed) for analysis and debugging when Google's page structure changes
- Page Screenshot: Automatically captures and saves a full-page screenshot when saving HTML content
- MCP Server Integration: Provides real-time search capabilities to AI assistants like Claude without requiring additional API keys
- Completely Open Source and Free: All code is open source with no usage restrictions, freely customizable and extensible
- Developed with TypeScript, providing type safety and better development experience
- Browser automation based on Playwright, supporting multiple browser engines
- Command-line parameter support for search keywords
- MCP server support for AI assistant integration
- Returns search results with title, link, and snippet
- Option to retrieve raw HTML of search result pages for analysis
- JSON format output
- Support for both headless and headed modes (for debugging)
- Detailed logging output
- Robust error handling
- Browser state saving and restoration to effectively avoid anti-bot detection
# Install from source
git clone https://github.com/web-agent-master/google-search.git
cd google-search
# Install dependencies
npm install
# Or using yarn
yarn
# Or using pnpm
pnpm install
# Compile TypeScript code
npm run build
# Or using yarn
yarn build
# Or using pnpm
pnpm build
# Link package globally (required for MCP functionality)
npm link
# Or using yarn
yarn link
# Or using pnpm
pnpm linkThis tool has been specially adapted for Windows environments:
.cmdfiles are provided to ensure command-line tools work properly in Windows Command Prompt and PowerShell- Log files are stored in the system temporary directory instead of the Unix/Linux
/tmpdirectory - Windows-specific process signal handling has been added to ensure proper server shutdown
- Cross-platform file path handling is used to support Windows path separators
# Direct command line usage
google-search "search keywords"
# Using command line options
google-search --limit 5 --timeout 60000 --no-headless "search keywords"
# Or using npx
npx google-search-cli "search keywords"
# Run in development mode
pnpm dev "search keywords"
# Run in debug mode (showing browser interface)
pnpm debug "search keywords"
# Get raw HTML of search result page
google-search "search keywords" --get-html
# Get HTML and save to file
google-search "search keywords" --get-html --save-html
# Get HTML and save to specific file
google-search "search keywords" --get-html --save-html --html-output "./output.html"--limit <number>: Limit the number of results (default: 10)--timeout <number>: Set timeout in milliseconds (default: 30000)--no-headless: Deprecated: The tool now always starts in headless mode and automatically switches to headed mode if CAPTCHA verification is encountered--state-file <path>: Specify browser state file path (default:./browser-state.json)--no-save-state: Do not save browser state--get-html: Get raw HTML of search results page instead of parsed results--save-html: Save HTML to file (used with--get-html)--html-output <path>: Specify HTML output file path (used with--get-html --save-html)
Add the following configuration to your MCP settings file (e.g., ~/.cursor/mcp.json or Claude Desktop configuration):
{
"mcpServers": {
"google-search": {
"command": "google-search-mcp"
}
}
}After configuring, restart your AI assistant (Claude Desktop or Cursor) to enable the Google Search tool.
Once configured, you can directly ask the AI assistant to search for information:
- "Search for the latest information about TypeScript 5.0"
- "Find tutorials on using Playwright"
- "What are the recent developments in AI?"
The AI assistant will automatically call the Google Search tool, retrieve real-time information from the web, and provide answers based on the search results.
{
"query": "search keywords",
"results": [
{
"title": "Result title",
"link": "https://example.com",
"snippet": "Result summary..."
}
]
}When using --get-html option:
{
"query": "search keywords",
"url": "https://www.google.com/search?q=...",
"originalHtmlLength": 123456,
"cleanedHtmlLength": 78900,
"savedPath": "./google-search-html/query-timestamp.html",
"screenshotPath": "./google-search-html/query-timestamp.png",
"htmlPreview": "First 500 characters of HTML..."
}This tool automatically saves browser state to avoid repeated CAPTCHA verification:
- First run: If CAPTCHA verification is encountered, the tool will automatically switch to headed mode (visible browser) and wait for you to complete verification
- State saving: After successful verification, browser state is automatically saved to
~/.google-search-browser-state.json(for MCP server) or current directory'sbrowser-state.json(for command line tool) - Subsequent runs: The tool uses saved state to bypass verification, enabling fast headless searches
- Fingerprint management: Browser fingerprint configuration is automatically saved and reused to reduce detection risk
This tool uses multiple techniques to bypass Google's anti-bot detection:
-
Browser Fingerprint Simulation:
- Automatically detects and uses host machine's timezone, language and other settings
- Randomizes device types (Desktop Chrome, Firefox, Safari, Edge)
- Simulates real browser environment (WebGL, plugins, screen resolution, etc.)
-
Behavior Simulation:
- Random delays simulate human input speed
- Uses real keyboard events instead of direct value setting
- Maintains consistent behavior patterns
-
State Persistence:
- Saves and restores cookies and local storage
- Maintains consistent browser fingerprint
- Reduces verification frequency by reusing sessions
-
Intelligent Mode Switching:
- Starts in headless mode for optimal performance
- Automatically switches to headed mode when CAPTCHA is detected
- Prompts user to complete verification, then saves state for future use
All logs are saved to system temporary directory:
- Unix/Linux/macOS:
/tmp/google-search-logs/google-search.log - Windows:
%TEMP%\google-search-logs\google-search.log
Log level can be controlled via LOG_LEVEL environment variable:
LOG_LEVEL=debug google-search "search keywords"# Install dependencies
pnpm install
# Compile TypeScript
pnpm build
# Run in development mode
pnpm dev "search keywords"
# Run in debug mode (show browser)
pnpm debug "search keywords"
# Run MCP server
pnpm mcp
# Test build
pnpm test:buildIf you frequently encounter CAPTCHA verification:
- Let the tool automatically switch to headed mode and complete verification
- Ensure browser state files are properly saved
- Try using different Google domains (the tool automatically randomizes this)
- Avoid making too many requests in a short time
If Playwright browser installation fails:
# Manually install Chromium
npx playwright install chromium
# Or install all browsers
npx playwright installOn Unix/Linux systems, if you encounter permission issues:
# Make bin files executable
chmod +x bin/google-search
chmod +x bin/google-search-mcpIssues and Pull Requests are welcome! Please ensure code follows existing style and all tests pass.
ISC License
This project is maintained by web-agent-master.
- Playwright - Modern web automation library
- Model Context Protocol - Protocol for AI assistant tool integration
- Claude Desktop - Anthropic's AI assistant desktop application