Skip to content

ACSGenUI/mcp-google-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Search Tool

A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches and extract results. It can be used directly as a command-line tool or as a Model Context Protocol (MCP) server to provide real-time search capabilities to AI assistants like Claude.

Star History Chart

Key Features

  • Local SERP API Alternative: No need to rely on paid search engine results API services, all searches are executed locally
  • Advanced Anti-Bot Detection Bypass Techniques:
    • Intelligent browser fingerprint management that simulates real user behavior
    • Automatic saving and restoration of browser state to reduce verification frequency
    • Smart headless/headed mode switching, automatically switching to headed mode when verification is needed
    • Randomization of device and locale settings to reduce detection risk
  • Raw HTML Retrieval: Ability to fetch the raw HTML of search result pages (with CSS and JavaScript removed) for analysis and debugging when Google's page structure changes
  • Page Screenshot: Automatically captures and saves a full-page screenshot when saving HTML content
  • MCP Server Integration: Provides real-time search capabilities to AI assistants like Claude without requiring additional API keys
  • Completely Open Source and Free: All code is open source with no usage restrictions, freely customizable and extensible

Technical Features

  • Developed with TypeScript, providing type safety and better development experience
  • Browser automation based on Playwright, supporting multiple browser engines
  • Command-line parameter support for search keywords
  • MCP server support for AI assistant integration
  • Returns search results with title, link, and snippet
  • Option to retrieve raw HTML of search result pages for analysis
  • JSON format output
  • Support for both headless and headed modes (for debugging)
  • Detailed logging output
  • Robust error handling
  • Browser state saving and restoration to effectively avoid anti-bot detection

Installation

# Install from source
git clone https://github.com/web-agent-master/google-search.git
cd google-search
# Install dependencies
npm install
# Or using yarn
yarn
# Or using pnpm
pnpm install

# Compile TypeScript code
npm run build
# Or using yarn
yarn build
# Or using pnpm
pnpm build

# Link package globally (required for MCP functionality)
npm link
# Or using yarn
yarn link
# Or using pnpm
pnpm link

Windows Environment Notes

This tool has been specially adapted for Windows environments:

  1. .cmd files are provided to ensure command-line tools work properly in Windows Command Prompt and PowerShell
  2. Log files are stored in the system temporary directory instead of the Unix/Linux /tmp directory
  3. Windows-specific process signal handling has been added to ensure proper server shutdown
  4. Cross-platform file path handling is used to support Windows path separators

Usage

Command Line Tool

# Direct command line usage
google-search "search keywords"

# Using command line options
google-search --limit 5 --timeout 60000 --no-headless "search keywords"

# Or using npx
npx google-search-cli "search keywords"

# Run in development mode
pnpm dev "search keywords"

# Run in debug mode (showing browser interface)
pnpm debug "search keywords"

# Get raw HTML of search result page
google-search "search keywords" --get-html

# Get HTML and save to file
google-search "search keywords" --get-html --save-html

# Get HTML and save to specific file
google-search "search keywords" --get-html --save-html --html-output "./output.html"

Available Command Line Options

  • --limit <number>: Limit the number of results (default: 10)
  • --timeout <number>: Set timeout in milliseconds (default: 30000)
  • --no-headless: Deprecated: The tool now always starts in headless mode and automatically switches to headed mode if CAPTCHA verification is encountered
  • --state-file <path>: Specify browser state file path (default: ./browser-state.json)
  • --no-save-state: Do not save browser state
  • --get-html: Get raw HTML of search results page instead of parsed results
  • --save-html: Save HTML to file (used with --get-html)
  • --html-output <path>: Specify HTML output file path (used with --get-html --save-html)

MCP Server Usage

Configuration

Add the following configuration to your MCP settings file (e.g., ~/.cursor/mcp.json or Claude Desktop configuration):

{
  "mcpServers": {
    "google-search": {
      "command": "google-search-mcp"
    }
  }
}

After configuring, restart your AI assistant (Claude Desktop or Cursor) to enable the Google Search tool.

How to Use in AI Assistants

Once configured, you can directly ask the AI assistant to search for information:

  • "Search for the latest information about TypeScript 5.0"
  • "Find tutorials on using Playwright"
  • "What are the recent developments in AI?"

The AI assistant will automatically call the Google Search tool, retrieve real-time information from the web, and provide answers based on the search results.

Output Format

Standard Search Results

{
  "query": "search keywords",
  "results": [
    {
      "title": "Result title",
      "link": "https://example.com",
      "snippet": "Result summary..."
    }
  ]
}

HTML Retrieval Results

When using --get-html option:

{
  "query": "search keywords",
  "url": "https://www.google.com/search?q=...",
  "originalHtmlLength": 123456,
  "cleanedHtmlLength": 78900,
  "savedPath": "./google-search-html/query-timestamp.html",
  "screenshotPath": "./google-search-html/query-timestamp.png",
  "htmlPreview": "First 500 characters of HTML..."
}

Browser State Management

This tool automatically saves browser state to avoid repeated CAPTCHA verification:

  1. First run: If CAPTCHA verification is encountered, the tool will automatically switch to headed mode (visible browser) and wait for you to complete verification
  2. State saving: After successful verification, browser state is automatically saved to ~/.google-search-browser-state.json (for MCP server) or current directory's browser-state.json (for command line tool)
  3. Subsequent runs: The tool uses saved state to bypass verification, enabling fast headless searches
  4. Fingerprint management: Browser fingerprint configuration is automatically saved and reused to reduce detection risk

Technical Details

Anti-Bot Detection Strategy

This tool uses multiple techniques to bypass Google's anti-bot detection:

  1. Browser Fingerprint Simulation:

    • Automatically detects and uses host machine's timezone, language and other settings
    • Randomizes device types (Desktop Chrome, Firefox, Safari, Edge)
    • Simulates real browser environment (WebGL, plugins, screen resolution, etc.)
  2. Behavior Simulation:

    • Random delays simulate human input speed
    • Uses real keyboard events instead of direct value setting
    • Maintains consistent behavior patterns
  3. State Persistence:

    • Saves and restores cookies and local storage
    • Maintains consistent browser fingerprint
    • Reduces verification frequency by reusing sessions
  4. Intelligent Mode Switching:

    • Starts in headless mode for optimal performance
    • Automatically switches to headed mode when CAPTCHA is detected
    • Prompts user to complete verification, then saves state for future use

Logging

All logs are saved to system temporary directory:

  • Unix/Linux/macOS: /tmp/google-search-logs/google-search.log
  • Windows: %TEMP%\google-search-logs\google-search.log

Log level can be controlled via LOG_LEVEL environment variable:

LOG_LEVEL=debug google-search "search keywords"

Development

# Install dependencies
pnpm install

# Compile TypeScript
pnpm build

# Run in development mode
pnpm dev "search keywords"

# Run in debug mode (show browser)
pnpm debug "search keywords"

# Run MCP server
pnpm mcp

# Test build
pnpm test:build

Troubleshooting

CAPTCHA Verification Issues

If you frequently encounter CAPTCHA verification:

  1. Let the tool automatically switch to headed mode and complete verification
  2. Ensure browser state files are properly saved
  3. Try using different Google domains (the tool automatically randomizes this)
  4. Avoid making too many requests in a short time

Playwright Installation Issues

If Playwright browser installation fails:

# Manually install Chromium
npx playwright install chromium

# Or install all browsers
npx playwright install

Permission Issues

On Unix/Linux systems, if you encounter permission issues:

# Make bin files executable
chmod +x bin/google-search
chmod +x bin/google-search-mcp

Contributing

Issues and Pull Requests are welcome! Please ensure code follows existing style and all tests pass.

License

ISC License

Credits

This project is maintained by web-agent-master.

Related Projects

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published