You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
REST API - Asynchronous job queue server with full-featured endpoints
Docker Support - Containerized deployment
Chrome Profile Persistence - Session data persists between runs
Quick Start
Option 1: Docker (Recommended)
The easiest way to get started. Docker handles all dependencies, Playwright, and browser setup automatically.
Using docker-compose (Simplest)
# Clone the repository
git clone https://github.com/Ayyouboss0011/SherlockMaps.git
cd GoogleMapsCrawler
# Start the API server
docker compose up -d
# The API is now running at http://localhost:8000# Interactive documentation: http://localhost:8000/docs
# Check job status
curl http://localhost:8000/status
# Get all results
curl http://localhost:8000/results
Stop the container
docker compose down
Using Docker CLI (without docker-compose)
# Clone the repository
git clone https://github.com/Ayyouboss0011/SherlockMaps.git
cd GoogleMapsCrawler
# Build the imagecd core && docker build -t sherlock-maps .&&cd ..
# Run as API server
docker run -d -p 8000:8000 --name sherlock-maps sherlock-maps
# Run in CLI mode (one-time crawl)
docker run --rm -e PROMPT="restaurants berlin" sherlock-maps python /app/core/main_cli.py
Option 2: Without Docker
Install Python dependencies manually and run the crawler directly.
# Set search termexport PROMPT="restaurants berlin"# Run the crawler
python main.py
Run the REST API server
# Start the API server on port 8000
python api_main.py
# The API is now running at http://localhost:8000# Interactive documentation: http://localhost:8000/docs
fromcore.modelsimportCrawlerConfig, CompanyDatafromcore.crawlerimportGoogleMapsCrawler# Create configurationconfig=CrawlerConfig(
search_prompt="restaurants berlin",
headless=False,
output_format="pretty",
)
# Use crawler with context managerwithGoogleMapsCrawler(config) ascrawler:
results=crawler.crawl()
# Process resultsforcompanyinresults:
ifisinstance(company, CompanyData):
print(f"{company.name}: {company.rating} stars ({company.reviews_count} reviews)")
Custom Search at Runtime
fromcore.modelsimportCrawlerConfigfromcore.crawlerimportGoogleMapsCrawlerconfig=CrawlerConfig(
search_prompt="cafes berlin",
output_format="json",
)
withGoogleMapsCrawler(config) ascrawler:
# First searchresults1=crawler.crawl()
# Second search with different termresults2=crawler.crawl(prompt="restaurants munich")
REST API
The crawler can run as a persistent service with REST API. The container starts as an API server and can process multiple crawl jobs sequentially.
Start the API
# Build the imagecd core
docker build -t sherlock-maps .# Start API server (port 8000)
docker run -p 8000:8000 sherlock-maps
# With custom port
docker run -p 8080:8080 -e API_PORT=8080 sherlock-maps
API Endpoints
Health & Status
Method
Path
Description
GET
/health
Health check (for Docker orchestrators)
GET
/status
Current status (idle/busy), active jobs, queue length
GET
/stats
Detailed statistics
Crawler Control
Method
Path
Description
POST
/crawl
Start a new crawl job
GET
/crawl/{job_id}
Get job status
GET
/crawl/{job_id}/results
Get job results
DELETE
/crawl/{job_id}
Cancel a running job
GET
/crawl/history
List all jobs with pagination
Data Management
Method
Path
Description
GET
/results
Get all results
POST
/results/export
Export results
DELETE
/results/clear
Clear all results
Configuration
Method
Path
Description
GET
/config
Get current configuration
PUT
/config
Update configuration
Browser
Method
Path
Description
GET
/browser/info
Browser information
POST
/browser/restart
Restart browser
API Examples
# Start a new crawl job
curl -X POST http://localhost:8000/crawl \
-H "Content-Type: application/json" \
-d '{"prompt": "restaurants berlin", "output_format": "json"}'# Get job status
curl http://localhost:8000/crawl/<job_id># Get results
curl http://localhost:8000/crawl/<job_id>/results
# Get all results as CSV
curl "http://localhost:8000/results?format=csv"# Get status
curl http://localhost:8000/status
# Health check
curl http://localhost:8000/health
# Cancel job
curl -X DELETE http://localhost:8000/crawl/<job_id># Job history
curl "http://localhost:8000/crawl/history?limit=10&offset=0"