Google Maps Business Scraper & Dashboard

A production-ready system for scraping business information from Google Maps with a modern web dashboard interface. Deployed on AWS Lambda with MongoDB backend.

🚀 Live Demo

Production URL: You will get it after deploying to AWS Lambda using the instructions below.

✨ Features

Web Dashboard

📊 Real-time scraping progress - Live activity tracking during scraping
🔍 Search & filter businesses by name, category, or location
📈 Statistics - Total businesses, average ratings, top categories
✏️ CRUD operations - Edit, delete, and manage scraped data
📥 Export to CSV - Download business data for analysis
🌐 MongoDB integration - Persistent cloud storage

Scraping Engine

🗺️ Google Maps automation - Playwright-based scraping
📱 Comprehensive data extraction:
- Business name, phone, website, address
- Instagram and WhatsApp (including from Reserve/Order buttons)
- Ratings and review counts
- Geographic coordinates
🎯 Smart duplicate detection - Name, phone, and URL matching
💾 Immediate database saves - No data loss on errors
🛡️ Robust error handling - Graceful recovery and retry logic

AWS Lambda Deployment

⚡ Serverless architecture - No server management
🐳 Dockerized - Consistent environments
🔄 Async invocation - No API Gateway timeouts
📊 CloudWatch logging - Full observability
🌍 API Gateway - RESTful endpoint with /prod stage

📋 What's Working

Component	Status	Description
Web Dashboard	✅ Working	Flask app with modern UI
MongoDB Integration	✅ Working	External MongoDB @ easypanel.host
Google Maps Scraper	✅ Working	100% success rate (fixed Oct 22)
WhatsApp Extraction	✅ Enhanced	Extracts from action buttons
Lambda Deployment	✅ Working	1024MB, 300s timeout
Async Scraping	✅ Working	Self-invocation pattern
Real-time Progress	✅ Working	Live activity updates
Error Handling	✅ Production-ready	Graceful degradation
Database Errors	✅ User-friendly	Clean messages

📁 Project Structure

scraper_playwright/
├── app.py                    # Main Flask application
├── scrape_businesses_maps.py # Google Maps scraper
├── database.py               # MongoDB operations
├── lambda_handler.py         # AWS Lambda entry point
├── extract_contact_info.py   # Contact extraction (deprecated in scraping flow)
├── json_database.py          # Fallback JSON storage
├── templates/                # Jinja2 templates
│   ├── index.html           # Scraping interface
│   └── dashboard.html       # Business management
├── static/                   # CSS, JS, images
├── infra/                    # Terraform configuration
│   ├── main.tf
│   ├── terraform.tfvars
│   └── outputs.tf
├── Dockerfile               # Lambda container image
├── deploy.sh                # Deployment automation
└── requirements.txt         # Python dependencies

🔧 Data Structure

Each scraped business contains:

{
  "name": "Business Name",
  "phone": "011 1234-5678",
  "website": "https://example.com",
  "email": null,
  "address": "Street Address, City",
  "rating": "4.8",
  "reviews": 152,
  "instagram": "https://instagram.com/username",
  "whatsapp": "+5491123456789",
  "scraped_at": "2025-10-22T12:34:56.789000",
  "search_query": "plomero, caba"
}

🚀 Quick Start

Local Development

Clone and install:

git clone https://github.com/cf2018/scraper_playwright.git
cd scraper_playwright
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium

Set environment variables:

cp .env.example .env
# Edit .env with your MongoDB credentials

Run locally:

python app.py
# Dashboard: http://localhost:5000/

Command-Line Scraping

# Scrape specific business type and location
python scrape_businesses_maps.py "plomero, caba" --max-results 10

# Output saved to: json_output/plomero_caba_YYYYMMDD_HHMMSS.json

AWS Lambda Deployment

Configure AWS credentials:

aws configure

Set MongoDB credentials in infra/terraform.tfvars:

mongodb_connection_string = "mongodb://user:pass@host:27017/scraper"
mongodb_database_name = "scraper"

Deploy:

./deploy.sh deploy

Access:

Dashboard: https://<api-id>.execute-api.us-east-1.amazonaws.com/prod/

📊 Usage Examples

Scrape via Web Interface

Navigate to / (scraping page)
Enter search query (e.g., "restaurants, new york")
Set max results (default: 20, Lambda max: 20)
Click "Start Scraping"
Watch live progress updates
View results in /dashboard

Scrape via API

# Start scraping
curl -X POST https://your-api.amazonaws.com/prod/api/scrape \
  -H "Content-Type: application/json" \
  -d '{"search_query": "plomero, caba", "max_results": 10}'

# Check status
curl https://your-api.amazonaws.com/prod/api/scraping-status/<task_id>

# Get businesses
curl https://your-api.amazonaws.com/prod/api/businesses

🔒 Environment Variables

Variable	Description	Required
`MONGODB_CONNECTION_STRING`	MongoDB URI	Yes
`MONGODB_DATABASE_NAME`	Database name	Yes
`LAMBDA_ENVIRONMENT`	Set to `true` in Lambda	Auto
`API_PREFIX`	API Gateway stage prefix	Auto

🧪 Recent Improvements (Oct 22, 2025)

Fixed Critical Browser Stability Issue ✅

Problem: Scraper crashed after first business (browser context closed)
Solution: Disabled website contact extraction during multi-business scraping
Result: 100% success rate (was 10%)

Enhanced WhatsApp Extraction ✅

Extracts WhatsApp from Reserve/Order buttons with wa.me links
Handles URL-encoded parameters
Validates phone number format (10-15 digits)

Improved Browser Fingerprinting ✅

Updated to Chrome 131 user agent
Argentine locale (es-AR) and timezone
Buenos Aires geolocation coordinates
Proper Sec-Fetch-* headers

Robust Error Recovery ✅

Page validity checking before navigation
Fallback to re-navigate on errors
Graceful exit on unrecoverable failures

🐛 Known Limitations

Email extraction: Disabled to maintain stability (was ~40% coverage)
Lambda max results: Limited to 20 businesses due to 300s timeout
Website extraction: Disabled in multi-business scraping (stability over completeness)

📚 Documentation

🔗 Technology Stack

Backend: Python 3.12, Flask 3.1.2
Scraping: Playwright (Chromium)
Database: MongoDB (external cloud)
Deployment: AWS Lambda, API Gateway, ECR
IaC: Terraform
Frontend: Vanilla JS, Tailwind-inspired CSS

📈 Performance

Scraping speed: ~3-5 seconds per business
Lambda cold start: ~2-3 seconds
Lambda warm execution: ~1 second for dashboard
Database queries: <100ms average
Success rate: 100% (after stability fixes)

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is for educational and research purposes.

⚠️ Disclaimer

This scraper is for educational purposes. Always respect Google Maps Terms of Service and implement appropriate rate limiting. The authors are not responsible for misuse of this tool.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
infra		infra
scripts		scripts
templates		templates
tests		tests
.dockerignore		.dockerignore
.dockerignore.dev		.dockerignore.dev
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LAMBDA_DEPLOYMENT.md		LAMBDA_DEPLOYMENT.md
README.md		README.md
SCRAPER_STABILITY_IMPROVEMENTS.md		SCRAPER_STABILITY_IMPROVEMENTS.md
app.py		app.py
database.py		database.py
debug_scraper.py		debug_scraper.py
demo_webapp.py		demo_webapp.py
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
extract_contact_info.py		extract_contact_info.py
json_database.py		json_database.py
lambda-config.yml		lambda-config.yml
lambda_handler.py		lambda_handler.py
manage_businesses.py		manage_businesses.py
manage_database.py		manage_database.py
requirements.txt		requirements.txt
scrape_businesses_maps.py		scrape_businesses_maps.py
start_webapp.sh		start_webapp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Google Maps Business Scraper & Dashboard

🚀 Live Demo

✨ Features

Web Dashboard

Scraping Engine

AWS Lambda Deployment

📋 What's Working

📁 Project Structure

🔧 Data Structure

🚀 Quick Start

Local Development

Command-Line Scraping

AWS Lambda Deployment

📊 Usage Examples

Scrape via Web Interface

Scrape via API

🔒 Environment Variables

🧪 Recent Improvements (Oct 22, 2025)

Fixed Critical Browser Stability Issue ✅

Enhanced WhatsApp Extraction ✅

Improved Browser Fingerprinting ✅

Robust Error Recovery ✅

🐛 Known Limitations

📚 Documentation

🔗 Technology Stack

📈 Performance

🤝 Contributing

📝 License

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Languages

cf2018/scraper_playwright

Folders and files

Latest commit

History

Repository files navigation

Google Maps Business Scraper & Dashboard

🚀 Live Demo

✨ Features

Web Dashboard

Scraping Engine

AWS Lambda Deployment

📋 What's Working

📁 Project Structure

🔧 Data Structure

🚀 Quick Start

Local Development

Command-Line Scraping

AWS Lambda Deployment

📊 Usage Examples

Scrape via Web Interface

Scrape via API

🔒 Environment Variables

🧪 Recent Improvements (Oct 22, 2025)

Fixed Critical Browser Stability Issue ✅

Enhanced WhatsApp Extraction ✅

Improved Browser Fingerprinting ✅

Robust Error Recovery ✅

🐛 Known Limitations

📚 Documentation

🔗 Technology Stack

📈 Performance

🤝 Contributing

📝 License

⚠️ Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages