A production-ready system for scraping business information from Google Maps with a modern web dashboard interface. Deployed on AWS Lambda with MongoDB backend.
Production URL: You will get it after deploying to AWS Lambda using the instructions below.
- π Real-time scraping progress - Live activity tracking during scraping
- π Search & filter businesses by name, category, or location
- π Statistics - Total businesses, average ratings, top categories
- βοΈ CRUD operations - Edit, delete, and manage scraped data
- π₯ Export to CSV - Download business data for analysis
- π MongoDB integration - Persistent cloud storage
- πΊοΈ Google Maps automation - Playwright-based scraping
- π± Comprehensive data extraction:
- Business name, phone, website, address
- Instagram and WhatsApp (including from Reserve/Order buttons)
- Ratings and review counts
- Geographic coordinates
 
- π― Smart duplicate detection - Name, phone, and URL matching
- πΎ Immediate database saves - No data loss on errors
- π‘οΈ Robust error handling - Graceful recovery and retry logic
- β‘ Serverless architecture - No server management
- π³ Dockerized - Consistent environments
- π Async invocation - No API Gateway timeouts
- π CloudWatch logging - Full observability
- π API Gateway - RESTful endpoint with /prodstage
| Component | Status | Description | 
|---|---|---|
| Web Dashboard | β Working | Flask app with modern UI | 
| MongoDB Integration | β Working | External MongoDB @ easypanel.host | 
| Google Maps Scraper | β Working | 100% success rate (fixed Oct 22) | 
| WhatsApp Extraction | β Enhanced | Extracts from action buttons | 
| Lambda Deployment | β Working | 1024MB, 300s timeout | 
| Async Scraping | β Working | Self-invocation pattern | 
| Real-time Progress | β Working | Live activity updates | 
| Error Handling | β Production-ready | Graceful degradation | 
| Database Errors | β User-friendly | Clean messages | 
scraper_playwright/
βββ app.py                    # Main Flask application
βββ scrape_businesses_maps.py # Google Maps scraper
βββ database.py               # MongoDB operations
βββ lambda_handler.py         # AWS Lambda entry point
βββ extract_contact_info.py   # Contact extraction (deprecated in scraping flow)
βββ json_database.py          # Fallback JSON storage
βββ templates/                # Jinja2 templates
β   βββ index.html           # Scraping interface
β   βββ dashboard.html       # Business management
βββ static/                   # CSS, JS, images
βββ infra/                    # Terraform configuration
β   βββ main.tf
β   βββ terraform.tfvars
β   βββ outputs.tf
βββ Dockerfile               # Lambda container image
βββ deploy.sh                # Deployment automation
βββ requirements.txt         # Python dependencies
Each scraped business contains:
{
  "name": "Business Name",
  "phone": "011 1234-5678",
  "website": "https://example.com",
  "email": null,
  "address": "Street Address, City",
  "rating": "4.8",
  "reviews": 152,
  "instagram": "https://instagram.com/username",
  "whatsapp": "+5491123456789",
  "scraped_at": "2025-10-22T12:34:56.789000",
  "search_query": "plomero, caba"
}- Clone and install:
git clone https://github.com/cf2018/scraper_playwright.git
cd scraper_playwright
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium- Set environment variables:
cp .env.example .env
# Edit .env with your MongoDB credentials- Run locally:
python app.py
# Dashboard: http://localhost:5000/# Scrape specific business type and location
python scrape_businesses_maps.py "plomero, caba" --max-results 10
# Output saved to: json_output/plomero_caba_YYYYMMDD_HHMMSS.json- Configure AWS credentials:
aws configure- Set MongoDB credentials in infra/terraform.tfvars:
mongodb_connection_string = "mongodb://user:pass@host:27017/scraper"
mongodb_database_name = "scraper"- Deploy:
./deploy.sh deploy- Access:
- Dashboard: https://<api-id>.execute-api.us-east-1.amazonaws.com/prod/
- Navigate to /(scraping page)
- Enter search query (e.g., "restaurants, new york")
- Set max results (default: 20, Lambda max: 20)
- Click "Start Scraping"
- Watch live progress updates
- View results in /dashboard
# Start scraping
curl -X POST https://your-api.amazonaws.com/prod/api/scrape \
  -H "Content-Type: application/json" \
  -d '{"search_query": "plomero, caba", "max_results": 10}'
# Check status
curl https://your-api.amazonaws.com/prod/api/scraping-status/<task_id>
# Get businesses
curl https://your-api.amazonaws.com/prod/api/businesses| Variable | Description | Required | 
|---|---|---|
| MONGODB_CONNECTION_STRING | MongoDB URI | Yes | 
| MONGODB_DATABASE_NAME | Database name | Yes | 
| LAMBDA_ENVIRONMENT | Set to truein Lambda | Auto | 
| API_PREFIX | API Gateway stage prefix | Auto | 
- Problem: Scraper crashed after first business (browser context closed)
- Solution: Disabled website contact extraction during multi-business scraping
- Result: 100% success rate (was 10%)
- Extracts WhatsApp from Reserve/Order buttons with wa.melinks
- Handles URL-encoded parameters
- Validates phone number format (10-15 digits)
- Updated to Chrome 131 user agent
- Argentine locale (es-AR) and timezone
- Buenos Aires geolocation coordinates
- Proper Sec-Fetch-* headers
- Page validity checking before navigation
- Fallback to re-navigate on errors
- Graceful exit on unrecoverable failures
- Email extraction: Disabled to maintain stability (was ~40% coverage)
- Lambda max results: Limited to 20 businesses due to 300s timeout
- Website extraction: Disabled in multi-business scraping (stability over completeness)
- Backend: Python 3.12, Flask 3.1.2
- Scraping: Playwright (Chromium)
- Database: MongoDB (external cloud)
- Deployment: AWS Lambda, API Gateway, ECR
- IaC: Terraform
- Frontend: Vanilla JS, Tailwind-inspired CSS
- Scraping speed: ~3-5 seconds per business
- Lambda cold start: ~2-3 seconds
- Lambda warm execution: ~1 second for dashboard
- Database queries: <100ms average
- Success rate: 100% (after stability fixes)
- Fork the repository
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit changes (git commit -m 'Add amazing feature')
- Push to branch (git push origin feature/amazing-feature)
- Open a Pull Request
This project is for educational and research purposes.
This scraper is for educational purposes. Always respect Google Maps Terms of Service and implement appropriate rate limiting. The authors are not responsible for misuse of this tool.