Business Scraper - Professional Lead Generation System

Professional Google Maps business lead generation system with Playwright, FastAPI, and smart proxy management.

✨ Features

✅ Playwright-based scraping - Reliable, modern browser automation
✅ Smart proxy management - Auto-detection, rotation, health tracking
✅ FastAPI dashboard - Real-time updates and job management
✅ CSV/PDF export - Professional reports with filtering
✅ Job queue system - Manage multiple scraping jobs
✅ SQLite database - Persistent storage with quality scoring
✅ Environment-based config - Type-safe configuration with .env

🚀 Quick Start

1. Install Dependencies

# Create virtual environment
uv venv
.venv\Scripts\activate  # Windows
source .venv/bin/activate  # Linux/Mac

# Install packages
uv pip install -e ".[dev]"

2. Configure Environment

# Copy environment template
copy .env.example .env  # Windows
cp .env.example .env    # Linux/Mac

# Edit .env with your settings (optional)

3. Run Application

python app.py

4. Access Dashboard

Open http://localhost:8000 in your browser

📖 Documentation

Documentation Index - All documentation
Dashboard Guide - Using the web interface
CSV Import Guide - Bulk job import
PDF Generation Guide - Export reports
Architecture - System design
Deployment Checklist - Production deployment

⚙️ Configuration

All configuration is managed through the .env file. Key settings:

# Database
DATABASE_PATH=business_leads.db

# Scraping
MAX_RESULTS_PER_JOB=50
HEADLESS_MODE=false

# Proxy (optional - auto-detected)
PROXIES_FILE=proxies.txt

# Server
HOST=0.0.0.0
PORT=8000

See .env.example for all available options.

🔒 Proxy Support

The scraper includes smart proxy management:

Auto-detection - Checks for proxies.txt automatically
Works with or without proxies - No code changes needed
Rotation & health tracking - Skips failed proxies
Multiple formats supported - ip:port:user:pass, http://user:pass@ip:port

To use proxies:

Create proxies.txt in project root
Add proxies (one per line)
Run scraper - proxies automatically detected!

To disable proxies:

Delete or rename proxies.txt

See Smart Proxy Guide for details.

📁 Project Structure

business-scraper/
├── config.py              # Configuration management
├── proxy_manager.py       # Smart proxy system
├── database_manager.py    # Database operations
├── google_maps_scraper.py # Main scraper
├── app.py                 # FastAPI application
├── .env                   # Environment configuration
├── .env.example           # Configuration template
├── docs/                  # Documentation
│   ├── guides/            # User guides
│   ├── technical/         # Technical docs
│   └── deployment/        # Deployment guides
└── templates/             # Web templates

🧪 Development

# Run tests
pytest

# Format code
black .

# Lint code
ruff check .

# Type check
mypy .

📊 Tech Stack

Backend: FastAPI, Pydantic
Scraping: Playwright
Database: SQLite
Export: Pandas, ReportLab
Config: Pydantic Settings, python-dotenv

📝 License

MIT License - See LICENSE file for details

🤝 Contributing

Contributions welcome! Please read the documentation first.

Professional product ready for production use 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
docs		docs
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AUTO_SKIP_STUCK_JOBS.md		AUTO_SKIP_STUCK_JOBS.md
BUGFIX_NOTES.md		BUGFIX_NOTES.md
BULK_JOB_FIX.md		BULK_JOB_FIX.md
COMPLETE_FEATURES.md		COMPLETE_FEATURES.md
DASHBOARD_README.md		DASHBOARD_README.md
DEPLOYMENT_OCI.md		DEPLOYMENT_OCI.md
DEPLOYMENT_READY.txt		DEPLOYMENT_READY.txt
Dockerfile		Dockerfile
FILE_STRUCTURE.md		FILE_STRUCTURE.md
FINAL_SUMMARY.md		FINAL_SUMMARY.md
FIXES_APPLIED.md		FIXES_APPLIED.md
JOB_MANAGEMENT_FIXED.md		JOB_MANAGEMENT_FIXED.md
JOB_SELECTION_FEATURE.md		JOB_SELECTION_FEATURE.md
Jobs list.txt		Jobs list.txt
LIVE_DASHBOARD_FIX.md		LIVE_DASHBOARD_FIX.md
LIVE_UPDATES.md		LIVE_UPDATES.md
OCI_DEPLOYMENT_SUMMARY.md		OCI_DEPLOYMENT_SUMMARY.md
PROXY_FIX_SUMMARY.md		PROXY_FIX_SUMMARY.md
PROXY_FIX_URGENT.md		PROXY_FIX_URGENT.md
PROXY_VERIFICATION.md		PROXY_VERIFICATION.md
QUICK_START_ORACLE.md		QUICK_START_ORACLE.md
QUICK_START_SCRAPING.md		QUICK_START_SCRAPING.md
README.md		README.md
README_BUSINESS.md		README_BUSINESS.md
README_DEPLOYMENT.md		README_DEPLOYMENT.md
README_NEW.md		README_NEW.md
README_ORACLE.md		README_ORACLE.md
RESET_BUTTON_ADDED.md		RESET_BUTTON_ADDED.md
SIMPLE_ORACLE_DEPLOY.md		SIMPLE_ORACLE_DEPLOY.md
START_HERE.md		START_HERE.md
STATUS_SUMMARY.md		STATUS_SUMMARY.md
UI_IMPROVEMENTS.md		UI_IMPROVEMENTS.md
USAGE_GUIDE.md		USAGE_GUIDE.md
WHAT_WAS_DONE.md		WHAT_WAS_DONE.md
What to do		What to do
add_new_job.py		add_new_job.py
app.py		app.py
auth.py		auth.py
config.py		config.py
dashboard.py		dashboard.py
database_manager.py		database_manager.py
database_scraper.py		database_scraper.py
db.py		db.py
deploy.sh		deploy.sh
example_proxies.txt		example_proxies.txt
example_smart_proxy.py		example_smart_proxy.py
exceptions.py		exceptions.py
export.py		export.py
export_data.py		export_data.py
google_maps_scraper.py		google_maps_scraper.py
import_csv_jobs.py		import_csv_jobs.py
jobs.csv		jobs.csv
load_jobs.py		load_jobs.py
logging_config.py		logging_config.py
main.py		main.py
nginx-business-scraper.conf		nginx-business-scraper.conf
oci-server-spec.txt		oci-server-spec.txt
places.csv		places.csv
proxy_manager.py		proxy_manager.py
proxy_scraper.py		proxy_scraper.py
pyproject.toml		pyproject.toml
rate_limit.py		rate_limit.py
requirements.txt		requirements.txt
reset_stuck_jobs.py		reset_stuck_jobs.py
scheduler_service.py		scheduler_service.py
schemas.py		schemas.py
scraper.py		scraper.py
scraper_controller.py		scraper_controller.py
scraper_playwright.py		scraper_playwright.py
scraper_with_database.py		scraper_with_database.py
setup_business.py		setup_business.py
simple-deploy.sh		simple-deploy.sh
start.sh		start.sh
supervisor-business-scraper.conf		supervisor-business-scraper.conf
test_csv_split.py		test_csv_split.py
test_integration.py		test_integration.py
test_pdf_generation.py		test_pdf_generation.py
test_playwright_proxy.py		test_playwright_proxy.py
test_proxies_simple.py		test_proxies_simple.py
test_proxy_rotation.py		test_proxy_rotation.py
test_proxy_usage.py		test_proxy_usage.py
test_proxy_verification.py		test_proxy_verification.py
test_scraper.py		test_scraper.py
test_simple_proxy.py		test_simple_proxy.py
test_website_detection.py		test_website_detection.py
verify_proxy.py		verify_proxy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Business Scraper - Professional Lead Generation System

✨ Features

🚀 Quick Start

1. Install Dependencies

2. Configure Environment

3. Run Application

4. Access Dashboard

📖 Documentation

⚙️ Configuration

🔒 Proxy Support

📁 Project Structure

🧪 Development

📊 Tech Stack

📝 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Business Scraper - Professional Lead Generation System

✨ Features

🚀 Quick Start

1. Install Dependencies

2. Configure Environment

3. Run Application

4. Access Dashboard

📖 Documentation

⚙️ Configuration

🔒 Proxy Support

📁 Project Structure

🧪 Development

📊 Tech Stack

📝 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages