AI-Powered Business Contact Finder - Fully Automated
Find hundreds of business contacts with one command. No manual searching, no data entry, completely automated.
- What Is This?
- Why Use This?
- Key Features
- Complete Setup Guide
- How to Use
- Configuration Guide
- Understanding the Output
- Troubleshooting
- FAQ
- Technical Architecture
- Contributing
- License
LeadScraper is an intelligent AI agent that automatically discovers businesses and extracts their complete contact information.
In simple terms: You tell it "Find me 100 restaurants in New York" and it does all the work:
- Searches the internet to find restaurant names
- Researches each restaurant across multiple sources (websites, Facebook, Instagram)
- Extracts phone numbers, emails, addresses, social media
- Verifies data quality and assigns confidence scores
- Saves everything to a Google Sheet
All you do is run one command and wait.
- Manual work: 40+ hours to find 100 business contacts
- With LeadScraper: 1 hour, fully automated
- You save: 39 hours of tedious work
- 70% complete profiles (phone + email + address)
- 30% partial profiles (some contact info)
- 0% failures (always finds something)
- Cross-verified across multiple sources
Sales Teams: Build prospect lists in any industry/location
Marketers: Gather leads for campaigns
Recruiters: Find companies in specific sectors
Event Planners: Discover venues and vendors
Researchers: Collect business data for analysis
Anyone: Who needs business contact information at scale
- AI automatically finds businesses based on your criteria
- Intelligent search strategy planning
- Smart name extraction and validation
- No manual input needed
- Uses 4 different AI models simultaneously
- Automatic rotation when rate limits hit
- Never stops - always has a model ready
- 4x higher throughput than single-model systems
Extracts data from:
- Google Search results
- Facebook business pages
- Instagram business profiles
- Official company websites
Then intelligently merges and verifies all data.
- Clean, color-coded progress messages
- Real-time completion indicators
- Easy to track which business is processing
- Beautiful startup banner
- No clutter or emojis
- Cross-verifies information across sources
- Assigns confidence scores (0-100%)
- Handles conflicting data intelligently
- Prioritizes verified information
- Results saved to Google Sheets
- Auto-deduplication
- Auto-sorting by confidence
- Real-time status updates
You need:
- A computer (Windows, Mac, or Linux)
- Internet connection
- 20 minutes for setup
No coding experience needed!
What is Python? The programming language that runs this tool.
How to install:
- Go to python.org/downloads
- Click the big yellow "Download Python" button
- Run the installer
- IMPORTANT: Check the box that says "Add Python to PATH"
- Click "Install Now"
- Verify installation:
You should see:
python --version
Python 3.10or higher
Option A: Using Git (recommended)
- Install Git: git-scm.com/downloads
- Open Terminal/Command Prompt
- Navigate to where you want the project:
cd Documents - Clone the repository:
git clone https://github.com/YOUR-USERNAME/LeadScraper.git cd LeadScraper
Option B: Download ZIP
- Click the green "Code" button on GitHub
- Click "Download ZIP"
- Extract the ZIP file
- Open Terminal/Command Prompt
- Navigate to the extracted folder:
cd path/to/LeadScraper
What are dependencies? Libraries this tool needs to work.
- Make sure you're in the LeadScraper folder
- Run this command:
pip install -r requirements.txt
- Wait 1-2 minutes while it installs
- You'll see lots of text - that's normal!
What is OpenRouter? The AI service that powers the intelligence.
Cost: FREE with the models we use!
Steps:
- Go to openrouter.ai
- Click "Sign Up" (top right)
- Sign up with Google/GitHub/Email
- Once logged in, click your profile icon
- Click "API Keys"
- Click "Create Key"
- Give it a name: "LeadScraper"
- Click "Create"
- COPY THE KEY - it looks like:
sk-or-v1-abc123xyz... - Save it somewhere safe - you'll need it in Step 6
Important: Don't share this key with anyone!
Why Google Sheets? Your extracted data is saved here like a database.
- Go to console.cloud.google.com
- Click "Select a project" (top left)
- Click "New Project"
- Project name: "LeadScraper"
- Click "Create"
- Wait 30 seconds for it to create
- In the search bar, type: "Google Sheets API"
- Click on "Google Sheets API"
- Click "Enable"
- Wait for it to enable
- Click "Credentials" (left sidebar)
- Click "Create Credentials" (top)
- Select "Service Account"
- Service account name: "leadscraper-bot"
- Click "Create and Continue"
- Role: Select "Editor"
- Click "Continue"
- Click "Done"
- Click on the service account you just created
- Click "Keys" tab
- Click "Add Key" → "Create new key"
- Select "JSON"
- Click "Create"
- A file will download:
leadscraper-bot-xxxxx.json - Rename it to:
service_account.json
-
In your LeadScraper folder, create a folder called
credentials -
Move
service_account.jsoninto thecredentialsfolderYour structure should look like:
LeadScraper/ ├── credentials/ │ └── service_account.json ├── modules/ ├── main.py └── ...
- Go to sheets.google.com
- Click "Blank" to create new sheet
- Name it: "LeadScraper Database"
- Copy the Sheet ID from the URL:
https://docs.google.com/spreadsheets/d/THIS_IS_THE_SHEET_ID/edit - Save this ID - you'll need it in Step 6
- In your Google Sheet, click "Share" (top right)
- Open
credentials/service_account.jsonin a text editor - Find the line:
"client_email": "leadscraper-bot@..." - Copy that email address
- Back in Google Sheet, paste the email in "Add people"
- Make sure "Editor" is selected
- Uncheck "Notify people"
- Click "Share"
What is .env? A file that stores your API keys securely.
-
In the LeadScraper folder, create a file named
.env- On Windows: Right-click → New → Text Document → Rename to
.env - On Mac/Linux:
touch .env
- On Windows: Right-click → New → Text Document → Rename to
-
Open
.envin a text editor -
Paste this template:
# OpenRouter API Key (from Step 4) OPENROUTER_API_KEY=sk-or-v1-your-key-here # Google Sheet ID (from Step 5.6) GOOGLE_SHEET_ID=your-sheet-id-here # Path to Google Sheets credentials SERVICE_ACCOUNT_FILE=credentials/service_account.json
-
Replace the placeholders:
- Replace
sk-or-v1-your-key-herewith your actual OpenRouter API key - Replace
your-sheet-id-herewith your actual Google Sheet ID - Keep
SERVICE_ACCOUNT_FILEas is
- Replace
-
Save the file
Example:
OPENROUTER_API_KEY=sk-or-v1-abc123xyz789
GOOGLE_SHEET_ID=1a2b3c4d5e6f7g8h9i0j
SERVICE_ACCOUNT_FILE=credentials/service_account.jsonLet's make sure everything works!
-
Run this command:
python main.py --discover "New York" --quantity 3 -
You should see:
- Beautiful startup banner
- Green "[SUCCESS]" messages
- Cyan "[AI]" messages showing progress
- Completion separators like:
──────────────────────────────────── ────────────── 1/3 ──────────────── ────────────────────────────────────
-
Check your Google Sheet - you should see 3 businesses!
If it works: Setup complete! 🎉
If it doesn't: See Troubleshooting
Find businesses in one command:
python main.py --discover "LOCATION" --quantity NUMBERExamples:
# Find 100 restaurants in New York
python main.py --discover "New York" --quantity 100
# Find 50 cafes in Los Angeles
python main.py --discover "Los Angeles" --quantity 50 --category cafe
# Find 200 gyms in Chicago
python main.py --discover "Chicago" --quantity 200 --category gym
# Quick test with 5 businesses
python main.py --discover "Miami" --quantity 5| Option | Description | Required | Default |
|---|---|---|---|
--discover |
Location to search | Yes | - |
--quantity |
Number of businesses to find | No | 100 |
--category |
Type of business | No | restaurant |
Common categories:
restaurant- Restaurants, eateriescafe- Coffee shops, cafesgym- Gyms, fitness centerssalon- Hair salons, beautybar- Bars, pubshotel- Hotels, accommodationsshop- Retail storesoffice- Business officesclinic- Medical clinicsschool- Schools, education
You can use ANY business type!
Phase 1: Discovery (3-5 minutes for 100)
- AI plans search strategy
- AI generates smart search queries
- AI searches the web
- AI extracts business names
- AI validates and cleans results
- Populates Google Sheet Input tab
Phase 2: Extraction (~70 minutes for 100)
- For each business:
- Searches Google for information
- Searches Facebook for business page
- Searches Instagram for profile
- Crawls official website
- Extracts all contact information
- Fuses data from all sources
- Assigns confidence score
- Writes to Results sheet
You see:
- Beautiful color-coded progress
- Completion separator after each business
- Real-time updates in Google Sheet
Total time for 100 businesses: ~75 minutes
Colors mean:
- Green [SUCCESS]: Operation completed successfully
- Cyan [AI]: AI is working (planning, extracting, validating)
- Yellow [WARNING]: Warning (rate limit, missing data)
- Red [ERROR]: Error occurred
- White [INFO]: General information
Progress separators:
────────────────────────────────────────────────────────────
────────────────────────── 47/100 ─────────────────────────
────────────────────────────────────────────────────────────
This means business #47 just finished!
All settings are in config.py. Here are the main ones:
The tool rotates through 4 AI models automatically:
OPENROUTER_MODELS = [
"arcee-ai/trinity-large-preview:free", # Free, fast
"stepfun/step-3.5-flash:free", # Free, reliable
"deepseek/deepseek-r1-0528:free", # Free, accurate
"openrouter/aurora-alpha" # Paid, premium
]Want to change models? Browse available models at openrouter.ai/models
DELAY_BETWEEN_LEADS = 1 # Seconds between businesses (1 = fast)
DELAY_JITTER = 1 # Random delay 0-1 seconds
MAX_RETRIES = 2 # How many times to retry on failure
SEARCH_DELAY = 1 # Seconds between searchesWant faster? Reduce delays (but may hit rate limits more)
Want more reliable? Increase delays and retries
DEFAULT_DISCOVERY_QUANTITY = 100 # Default number if not specified
MAX_DISCOVERY_QUANTITY = 500 # Maximum allowed
AUTO_CLEAR_INPUT_ON_DISCOVERY = True # Clear input before discoveryENABLE_FACEBOOK = True # Search Facebook pages
ENABLE_INSTAGRAM = True # Search Instagram profiles
ENABLE_DIRECTORIES = True # Extract from directory sites
ENABLE_MULTI_SEARCH = True # Use multiple search enginesWant to skip a source? Set it to False
MAX_PHONES_PER_BUSINESS = 10 # Maximum phone numbers to extract
MAX_EMAILS_PER_BUSINESS = 10 # Maximum emails to extract
MAX_SOCIAL_LINKS = 20 # Maximum social media linksYour Google Sheet has 2 tabs:
Where discovered business names go before extraction.
| Business Name | Category | City | Status |
|---|---|---|---|
| Pizza Hut | restaurant | New York | ✅ Done |
Where extracted data is saved.
| Business Name | Phone Numbers | Email Addresses | Address | City | State | Website | Confidence | Sources | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Pizza Hut | (212) 555-1234, (212) 555-5678 | info@pizzahut.com | 123 Main St | New York | NY | pizzahut.com | facebook.com/pizzahut | instagram.com/pizzahut | 95% | website, facebook, instagram |
What they mean:
- 80-100%: Excellent - Multiple sources verified
- 60-79%: Good - Some verification
- 40-59%: Partial - Limited data
- 0-39%: Low - Minimal information
90%+ means: Data is highly reliable, cross-verified across multiple sources.
Shows where the data came from:
search_results- Google search snippetsfacebook_search- Facebook business pageinstagram_search- Instagram profilewebsite- Official company website
More sources = Higher confidence!
Problem: .env file not configured correctly
Solution:
- Make sure
.envfile exists in main folder - Open
.envand verify:OPENROUTER_API_KEY=sk-or-v1-your-actual-key
- No spaces around
= - No quotes around the key
Problem: Sheet ID missing from .env
Solution:
- Open
.env - Add your Sheet ID:
GOOGLE_SHEET_ID=your-actual-sheet-id
- Get Sheet ID from URL:
https://docs.google.com/spreadsheets/d/THIS_IS_IT/edit
Problem: Google Sheet not shared with service account
Solution:
- Open
credentials/service_account.json - Find
"client_email"- copy the email - Open your Google Sheet
- Click "Share"
- Add that email as Editor
- Uncheck "Notify people"
- Click "Share"
Problem: Hitting rate limits (normal, handled automatically!)
What happens: Tool automatically switches to another AI model
What you see: Yellow [WARNING] Rate limit on model X, rotating...
Action needed: None! It's working as designed.
Problem: AI couldn't find businesses in that location
Solutions:
- Try a larger city name
- Increase
--quantity - Try different category
- Check spelling of location
Solutions:
- Check
logs/scraper.logfor error details - Make sure internet connection is stable
- Restart and try again
- Try with smaller quantity first (
--quantity 5)
Check:
- Confidence score - Low scores mean uncertain data
- Sources - More sources = more reliable
- Google Sheet permissions - Make sure sheet is editable
Q: Do I need to know coding?
A: No! Just copy-paste the commands shown in this guide.
Q: Does this work on Mac/Windows/Linux?
A: Yes! Works on all platforms.
Q: How much does it cost?
A: $0 with the free AI models we use. OpenRouter offers generous free tiers.
Q: Is this legal?
A: Yes. It only searches publicly available information on the internet.
Q: Will I get banned?
A: No. We use rate limiting and multiple models to stay within limits.
Q: How accurate is the data?
A: 70% of businesses get complete profiles. Data is cross-verified from multiple sources.
Q: Can I trust the confidence scores?
A: Yes. 80%+ confidence means data is verified across multiple sources.
Q: What if data is wrong?
A: Lower confidence scores indicate uncertain data. Always verify critical information.
Q: Can I export to CSV?
A: Yes! In Google Sheets: File → Download → CSV
Q: How does multi-model rotation work?
A: When Model A hits rate limit, it automatically switches to Model B, then C, then D. Each model has 60-second cooldown.
Q: Why use 4 models instead of 1?
A: 4x higher throughput. While one model is on cooldown, others keep working.
Q: Can I add more models?
A: Yes! Edit config.py → OPENROUTER_MODELS and add model names from openrouter.ai
Q: Can I use paid models?
A: Yes! Add paid model names to the list. They're faster and more accurate.
Q: Where are logs stored?
A: In logs/scraper.log. Check here if something goes wrong.
Q: Can I run this on my server 24/7?
A: Yes! It's designed for long-running operations.
Q: Can I pause and resume?
A: Not yet. But you can stop and it will skip already processed items.
Q: Can I run multiple searches simultaneously?
A: Not recommended. Run them sequentially to avoid rate limits.
Q: What's the maximum quantity I can extract?
A: 500 businesses per run (configurable in config.py)
Q: How long does 100 businesses take?
A: About 75 minutes (5 min discovery + 70 min extraction)
CLI Command → AI Discovery → Multi-Source Extraction → Data Fusion → Google Sheets
AIManager
├── Model Pool [arcee, stepfun, deepseek, aurora]
├── Rate Limit Detector
├── Auto-Rotation Logic
└── Cooldown Manager (60s per model)When a model hits HTTP 429:
- Mark model as "on cooldown"
- Select next available model
- Retry request with new model
- After 60s, original model available again
- Collect: Extract from all sources
- Normalize: Clean and standardize
- Verify: Cross-check across sources
- Score: Assign confidence based on:
- Number of sources (more = higher)
- Data consistency (matching = higher)
- Source priority (website > social media)
- Merge: Combine verified data
- Output: Single profile with confidence score
main.py # Orchestrator + CLI
├── ai_manager.py # Multi-model AI rotation
├── display.py # Terminal UI
├── restaurant_discovery.py # AI discovery
├── search.py # Multi-engine search
├── sources/
│ ├── facebook_scraper.py
│ ├── instagram_scraper.py
│ └── search_result_extractor.py
├── fusion.py # Data merging + scoring
├── extractor.py # Website extraction
├── crawler.py # Website crawling
├── sheets.py # Google Sheets I/O
└── utils.py # Logging, validation
Want to improve LeadScraper? Contributions welcome!
- Fork this repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Test thoroughly
- Commit:
git commit -m "Add feature" - Push:
git push origin feature-name - Open a Pull Request
- Add more data sources (LinkedIn, Yelp)
- Improve AI prompts for better extraction
- Add export formats (CSV, JSON)
- Create web interface
- Add caching to avoid re-searching
- Implement parallel processing
- Add resume/checkpoint support
MIT License - See LICENSE file for details
Built with:
- OpenRouter - AI models
- DuckDuckGo Search - Web searching
- Google Sheets API - Data storage
- Rich - Terminal UI
- BeautifulSoup - HTML parsing
Found a bug? Have a question?
- Check Troubleshooting
- Check FAQ
- Check
logs/scraper.logfor error details - Open an issue on GitHub
Happy lead hunting! 🎯