A professional Python scraper for extracting foreclosure and bankruptcy notices from the Minnesota Public Notice website for lead generation in financial assistance services.
Windows: Double-click run_scraper.bat
Mac/Linux: Double-click run_scraper.sh
The launcher will automatically:
- Install Python 3.11+ if needed
- Install all required packages
- Set up API keys (first time only)
- Run the scraper
- Searches for foreclosure and bankruptcy notices from yesterday's date
- Automatically solves reCAPTCHA challenges using 2captcha service
- Extracts contact information using AI-powered text parsing
- Processes multiple pages of results (typically 200-300+ notices per day)
- Saves results to organized CSV files
- Uses VPN rotation to prevent IP blocking
Each record contains:
- Contact Info: First Name, Last Name, Street Address, City, State, ZIP
- Legal Details: Date Filed, Plaintiff/Creditor Name
- Reference: Link to original notice, Internal notice ID
- Python 3.7+ (auto-installed if missing)
- Playwright browser automation
- Required Python packages (auto-installed)
- Mullvad VPN ($5/month) - Prevents IP blocking
- 2captcha (~$2-5/month) - Solves captchas automatically
- OpenAI API (~$3-10/month) - AI text parsing
Follow the guides in order:
- SETUP_GUIDE.txt - Complete setup walkthrough
- DAILY_USE.txt - How to run daily
- AUTOMATION_GUIDE.txt - Set up automatic daily runs
- TROUBLESHOOTING.txt - Fix common problems
✅ Fully Automated - One-click operation after setup
✅ Enterprise Reliability - Handles all website quirks and errors
✅ AI-Powered Parsing - Superior data extraction accuracy
✅ Multi-page Processing - Gets all available notices, not just first 50
✅ IP Blocking Prevention - VPN rotation and smart rate limiting
✅ Professional Output - Clean, organized CSV files
✅ Self-Updating - Handles Python and package updates automatically
MN_Notice_Scraper/
├── run_scraper.bat # Windows launcher
├── run_scraper.sh # Mac/Linux launcher
├── SETUP_GUIDE.txt # Complete setup instructions
├── DAILY_USE.txt # Daily operation guide
├── AUTOMATION_GUIDE.txt # Scheduling setup
├── TROUBLESHOOTING.txt # Problem solving
├── mn_scraper.py # Main scraper
├── gpt_parser.py # AI text parsing
├── mullvad_manager.py # VPN management
├── requirements.txt # Python dependencies
└── csvs/ # Output folder (auto-created)
- Runtime: 1 hour+ for typical daily batch (200-300 notices)
- Success Rate: 95%+ captcha solving, 90%+ data extraction
- Data Quality: AI parsing eliminates formatting artifacts
- Reliability: Enterprise-grade error handling with automatic recovery
- Automatic (recommended): Set up daily scheduling - see AUTOMATION_GUIDE.txt
- Manual: Double-click launcher file, press Enter when prompted
Results are saved as mn_notices_YYYY-MM-DD.csv
with yesterday's date.
📅 Daily Data: Automatically searches for yesterday's notices
💾 File Preservation: Each day creates a new CSV - old files are kept
🔐 API Keys: Stored securely in .env file (created automatically)
Check the documentation files for detailed help:
- Setup issues → SETUP_GUIDE.txt
- Daily operation → DAILY_USE.txt
- Common problems → TROUBLESHOOTING.txt
- Automation → AUTOMATION_GUIDE.txt
This scraper is designed for legitimate business use in financial assistance services. Always respect website terms of use and rate limits.