Automated web scraping system for monitoring product prices on fireawaysupply.co.uk with daily execution, change detection, and Slack notifications.
- β Async Playwright browser automation
- β ~500 products across 9 categories
- β Price change detection with historical tracking
- β Stock availability monitoring
- β Slack Block Kit notifications
- β CSV exports (daily snapshots + changes)
- β AWS S3 backup (optional)
- β Docker support with multiple execution modes
- β Retry logic with exponential backoff
- β Error screenshots and alerts
- β Daily scheduling with APScheduler
- Python 3.11+
- Docker (optional, recommended)
- Slack webhook URL
- AWS S3 credentials (optional)
-
Clone and configure:
git clone <repo-url> cd price-updates cp .env.example .env # Edit .env with your credentials
-
Build Docker image:
chmod +x scripts/*.sh ./scripts/docker-build.sh -
Run in scheduled mode (daily at 2 AM):
./scripts/docker-run-scheduled.sh
-
Or run once manually:
./scripts/docker-run-once.sh
-
Setup virtual environment:
python3.11 -m venv venv source venv/bin/activate # or `venv\Scripts\activate` on Windows
-
Install dependencies:
pip install -r requirements.txt playwright install chromium
-
Configure:
cp .env.example .env # Edit .env with your Fireaway credentials and Slack webhook -
Run once:
python src/main.py --mode once
-
Run scheduled:
python src/scheduler.py
All configuration via environment variables (.env file):
FIREAWAY_USERNAME- Login username for Fireaway Supply portalFIREAWAY_PASSWORD- Login password for Fireaway Supply portal
SLACK_WEBHOOK_URL- Slack webhook for notifications (highly recommended)AWS_S3_ENABLED=false- Enable S3 backup for CSV filesAWS_S3_BUCKET- S3 bucket nameSCRAPE_SCHEDULE="0 2 * * *"- Cron schedule (default: 2 AM daily)CONCURRENCY_LIMIT=3- Number of parallel product scrapesHEADLESS=true- Headless browser mode (false for debugging)LOG_LEVEL=INFO- Logging level (DEBUG, INFO, WARNING, ERROR)
See .env.example for all available options.
price-updates/
βββ src/
β βββ scrapers/ # Web scraping modules
β β βββ browser.py # Playwright browser setup
β β βββ auth.py # Authentication logic
β β βββ category.py # Category page scraper
β β βββ product.py # Product detail scraper
β βββ storage/ # Data persistence
β β βββ database.py # SQLite operations
β β βββ csv_export.py # CSV file generation
β β βββ s3_uploader.py # AWS S3 backup
β βββ notifications/ # Alert systems
β β βββ slack.py # Slack Block Kit notifications
β βββ utils/ # Utilities
β β βββ logger.py # Logging configuration
β β βββ retry.py # Retry decorator
β β βββ config.py # Configuration loader
β βββ main.py # Main entry point
β βββ scheduler.py # APScheduler setup
βββ data/
β βββ database.sqlite # Historical product data
β βββ exports/ # CSV export files
β β βββ products_*.csv # Daily snapshots
β β βββ changes_*.csv # Price/stock changes
β βββ screenshots/ # Error screenshots
βββ config/
β βββ categories.json # Category configuration
βββ docker/
β βββ Dockerfile
β βββ docker-compose.yml
βββ scripts/ # Helper scripts
β βββ docker-build.sh
β βββ docker-run-once.sh
β βββ docker-run-scheduled.sh
βββ tests/ # Test suite
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
βββ README.md # This file
Scheduled mode (runs continuously with daily execution):
docker-compose -f docker/docker-compose.yml up -d fireaway-scraper-scheduled
docker-compose -f docker/docker-compose.yml logs -f # View logs
docker-compose -f docker/docker-compose.yml down # StopOne-shot mode (manual execution):
docker-compose -f docker/docker-compose.yml --profile manual run --rm fireaway-scraper-onceTest mode (dry run):
docker-compose -f docker/docker-compose.yml --profile test run --rm fireaway-scraper-testFull Snapshot (products_YYYY-MM-DD_HH-MM-SS.csv):
sku,name,category,price_current,price_original,stock_status,url,image_url,last_checked
FA-001,Product Name,Food,15.99,19.99,in-stock,https://...,https://...,2025-11-06 02:00:00
Changes Report (changes_YYYY-MM-DD_HH-MM-SS.csv):
timestamp,sku,name,change_type,old_value,new_value,price_diff,price_diff_percent
2025-11-06 02:00:00,FA-001,Product Name,price_decrease,19.99,15.99,-4.00,-20.01%
Rich formatted messages with:
- π Price decreases (with % and absolute difference)
- π Price increases (with alert emoji)
- β Stock now available (out-of-stock β in-stock)
- β Out of stock (in-stock β out-of-stock)
- π New products (first-time detection)
Example notification:
π₯ Fireaway Supply - Price Update Alert
π Price Decreased
Product Name
Β£19.99 β Β£15.99 (-Β£4.00, -20.01%)
Category: Food | SKU: FA-001
Test authentication:
python src/main.py --test-authTest Slack notifications:
python src/main.py --test-slackRun full test suite:
pytest tests/ -vTest with visible browser (debugging):
HEADLESS=false python src/main.py --mode once- Execution time: ~15-20 minutes (for ~500 products)
- Products monitored: ~500 across 9 categories
- Concurrency: 3 parallel product scrapes (configurable)
- Memory usage: ~500MB (Playwright browser + data)
- Storage growth: ~5-10MB per daily run (SQLite + CSVs)
- Network: ~100-200 HTTP requests per run
Install dev dependencies:
pip install -r requirements-dev.txtCode formatting:
black src/ tests/Linting:
pylint src/Type checking:
mypy src/Run tests with coverage:
pytest tests/ --cov=src --cov-report=htmlβ οΈ Never commit.envfile (added to.gitignore)- π Use Docker secrets in production
- π Rotate credentials regularly
- π‘οΈ Limit S3 bucket access with IAM policies
- πΈ Screenshots may contain sensitive data - secure storage location
- π Use environment-specific
.envfiles (.env.prod,.env.dev)
- β
Check username/password in
.env - β
Try with
HEADLESS=falseto see browser behavior - β
Check error screenshots in
data/screenshots/ - β Verify website is accessible and login page hasn't changed
- β Verify webhook URL format is correct
- β
Test with
python src/main.py --test-slack - β Check webhook permissions in Slack workspace
- β Review logs for notification errors
- β Check AWS credentials are valid
- β Verify bucket exists and region is correct
- β
Ensure IAM user has
s3:PutObjectpermission - β
Test AWS CLI:
aws s3 ls s3://your-bucket-name/
- β¬οΈ Reduce
CONCURRENCY_LIMIT(try 1 or 2) - β±οΈ Increase
REQUEST_DELAY_MS(try 3000-5000) - π§Ή Clean up old screenshots and CSV exports
- πΎ Check disk space for database growth
- β
Check
config/categories.json- all categories enabled? - β Verify website structure hasn't changed (selectors)
- β Check logs for scraping errors
- β Try scraping single category first
- β
Check logs:
docker-compose logs fireaway-scraper-scheduled - β
Verify
.envfile is mounted correctly - β Ensure all required env variables are set
- β Check for Python syntax errors in logs
- π Weekly: Review logs for errors
- π Monthly: Backup database file
- π Monthly: Clean up old CSV exports
- π
Quarterly: Update dependencies (
pip install -U -r requirements.txt) - π Quarterly: Review and optimize database (SQLite VACUUM)
# Backup database
cp data/database.sqlite data/database_backup_$(date +%Y%m%d).sqlite
# Optimize database
sqlite3 data/database.sqlite "VACUUM;"
# Check database size
du -h data/database.sqlite# Archive old logs
gzip data/scraper.log
mv data/scraper.log.gz data/logs/scraper_$(date +%Y%m%d).log.gz- Increase
CONCURRENCY_LIMIT(4-6 with good connection) - Adjust
REQUEST_DELAY_MSto balance speed vs. politeness - Consider running multiple scrapers for different category groups
- Use managed scheduling (AWS EventBridge, Google Cloud Scheduler)
- Store database in persistent volume or managed database
- Use managed Slack app instead of webhooks
- Implement monitoring (Datadog, New Relic)
- Add alerting for scraper failures
- β Scrape execution time
- β Number of products scraped
- β Number of price changes detected
- β Error rate
- β Database size growth
- β Memory usage
# Check last successful run
sqlite3 data/database.sqlite "SELECT MAX(last_checked) FROM products;"
# Count products in database
sqlite3 data/database.sqlite "SELECT COUNT(*) FROM products;"
# Check for recent errors
tail -n 100 data/scraper.log | grep ERROR- Web dashboard for viewing price history
- Email notifications (alternative to Slack)
- Price alerts with custom thresholds
- Product filtering by category/price range
- Historical price charts
- API endpoint for querying data
- Multi-site support (expand beyond Fireaway)
- Machine learning for price prediction
MIT License
Copyright (c) 2025
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8
- Use type hints
- Add docstrings to functions
- Write tests for new features
feat: Add price prediction model
fix: Handle missing product images
docs: Update installation instructions
test: Add integration tests for S3 upload
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: See
/docsfolder for detailed guides
- Playwright - Browser automation
- APScheduler - Task scheduling
- Slack Block Kit - Rich notifications
- SQLite - Embedded database
Made with β€οΈ for efficient price monitoring