A robust web crawler designed to gather information about the Ukraine-Russia war from various sources. This crawler is built with anti-detection mechanisms and human-like behavior simulation to bypass common anti-scraping measures.
- Browser automation with undetected-chromedriver
- Human-like behavior simulation
- Anti-bot detection bypass
- Dynamic content handling
- Article extraction and processing
- Local storage in JSON/CSV format
src/
├── crawler/ # Core crawler implementation
├── utils/ # Utility functions
└── config/ # Configuration files
tests/ # Test files
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Unix/macOS- Install dependencies:
pip install -r requirements.txt[Usage instructions will be added as the project develops]
MIT License