A high-performance, production-ready web crawler built with Rust, designed for scalable data extraction from social media platforms and web content.
- High-Performance HTTP Client - Built on
hyper
with connection pooling and async operations - Real-Time TUI Dashboard - Monitor crawling operations with an interactive terminal interface
- Modular Architecture - Extensible platform-specific scrapers for social media sites
- Scalable Storage - ScyllaDB for time-series data and S3-compatible archival storage
- Anti-Bot Evasion - Rate limiting, proxy rotation, and stealth capabilities
- Compliance Ready - GDPR compliance features and robots.txt respect
- Rust 1.82.0 or higher
- Optional: ScyllaDB for storage (Docker available)
- Optional: S3-compatible storage
# Clone the repository
git clone https://github.com/codewithkenzo/swoop.git
cd swoop
# Build the project
cargo build --release
The TUI provides a real-time dashboard for monitoring scraper performance and status.
# Run the TUI interface
cargo run --bin swoop-tui
TUI Controls:
q
orEsc
: Quit the applicationTab
/Shift+Tab
: Cycle through tabs←
/→
: Navigate panes within the Overview tab↑
/↓
: Scroll through lists (Logs, Targets, etc.)Spacebar
: Pause or resume the scraping enginei
: Enter input mode to add new target URLsl
: Load URLs from the default file (urls.txt
)e
: Switch to the Export tabd
: Launch the advanced, standalone dashboard
For command-line operations, use the swoop-cli
binary.
Scrape a single URL:
cargo run --bin swoop-cli -- --url "https://example.com"
Scrape a list of URLs from a file:
cargo run --bin swoop-cli -- --file urls.txt
CLI Options:
--url <URL>
: Scrape a single URL.--file <PATH>
: Scrape URLs from a file (one per line).--concurrency <NUM>
: Set the number of concurrent requests (default: 10).--output-dir <DIR>
: Specify the directory for saving results (default:./test_output
).--format <FORMAT>
: Set the output format (json
orcsv
, default:json
).
- User Guide - Complete setup and usage instructions
- API Reference - Detailed API documentation for all crates
- Architecture - System design and component overview
- Examples - Practical usage examples and tutorials
swoop/
├── core/ # HTTP client and networking utilities
├── scrapers/ # Platform-specific content extraction
├── storage/ # Data persistence layer (ScyllaDB + S3)
├── tui/ # Terminal user interface
├── docs/ # Documentation
└── examples/ # Usage examples
Swoop uses environment variables for configuration. Create a .env
file in the root directory:
# Scraper settings
MAX_CONCURRENT=10
RATE_LIMIT=1.0
USER_AGENT="Swoop/1.0"
# ScyllaDB settings
SCYLLA_NODES="127.0.0.1:9042"
SCYLLA_KEYSPACE="swoop"
# S3 settings
S3_BUCKET="swoop-data"
S3_REGION="us-east-1"
AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
# Development build
cargo build
# Run tests
cargo test
# Check code quality
cargo clippy
cargo fmt
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Security: Please report security vulnerabilities to security@swoop.dev
Built with ❤️ using Rust