Swoop Web Crawler

A high-performance, production-ready web crawler built with Rust, designed for scalable data extraction from social media platforms and web content.

✨ Features

High-Performance HTTP Client - Built on hyper with connection pooling and async operations
Real-Time TUI Dashboard - Monitor crawling operations with an interactive terminal interface
Modular Architecture - Extensible platform-specific scrapers for social media sites
Scalable Storage - ScyllaDB for time-series data and S3-compatible archival storage
Anti-Bot Evasion - Rate limiting, proxy rotation, and stealth capabilities
Compliance Ready - GDPR compliance features and robots.txt respect

🚀 Quick Start

Prerequisites

Rust 1.82.0 or higher
Optional: ScyllaDB for storage (Docker available)
Optional: S3-compatible storage

Installation

# Clone the repository
git clone https://github.com/codewithkenzo/swoop.git
cd swoop

# Build the project
cargo build --release

Running the TUI Dashboard

The TUI provides a real-time dashboard for monitoring scraper performance and status.

# Run the TUI interface
cargo run --bin swoop-tui

TUI Controls:

q or Esc: Quit the application
Tab / Shift+Tab: Cycle through tabs
← / →: Navigate panes within the Overview tab
↑ / ↓: Scroll through lists (Logs, Targets, etc.)
Spacebar: Pause or resume the scraping engine
i: Enter input mode to add new target URLs
l: Load URLs from the default file (urls.txt)
e: Switch to the Export tab
d: Launch the advanced, standalone dashboard

Running the CLI Scraper

For command-line operations, use the swoop-cli binary.

Scrape a single URL:

cargo run --bin swoop-cli -- --url "https://example.com"

Scrape a list of URLs from a file:

cargo run --bin swoop-cli -- --file urls.txt

CLI Options:

--url <URL>: Scrape a single URL.
--file <PATH>: Scrape URLs from a file (one per line).
--concurrency <NUM>: Set the number of concurrent requests (default: 10).
--output-dir <DIR>: Specify the directory for saving results (default: ./test_output).
--format <FORMAT>: Set the output format (json or csv, default: json).

📚 Documentation

User Guide - Complete setup and usage instructions
API Reference - Detailed API documentation for all crates
Architecture - System design and component overview
Examples - Practical usage examples and tutorials

🏗️ Project Structure

swoop/
├── core/           # HTTP client and networking utilities
├── scrapers/       # Platform-specific content extraction
├── storage/        # Data persistence layer (ScyllaDB + S3)
├── tui/           # Terminal user interface
├── docs/          # Documentation
└── examples/      # Usage examples

🔧 Configuration

Swoop uses environment variables for configuration. Create a .env file in the root directory:

# Scraper settings
MAX_CONCURRENT=10
RATE_LIMIT=1.0
USER_AGENT="Swoop/1.0"

# ScyllaDB settings
SCYLLA_NODES="127.0.0.1:9042"
SCYLLA_KEYSPACE="swoop"

# S3 settings
S3_BUCKET="swoop-data"
S3_REGION="us-east-1"
AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

🛠️ Development

Building from Source

# Development build
cargo build

# Run tests
cargo test

# Check code quality
cargo clippy
cargo fmt

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Security: Please report security vulnerabilities to security@swoop.dev

Built with ❤️ using Rust

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
core		core
logs		logs
scrapers		scrapers
storage		storage
tui		tui
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
clippy.toml		clippy.toml
run_tui.sh		run_tui.sh
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
test_tui.sh		test_tui.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Swoop Web Crawler

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the TUI Dashboard

Running the CLI Scraper

📚 Documentation

🏗️ Project Structure

🔧 Configuration

🛠️ Development

Building from Source

🤝 Contributing

📄 License

🙋 Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

codewithkenzo/swoop

Folders and files

Latest commit

History

Repository files navigation

Swoop Web Crawler

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the TUI Dashboard

Running the CLI Scraper

📚 Documentation

🏗️ Project Structure

🔧 Configuration

🛠️ Development

Building from Source

🤝 Contributing

📄 License

🙋 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages