Autonomous AI-Powered Privacy Defense Platform
Your data is exposed. Your identity is mapped. Take it back.
Features β’ Tech Stack β’ Getting Started β’ Architecture β’ Screenshots
DataReaper is an autonomous AI system that hunts down your exposed personal data across the web and forces its deletion β without you lifting a finger. It combines advanced OSINT (Open Source Intelligence) techniques, AI-powered identity resolution, and automated legal compliance workflows to help individuals reclaim their digital privacy.
Data brokers are silently collecting, packaging, and selling your personal data faster than you can track it. DataReaper transforms fragmented public data into a unified identity graph, then weaponizes it to systematically eliminate your digital footprint from data broker databases.
- π Comprehensive OSINT Scanning: Continuously scan 100+ platforms and data broker sites using advanced reconnaissance techniques
- π§ AI-Powered Identity Resolution: Build complete digital identity graphs through intelligent cross-platform pivoting
- βοΈ Automated Legal Compliance: Generate and dispatch legally binding deletion requests (GDPR, CCPA, DPDP Act)
- π§ Autonomous Email Warfare: AI agents handle broker responses, objections, and escalations automatically
- π Real-Time Intelligence Dashboard: Monitor exposure levels, track deletion progress, and visualize your identity graph
- π― War Room Operations: Coordinate multi-broker deletion campaigns with military precision
- Multi-platform account discovery using email, phone, or username seeds
- Username enumeration across social networks and forums
- Profile scraping with anti-detection browser automation
- Maigret integration for deep username reconnaissance
- AI-powered identity resolution using LLM analysis
- Cross-platform data correlation and pivoting
- Visual graph representation of digital footprint
- Real-time graph updates as new data is discovered
- Automated scanning of 100+ data broker catalogs
- Intelligent listing verification with confidence scoring
- Opt-out rule engine for broker-specific workflows
- Contact point discovery and validation
- Multi-jurisdiction compliance (GDPR, CCPA, DPDP Act)
- Automated legal notice generation
- Escalation workflows for non-compliant brokers
- Audit trail and compliance tracking
- Intent classification for incoming broker emails
- Context-aware reply generation
- Objection handling and legal argumentation
- Thread continuity and conversation memory
- Centralized command center for deletion campaigns
- Real-time status tracking across all brokers
- Threat level assessment and prioritization
- Batch operations and bulk actions
The Gateway to Privacy Reclamation
The landing page introduces users to DataReaper's mission with a compelling narrative about data exposure and digital privacy. It features:
- Hero section with clear value proposition
- Problem statement highlighting data broker threats
- Feature showcase explaining the three-pillar approach: Scan, Identify, Terminate
- Process flow visualization showing how DataReaper works
- Call-to-action for launching privacy protection
Seamless Privacy Journey Initialization
The onboarding experience guides users through their first privacy scan with:
- Simple seed input (email, phone, or username)
- Jurisdiction selection for legal compliance
- Privacy preferences configuration
- Real-time scan initialization
- Educational tooltips explaining each step
Mission Control for Your Digital Privacy
The Command Centre serves as the central dashboard providing:
- Exposure Overview: Real-time metrics on discovered accounts, data brokers, and deletion progress
- Active Scans: Monitor ongoing OSINT reconnaissance operations
- Threat Assessment: Visual indicators of exposure severity across different categories
- Quick Actions: Launch new scans, review reports, or access the War Room
- Timeline View: Chronological activity feed of discoveries and deletions
- Statistics Dashboard: Charts and graphs showing privacy improvement over time
Visualize Your Digital Footprint
The Identity Graph provides an interactive visualization of your digital identity:
- Node-Based Visualization: See how your data points connect across platforms
- Relationship Mapping: Understand how brokers correlate your information
- Platform Clustering: Group accounts by social networks, forums, and data brokers
- Interactive Exploration: Click nodes to reveal detailed information
- Export Capabilities: Generate reports from graph data
- Real-Time Updates: Watch the graph evolve as new data is discovered
The graph uses force-directed layout algorithms to show:
- Seed identifiers (email, phone, username)
- Discovered platform accounts
- Extracted usernames and aliases
- Resolved identity attributes (name, location)
- Data broker listings and exposures
Coordinate Deletion Campaigns with Precision
The War Room is where deletion operations are planned and executed:
- Broker Target List: Complete inventory of data brokers holding your information
- Campaign Status: Track deletion request status (pending, in-progress, completed, escalated)
- Email Thread Viewer: Review AI-generated legal notices and broker responses
- Escalation Management: Handle non-compliant brokers with automated escalation workflows
- Batch Operations: Execute mass deletion requests across multiple brokers
- Compliance Tracking: Monitor legal deadlines and response times
- Agent Activity Log: See what your AI agents are doing in real-time
Key features include:
- One-click deletion request dispatch
- Automated follow-up scheduling
- Legal template customization
- Attachment handling for ID verification requests
- Success rate analytics per broker
DataReaper follows a modern microservices-inspired architecture with clear separation of concerns:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (React + Vite) β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Landing β βOnboardingβ β Command β β War β β
β β Page β β Flow β β Center β β Room β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β β β
β ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ β
β β β
β TanStack Query β
β β β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β
REST API / WebSocket
β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β API Layer (FastAPI Router) β β
β β /onboarding /scans /dashboard /war-room /inbox β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ β
β β Agent Orchestration β β
β β ββββββββββββ ββββββββββββ ββββββββββββββββββββββββ β β
β β β Sleuth β β Legal β β Communications β β β
β β β Agent β β Agent β β Agent β β β
β β ββββββββββββ ββββββββββββ ββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ β
β β Core Services β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ β β
β β β OSINT β β Broker β β Email β β Legal β β β
β β β Engine β βDiscovery β β Sync β βComplianceβ β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ β
β β Infrastructure Layer β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ β β
β β βPostgreSQLβ β Redis β βPlaywrightβ β ARQ β β β
β β β (Neon) β β Cache β β Browser β β Queue β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- React 18.3 with TypeScript for type-safe UI development
- Vite for lightning-fast development and optimized production builds
- TanStack Query for server state management and caching
- Radix UI for accessible, unstyled component primitives
- Tailwind CSS 4 for utility-first styling
- Motion (Framer Motion) for smooth animations
- FastAPI for high-performance async API endpoints
- Pydantic v2 for request/response validation
- WebSocket support for real-time updates
- CORS middleware for secure cross-origin requests
- Sleuth Agent: OSINT discovery and identity resolution
- Legal Agent: Compliance analysis and notice generation
- Communications Agent: Email intent classification and reply generation
- Prompt Manager: Centralized LLM prompt templates
- OSINT Engine: Account discovery, username enumeration, profile scraping
- Broker Discovery: Catalog management, listing verification, opt-out rules
- Email Sync: Gmail OAuth integration, thread building, attachment handling
- Legal Compliance: Multi-jurisdiction rules, escalation workflows, audit trails
- PostgreSQL (Neon): Primary database with async support via asyncpg
- SQLAlchemy 2.x: Modern async ORM with relationship mapping
- Alembic: Database migrations and schema versioning
- ARQ: Redis-based async task queue for long-running operations
- APScheduler: Cron-like scheduling for periodic scans
- Playwright: Headless browser automation for anti-detection scraping
- Maigret: Username OSINT across 3000+ sites
- Groq: Fast LLM inference for AI agents
- Gmail API: OAuth-based email access and sending
| Technology | Purpose | Version |
|---|---|---|
| React | UI framework | 18.3.1 |
| TypeScript | Type-safe JavaScript | 5.0+ |
| Vite | Build tool & dev server | 6.4+ |
| TanStack Query | Server state management | 5.99+ |
| Radix UI | Accessible component primitives | Latest |
| Tailwind CSS | Utility-first CSS framework | 4.1+ |
| Motion | Animation library | 12.23+ |
| Recharts | Data visualization | 2.15+ |
| React Router | Client-side routing | 7.13+ |
| Axios | HTTP client | 1.15+ |
| Sonner | Toast notifications | 2.0+ |
| Technology | Purpose | Version |
|---|---|---|
| Python | Core language | 3.11+ |
| FastAPI | Web framework | 0.116+ |
| SQLAlchemy | ORM | 2.0+ |
| Pydantic | Data validation | 2.11+ |
| Alembic | Database migrations | 1.14+ |
| Uvicorn | ASGI server | 0.34+ |
| PostgreSQL | Primary database | Latest |
| asyncpg | Async PostgreSQL driver | 0.30+ |
| ARQ | Async task queue | 0.26+ |
| Redis | Cache & queue backend | Latest |
| Technology | Purpose |
|---|---|
| Groq | Fast LLM inference |
| Playwright | Browser automation |
| Maigret | Username OSINT |
| BeautifulSoup4 | HTML parsing |
| Trafilatura | Web content extraction |
| curl-cffi | Anti-detection HTTP client |
| Technology | Purpose |
|---|---|
| Google API Client | Gmail integration |
| google-auth-oauthlib | OAuth 2.0 flow |
| email-validator | Email validation |
| Technology | Purpose |
|---|---|
| pytest | Testing framework |
| pytest-asyncio | Async test support |
| pytest-cov | Coverage reporting |
| Ruff | Fast Python linter |
| Mypy | Static type checking |
| Vitest | Frontend testing |
- Python 3.11+ installed
- Node.js 18+ and pnpm installed
- PostgreSQL database (or Neon account)
- Redis server running
- Gmail API credentials (for email features)
- Groq API key (for AI agents)
-
Clone the repository
git clone https://github.com/yourusername/datareaper.git cd datareaper/backend -
Install Python dependencies
# Using uv (recommended) pip install uv uv sync # Or using pip pip install -e ".[dev]"
-
Configure environment variables
cp .env.example .env
Edit
.envand configure:# Database DATABASE_URL=postgresql+asyncpg://user:pass@localhost/datareaper # Redis REDIS_URL=redis://localhost:6379 # AI GROQ_API_KEY=your_groq_api_key # Gmail (optional) GMAIL_CLIENT_ID=your_client_id GMAIL_CLIENT_SECRET=your_client_secret # App APP_DEBUG=true APP_CORS_ORIGINS=["http://localhost:5173"]
-
Run database migrations
alembic upgrade head
-
Seed initial data (optional)
python scripts/import_broker_catalog.py python scripts/import_platform_catalog.py python scripts/seed_demo_data.py
-
Start the backend stack
# Windows PowerShell .\scripts\start_stack.ps1 # Or manually start API and worker uvicorn datareaper.main:app --reload --app-dir src --port 8000 arq datareaper.worker.WorkerSettings
The API will be available at http://localhost:8000
-
Navigate to frontend directory
cd ../frontend -
Install dependencies
pnpm install
-
Configure environment (if needed)
# Create .env.local echo "VITE_API_URL=http://localhost:8000" > .env.local
-
Start development server
pnpm dev
The frontend will be available at http://localhost:5173
To quickly explore DataReaper with pre-populated demo data:
cd backend
python scripts/seed_demo_data.pyThis creates:
- Sample scan results
- Demo identity graph
- Mock broker cases
- Example email threads
Once the backend is running, visit:
- Interactive API Docs (Swagger): http://localhost:8000/docs
- Alternative API Docs (ReDoc): http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
POST /api/onboarding/start- Initialize new privacy scanGET /api/scans/{scan_id}- Get scan status and resultsPOST /api/scans/{scan_id}/resume- Resume paused scan
GET /api/dashboard/overview- Get exposure metricsGET /api/dashboard/timeline- Get activity timeline
GET /api/recon/graph/{scan_id}- Get identity graph dataGET /api/targets/{scan_id}- Get discovered broker targets
GET /api/war-room/cases- List all broker casesPOST /api/war-room/cases/{case_id}/dispatch- Send deletion requestGET /api/war-room/cases/{case_id}/thread- Get email thread
GET /api/inbox/threads- List email threadsPOST /api/inbox/sync- Sync Gmail inboxPOST /api/inbox/reply- Generate AI reply
cd backend
# Run all tests
pytest
# Run with coverage
pytest --cov=datareaper --cov-report=html
# Run specific test file
pytest tests/test_sleuth_agent.py
# Run with verbose output
pytest -vcd frontend
# Run all tests
pnpm test
# Run in watch mode
pnpm test --watch
# Run with coverage
pnpm test --coverageDataReaper is built with security and privacy as core principles:
- End-to-end encryption for sensitive user data
- Secure credential storage using environment variables
- No third-party tracking or analytics
- Local-first architecture - your data stays in your database
- GDPR compliant deletion workflows
- CCPA compliant opt-out mechanisms
- DPDP Act (India) support
- Audit trails for all operations
- OAuth 2.0 for Gmail integration
- JWT-based session management
- Rate limiting on API endpoints
- Input validation with Pydantic
- SQL injection protection via SQLAlchemy ORM
- CORS configuration for API security
DataReaper is designed for legitimate privacy protection only. Users must:
- Only scan their own personal information
- Comply with applicable laws and regulations
- Respect platform terms of service
- Use automation responsibly
datareaper/
βββ backend/
β βββ src/datareaper/
β β βββ agents/ # AI agent implementations
β β βββ api/ # FastAPI routes and endpoints
β β βββ brokers/ # Data broker catalog and discovery
β β βββ comms/ # Email sync and communication
β β βββ compliance/ # Legal compliance engine
β β βββ core/ # Core utilities and config
β β βββ identity/ # Identity resolution
β β βββ osint/ # OSINT discovery tools
β β βββ scraper/ # Web scraping orchestration
β β βββ worker/ # Background task workers
β βββ migrations/ # Alembic database migrations
β βββ scripts/ # Utility scripts
β βββ tests/ # Test suite
β βββ data/ # Static data (catalogs, prompts)
βββ frontend/
β βββ src/
β β βββ components/ # Reusable UI components
β β βββ pages/ # Page components
β β βββ lib/ # Utilities and API client
β β βββ hooks/ # Custom React hooks
β β βββ styles/ # Global styles
β βββ public/ # Static assets
βββ Screenshots/ # UI screenshots for documentation
DataReaper uses a multi-stage OSINT pipeline:
- Seed Input: Start with email, phone, or username
- Account Discovery: Find associated platform accounts
- Username Enumeration: Extract usernames and aliases
- Profile Scraping: Collect detailed profile information
- Identity Resolution: Use AI to correlate data points
The identity graph is a node-edge data structure representing:
- Nodes: Identifiers, accounts, usernames, attributes, brokers
- Edges: Relationships like "pivoted_to", "discovered_username", "found_on_broker"
Each data broker exposure becomes a "case" with:
- Discovery metadata (when, how, confidence)
- Opt-out workflow (email, form, phone)
- Legal notice generation
- Email thread tracking
- Status progression (pending β dispatched β completed)
Three specialized agents work autonomously:
- Sleuth Agent: Reconnaissance and discovery
- Legal Agent: Compliance analysis and notice drafting
- Communications Agent: Email triage and response generation
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 for Python code
- Use TypeScript for all frontend code
- Write tests for new features
- Update documentation as needed
- Run linters before committing:
# Backend ruff check . mypy src/ # Frontend pnpm lint
This project is licensed under the MIT License - see the LICENSE file for details.
- Maigret - Username OSINT framework
- Playwright - Browser automation
- FastAPI - Modern Python web framework
- React - UI library
- Radix UI - Accessible component primitives
- Tailwind CSS - Utility-first CSS framework
For questions, issues, or feature requests:
- GitHub Issues: Create an issue
- Email: support@datareaper.io
- Documentation: docs.datareaper.io
- Multi-user support with role-based access control
- Mobile app (React Native)
- Browser extension for real-time monitoring
- Expanded broker catalog (200+ brokers)
- Automated form submission for non-email opt-outs
- Phone call automation for phone-based opt-outs
- Integration with password managers
- Dark web monitoring
- Enterprise features (team management, SSO)
- API for third-party integrations
- Compliance reporting dashboard
- AI model fine-tuning for better accuracy




