ScanGuard - Privacy-First Malware Analysis Portal

🎯 Project Overview

ScanGuard is a self-hosted, privacy-respecting malware analysis web portal inspired by VirusTotal, but designed with a defensive security mindset. Built as a cybersecurity graduate portfolio project, it demonstrates secure file handling, malware detection, and privacy-first architecture.

Key Differentiators from VirusTotal

Feature	VirusTotal	ScanGuard
Malware Engines	70+ (cloud-based)	1 (ClamAV, self-hosted)
Data Storage	Files stored indefinitely	Zero permanent storage
Privacy	Shared with security vendors	Fully isolated, no sharing
User Tracking	Account-based, logged	No accounts, no tracking
Threat Intel	Multi-source aggregation	Single-engine detection
Use Case	Production malware analysis	Educational/Research
Architecture	Proprietary cloud service	Open-source, self-hosted

🛡️ Security Architecture

Core Security Principles

Isolation First: All file processing occurs in isolated temporary directories
Immediate Deletion: Files deleted within milliseconds of scan completion
No Execution: Files are NEVER executed under any circumstances
Minimal Attack Surface: Single scanning engine, no external API calls
Rate Limiting: In-memory rate limiting prevents abuse (10 requests/60s)
Input Validation: Strict file size, extension, and hash format validation

File Analysis Pipeline

┌─────────────────┐
│  File Upload    │
│  (Client-Side)  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Size & Type    │◄──── Max 10MB
│  Validation     │◄──── Extension Allowlist
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Isolated Temp  │◄──── Temporary Directory
│  Directory      │◄──── Restricted Permissions
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  SHA-256 Hash   │
│  Computation    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  ClamAV Scan    │◄──── No File Execution
│  (Signature)    │◄──── Pattern Matching Only
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Return Result  │◄──── Clean / Suspicious / Malicious
│  + Metadata     │◄──── Detection Name (if found)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  File & Dir     │◄──── CRITICAL: Always executed
│  Deletion       │◄──── Even on errors/exceptions
└─────────────────┘

🚨 Threat Model

Assumptions (What We Defend Against)

Malicious File Uploads: Users uploading actual malware samples
Exploitation Attempts: Files crafted to exploit vulnerabilities in scanning engine
Abuse/DoS: Repeated uploads to overwhelm the system
Privacy Attacks: Attempts to infer user identity from scans
Curious Probing: Users testing system boundaries

Security Measures

Threat	Mitigation
Malware Execution	Files never executed; signature-based scanning only
Path Traversal	Isolated temp directories with unique names
Resource Exhaustion	10MB file limit + rate limiting (10 req/min)
User Tracking	No IP logging, no user accounts, no permanent logs
Signature Evasion	Regular ClamAV definition updates (not automated in demo)
Zero-Day Malware	Limited Protection (single-engine limitation)

Known Limitations (Out of Scope)

Zero-Day Threats: ClamAV relies on signatures; novel malware may not be detected
Advanced Evasion: Polymorphic/metamorphic malware may bypass detection
Nation-State Attacks: Not hardened against advanced persistent threats (APTs)
Production Hardening: Demo setup; not audited for production deployment
Comprehensive Coverage: Single engine vs. multi-engine analysis (VirusTotal's strength)

🏗️ Technical Stack

Backend

Framework: FastAPI (Python 3.11+)
Malware Engine: ClamAV (via python-clamd)
File Handling: tempfile + shutil (secure temp directories)
Rate Limiting: In-memory dictionary (production: Redis)
Hashing: Python hashlib (SHA-256)

Frontend

Framework: React 19 (SPA)
UI Library: shadcn/ui + Tailwind CSS
Animations: Framer Motion
HTTP Client: Axios
Notifications: Sonner (toast notifications)

Infrastructure

Server: FastAPI on Uvicorn
Database: None (no persistent storage by design)
Deployment: Docker container (ClamAV + App)

🚀 Getting Started

Prerequisites

Python 3.11+
Node.js 18+
ClamAV installed and running

Installation

1. Install ClamAV

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install clamav clamav-daemon
sudo systemctl start clamav-daemon
sudo systemctl enable clamav-daemon

# Update virus definitions
sudo freshclam

macOS (Homebrew):

brew install clamav
brew services start clamav

# Update virus definitions
freshclam

Docker (Recommended for Demo):

docker run -d -p 3310:3310 clamav/clamav:latest

2. Backend Setup

cd backend
pip install -r requirements.txt

# Start FastAPI server
uvicorn server:app --host 0.0.0.0 --port 8001 --reload

3. Frontend Setup

cd frontend
yarn install

# Update .env with backend URL
echo "REACT_APP_BACKEND_URL=http://localhost:8001" > .env

# Start React app
yarn start

4. Verify ClamAV Connection

curl http://localhost:8001/api/health

Expected response:

{
  "status": "operational",
  "clamav_available": true,
  "engine_version": "ClamAV 1.0.0/..."
}

📖 API Documentation

Endpoints

`GET /api/health`

Check API and ClamAV engine status.

Response:

{
  "status": "operational",
  "clamav_available": true,
  "engine_version": "ClamAV 1.0.0/27034/Tue Jan 7 12:00:00 2025"
}

`POST /api/scan/file`

Upload and scan a file.

Request:

Method: POST
Content-Type: multipart/form-data
Body: file (binary file, max 10MB)

Response (Clean File):

{
  "scan_id": "a1b2c3d4-...",
  "file_hash": "e3b0c44298fc1c14...",
  "file_name": "document.pdf",
  "file_size": 524288,
  "status": "clean",
  "detection_name": null,
  "engine_version": "ClamAV 1.0.0/27034",
  "scan_time": "2025-01-07T12:00:00Z",
  "message": "File scanned successfully"
}

Response (Malicious File):

{
  "scan_id": "f5e6d7c8-...",
  "file_hash": "275a021bbfb6489e...",
  "file_name": "malware.exe",
  "file_size": 102400,
  "status": "malicious",
  "detection_name": "Win.Trojan.Generic-12345",
  "engine_version": "ClamAV 1.0.0/27034",
  "scan_time": "2025-01-07T12:05:00Z",
  "message": null
}

`POST /api/scan/hash`

Lookup a SHA-256 hash.

Request:

{
  "hash": "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f"
}

Response:

{
  "hash": "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f",
  "status": "malicious",
  "message": "Hash matches known threat: EICAR-Test-File"
}

🧪 Testing the Application

Test File: EICAR

The EICAR test file is a standard, harmless file recognized by all antivirus engines.

Download EICAR:

curl -o eicar.com https://secure.eicar.org/eicar.com.txt

Test Upload:

Navigate to http://localhost:3000
Click File Upload tab
Upload eicar.com
Expected result: MALICIOUS with detection name Eicar-Test-Signature

Test Hash Lookup:

Click Hash Lookup tab
Enter: 275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f
Expected result: MALICIOUS with message about EICAR

🔒 Privacy Guarantees

What We DO NOT Store

✅ Files: Deleted immediately after scan (verified in code)
✅ User IP Addresses: Not logged in application logs
✅ Upload Timestamps Linked to Users: Only coarse timestamps shown
✅ User Accounts: No authentication system
✅ Session Cookies: No persistent session tracking
✅ File Metadata Beyond Scan: Only hash + scan result kept in memory briefly

What We DO Store (Temporarily)

⏱️ Rate Limit Counters: In-memory only, auto-expires after 60 seconds
⏱️ Scan Results: Displayed to user, not persisted to disk/database

Evidence (Code References)

File Deletion (server.py:273-287):

finally:
    # CRITICAL: Always delete temporary files and directory
    if temp_file_path and temp_file_path.exists():
        temp_file_path.unlink()
        logger.info(f"Deleted temporary file: {temp_file_path}")
    
    if temp_dir and Path(temp_dir).exists():
        shutil.rmtree(temp_dir)
        logger.info(f"Deleted temporary directory: {temp_dir}")

⚠️ Ethical & Legal Disclaimer

Intended Use

This tool is designed for:

Educational purposes (cybersecurity coursework/portfolio)
Research (malware analysis in controlled environments)
Personal file verification (scanning your own files)

Prohibited Use

Do NOT use this tool for:

❌ Analyzing files you do not own or have permission to scan
❌ Circumventing security measures on systems you don't control
❌ Production malware analysis without proper hardening
❌ Distributing malware or weaponizing detection gaps

Legal Considerations

Malware Possession: Possessing malware samples may be illegal in some jurisdictions
Liability: This tool is provided "as-is" without warranties
No Guarantees: Single-engine scanning cannot guarantee threat detection
User Responsibility: You are responsible for how you use this tool

📊 Project Roadmap (Future Enhancements)

Automated ClamAV Updates: Schedule freshclam runs via cron
Hash Database: Local cache of scanned hashes (SQLite, with TTL)
YARA Rules Integration: Custom signature creation
Sandbox Analysis: Integrate Cuckoo Sandbox for dynamic analysis
Multi-Engine Support: Add additional scanners (Windows Defender API, VirusTotal API)
Threat Intelligence Feeds: Integrate with MISP, OTX, or similar
Containerized Scanning: Isolate scans in Docker containers
Metrics Dashboard: Scan statistics (total scans, detection rate)

🤝 Contributing

This is a portfolio project, but contributions are welcome:

Security Issues: Report vulnerabilities via GitHub Issues (private disclosure)
Bug Fixes: Submit pull requests with clear descriptions
Documentation: Improve README, add code comments
Feature Requests: Open issues with detailed use cases

📝 License

This project is licensed under the MIT License.

🙏 Acknowledgments

ClamAV: Open-source antivirus engine (https://www.clamav.net/)
EICAR: Standard test file for AV testing (https://www.eicar.org/)
FastAPI: Modern Python web framework (https://fastapi.tiangolo.com/)
shadcn/ui: Beautiful UI components (https://ui.shadcn.com/)
VirusTotal: Inspiration for multi-engine analysis concept (https://www.virustotal.com/)

📧 Contact

For questions or feedback:

GitHub: Open an issue
Email: your.email@example.com (replace with your contact)

Built with 🛡️ for Defensive Security
Think like a blue-team engineer, not a SaaS founder.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backend		backend
frontend		frontend
memory		memory
test_reports		test_reports
tests		tests
.gitconfig		.gitconfig
.gitignore		.gitignore
README.md		README.md
THREAT_MODEL.md		THREAT_MODEL.md
backend_test.py		backend_test.py
design_guidelines.json		design_guidelines.json
test_result.md		test_result.md

Folders and files

Latest commit

History

Repository files navigation

ScanGuard - Privacy-First Malware Analysis Portal

🎯 Project Overview

Key Differentiators from VirusTotal

🛡️ Security Architecture

Core Security Principles

File Analysis Pipeline

🚨 Threat Model

Assumptions (What We Defend Against)

Security Measures

Known Limitations (Out of Scope)

🏗️ Technical Stack

Backend

Frontend

Infrastructure

🚀 Getting Started

Prerequisites

Installation

1. Install ClamAV

2. Backend Setup

3. Frontend Setup

4. Verify ClamAV Connection

📖 API Documentation

Endpoints

GET /api/health

POST /api/scan/file

POST /api/scan/hash

🧪 Testing the Application

Test File: EICAR

🔒 Privacy Guarantees

What We DO NOT Store

What We DO Store (Temporarily)

Evidence (Code References)

⚠️ Ethical & Legal Disclaimer

Intended Use

Prohibited Use

Legal Considerations

📊 Project Roadmap (Future Enhancements)

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /api/health`

`POST /api/scan/file`

`POST /api/scan/hash`

Packages