Security Policy

This document outlines the security measures, scanning tools, and best practices implemented in the EvidenceLab AI project.

Reporting Security Vulnerabilities

If you discover a security vulnerability in this project, please report it responsibly:

Do NOT create a public GitHub issue for security vulnerabilities
Email evidencelab@astrobagel.com directly with details of the vulnerability
Include steps to reproduce the issue
Allow reasonable time for the issue to be addressed before public disclosure

We aim to acknowledge security reports within 48 hours and provide a fix timeline within 7 days.

Security Architecture

Defense in Depth

The project implements multiple layers of security:

Pre-commit Hooks: Catch issues before code enters the repository
CI/CD Security Scans: Automated checks on every push and PR
Dependency Monitoring: Automated alerts for vulnerable dependencies
Container Scanning: Vulnerability checks on Docker images
Runtime Protections: Input validation, CORS restrictions, rate limiting
Request Size Limits: Body size enforcement at the middleware layer
Error Sanitisation: Internal details stripped from production responses
Structured Logging: JSON-formatted audit trail for SIEM ingestion

Automated Security Scanning

Pre-commit Hooks

Security-focused pre-commit hooks run on every commit:

Tool	Purpose	Configuration
Bandit	Python SAST - detects security issues like SQL injection, hardcoded passwords, unsafe deserialization	`.pre-commit-config.yaml`
Hadolint	Dockerfile linting - checks for security misconfigurations	`.pre-commit-config.yaml`
detect-secrets	Prevents accidental credential commits	`.secrets.baseline`
Gitleaks	Enhanced secret detection with comprehensive regex patterns	`.pre-commit-config.yaml`

CI/CD Security Jobs

The GitHub Actions workflow includes dedicated security jobs:

`security-scan` Job

Check	Tool	Description
Python Dependencies	pip-audit	Scans `requirements.txt` for known vulnerabilities
JavaScript Dependencies	npm audit	Scans `package.json` for known vulnerabilities
Python SAST	Bandit	Static analysis for security issues (JSON report artifact)
Dockerfile Linting	Hadolint	Checks all Dockerfiles for security best practices
Secret Scanning	Gitleaks	Scans entire repository for exposed secrets

`container-scan` Job

Check	Tool	Description
API Image	Trivy	Scans built Docker image for OS and application vulnerabilities
UI Image	Trivy	Scans built Docker image for OS and application vulnerabilities

Frontend Security Linting

ESLint plugins provide JavaScript/TypeScript security analysis:

Plugin	Purpose
eslint-plugin-security	Detects potential security issues (eval, object injection, etc.)
eslint-plugin-sonarjs	Code quality rules that catch security anti-patterns

Dependency Management

Dependabot Configuration

Dependabot (.github/dependabot.yml) automatically monitors and creates PRs for:

Python (pip): Weekly scans of requirements.txt
JavaScript (npm): Weekly scans of ui/frontend/package.json
Docker: Weekly scans of base images in all Dockerfiles
GitHub Actions: Weekly scans of action versions

Dependency Update Policy

Security updates are prioritized and should be merged promptly
Major version updates require manual review for breaking changes
ML libraries (torch, transformers) have major version updates ignored to prevent breaking changes

Code Security Measures

Input Validation

Path Traversal Protection

The /file/{file_path} endpoint implements comprehensive path traversal protection:

Double URL decoding to catch encoded attacks (%2e%2e, %252e%252e)
Null byte rejection
Path canonicalization using Path.resolve()
Directory containment verification using relative_to()
Explicit file extension whitelist

Data Source Validation

API endpoints validate data_source parameters against a whitelist loaded from config.json, preventing:

Cache pollution attacks
Unintended database connections
Resource exhaustion

CORS Configuration

CORS is configured securely:

Origins: Read from CORS_ALLOWED_ORIGINS environment variable; defaults to localhost for development (never *)
Headers: Read from CORS_ALLOWED_HEADERS environment variable; defaults to Content-Type, Authorization, X-API-Key, X-CSRF-Token, Accept, Accept-Language (never *)
Explicit HTTP method whitelist (GET, POST, PUT, PATCH, DELETE, OPTIONS)
Credentials supported only for allowed origins

Request Body Size Limits

Request body size is enforced at the middleware layer (ASVS V13.1.3):

Requests with Content-Length exceeding the limit are rejected with HTTP 413
Default limit: 2 MB; configurable via MAX_REQUEST_BODY_BYTES env var
GET and other bodyless requests are unaffected

Error Detail Sanitisation

Production error responses never expose internal details (ASVS V7.1.1, V7.4.1):

ValueError responses return a generic message unless the error matches a known-safe prefix (e.g. "Invalid data_source:")
Unhandled exceptions return "Internal server error" with no stack traces, paths, or internal state
Full details available when API_DEBUG=true (development only)
Full tracebacks are always logged server-side regardless of debug mode

Structured Logging

JSON-structured log output for SIEM integration (ASVS V7.1.3):

JSONLogFormatter outputs {timestamp, level, logger, message, module, function, line} with UTC ISO 8601 timestamps
Security-relevant extra fields (user_id, user_email, ip_address, event_type, request_id) included when present
Exception tracebacks included in the exception field
Enabled via LOG_FORMAT=json env var; plain text remains the default

Rate Limiting

API endpoints are protected by rate limiting (slowapi):

Search operations: Configurable via RATE_LIMIT_SEARCH
AI operations: Configurable via RATE_LIMIT_AI
Default operations: Configurable via RATE_LIMIT_DEFAULT

API Key Authentication

API key authentication via X-API-Key header
Timing-safe comparison using secrets.compare_digest()
OpenAPI/Swagger docs disabled in production

Container Security

Dockerfile Best Practices

Hadolint enforces:

Avoiding latest tags for base images
Minimizing layer count
Using specific package versions where practical
Multi-stage builds to reduce attack surface

Image Scanning

Trivy scans Docker images for:

OS package vulnerabilities
Application dependency vulnerabilities
Misconfigurations

API Security

API Key Authentication

API key required for all endpoints except /health, /auth/*, file serving
API key validated using timing-safe comparison
Auth routes exempt from API key — protected by their own rate-limiting and CSRF
Development mode allows unauthenticated access (no API_SECRET_KEY set)

User Authentication Module

When USER_MODULE is set to on_passive or on_active, fastapi-users provides full user lifecycle management. In on_passive mode, authentication is optional and anonymous users can browse freely. In on_active mode, all access requires login:

Control	Implementation
Token storage	httpOnly cookies only; no localStorage (XSS mitigation)
Token lifetime	1-hour JWTs for access; separate configurable lifetimes for reset (24h) and verify (7d) tokens
Cookie flags	`httponly`, `secure`, `samesite=lax`
CSRF protection	Double-submit cookie (`evidencelab_csrf` + `X-CSRF-Token` header); cookie cleared on logout/account deletion
Secret validation	`AUTH_SECRET_KEY` must be 32+ chars; insecure defaults rejected
Input validation	`display_name` max 255 chars, whitespace-stripped, blank-to-None
Password policy	Minimum length + digit + letter (configurable via `AUTH_MIN_PASSWORD_LENGTH`)
Password history	Prevents reuse of last N passwords; configurable via `AUTH_PASSWORD_HISTORY_COUNT` (default 5, ASVS V2.1.10)
Account lockout	Lock after N consecutive failures for M minutes; counters reset on password reset (`AUTH_LOCKOUT_THRESHOLD`, `AUTH_LOCKOUT_DURATION_MINUTES`)
Timing-attack mitigation	Password hash always computed even for non-existent users
Registration control	Email domain whitelist via `AUTH_ALLOWED_EMAIL_DOMAINS` (registration only; not enforced on password change)
Rate limiting	Per-IP sliding window on `/auth/*` (default 10 req/60s)
Permission model	Deny-by-default; unauthenticated users see no datasources
Error handling	Permission failures logged and return empty data (no leak)
Audit logging	All auth events (login, failure, lockout, register, password reset) logged to `audit_log` table
OAuth	Google and Microsoft SSO with explicit minimal scopes (`openid, email, profile`)
Email verification	Required after registration; token sent via SMTP

Active Authentication Enforcement (`on_active` mode)

When USER_MODULE_MODE=on_active, the ActiveAuthMiddleware enforces authentication on all data endpoints at the middleware layer. This ensures that no data endpoint can be accessed without valid credentials, even if an individual route handler lacks its own auth dependency.

How It Works

The middleware intercepts every incoming request and validates credentials before the request reaches any route handler. It checks (in order):

API key — X-API-Key header compared against API_SECRET_KEY using secrets.compare_digest() (timing-safe)
Bearer token — Authorization: Bearer <jwt> header decoded with HS256 and audience ["fastapi-users:auth"]
Session cookie — evidencelab_auth httpOnly cookie decoded with the same JWT secret and audience

If none of the above yield a valid credential, the middleware returns 401 {"detail": "Authentication required"} without forwarding the request.

Exempt Paths

The following paths are exempt from middleware authentication because they either have their own auth mechanisms or must be publicly accessible:

Path prefix	Reason
`/health`	Health check endpoint (monitoring/load balancer probes)
`/auth/`	Login, register, verify, password reset — has own auth flow
`/config/auth-status`	Frontend feature flag (must be accessible before login)
`/users/`	Protected by fastapi-users `current_active_user` dependency
`/groups/`	Protected by fastapi-users `current_active_user` dependency
`/ratings/`	Protected by fastapi-users `current_active_user` dependency
`/activity/`	Protected by fastapi-users `current_active_user` dependency
`/docs`, `/redoc`, `/openapi.json`	OpenAPI docs (disabled in production)
`OPTIONS`	CORS preflight requests

Security Properties

Property	Detail
Scope	Only active when `USER_MODULE_MODE == "on_active"`
Layer	Starlette middleware — runs before all route handlers
JWT validation	Verifies signature, expiry, and audience claim
API key comparison	Timing-safe via `secrets.compare_digest()`
Fail-closed	Denies by default; only permits explicitly validated credentials
Logging	Denied requests logged with method, path, and client IP
Test coverage	23 dedicated unit tests covering all auth paths and edge cases

Relationship to Other Auth Controls

The middleware provides defense in depth alongside existing controls:

/config/datasources — already denies unauthenticated users (returns empty list); the middleware adds a hard 401 for all other data endpoints
fastapi-users routes (/users/, /groups/, etc.) — have their own Depends(current_active_user) checks; the middleware exempts these to avoid double-validation
API key auth — the middleware accepts the same API_SECRET_KEY used elsewhere, so programmatic API access continues to work
Frontend login wall — AuthGate component blocks UI access; the middleware ensures the same protection at the API layer

Authorization

Data source access controlled via whitelist validation and group-based RBAC
File serving restricted to specific directories and file types
Superusers bypass datasource filtering; regular users see only granted sources

Security Headers

Application-level security headers middleware provides defence-in-depth:

Content-Security-Policy — configurable via CSP_POLICY env var; defaults to strict self-only policy with frame-ancestors 'none', SHA-256 hash-based inline script allowlisting (no 'unsafe-inline' in script-src), and explicit connect-src for analytics domains
X-Content-Type-Options: nosniff — prevents MIME-sniffing
X-Frame-Options: DENY — prevents clickjacking
Referrer-Policy: strict-origin-when-cross-origin — limits referrer leakage
Permissions-Policy — restricts camera, microphone, geolocation
Strict-Transport-Security — HSTS with preload directive when HTTPS is configured; warning logged when AUTH_COOKIE_SECURE=false
Cache-Control: no-store + Pragma: no-cache on sensitive endpoints (/auth/, /users/, /groups/, /ratings/, /activity/) — prevents browsers and proxies from caching tokens or personal data (ASVS V8.1.4)

Production deployments via Caddy additionally include:

Automatic HTTPS with Let's Encrypt
API key validation for protected endpoints
Proper proxy headers (X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)

Development Security Practices

Environment Variables

Sensitive values stored in .env files (gitignored)
.env.example provides templates without real values
Secrets managed via GitHub Actions secrets for CI/CD
AUTH_SECRET_KEY is required in production — Docker Compose uses :? syntax to fail-fast if unset; startup logs a CRITICAL warning for insecure or short values (ASVS V14.1.2)

Secret Management

detect-secrets baseline prevents new secrets from being committed
Gitleaks provides additional coverage with comprehensive patterns
Pre-commit hooks catch secrets before they enter git history

Secure Coding Guidelines

Never hardcode secrets - Use environment variables
Validate all input - Especially file paths and user-provided data
Use parameterized queries - Prevent SQL injection
Avoid dangerous functions - eval(), exec(), subprocess with shell=True
Keep dependencies updated - Review Dependabot PRs promptly
Follow least privilege - Containers and services should have minimal permissions

OWASP ASVS L2 Compliance

The project targets OWASP ASVS Level 2 compliance. Current estimated coverage is ~82% across the 14 ASVS verification categories. Key controls mapped to ASVS requirements:

ASVS Requirement	Control	Status
V2.1.10	Password history / reuse prevention	Done
V7.1.1, V7.4.1	Error detail sanitisation	Done
V7.1.3	Structured JSON logging	Done
V8.1.4	Sensitive endpoint cache control	Done
V13.1.3	Request body size limits	Done
V14.1.2	Startup credential validation	Done
V14.4.3	CSP with hash-based inline scripts	Done

Security Checklist for Contributors

Before submitting a PR, ensure:

No hardcoded secrets or credentials
Input validation for user-provided data
Pre-commit hooks pass (including Bandit, Gitleaks)
No new security warnings in CI
Sensitive operations are properly authenticated
File operations validate paths against allowed directories

Tools Reference

Tool	Purpose	Documentation
Bandit	Python SAST	https://bandit.readthedocs.io/
Hadolint	Dockerfile linting	https://github.com/hadolint/hadolint
detect-secrets	Secret detection	https://github.com/Yelp/detect-secrets
Gitleaks	Secret detection	https://github.com/gitleaks/gitleaks
pip-audit	Python dependency scanning	https://github.com/pypa/pip-audit
npm audit	JS dependency scanning	https://docs.npmjs.com/cli/v8/commands/npm-audit
Trivy	Container scanning	https://aquasecurity.github.io/trivy/
eslint-plugin-security	JS security linting	https://github.com/eslint-community/eslint-plugin-security

Changelog

2026-03-10: ASVS L2 Tier 1 + Tier 2 hardening (33 new tests)
- Added request body size limit middleware — rejects >2 MB with HTTP 413 (MAX_REQUEST_BODY_BYTES, ASVS V13.1.3)
- Added error detail sanitisation — production responses never expose internals (API_DEBUG, ASVS V7.1.1 / V7.4.1)
- Added startup credential validation — CRITICAL log on insecure AUTH_SECRET_KEY; Docker Compose fails-fast if unset (ASVS V14.1.2)
- Hardened CSP with SHA-256 hash for inline GA script — removed need for 'unsafe-inline' in script-src (ASVS V14.4.3)
- Added structured JSON logging via LOG_FORMAT=json for SIEM ingestion (ASVS V7.1.3)
- Added password history — prevents reuse of last N passwords (AUTH_PASSWORD_HISTORY_COUNT, ASVS V2.1.10)
- Added Alembic migration 0016_add_password_history (nullable JSONB column)
- Added Cache-Control: no-store on sensitive endpoint responses (ASVS V8.1.4)
2026-03-03: Active authentication enforcement middleware
- Added ActiveAuthMiddleware for on_active mode — denies unauthenticated requests to all data endpoints
- Middleware validates JWT cookies, Bearer tokens, and API keys at the Starlette layer (before route handlers)
- Fail-closed design: denies by default, only permits explicitly validated credentials
- Exempt paths for health checks, auth flows, and routes with their own fastapi-users dependencies
- Frontend AuthGate component enforces login wall in the UI for on_active mode
- 23 dedicated unit tests covering all auth paths, exempt routes, and edge cases
2026-03-01: Security hardening for enterprise pen testing
- Added Content-Security-Policy header (env-configurable via CSP_POLICY)
- Changed CORS allow_headers from * to env-configurable whitelist (CORS_ALLOWED_HEADERS)
- Added display_name input validation (max 255 chars, whitespace stripping)
- Moved email domain whitelist check to registration only (no longer triggers on password change)
- Added lockout counter reset on successful password reset
- Separated token lifetimes: reset tokens (24h) and verification tokens (7d) independently configurable
- Added warning log when no default group is configured for new users
- Added CSRF cookie clearing on account deletion
- Added explicit minimal OAuth scopes (openid, email, profile) for Google and Microsoft
- Added HSTS preload directive for HSTS preload list eligibility
- Added warning log when AUTH_COOKIE_SECURE is disabled (HTTP-only development)
2026-03-01: User authentication & permissions module
- Added cookie-based JWT auth with httpOnly, secure, samesite=lax flags
- Added CSRF double-submit cookie middleware (evidencelab_csrf)
- Added security response headers middleware (X-Content-Type-Options, X-Frame-Options, etc.)
- Added per-IP rate limiting on auth endpoints (sliding window)
- Added account lockout with timing-attack mitigation
- Added password complexity validation (length + digit + letter)
- Added email domain whitelisting for registration
- Added immutable audit log table for all auth events
- Added group-based RBAC with deny-by-default datasource permissions
- Added Google and Microsoft OAuth2 SSO support
- Exempted /auth/* routes from API key requirement
2026-02-05: Comprehensive security policy
- Added Bandit for Python SAST
- Added Hadolint for Dockerfile linting
- Added Gitleaks for enhanced secret detection
- Added eslint-plugin-security for frontend
- Added pip-audit and npm audit to CI
- Added Trivy container scanning
- Configured Dependabot for automated updates
- Fixed CORS misconfiguration
- Fixed path traversal vulnerability
- Added data source validation

Security: dividor/evidencelab

Security

SECURITY.md

Security Policy

Table of Contents

Reporting Security Vulnerabilities

Security Architecture

Defense in Depth

Automated Security Scanning

Pre-commit Hooks

CI/CD Security Jobs

security-scan Job

container-scan Job

Frontend Security Linting

Dependency Management

Dependabot Configuration

Dependency Update Policy

Code Security Measures

Input Validation

Path Traversal Protection

Data Source Validation

CORS Configuration

Request Body Size Limits

Error Detail Sanitisation

Structured Logging

Rate Limiting

API Key Authentication

Container Security

Dockerfile Best Practices

Image Scanning

API Security

API Key Authentication

User Authentication Module

Active Authentication Enforcement (on_active mode)

How It Works

Exempt Paths

Security Properties

Relationship to Other Auth Controls

Authorization

Security Headers

Development Security Practices

Environment Variables

Secret Management

Secure Coding Guidelines

OWASP ASVS L2 Compliance

Security Checklist for Contributors

Tools Reference

Changelog

There aren’t any published security advisories

`security-scan` Job

`container-scan` Job

Active Authentication Enforcement (`on_active` mode)