docs: Add comprehensive project planning documents#2
Conversation
Generate complete planning documentation suite for RAG Processor WebUI: **Core Planning Documents:** - Project Vision & Scope: Problem statement, target users, success metrics - Technical Specification: Architecture, API, data model, security - Development Roadmap: 4 phases, 7 user stories, risk register - ADR-001: React + FastAPI architecture with security and error handling **Detailed Phase Plans (54 sprints, 188 hours):** - Phase 0: Foundation (10 sprints, 33 hours) - Docker, CI/CD, dev environment - Phase 1: MVP Core (24 sprints, 85 hours) - Auth, upload, routing, queue, WebSocket - Phase 2: Enhancement (10 sprints, 35 hours) - Vector store handoff, downloads, history - Phase 3: Polish (10 sprints, 35 hours) - Testing, docs, performance, security **Key Features:** - Each sprint is 3-4 hours with clear goals and exit criteria - Bidirectional references between PROJECT-PLAN.md and phase plans - Git branch strategy per phase (feat/phase-0-foundation, etc.) - Comprehensive validation checklists - Risk mitigation strategies **Quality:** - All documents pass markdown linting - All cross-references validated - Expert review conducted on all documents - 22% time buffer (52 hours) across 6-week plan 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughCreates comprehensive project planning documentation for a RAG Processor WebUI including project plan, architectural decisions, phase-based execution plans (Phases 0–3), project vision, roadmap, and technical specification. All changes are documentation-only. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~40 minutes
Suggested labels
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
docs/planning/PROJECT-PLAN.md(1 hunks)docs/planning/adr/adr-001-react-fastapi-architecture.md(1 hunks)docs/planning/phase-0-foundation.md(1 hunks)docs/planning/phase-1-mvp-core.md(1 hunks)docs/planning/phase-2-enhancement.md(1 hunks)docs/planning/phase-3-polish.md(1 hunks)docs/planning/project-vision.md(1 hunks)docs/planning/roadmap.md(1 hunks)docs/planning/tech-spec.md(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Documentation
docs/planning/PROJECT-PLAN.md
[warning] 1-1: Doc file 'planning/PROJECT-PLAN.md' contains an unrecognized relative link './adr/', it was left as is. Did you mean 'adr/README.md'?
🪛 LanguageTool
docs/planning/tech-spec.md
[style] ~304-~304: ‘with success’ might be wordy. Consider a shorter alternative.
Context: ...es, audio, Office docs - Mock pipelines with success/failure/timeout scenarios ## 9. Deploy...
(EN_WORDINESS_PREMIUM_WITH_SUCCESS)
docs/planning/adr/adr-001-react-fastapi-architecture.md
[grammar] ~141-~141: Ensure spelling is correct
Context: ...Cold starts*: Lambda cold starts (100-500ms) add latency to file uploads - ❌ **Vend...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~298-~298: Ensure spelling is correct
Context: ...Ds per connection (in-memory LRU cache, 5min TTL) Graceful Shutdown: - SIGTERM...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~314-~314: Ensure spelling is correct
Context: ..._active_connections` (alert if >180 for 5min) ### Performance Budget | Operation |...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
docs/planning/phase-3-polish.md
[style] ~78-~78: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...ncrypted PDFs - Zero-byte files - Extremely large files (near limit) 2. Write error hand...
(EN_WEAK_ADJECTIVE)
docs/planning/phase-2-enhancement.md
[grammar] ~132-~132: Ensure spelling is correct
Context: ...doff triggers on batch completion - [ ] Qdrant integration works - [ ] Milvus integrat...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (7)
docs/planning/adr/adr-001-react-fastapi-architecture.md (1)
1-455: Comprehensive and well-reasoned ADR with thorough options evaluation and implementation details.ADR-001 effectively documents the architectural decision with clear rationale, honest trade-offs, and specific implementation guidance. The decision to use React 18 + FastAPI with microservices architecture is well-justified given the constraints (single developer, 4-6 week timeline, self-hosted infrastructure).
Key strengths:
- Options evaluation is fair and transparent (4 options with pros/cons)
- Trade-offs are acknowledged with concrete mitigations
- Implementation section is detailed enough to guide Phase 1 work
- Security model covers critical areas (auth, input validation, secrets)
- Error handling addresses realistic failure modes (pipeline timeouts, Redis failures, WebSocket reconnects)
- Performance budgets are specific and testable
- Pipeline Adapter pattern enables future extensibility without core changes
- Validation instrumentation plan supports ongoing performance monitoring
docs/planning/tech-spec.md (6)
1-64: Strong foundational technical specification with clear version constraints and dependencies.The Technology Stack and TL;DR sections set clear expectations: Python 3.12+, React 18.2+, FastAPI 0.109+, and comprehensive tooling choices (Ruff, BasedPyright, pytest, Vitest). The explicit dependency pinning—particularly
redis >=5.0.0,rq >=1.15.0, andfastapi >=0.109.0—provides implementers with concrete targets. The modern Python 3.10+ union syntax (str | None) used throughout is appropriate for the stated Python 3.12+ requirement and aligns with the tech stack commitment.
65-109: Architecture section is clear and well-illustrated; verify alignment with ADR-001 sections referenced.The component diagram and responsibility matrix effectively communicate the microservices pattern with Cloudflare edge integration. Cross-reference to ADR-001 (line 69) should map to a detailed decision rationale on why this pattern was chosen over alternatives (monolith, serverless, etc.). Confirm the ADR includes discussion of API Gateway responsibilities and trade-offs.
Given the PR claim of "cross-references verified (9/9)," these are likely in place, but ensure each reference point (ADR-001 §security-model, §redis-ha-mvp, §performance-budget, etc.) exists with the expected content during merge.
110-164: Data Model is well-structured with Pydantic; validates against the API Specification examples.The core entities—Batch, Job, FileClassification, Pipeline—use modern Pydantic BaseModel syntax and proper enums. The relationships (Batch→Job one-to-many, Job→Pipeline many-to-one) align with the storage strategy (Redis for metadata, file system for uploads/results). UUID usage in example payloads (lines 200–205) is consistent.
227-251: Security section covers authentication, authorization, data protection, and input validation; flag phase-gating of RBAC.The security strategy is sound: Cloudflare Access for OAuth, JWT validation, whitelisting (PDF, Office, audio, image, video), and magic byte scanning. Input limits (100MB/file, 500MB/batch) are reasonable for MVP. The note on line 238 that "All authenticated users can upload" with RBAC deferred to Phase 2 is a pragmatic MVP boundary, though this should be documented in phase plans as a Phase 2 security hardening.
329-329: Verify worker service intentionally reuses gateway build context.Line 329 specifies
build: ./gatewayfor the worker service (same as line 319 for the gateway). Confirm this is intentional (code reuse with different entrypoint viacommand: rq worker...) vs. should reference a separate worker build context. If intentional, this is good container optimization; if not, clarify the separate build.
351-354: Cross-references to related documents are well-positioned; confirm all linked docs are present in the PR.The Related Documents section appropriately links to project-vision.md, ADR-001, and roadmap.md. Per the PR objectives, all 9 cross-references are verified. Ensure that during merge, all referenced files are committed and that the doc paths are correct (e.g.,
./adr/adr-001-react-fastapi-architecture.mdon line 354).Confirm all cross-referenced documents (project-vision.md, roadmap.md, adr/adr-001-react-fastapi-architecture.md) are included in this PR and paths are accurate.
| - [Project Vision & Scope](./project-vision.md): Problem statement, target users, scope boundaries, success metrics | ||
| - [Technical Specification](./tech-spec.md): Architecture, API endpoints, data model, security, performance | ||
| - [Development Roadmap](./roadmap.md): Phase overview with user stories and dependencies | ||
| - [Architecture Decisions](./adr/): ADR-001 (React + FastAPI architecture) |
There was a problem hiding this comment.
Fix unrecognized relative link causing markdown linting failure.
Line 415 references ./adr/ which the linter cannot resolve. Update to point to the actual ADR-001 file or create an ADR directory index.
- - [Architecture Decisions](./adr/)
+ - [Architecture Decisions](./adr/adr-001-react-fastapi-architecture.md)Alternatively, if you'd like a directory index pattern, create docs/planning/adr/README.md and update the link accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - [Architecture Decisions](./adr/): ADR-001 (React + FastAPI architecture) | |
| - [Architecture Decisions](./adr/adr-001-react-fastapi-architecture.md): ADR-001 (React + FastAPI architecture) |
🤖 Prompt for AI Agents
In docs/planning/PROJECT-PLAN.md around line 415, the relative link
"[Architecture Decisions](./adr/): ADR-001 (React + FastAPI architecture)" is
unresolvable by the linter; update the link to point to the actual ADR file (for
example ./adr/ADR-001-React-FastAPI.md) or create a directory index at
docs/planning/adr/README.md and change the link to ./adr/ or ./adr/README.md
accordingly so the linter can resolve it.
| ## 4. API Specification | ||
|
|
||
| - **Python**: 3.12 | ||
| - **Package Manager**: UV | ||
| - **Linter**: Ruff | ||
| - **Type Checker**: BasedPyright | ||
| - **Testing**: pytest | ||
| - **Container**: Docker | ||
| ### Endpoints | ||
|
|
||
| ## What This Document Contains | ||
| | Method | Path | Purpose | Auth | | ||
| | ------ | ---- | ------- | ---- | | ||
| | POST | /api/v1/ingest | Upload files | Yes (JWT) | | ||
| | GET | /api/v1/batch/{id} | Get batch status | Yes | | ||
| | GET | /api/v1/batch/{id}/jobs | List jobs | Yes | | ||
| | GET | /api/v1/job/{id} | Get job details | Yes | | ||
| | GET | /api/v1/job/{id}/result | Download result | Yes | | ||
| | POST | /api/v1/batch/{id}/handoff | Trigger vector store handoff | Yes | | ||
| | WS | /ws/batch/{id} | Real-time status | Yes (token param) | | ||
| | GET | /health | Health check | No | | ||
| | GET | /metrics | Prometheus metrics | No | | ||
|
|
||
| ### Request/Response Examples | ||
|
|
||
| **POST /api/v1/ingest**: | ||
|
|
||
| Request (multipart/form-data): | ||
|
|
||
| ```http | ||
| Cf-Access-Jwt-Assertion: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9... | ||
| Content-Type: multipart/form-data | ||
|
|
||
| files=<binary> | ||
| priority=high | ||
| target_vector_store=qdrant-prod | ||
| ``` | ||
|
|
||
| Response: | ||
|
|
||
| ```json | ||
| { | ||
| "batch_id": "550e8400-e29b-41d4-a716-446655440000", | ||
| "created_by": "user@example.com", | ||
| "status": "queued", | ||
| "total_files": 1, | ||
| "jobs": [{ | ||
| "job_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7", | ||
| "filename": "document.pdf", | ||
| "classification": "born_digital_pdf", | ||
| "routed_to": "doc_processing", | ||
| "status": "queued" | ||
| }], | ||
| "websocket_url": "wss://host/ws/batch/550e8400-e29b-41d4-a716-446655440000" | ||
| } | ||
| ``` | ||
|
|
||
| Once generated, this document will include: | ||
| **WebSocket** (`/ws/batch/{id}?cf_access_token={JWT}&last_event_id={N}`): | ||
|
|
||
| - **Technology Stack**: Complete tech choices with versions | ||
| - **Architecture**: Component diagram and responsibilities | ||
| - **Data Model**: Entity definitions and relationships | ||
| - **API Specification**: Endpoints, request/response formats | ||
| - **Security**: Auth, authorization, data protection | ||
| - **Error Handling**: Strategy and error codes | ||
| - **Performance Requirements**: Targets and measurement | ||
| ```json | ||
| { | ||
| "event_id": 42, | ||
| "type": "job_update", | ||
| "job_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7", | ||
| "status": "completed", | ||
| "result_url": "/api/v1/job/7c9e6679-7425-40de-944b-e07fc1f90ae7/result" | ||
| } | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Cloudflare Access JWT header name Cf-Access-Jwt-Assertion
💡 Result:
Yes — Cloudflare Access attaches the application JWT in the request header named Cf-Access-Jwt-Assertion (also available as the CF_Authorization cookie). Validate that JWT (RS256) against your team’s public keys at https://.cloudflareaccess.com/cdn-cgi/access/certs and prefer validating the Cf-Access-Jwt-Assertion header over the cookie. [1][2]
Sources:
- Cloudflare docs — Validate JWTs / Authorization cookie. [1] https://developers.cloudflare.com/cloudflare-one/access-controls/applications/http-apps/authorization-cookie/validating-json/
- Cloudflare docs — Authorization cookie / Application token. [2] https://developers.cloudflare.com/cloudflare-one/identity/authorization-cookie/
🌐 Web query:
Cloudflare Access WebSocket authentication token parameter cf_access_token
💡 Result:
Short answer: Cloudflare Access does not expose a supported "cf_access_token" query parameter for WebSocket authentication. Access authenticates via the CF_Authorization cookie (JWT) or by request headers (e.g., Cf-Access-Token / Cf-Access-Client-Id & Cf-Access-Client-Secret for service tokens). For WebSocket clients that cannot set arbitrary headers (browsers), Cloudflare examples use the Sec‑WebSocket‑Protocol (or a custom header when possible) to carry an auth token — not a documented cf_access_token query parameter. [1][2][3]
Sources
- Cloudflare Access: CF_Authorization cookie / JWT behavior. [1]
- Cloudflare blog: WebSocket auth via header or sec-websocket-protocol for browser clients. [2]
- cloudflared / Access client behavior: uses Cf-Access-Token and service token headers for auth. [3]
Fix unsupported WebSocket authentication parameter.
The JWT header Cf-Access-Jwt-Assertion (line 186) is correct per Cloudflare Access documentation. However, the WebSocket endpoint (line 215) uses an unsupported cf_access_token query parameter. Cloudflare Access does not expose a documented cf_access_token query parameter. Instead, authenticate WebSockets via the CF_Authorization cookie (JWT) or by passing the JWT in request headers (e.g., Authorization header or Sec-WebSocket-Protocol). Update the WebSocket URL example to reflect one of these supported authentication methods.
🤖 Prompt for AI Agents
In docs/planning/tech-spec.md around lines 165 to 225, the WebSocket example
uses an unsupported cf_access_token query parameter; remove the query param and
show a supported authentication pattern instead — e.g., use the bare URL
wss://host/ws/batch/{id} and state that clients must authenticate by sending the
Cloudflare Access JWT in the CF_Authorization cookie or by including the JWT in
the WebSocket request headers (Authorization or Sec-WebSocket-Protocol); update
the websocket_url and example note to reflect one of these supported methods and
remove the cf_access_token example.
|
|
||
| - **At Rest**: Docker volume permissions (0600), Redis password, AOF encryption | ||
| - **In Transit**: TLS 1.3 (Cloudflare Tunnel), internal Docker network unencrypted | ||
| - **Sensitive Data**: JWT never logged, email logged in `created_by` only |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Document internal Docker network encryption as Phase 2 security improvement.
The note on line 242—"internal Docker network unencrypted"—is an acceptable MVP trade-off (single-machine Docker Compose is lower-risk than distributed Kubernetes), but this should be explicitly captured in Phase 2 enhancements and the risk register (referenced in the roadmap). Suggest adding a brief comment: # Internal Docker network unencrypted: Phase 2 improvement pending distributed deployment.
🤖 Prompt for AI Agents
In docs/planning/tech-spec.md around lines 241 to 244, the doc currently marks
the internal Docker network as unencrypted; capture this as a Phase 2 security
improvement by adding a short clarifying note and registering the risk: add the
comment "# Internal Docker network unencrypted: Phase 2 improvement pending
distributed deployment" next to the "internal Docker network unencrypted" line,
and append a brief entry to the Phase 2 enhancements / risk register referencing
this pending encryption work and its mitigation timeline.
| ## 8. Testing Strategy | ||
|
|
||
| ### Coverage | ||
|
|
||
| - **Minimum**: 80% (enforced by pytest --cov-fail-under=80) | ||
| - **Critical paths**: 100% (file routing, auth, queue) | ||
|
|
||
| ### Test Types | ||
|
|
||
| - **Unit**: File detection, PDF classification, job queue (fakeredis), React components | ||
| - **Integration**: Upload → queue → worker → WebSocket broadcast, mocked pipeline APIs (respx) | ||
| - **E2E**: Playwright tests against staging with real Cloudflare Access | ||
|
|
||
| ### Fixtures | ||
|
|
||
| - Sample files: PDFs (scanned/born-digital), images, audio, Office docs | ||
| - Mock pipelines with success/failure/timeout scenarios | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Testing strategy is comprehensive; consider fixture inventory in Phase 0 checklist.
The 80% minimum coverage requirement (line 292) with 100% for critical paths (file routing, auth, queue) is well-motivated. Fixtures are specified (sample PDFs, mock pipelines), but ensure these are inventoried and committed to the repo during Phase 0 infrastructure setup so that Phase 1 development has test data ready.
Add a Phase 0 task: "Commit sample fixtures (scanned/born-digital PDFs, images, audio, Office docs) to tests/fixtures/."
🧰 Tools
🪛 LanguageTool
[style] ~304-~304: ‘with success’ might be wordy. Consider a shorter alternative.
Context: ...es, audio, Office docs - Mock pipelines with success/failure/timeout scenarios ## 9. Deploy...
(EN_WORDINESS_PREMIUM_WITH_SUCCESS)
🤖 Prompt for AI Agents
In docs/planning/tech-spec.md around lines 288 to 305, add a Phase 0 checklist
item to commit the sample test fixtures referenced (scanned and born-digital
PDFs, images, audio, Office docs) into the repository under tests/fixtures/ so
Phase 1 has ready test data; update the "Fixtures" or "Phase 0" section to
include the task text: "Commit sample fixtures (scanned/born-digital PDFs,
images, audio, Office docs) to tests/fixtures/."
| ### Fixtures | ||
|
|
||
| - Sample files: PDFs (scanned/born-digital), images, audio, Office docs | ||
| - Mock pipelines with success/failure/timeout scenarios |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Minor: Consider tightening phrasing on line 304.
"Mock pipelines with success/failure/timeout scenarios" — per LanguageTool, "with success" is slightly wordy. Consider: "Mock pipelines for success/failure/timeout scenarios" for brevity.
🧰 Tools
🪛 LanguageTool
[style] ~304-~304: ‘with success’ might be wordy. Consider a shorter alternative.
Context: ...es, audio, Office docs - Mock pipelines with success/failure/timeout scenarios ## 9. Deploy...
(EN_WORDINESS_PREMIUM_WITH_SUCCESS)
🤖 Prompt for AI Agents
In docs/planning/tech-spec.md around line 304, tighten the phrasing by replacing
"Mock pipelines with success/failure/timeout scenarios" with "Mock pipelines for
success/failure/timeout scenarios" to improve brevity and clarity.
| ## 9. Deployment | ||
|
|
||
| ### Docker Compose | ||
|
|
||
| ```yaml | ||
| services: | ||
| cloudflared: | ||
| image: cloudflare/cloudflared:latest | ||
| command: tunnel run | ||
| environment: | ||
| TUNNEL_TOKEN: ${CLOUDFLARE_TUNNEL_TOKEN} | ||
|
|
||
| gateway: | ||
| build: ./gateway | ||
| environment: | ||
| CLOUDFLARE_TEAM_DOMAIN: ${TEAM} | ||
| CLOUDFLARE_AUDIENCE_TAG: ${AUDIENCE} | ||
| REDIS_URL: redis://:${REDIS_PASSWORD}@redis:6379 | ||
| volumes: | ||
| - upload-data:/data/uploads | ||
| - result-data:/data/results | ||
|
|
||
| worker: | ||
| build: ./gateway | ||
| command: rq worker high default low | ||
| volumes: | ||
| - upload-data:/data/uploads | ||
| - result-data:/data/results | ||
|
|
||
| rag-ui: | ||
| build: ./rag-ui | ||
|
|
||
| redis: | ||
| image: redis:7-alpine | ||
| command: redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes | ||
|
|
||
| prometheus: | ||
| image: prom/prometheus:latest | ||
| volumes: | ||
| - ./prometheus.yml:/etc/prometheus/prometheus.yml | ||
|
|
||
| grafana: | ||
| image: grafana/grafana:latest | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Docker Compose provides MVP structure; add service dependencies and health checks for Phase 1 production readiness.
The Docker Compose configuration covers all required services (cloudflared, gateway, worker, rag-ui, redis, prometheus, grafana), but the MVP version lacks:
- depends_on: Worker and gateway should explicitly declare Redis dependency (lines 318–333).
- Health checks: Redis (line 338) and gateway (line 318) should include health checks for orchestration.
- Port mappings: Prometheus and Grafana lack port mappings (likely intentional if only accessed via tunnel, but should be explicit).
These omissions are acceptable for MVP (Phase 0 infra setup), but recommend adding to Phase 1 production-readiness checklist.
Apply this diff to add service dependencies (essential for production):
worker:
build: ./gateway
command: rq worker high default low
+ depends_on:
+ redis:
+ condition: service_healthy
volumes:
- upload-data:/data/uploads
- result-data:/data/results
gateway:
build: ./gateway
environment:
CLOUDFLARE_TEAM_DOMAIN: ${TEAM}
CLOUDFLARE_AUDIENCE_TAG: ${AUDIENCE}
REDIS_URL: redis://:${REDIS_PASSWORD}@redis:6379
+ depends_on:
+ redis:
+ condition: service_healthy
volumes:
- upload-data:/data/uploads
- result-data:/data/results
redis:
image: redis:7-alpine
command: redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes
+ healthcheck:
+ test: ["CMD", "redis-cli", "ping"]
+ interval: 10s
+ timeout: 5s
+ retries: 3PR #26 review (Important #1 and #2). #1 (bypass identity mismatch): verify_ws_token returned {"email": "anonymous@local", "user_id": "local"} in dev-bypass mode, while CloudflareAuthMiddleware._get_bypass_user returned email="dev@localhost" / user_id="dev-user-001". After batch_is_owned_by was introduced, a batch created over the HTTP path (creator = dev@localhost) could no longer be subscribed to over WebSocket (caller = anonymous@local) — ownership check failed. Align verify_ws_token to return the same identity the HTTP path uses, and (Suggested #4) emit the same CRITICAL log so production misconfiguration is loud on every WS upgrade too. #2 (unhandled network errors): verify_cloudflare_token can raise httpx.HTTPStatusError / ConnectError / TimeoutException and json.JSONDecodeError when the JWKS fetch fails. These aren't subclasses of jwt.InvalidTokenError, so they previously propagated out of verify_ws_token, closing the socket with 1011 (internal error) instead of 1008 (policy violation). The HTTP middleware already handles this with a broad except at cloudflare.py:145-147; add the matching fallback in the WS path. #6 (PII in success log): swap user_email out of the success-path WebSocket connection log for parity with the redacted unauthorized path. Tests: - Updated test_verify_ws_token_bypass_mode (test_websocket.py) + test_verify_ws_token_auth_disabled (test_websocket_router.py) to assert the dev@localhost / dev-user-001 identity. - New test_verify_ws_token_swallows_network_errors covering httpx.ConnectError → returns None. Docs: - SECURITY-FINDINGS.md §6 (Summary of fixes): note that sonarcloud.yml has been migrated on main (#27, 4055d57) so the SHA-pin from this PR's first commit will be overwritten on merge. - SECURITY-FINDINGS.md follow-ups: track tightening the batch_is_owned_by email fallback once all batches have user_id. 334 passed, 1 skipped (pre-existing). ruff/format clean.
* fix(security): close batch IDOR, harden CORS, pin CI actions
- api/batch.py, websocket/router.py: require ownership (created_by_user_id /
created_by_email) on GET /batch/{id}, GET /batch/job/{id}, and the
/ws/batch/{id} upgrade; non-owners get 404 / WS_1008_POLICY_VIOLATION so
batch existence is not leaked.
- core/config.py, main.py: read CORS origins from
RAG_PROCESSOR_CORS_ALLOWED_ORIGINS; refuse to start on "*" with
credentials; replace allow_methods/allow_headers wildcards with explicit
lists.
- auth/cloudflare.py: log CRITICAL on every request while bypass mode
(cloudflare_enabled=false) is active so a prod misconfig is loud.
- workflows/sonarcloud.yml: pin sonarsource/sonarqube-quality-gate-action
from @master to v1.2.0 SHA; add harden-runner to both jobs.
- workflows/dependency-review.yml: pin actions/dependency-review-action
from @v4 to v4.9.0 SHA; add harden-runner.
- SECURITY-FINDINGS.md: full review with severities and follow-ups.
* test(security): cover batch/WS ownership branches; fix ruff format
- ruff format ran across the new ownership condition in
websocket/router.py; commit the formatter output.
- New tests/unit/test_batch_authz.py exercises _user_owns_batch and the
owner / non-owner / missing-batch / missing-job / orphaned-job branches of
GET /batch/{batch_id} and GET /batch/job/{job_id}.
- Added a WebSocket non-owner test alongside the existing
rejects-unauthenticated / rejects-invalid-batch cases.
Total coverage 81.12% -> 82.76%. api/batch.py coverage 57% -> ~95%,
websocket/router.py 91% -> 96%. SonarCloud "new code coverage" should
now clear the 80% gate.
* ci: re-trigger checks after org workflow update
* security(cors): drop hardcoded http:// dev defaults, fail closed
SonarCloud python:S5332 (correctly) flagged the previous
cors_allowed_origins default list of `http://localhost:*` URLs as
insecure plaintext origins baked into production source. The defaults
also weakened the secure-by-default posture the rest of the PR was
aiming for: a misconfigured prod deploy would silently trust dev
origins.
- core/config.py: default cors_allowed_origins to an empty list. Empty
combined with the existing wildcard guard in main.py means the
backend fails closed; CORS preflights are denied until the operator
sets RAG_PROCESSOR_CORS_ALLOWED_ORIGINS explicitly.
- .env.example: document the recommended local-dev value so reviewers
/ new contributors aren't stuck guessing what the dev origins are.
- SECURITY-FINDINGS.md §2.2: updated to reflect the new default and
to note the SonarCloud rule that's now resolved.
* test(cors): set dev origins in conftest after empty-default change
The previous commit moved cors_allowed_origins to empty-by-default
(SonarCloud S5332 + secure-by-default), which broke the existing
TestCORSHeaders.test_cors_headers_present_on_get integration test:
the test sends Origin: http://localhost:3000 and asserts the
Access-Control-Allow-Origin header comes back, but with no origins
configured the middleware (correctly) returns no CORS headers.
CORSMiddleware reads allow_origins at app construction, so the env
var has to be set in conftest.py before the app module is imported -
the same pattern already used for RAG_PROCESSOR_RATE_LIMITING_ENABLED
and RAG_PROCESSOR_CLOUDFLARE_ENABLED.
* style(batch): use Annotated[] form for FastAPI Depends parameters
SonarCloud python:S8410 flagged the legacy
`user: CloudflareUser = Depends(get_current_user)` form on the two
endpoints added in this PR. The Annotated[] form is FastAPI's
recommended pattern and (per Sonar) avoids subtle issues with mutable
default values in function signatures.
No behavior change; tests/unit/test_batch_authz.py still passes its
mocked user via the same kwarg name.
* security(authz): consolidate ownership check + redact PII from logs
Addresses 6 CodeRabbit review comments on PR #26.
1. (security) src/rag_processor/websocket/router.py: the WS ownership
check used `owns_by_id OR owns_by_email`, so a caller whose email
coincidentally matched the batch owner's email could authenticate
as the owner even with a different Cloudflare user_id. The REST
path in api/batch.py had the correct user_id-takes-precedence
logic; the two paths could not diverge silently.
Extract the helper to auth/dependencies.py::batch_is_owned_by
(keyword-only requester_user_id / requester_email so both the REST
path with a CloudflareUser and the WS path with a verify-token
dict can call it). Both endpoints now share the same enforcement.
2. (privacy) api/batch.py + websocket/router.py: unauthorized-access
warnings previously logged owner_email, requester_email, and the
requester user_id. owner_email lets an attacker probing batch IDs
harvest owner identities from logs; requester_email is PII we
don't need for incident response when we already have
requester_user_id. Strip both fields; keep batch_id / job_id /
requester_user_id only.
3. (tests) tests/unit/test_batch_authz.py: add
test_user_id_mismatch_does_not_fall_back_to_email regression
covering the bypass described above.
4. (tests) tests/unit/test_websocket_router.py: add the WebSocket
variant — same email, mismatched user_id, assert close with
WS_1008_POLICY_VIOLATION and connection_manager.connect never
called.
5. (style) SECURITY-FINDINGS.md §"Summary of fixes applied in this
PR": replace the wide markdown table with a bulleted list so no
line exceeds the project's 120-char limit.
Tests: 333 passed, 1 skipped (pre-existing). ruff check + format
both clean.
* security(ws): align bypass identity + catch JWKS network errors
PR #26 review (Important #1 and #2).
#1 (bypass identity mismatch): verify_ws_token returned
{"email": "anonymous@local", "user_id": "local"}
in dev-bypass mode, while CloudflareAuthMiddleware._get_bypass_user
returned email="dev@localhost" / user_id="dev-user-001". After
batch_is_owned_by was introduced, a batch created over the HTTP
path (creator = dev@localhost) could no longer be subscribed to
over WebSocket (caller = anonymous@local) — ownership check failed.
Align verify_ws_token to return the same identity the HTTP path
uses, and (Suggested #4) emit the same CRITICAL log so production
misconfiguration is loud on every WS upgrade too.
#2 (unhandled network errors): verify_cloudflare_token can raise
httpx.HTTPStatusError / ConnectError / TimeoutException and
json.JSONDecodeError when the JWKS fetch fails. These aren't
subclasses of jwt.InvalidTokenError, so they previously propagated
out of verify_ws_token, closing the socket with 1011 (internal
error) instead of 1008 (policy violation). The HTTP middleware
already handles this with a broad except at cloudflare.py:145-147;
add the matching fallback in the WS path.
#6 (PII in success log): swap user_email out of the success-path
WebSocket connection log for parity with the redacted unauthorized
path.
Tests:
- Updated test_verify_ws_token_bypass_mode (test_websocket.py) +
test_verify_ws_token_auth_disabled (test_websocket_router.py) to
assert the dev@localhost / dev-user-001 identity.
- New test_verify_ws_token_swallows_network_errors covering
httpx.ConnectError → returns None.
Docs:
- SECURITY-FINDINGS.md §6 (Summary of fixes): note that
sonarcloud.yml has been migrated on main (#27, 4055d57) so the
SHA-pin from this PR's first commit will be overwritten on merge.
- SECURITY-FINDINGS.md follow-ups: track tightening the
batch_is_owned_by email fallback once all batches have user_id.
334 passed, 1 skipped (pre-existing). ruff/format clean.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Summary
Complete project planning documentation suite for RAG Processor WebUI, including:
Documents Generated (9 total)
Core Planning Documents
project-vision.md (8.1 KB)
tech-spec.md (12 KB)
roadmap.md (12 KB)
adr/adr-001-react-fastapi-architecture.md (20 KB)
Synthesized Plan
Detailed Phase Plans (Sprint-Level)
phase-0-foundation.md (15 KB)
phase-1-mvp-core.md (24 KB)
phase-2-enhancement.md (12 KB)
phase-3-polish.md (14 KB)
Sprint Breakdown Summary
Buffer: 52 hours (22% of 240 available hours)
Quality Validation
How to Use
Start Development:
During Development:
Next Steps After Merge
git checkout -b feat/phase-0-foundationRelated Issues
None (initial planning)
Checklist
🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.