Pr audit test hardening by MrChengLen · Pull Request #27 · MrChengLen/FileMorph

MrChengLen · 2026-05-08T21:57:31Z

No description provided.

…oarding Three connected fixes addressing the post-audit Doc-A findings (H5 api-reference.md staleness, M2 MAX_UPLOAD inconsistency) and the PM-Agent's R3 (.env.example structure). Aimed at making the Self-Hoster's first 30 minutes work without trial-and-error — a Technology-First / developer-discovery investment. 1. app/core/config.py — MAX_UPLOAD_SIZE_MB default 2000 → 100 MB The .env.example, api-reference.md, and self-hosting.md all said the default was 100 MB; the code default was 2000 MB (2 GB). With zero real users yet, the canonical value can move freely. 100 MB is sane for unconfigured Self-Hosters (avoids OOM-by-default), matches the quota tiers in app/core/quotas.py for the anonymous tier, and tracks the docs that were already published. Operators with bigger payloads override via env-var. 2. .env.example — sectioned by deployment edition Reorganised into four labelled sections so a self-hoster reading top-down hits only the keys their deployment needs: - Required for every deployment (host/port, API_KEYS_FILE, MAX_UPLOAD, CORS, APP_BASE_URL, optional API_BASE_URL split) - Cloud-overlay (JWT_SECRET, DATABASE_URL, Stripe, SMTP, PRICING_PAGE_ENABLED) — empty values keep features off - Compliance-Edition tunables (AUDIT_FAIL_CLOSED, RETENTION_HOURS) - Operational knobs (METRICS_ENABLED, sweep cadence, concurrency cap) Variables that were missing from the example (JWT_SECRET, the Stripe keys, SMTP fields) are now visible as commented-out entries with purpose notes — a Self-Hoster who wants to enable accounts can see the exact set of env-vars to set without grep-hunting through code. 3. docs/api-reference.md — append, do not rewrite Existing single-file structure preserved. Added: - Authentication: explicit two-scheme table (X-API-Key for Community / scripts; Authorization: Bearer for Cloud overlay). Login / refresh examples for the JWT path. Token placeholder syntax (<access-token>) chosen so static-analysis tools don't mis-flag the example as a leaked secret. - Cloud-Edition endpoints summary: /api/v1/auth/*, /api/v1/keys, /api/v1/billing/* — each as a one-line entry with auth requirement and purpose. Avoids re-documenting schema; defers to the auto-generated Swagger UI at /docs for request bodies. - Batch endpoints: /api/v1/convert/batch + /api/v1/compress/batch with their multipart shape and 200/422 semantics. - Response Headers section: X-Output-SHA256 (every conversion), X-Data-Classification (BSI taxonomy echo), X-FileMorph-Achieved- Bytes / X-FileMorph-Final-Quality (target_size_kb path), Retry-After (503 path). - Error Responses: added 403, 415, 503 with semantic notes. - Rate Limiting table now includes /ready and the billing endpoints. 4. .githooks/pre-{commit,push} — allow .env.example The hook's SECRET_ASSIGN regex correctly catches lines like `JWT_SECRET=...`, but `.env.example` is by definition the place to show those keys with placeholder values for self-hosters. Added `\.env\.example` to ALLOW_RE so legitimate documentation updates to that file aren't blocked. Verified: 473 tests passing, ruff clean, drift-check unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…6/M7/M9/M10 Phase 3 of the post-audit remediation plan (logical-beaming-brooks.md). Pure regression coverage — zero behaviour change in production code. The codebase already satisfies every assertion; this PR keeps it that way. Findings closed --------------- H3 — tests/test_billing_consent.py Two new tests pin the SHA-256 hash-chain end-to-end: test_audit_event_chain_intact_across_two_writes asserts verify_chain returns None after two real /billing/checkout writes. Catches a regression where a future refactor switches the canonical-JSON serialiser, the hashing primitive, or the chaining order — events would still record, but verify_chain would no longer detect tamper. test_audit_event_chain_detects_payload_tampering mutates one row's payload_json after-the-fact and asserts verify_chain returns that row's id. Pins the property that record_hash binds the payload. Without these guards, a silent break in dispute reproducibility (BORA §50, BeurkG §39a, ISO 27001 A.12.4.1) would only surface at audit time. H4 — tests/test_hook_allowlist_regression.py (NEW) 60 parametrized cases across the three regexes shared by .githooks/pre-commit and .githooks/pre-push (ALLOW_RE, FORBIDDEN_PATHS, INTERNAL_PATHS). Pins: - 17 paths that MUST be allowed (locale/*.po, address-bearing legal templates, public DPA template, .env.example, ...). - 4 application files that must NOT be allowed (content-pattern scans must run on app code). - 8 ops-only paths that must be FORBIDDEN (compose.prod.yml, deploy.sh, runbooks/, docs-internal/, root CLAUDE.md, ...). - 14 internal-doc paths that must redirect to docs-internal/ (admin-cockpit, email-setup, marketing-plan, ...). - drift-check that pre-commit and pre-push regexes stay identical (otherwise --no-verify defeats the local hook AND the pre-push backstop scans different rules). Without this guard, dropping `locale/.*` from ALLOW_RE silently blocks every i18n update on every developer's machine — the developer blames their content, not the regex. M10 — tests/test_billing_consent.py (existing tests amended) test_checkout_*_with_acknowledgement_records_audit_event now pin rows[0].actor_ip == "testclient" (TestClient default client host). Without this, a future commit dropping `request.client.host` from the audit-event recorder would still pass — but Compliance Edition customers would lose dispute reproducibility (no IP attribution). M6 — tests/test_public_pages_reachability.py test_enterprise_de_renders_authoritative_german now pins <html lang="de" (locale resolution) AND `DSGVO or Behörden` (DSGVO is the German GDPR label, untranslatable in EN). Either drift independently breaks the test — copy edit "Behörden" → "Verwaltung" no longer slips through silently. M7 — tests/test_public_pages_reachability.py test_impressum_en_has_preamble_then_german now asserts text.index(preamble) < text.index("Verantwortlich"). A template inversion (DE body above EN preamble) would still satisfy a presence-only check but breaks the document's purpose. M9 — tests/test_i18n.py Four new parametrized assertions on /de/<page> (privacy, terms, impressum, security) that pin a stable DE-only marker per page. The 200-status smoke above passes even when messages.mo is missing, corrupt, or out-of-sync — gettext silently falls back to the EN msgid. With this guard, a corrupt catalog surfaces as a hard failure rather than silent regression to English. Verification ------------ pytest tests/test_i18n.py tests/test_public_pages_reachability.py tests/test_billing_consent.py tests/test_hook_allowlist_regression.py → 115 passed pytest tests/ → 539 passed, 15 skipped (no regressions) ruff check + ruff format --check → clean Out of scope (deferred to follow-up) ------------------------------------ - L4 — /de/dashboard auth-gated content assertion. - L5 — drop-zone hidden initial state assertion. - Phase 2 doc fixes (M1 Caddyfile syntax, M3 UFW order, H6 docs/email-setup.md decision) — separate PR.

MrChengLen and others added 2 commits May 8, 2026 22:56

MrChengLen merged commit d484744 into main May 8, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr audit test hardening#27

Pr audit test hardening#27
MrChengLen merged 2 commits into
mainfrom
pr-audit-test-hardening

MrChengLen commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MrChengLen commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant