Pr audit test hardening#27
Merged
Merged
Conversation
…oarding
Three connected fixes addressing the post-audit Doc-A findings (H5
api-reference.md staleness, M2 MAX_UPLOAD inconsistency) and the
PM-Agent's R3 (.env.example structure). Aimed at making the
Self-Hoster's first 30 minutes work without trial-and-error — a
Technology-First / developer-discovery investment.
1. app/core/config.py — MAX_UPLOAD_SIZE_MB default 2000 → 100 MB
The .env.example, api-reference.md, and self-hosting.md all said the
default was 100 MB; the code default was 2000 MB (2 GB). With zero
real users yet, the canonical value can move freely. 100 MB is sane
for unconfigured Self-Hosters (avoids OOM-by-default), matches the
quota tiers in app/core/quotas.py for the anonymous tier, and tracks
the docs that were already published. Operators with bigger payloads
override via env-var.
2. .env.example — sectioned by deployment edition
Reorganised into four labelled sections so a self-hoster reading
top-down hits only the keys their deployment needs:
- Required for every deployment (host/port, API_KEYS_FILE,
MAX_UPLOAD, CORS, APP_BASE_URL, optional API_BASE_URL split)
- Cloud-overlay (JWT_SECRET, DATABASE_URL, Stripe, SMTP,
PRICING_PAGE_ENABLED) — empty values keep features off
- Compliance-Edition tunables (AUDIT_FAIL_CLOSED, RETENTION_HOURS)
- Operational knobs (METRICS_ENABLED, sweep cadence,
concurrency cap)
Variables that were missing from the example (JWT_SECRET, the Stripe
keys, SMTP fields) are now visible as commented-out entries with
purpose notes — a Self-Hoster who wants to enable accounts can see
the exact set of env-vars to set without grep-hunting through code.
3. docs/api-reference.md — append, do not rewrite
Existing single-file structure preserved. Added:
- Authentication: explicit two-scheme table (X-API-Key for
Community / scripts; Authorization: Bearer for Cloud overlay).
Login / refresh examples for the JWT path. Token placeholder
syntax (<access-token>) chosen so static-analysis tools don't
mis-flag the example as a leaked secret.
- Cloud-Edition endpoints summary: /api/v1/auth/*, /api/v1/keys,
/api/v1/billing/* — each as a one-line entry with auth
requirement and purpose. Avoids re-documenting schema; defers to
the auto-generated Swagger UI at /docs for request bodies.
- Batch endpoints: /api/v1/convert/batch + /api/v1/compress/batch
with their multipart shape and 200/422 semantics.
- Response Headers section: X-Output-SHA256 (every conversion),
X-Data-Classification (BSI taxonomy echo), X-FileMorph-Achieved-
Bytes / X-FileMorph-Final-Quality (target_size_kb path),
Retry-After (503 path).
- Error Responses: added 403, 415, 503 with semantic notes.
- Rate Limiting table now includes /ready and the billing
endpoints.
4. .githooks/pre-{commit,push} — allow .env.example
The hook's SECRET_ASSIGN regex correctly catches lines like
`JWT_SECRET=...`, but `.env.example` is by definition the place to
show those keys with placeholder values for self-hosters. Added
`\.env\.example` to ALLOW_RE so legitimate documentation updates
to that file aren't blocked.
Verified: 473 tests passing, ruff clean, drift-check unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…6/M7/M9/M10
Phase 3 of the post-audit remediation plan (logical-beaming-brooks.md).
Pure regression coverage — zero behaviour change in production code. The
codebase already satisfies every assertion; this PR keeps it that way.
Findings closed
---------------
H3 — tests/test_billing_consent.py
Two new tests pin the SHA-256 hash-chain end-to-end:
test_audit_event_chain_intact_across_two_writes asserts verify_chain
returns None after two real /billing/checkout writes. Catches a
regression where a future refactor switches the canonical-JSON
serialiser, the hashing primitive, or the chaining order — events
would still record, but verify_chain would no longer detect tamper.
test_audit_event_chain_detects_payload_tampering mutates one row's
payload_json after-the-fact and asserts verify_chain returns that
row's id. Pins the property that record_hash binds the payload.
Without these guards, a silent break in dispute reproducibility (BORA
§50, BeurkG §39a, ISO 27001 A.12.4.1) would only surface at audit
time.
H4 — tests/test_hook_allowlist_regression.py (NEW)
60 parametrized cases across the three regexes shared by
.githooks/pre-commit and .githooks/pre-push (ALLOW_RE,
FORBIDDEN_PATHS, INTERNAL_PATHS). Pins:
- 17 paths that MUST be allowed (locale/*.po, address-bearing
legal templates, public DPA template, .env.example, ...).
- 4 application files that must NOT be allowed (content-pattern
scans must run on app code).
- 8 ops-only paths that must be FORBIDDEN (compose.prod.yml,
deploy.sh, runbooks/, docs-internal/, root CLAUDE.md, ...).
- 14 internal-doc paths that must redirect to docs-internal/
(admin-cockpit, email-setup, marketing-plan, ...).
- drift-check that pre-commit and pre-push regexes stay
identical (otherwise --no-verify defeats the local hook AND
the pre-push backstop scans different rules).
Without this guard, dropping `locale/.*` from ALLOW_RE silently
blocks every i18n update on every developer's machine — the
developer blames their content, not the regex.
M10 — tests/test_billing_consent.py (existing tests amended)
test_checkout_*_with_acknowledgement_records_audit_event now pin
rows[0].actor_ip == "testclient" (TestClient default client host).
Without this, a future commit dropping `request.client.host` from
the audit-event recorder would still pass — but Compliance Edition
customers would lose dispute reproducibility (no IP attribution).
M6 — tests/test_public_pages_reachability.py
test_enterprise_de_renders_authoritative_german now pins
<html lang="de" (locale resolution) AND `DSGVO or Behörden`
(DSGVO is the German GDPR label, untranslatable in EN). Either
drift independently breaks the test — copy edit "Behörden" →
"Verwaltung" no longer slips through silently.
M7 — tests/test_public_pages_reachability.py
test_impressum_en_has_preamble_then_german now asserts
text.index(preamble) < text.index("Verantwortlich"). A template
inversion (DE body above EN preamble) would still satisfy a
presence-only check but breaks the document's purpose.
M9 — tests/test_i18n.py
Four new parametrized assertions on /de/<page> (privacy, terms,
impressum, security) that pin a stable DE-only marker per page.
The 200-status smoke above passes even when messages.mo is missing,
corrupt, or out-of-sync — gettext silently falls back to the EN
msgid. With this guard, a corrupt catalog surfaces as a hard
failure rather than silent regression to English.
Verification
------------
pytest tests/test_i18n.py tests/test_public_pages_reachability.py
tests/test_billing_consent.py tests/test_hook_allowlist_regression.py
→ 115 passed
pytest tests/ → 539 passed, 15 skipped (no regressions)
ruff check + ruff format --check → clean
Out of scope (deferred to follow-up)
------------------------------------
- L4 — /de/dashboard auth-gated content assertion.
- L5 — drop-zone hidden initial state assertion.
- Phase 2 doc fixes (M1 Caddyfile syntax, M3 UFW order, H6
docs/email-setup.md decision) — separate PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.