Skip to content

Handbook: downloadable RealUnit legal documents (PDF + DOCX) as derived export #658

@TaprootFreak

Description

@TaprootFreak

Goal

Add a "Rechtsdokumente — Downloads" section to the handbook (handbook.realunit.app/de/) that exposes RealUnit's own in-app legal documents as downloadable PDF and DOCX files, generated as a derived export of the repository's Markdown sources — the exact same upstream/downstream model the store-listing and mail sections already use.

This is the permanent, single-source-of-truth answer to David Lehner's repeated request (29.05 + 02.06) for the legal texts "als editierbares File (docx o.ä.)" for legal review — instead of a one-off email attachment, the handbook becomes the canonical place where the current texts are always downloadable in DE/EN, PDF and DOCX.

Scope (decided)

In scope — exactly the 3 RealUnit documents that are rendered in-app from repo Markdown:

Document Source assets App ARB title key Title (de)
Datenschutzbestimmungen assets/legal/privacy_policy_<lang>.md legalDisclaimerCheckboxPrivacyPolicy Datenschutzbestimmungen
Nutzungsbedingungen assets/legal/terms_of_use_<lang>.md termsOfUse Nutzungsbedingungen
Registrierungsvereinbarung assets/legal/registration_agreement_<lang>.md legalDisclaimerCheckboxRegistrationAgreement Registrierungsvereinbarung
  • Languages: every language variant present in the repo — do not hardcode de/en. Discover by globbing assets/legal/<base>_*.md. Today that is de + en → 3 docs × 2 langs = 6 sources → 6 PDF + 6 DOCX = 12 files. A future _fr.md must appear automatically with no code change.
  • Formats: PDF + DOCX only. No HTML download.

Explicitly OUT of scope (do not include):

  • DFX documents (DfxDocumentsConfigdocs.dfx.swiss/...) — external partner docs.
  • Aktionariat documents (AktionariatDocumentsConfigaktionariat.com/...) — external platform docs.
  • The 5 externally-hosted RealUnit corporate documents in LegalDocumentsConfig.informationalDocuments (EU securities prospectuses, CH stock-exchange prospectus, articles of association, investment regulations). These live only as official/signed PDFs on realunit.ch/realunit.de and have no Markdown source in the repo, so they cannot be a derived export. (See "Open points" for an optional follow-up.)

Rationale for the boundary: only these 3 are rendered in-app from repo-local Markdown (LegalDocumentPage reads assets/legal/<base>_<lang>.md via rootBundle; terms_of_use is wired through router_config.darttermsOfUse). Everything else is an external link. Generating a derived export only makes sense where the repo is the source.

How this fits the existing handbook architecture (verified)

The handbook image (Dockerfile.handbook) is a multi-stage nginx static host. Two precedents already implement "derived export from repo":

  1. Screenshotsscreenshots-builder stage runs scripts/assemble-handbook-screenshots.sh → PNGs → /usr/share/nginx/html/screenshots/. The PNGs are git-ignored (generated only in the image).
  2. Store-listingstore-listing-builder stage runs scripts/assemble-handbook-store-listing.py, which (a) copies assets into /out/... and (b) rewrites the <!-- BEGIN:store-listing --> / <!-- END:store-listing --> block in docs/handbook/de/index.html in place. The rewritten index.html is COPY-ed over the verbatim one in the final stage. A sync gate in handbook-build-check.yaml re-runs the generator and fails if git diff docs/handbook/de/index.html is non-empty (works only because the generator is pure-stdlib and deterministic).

This feature follows the same model, with one critical split dictated by determinism:

  • The HTML block (the list of download links + repo source links) is deterministic → committed into index.html, sync-gated. Generated by a pure-stdlib Python script.
  • The PDF/DOCX binaries are produced by pandoc, whose output is NOT deterministic (embedded timestamps, tool-version metadata, UUIDs). They must therefore be treated like the screenshots: generated only inside the image, git-ignored, never committed, never sync-gated.

Implementation

1. New generator — scripts/assemble-handbook-legal.py (pure stdlib, deterministic)

Mirror the structure of scripts/assemble-handbook-store-listing.py (same <output-dir> arg convention, same marker-block rewrite, same _PLACEHOLDER template approach).

Responsibilities:

  1. Discover the document set: for each base in ["privacy_policy", "terms_of_use", "registration_agreement"], glob assets/legal/<base>_*.md to find available languages. Error out if a base has zero languages.
  2. Resolve titles by reading assets/languages/strings_<lang>.arb (JSON, stdlib json) and looking up the mapped ARB key per base (table above). This keeps the handbook titles in lockstep with the in-app titles. Fall back to the de title if a language's ARB lacks the key.
  3. Render scripts/templates/legal-downloads.html.tmpl into the <!-- BEGIN:legal-downloads --> / <!-- END:legal-downloads --> block in docs/handbook/de/index.html (idempotent, in place). For each (base, lang):
    • link to ../legal/<base>_<lang>.pdf and ../legal/<base>_<lang>.docx (paths relative to /de/, matching how ../store/... and ../screenshots/... work);
    • a link to the Markdown source: https://github.com/RealUnitCH/app/blob/develop/assets/legal/<base>_<lang>.md (develop, per the precedent set by the store-listing source links — see PR fix(handbook): store-listing source links point at develop, not main #650).
    • HTML-escape every interpolated value (titles come from ARB, treat as untrusted text).
  4. Does NOT call pandoc and does NOT emit PDFs/DOCX. It only writes the HTML block. This keeps it deterministic and sync-gateable.

Header section/intro copy for the rendered block (mirror the store-listing wording): a short "Live aus Repo" badge + sentence explaining that assets/legal/*.md is the single source of truth, the in-app LegalDocumentPage and these downloads render from the same files, and any change goes through a PR on those Markdown files.

2. New template — scripts/templates/legal-downloads.html.tmpl

A <details id="spec-legal-downloads" class="spec"> section consistent with the existing spec styling. One sub-card per document, each listing its languages with PDF + DOCX buttons and the source link. Use the same placeholder syntax ({{ name }}) the store-listing template uses. The generator fills a repeating block per (base, lang) — implement by building the rows in Python and substituting a single {{ rows }} placeholder (the store-listing template is a fixed layout; here the row set is dynamic, so generate the rows in code).

3. PDF/DOCX builder — scripts/build-legal-downloads.sh (pandoc, image-only)

scripts/build-legal-downloads.sh <output-dir>

For every assets/legal/<base>_<lang>.md where base ∈ {privacy_policy, terms_of_use, registration_agreement}:

  • pandoc "$md" -o "<out>/legal/<base>_<lang>.docx" (native DOCX writer)
  • pandoc "$md" --pdf-engine=weasyprint -o "<out>/legal/<base>_<lang>.pdf"

Engine decision: weasyprint (Alpine weasyprint package) — HTML/CSS-based, avoids pulling a full TeX Live (~hundreds of MB) for PDF. A minimal RealUnit-branded CSS (logo, brand colour, A4 margins) via --css is a nice-to-have, not MVP-blocking — plain default styling is acceptable for v1. DOCX branding via --reference-doc is likewise an optional later polish.

(If a single Python script is preferred over a bash + python split, the builder may instead be a second mode of the generator, but keep the deterministic HTML-block writing and the non-deterministic pandoc invocation as separate entry points so the sync gate only ever runs the deterministic part.)

4. Dockerfile.handbook — new legal-docs-builder stage (chained after store-listing)

The store-listing stage already rewrites index.html. To avoid a clobber, chain the legal stage on top of the store-listing output so both blocks end up in the final index.html:

FROM alpine:3.20 AS legal-docs-builder
WORKDIR /work
RUN apk add --no-cache python3 pandoc weasyprint
COPY scripts/assemble-handbook-legal.py ./scripts/
COPY scripts/build-legal-downloads.sh ./scripts/
COPY scripts/templates/legal-downloads.html.tmpl ./scripts/templates/
COPY assets/legal/ ./assets/legal/
COPY assets/languages/ ./assets/languages/
# Take the store-listing-rewritten index.html as input so BOTH blocks survive.
COPY --from=store-listing-builder /out/index.html ./docs/handbook/de/index.html
RUN python3 ./scripts/assemble-handbook-legal.py /out \
 && bash ./scripts/build-legal-downloads.sh /out \
 && cp ./docs/handbook/de/index.html /out/index.html

Final nginx stage: add after the existing store-listing copies, and make the legal stage's index.html the authoritative one:

COPY --from=legal-docs-builder /out/legal/ /usr/share/nginx/html/legal/
COPY --from=legal-docs-builder /out/index.html /usr/share/nginx/html/de/index.html

(The legal-docs-builder heavyweight layers — pandoc/weasyprint — stay in the build stage; only the small generated PDFs/DOCX and the final index.html are copied into the runtime image, so the served image does not bloat.)

5. docs/handbook/de/index.html (committed)

6. handbook.nginx.conf — DOCX content type / download disposition

try_files already serves /legal/.... nginx's default mime.types maps .pdf but not .docx (would fall back to application/octet-stream, which still downloads fine). For correctness add a small location block:

location ~* \.(pdf|docx)$ {
    add_header Content-Disposition "attachment" always;
    # restate the server-level security headers (add_header is all-or-nothing per location)
    add_header X-Content-Type-Options "nosniff" always;
    add_header Cache-Control "private, no-store" always;
}

(Keep it behind the existing Basic-Auth gate — these are internal review documents, same audience as the rest of the handbook.)

7. .gitignore

Add docs/handbook/legal/ (the local-preview output dir), consistent with the already-ignored docs/handbook/screenshots/ and docs/handbook/mails/.

8. .github/workflows/handbook-build-check.yaml — extend the sync gate

  • Add to paths: scripts/assemble-handbook-legal.py, scripts/build-legal-downloads.sh, scripts/templates/legal-downloads.html.tmpl, assets/legal/**, assets/languages/**.
  • Add a step mirroring the existing store-listing sync check:
    python3 scripts/assemble-handbook-legal.py /tmp/legal-out
    if ! git diff --quiet docs/handbook/de/index.html; then
      echo "::error::docs/handbook/de/index.html legal-downloads block is stale — re-run scripts/assemble-handbook-legal.py and commit."
      git diff docs/handbook/de/index.html; exit 1
    fi
    
    Run this after the store-listing sync step (or run both generators then a single diff), so the chained block state matches.
  • Extend the container smoke test: assert a representative legal download is present (not 404), e.g.:
    for f in legal/privacy_policy_de.pdf legal/privacy_policy_de.docx legal/terms_of_use_en.pdf; do
      code=$(curl -s -o /dev/null -w '%{http_code}' -u "x:x" "http://127.0.0.1:8080/$f")
      [ "$code" = "404" ] && { echo "$f missing"; docker logs handbook; exit 1; }
    done
    

Acceptance criteria

  • handbook.realunit.app/de/ shows a new "Rechtsdokumente" (nav L) section listing the 3 RealUnit documents, each with PDF + DOCX download links per available language and a link to its .md source.
  • Downloads resolve (200 behind auth): /legal/<base>_<lang>.pdf and .docx for all (base, lang) found in assets/legal/.
  • PDF and DOCX content matches the current Markdown (same text the app renders).
  • No DFX and no Aktionariat documents appear in the section.
  • Adding a hypothetical assets/legal/privacy_policy_fr.md makes a French row appear with no code change (language discovery is by glob).
  • PDFs/DOCX are git-ignored and exist only in the built image (not committed).
  • The committed index.html legal block is in sync (the new gate is green).
  • handbook-build-check.yaml builds the image, the container smoke test passes including the legal-download presence checks.

Validation plan

  1. Local: python3 scripts/assemble-handbook-legal.py docs/handbook then bash scripts/build-legal-downloads.sh docs/handbook, open docs/handbook/de/index.html, verify links + rendered PDFs/DOCX.
  2. docker build -f Dockerfile.handbook -t realunit-handbook:legal .; run it; curl -u x:x http://127.0.0.1:8080/legal/terms_of_use_de.docx -o /tmp/t.docx and open in Word to confirm it is a valid, editable document (this is the artifact David needs).
  3. Confirm the PR's Handbook Build Check passes.

Branch / workflow

Per repo conventions (CONTRIBUTING.md): feature branch → PR against staging (not develop), open as Draft. Use the 3-subagent review loop. The handbook deploy (handbook-deploy.yaml, develop→DEV→PRD) picks the change up automatically once promoted.

Open points (non-blocking)

  • Registration agreement also exists as official signed PDFs on realunit.ch (_registrationAgreementPdfUrls in legal_documents_config.dart). The generated PDF here reflects the in-app Markdown text, which may differ from the signed original. Decide later whether to additionally surface the signed realunit.ch PDF as a separate link in that document's card. Not required for v1.
  • Corporate documents (prospectuses, articles of association, investment regulations) could later get their own "extern" sub-block that simply links to the realunit.ch/realunit.de URLs already in LegalDocumentsConfig — read-only links, no derived export. Out of scope here.
  • Branding of PDF/DOCX (logo, brand colours via --css / --reference-doc) is a later polish; v1 may ship default pandoc styling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions