Skip to content

feat(backup): configuration backup service (collect ingest + backend UI API)#81

Open
edospadoni wants to merge 35 commits intomainfrom
feat/backup-service
Open

feat(backup): configuration backup service (collect ingest + backend UI API)#81
edospadoni wants to merge 35 commits intomainfrom
feat/backup-service

Conversation

@edospadoni
Copy link
Copy Markdown
Member

@edospadoni edospadoni commented Apr 21, 2026

Preview link

Summary

Adds a first-class configuration-backup subsystem for NS8 and NethSecurity clients, replacing the legacy external backupd.nethesis.it. Clients stream a GPG-encrypted snapshot of their configuration to collect; users consume it (list / download / delete) from backend with the usual Logto JWT and RBAC.

Storage is bring-your-own S3: the code targets any S3-compatible bucket (DigitalOcean Spaces, AWS S3, Cloudflare R2, Backblaze B2, MinIO, Garage). MY never holds the plaintext — the ciphertext is produced client-side on the client.

Architecture

┌──────────────┐ GPG blob, Basic auth  ┌──────────────┐   PutObject    ┌──────────────┐
│    Client    │ ────────────────────► │   collect    │ ──────────────►│   S3 bucket  │
│ (NS8 / NSEC) │ POST /systems/backups │  (ingest)    │                └──────▲───────┘
└──────────────┘                       └──────────────┘                       │ presigned GET
                                                                              │
                                                                      ┌──────────────┐
                                                                      │   backend    │
                                                                      │  (UI reads)  │
                                                                      └──────────────┘

S3 object key layout — one prefix per system, recognisable at a glance in any bucket browser:

{org_id}/{system_key}/{backup_id}.{ext}

backup_id is a UUIDv7 (time-ordered); system_key is the stable user-facing NETH-… identifier the client authenticates with.

What changed

collect (ingest + client-facing read)

  • POST /api/systems/backups — stream body to S3 with SHA-256 io.TeeReader, metadata reconciled via same-key CopyObject, inline retention enforcement under a Redis SET NX lock, per-system ingest rate limit.
  • GET /api/systems/backups — list own backups (paginated; never truncated at the S3 1000-item response cap).
  • GET /api/systems/backups/:id — download own backup (via presigned URL for external clients; cross-tenant isolation enforced by S3 prefix).
  • Cross-service auth invalidation bus on Redis pub/sub (my:auth:invalidate): when backend regenerates a system secret, collect drops the in-memory + Redis auth cache within ~30 ms.
  • Per-system rate limit (6/min, 429 + Retry-After) and per-system / per-org aggregate quotas.

backend (UI-facing read + GDPR)

  • GET /api/systems/:id/backups — list with size, SHA-256, uploader IP, aggregate quota counters.
  • GET /api/systems/:id/backups/:backup_id/download — issues a short-lived presigned URL (TTL capped server-side at 15 m).
  • DELETE /api/systems/:id/backups/:backup_id — deletes one backup.
  • purgeSystemBackups runs inline on DestroySystem for GDPR Article 17 (right to erasure); destroy is refused if the purge fails, so the operator can retry.
  • Same RBAC as GET /systems/:id — a user sees a system's backups only if their org covers the system creator in the Owner → Distributor → Reseller → Customer hierarchy.

Storage abstraction

  • Shared storage/s3.go in both services. Runs against any S3-compatible endpoint; backend and collect can use separate credentials scoped to read-only vs. write-only on the same bucket to contain blast radius if one service is ever compromised.

Docs

  • New user-facing page under docs/systems/backups (EN + IT).
  • collect/README.md + backend/README.md cover env setup, bucket layout, split credentials, and end-to-end round-trip commands.
  • AGENTS.md updated with the new endpoint family and key layout.

Client API

Base URL: https://<collect-host> (whatever collect is deployed on — in prod my.nethesis.it).

All three endpoints use HTTP Basic auth with the system_key:system_secret pair returned at system registration.

Upload a backup

curl --user "$SYSTEM_KEY:$SYSTEM_SECRET" \
     -H "Content-Type: application/octet-stream" \
     -H "X-Filename: dump.json.gz.gpg" \
     --data-binary @/path/to/backup.gpg \
     https://<collect-host>/api/systems/backups

201 Created on success:

{
  "code": 201,
  "message": "backup stored",
  "data": {
    "id": "019db047-c5f7-766c-bae6-6f407da3e1c9.gpg",
    "filename": "dump.json.gz.gpg",
    "size": 130,
    "sha256": "b7d974f638ed5d07696dfce118fdbb20e2a447edfb3c79e563742925440c75b0",
    "mimetype": "application/octet-stream",
    "uploaded_at": "2026-04-21T13:43:07.502668Z",
    "uploader_ip": "127.0.0.1"
  }
}

Failure modes the client should handle: 401 (invalid/expired credentials → re-fetch from MY), 413 (body too large → trim or split), 429 Retry-After: N (rate-limited → back off N seconds), 5xx (transient → exponential retry).

List own backups

curl --user "$SYSTEM_KEY:$SYSTEM_SECRET" \
     https://<collect-host>/api/systems/backups
{
  "code": 200,
  "message": "backups listed",
  "data": {
    "backups": [
      {
        "id": "019db047-c5f7-766c-bae6-6f407da3e1c9.gpg",
        "filename": "dump.json.gz.gpg",
        "size": 130,
        "sha256": "b7d974f638ed5d07696dfce118fdbb20e2a447edfb3c79e563742925440c75b0",
        "uploaded_at": "2026-04-21T13:43:07.322Z",
        "uploader_ip": "127.0.0.1"
      }
    ]
  }
}

Pagination is handled server-side; the client receives the full list.

Download a backup

curl --user "$SYSTEM_KEY:$SYSTEM_SECRET" \
     -o restore.gpg \
     https://<collect-host>/api/systems/backups/019db047-c5f7-766c-bae6-6f407da3e1c9.gpg

Streams the ciphertext body with Content-Type: application/octet-stream. The client decrypts locally with its own GPG passphrase. Cross-tenant reads return 404 (a foreign backup_id simply does not exist under the caller's S3 prefix).

Client-side dedup (optional)

Before uploading, the client can compute the local MD5 of the just-encrypted blob and compare it with the sha256 of the previous run. If unchanged, the upload can be skipped to save bandwidth. Retention and quota enforcement live server-side regardless.

Deployment notes

New environment variables — identical on backend and collect:

BACKUP_S3_ENDPOINT=https://ams3.digitaloceanspaces.com
BACKUP_S3_REGION=ams3
BACKUP_S3_BUCKET=my-backups
BACKUP_S3_ACCESS_KEY=...
BACKUP_S3_SECRET_KEY=...
BACKUP_S3_USE_PATH_STYLE=false   # true only for local MinIO/Garage
#BACKUP_PRESIGN_TTL=5m            # capped at 15m server-side

Both .env.example and render.yaml are updated. Credentials can be split between backend (read-only) and collect (write-only) on providers that support per-key policies.

@edospadoni edospadoni deployed to feat/backup-service - my-collect-qa PR #81 April 21, 2026 13:57 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-backend-qa PR #81 April 21, 2026 13:57 — with Render Active
Expose the appliance-facing HTTP surface for configuration backups:

- POST /api/systems/backups   — stream upload with SHA256 tee + retention
- GET  /api/systems/backups   — list own backups with rich metadata
- GET  /api/systems/backups/:id — restore stream for the uploading system

Backups land in a DigitalOcean Spaces bucket (BACKUP_S3_* env) sharing
the S3 account already wired up for Mimir. Per-system retention caps
(BACKUP_MAX_PER_SYSTEM, BACKUP_MAX_SIZE_PER_SYSTEM) are enforced inline
by pruning the oldest objects under the system's prefix after each
upload. SHA256 is attached as object metadata via a same-key CopyObject
once the stream completes.

Object layout: {org_id}/{system_id}/{backup_id}.{ext}

Authentication reuses the existing system_key:system_secret BasicAuth
middleware; the authenticated system_id drives the prefix and is the
only backup surface each appliance can read or write.
Expose the user-facing backup surface on the backend with standard JWT
auth and org-based RBAC:

- GET    /api/systems/:id/backups                        — list backups + aggregate usage counters
- GET    /api/systems/:id/backups/:backup_id/download    — 302 to a short-lived presigned S3 URL
- DELETE /api/systems/:id/backups/:backup_id             — remove a stored backup

Backend reads directly from the same DigitalOcean Spaces bucket used by
collect for ingest (shared S3 account with Mimir, BACKUP_S3_* env).
Downloads are handed off to the browser via presigned URLs so the API
never streams object bodies itself; TTL defaults to 5 minutes and is
configurable via BACKUP_PRESIGN_TTL.

Access control reuses systemsService.GetSystem so a user may only see,
download, or delete backups for systems their organization owns.
Add OpenAPI spec entries for the three backend backup endpoints and a
shared `BackupMetadata` schema describing per-object metadata (id,
filename, size, sha256, uploader ip/ua/version).

Change the download endpoint response from a 302 redirect to a JSON
payload carrying `download_url` and `expires_in_seconds`. Browsers drop
the Authorization header when following 3xx redirects, so the frontend
must fetch the URL with its JWT and then navigate to it explicitly.
Wire the backup storage configuration into both environments for backend
and collect services. All four values are set via the Render dashboard
(sync: false): BACKUP_S3_ENDPOINT, BACKUP_S3_ACCESS_KEY, and
BACKUP_S3_SECRET_KEY reuse the same DigitalOcean Spaces account already
configured for Mimir; BACKUP_S3_BUCKET is environment-specific and
points at a bucket dedicated to appliance configuration backups.
Headers relied on emoji icons (🚀 🌐 💾 🛠️ 🛑 ⚙️) that rendered
inconsistently across terminals and IDEs. Replace them with plain
section labels; the content is unchanged.
Bring up an S3-compatible store on the developer machine so the full
backup round-trip (appliance upload → collect ingest → Garage →
backend list → presigned download) can be exercised without a cloud
provider.

- services/backup/docker-compose.local.yml: single-node Garage plus a
  one-shot bootstrap that assigns the cluster layout, creates the
  bucket, imports a fixed service key, and grants access. Every
  command is idempotent.
- services/backup/garage-local.toml: local-only Garage config with a
  weak RPC secret (the port is never exposed outside the container
  network).
- services/backup/README.md: step-by-step setup, env-var wiring for
  both host-run and container-run components, and inspection commands
  (garage CLI via podman exec, plus aws-cli recipes).

Backend also gains BACKUP_S3_PRESIGN_ENDPOINT (optional). When set, the
presigner signs download URLs with that hostname instead of
BACKUP_S3_ENDPOINT, so browsers can follow a URL generated by a
backend that reaches Garage through a container-internal hostname.
Empty in production; both clients use the same Spaces endpoint.
Shell helper that reproduces the NS8 / NethSecurity upload pipeline
against the locally running collect endpoint: build a minimal JSON
payload, gzip it, encrypt with GPG symmetric AES-256 (same flags the
appliances use), and PUT-as-POST the blob via HTTP Basic auth.

Targets collect at $COLLECT_URL with credentials from $SYSTEM_KEY /
$SYSTEM_SECRET, then inspects Garage with the bundled CLI and (if
available) awscli to confirm the object landed under the right prefix.

Useful for exercising the round-trip end-to-end without standing up an
actual appliance or pulling files off demo-heron2.
Extend the Testing section with a curl example for the new backup
upload endpoint and add a short Appliance integration block that lists
the three endpoints (upload/list/download) appliance teams need when
porting NS8 and NethSecurity to the new collect target.

Mentions client-side MD5 dedup as a useful optimization the appliance
can keep from the backupd-era flow.
Bring the directory in line with services/mimir conventions and promote
it to a first-class component in version.json:

- Add VERSION (0.5.0), Containerfile, .env.example, .render-build-trigger
  for parity with services/mimir (Garage is kept local-only — the
  Containerfile exists for ad-hoc self-hosted deployments, but Render
  provisions no instance and production backups go to DO Spaces).
- Add Makefile mirroring services/mimir targets (dev-up, dev-down,
  dev-restart, dev-logs, dev-status, dev-ready, dev-shell, dev-objects,
  dev-reset) plus a dev-setup that injects BACKUP_S3_* into backend/.env
  and collect/.env, and a test-roundtrip wrapper.
- Restructure the README around topology, quick start, ports table,
  credentials, object layout, retention, and production note, matching
  the shape of services/mimir/README.md.
- List services/backup under the Components block in README.md.
- Register services/backup in version.json and update release.sh so
  future bumps propagate to services/backup/VERSION alongside the other
  components.
- Document the component in AGENTS.md (`### 3.6 Backup`), promote the
  "six first-class components" line to seven, and add the new entry to
  the component README index.
Close the two gaps the conformance audit flagged:

- collect/methods/backups_unit_test.go covers the two pure helpers
  (extractFilename / extractExtension, including path-traversal and
  compound-extension cases) and the three handlers' auth / content-
  length short-circuits.
- backend/methods/backups_unit_test.go pins the JSON shape of
  BackupMetadata and BackupListResponse (the frontend contract) and
  exercises the handlers' guard against missing user context.
- docs/docs/systems/backups.md and its Italian sibling land as draft
  placeholders: the user-facing documentation will be written together
  with the Backups UI; for now the file carries only the frontmatter
  and a comment pointing readers at the server-side references.
…harden local Garage secrets

Three critical findings from the security audit:

1. Path traversal via `backup_id` path parameter. `collect.DownloadBackup`
   and backend `DownloadSystemBackup` / `DeleteSystemBackup` now refuse
   any identifier that does not match a strict UUIDv7 + known-extension
   allowlist, so the segment can never escape the authenticated
   system's S3 prefix. Covered by new unit tests in both components.

2. Upload integrity was best-effort. `UploadBackup` now (a) refuses
   requests with Content-Length <= 0, (b) counts the bytes actually
   read via a TeeReader wrapping countingReader and rejects uploads
   whose stream diverges from the declared length, and (c) deletes the
   stored object and returns 502 when the final sha256 CopyObject
   fails, instead of silently serving `sha256: "pending"` forever.

3. Garage local dev shipped with a committed RPC secret and admin
   token, and exposed the admin API on 0.0.0.0. The config template
   is now expanded at runtime by entrypoint.sh via envsubst, the
   Containerfile copies the template (not the resolved file), the
   compose file binds 127.0.0.1:13900/13903 only, and
   `make dev-secrets` generates fresh GARAGE_RPC_SECRET /
   GARAGE_ADMIN_TOKEN into services/backup/.env on first boot.
…L, GDPR purge, audit download

Seven high-severity findings addressed:

- Filename/system-version headers are now whitelisted (filenames map to
  [A-Za-z0-9._-] and cap at 255, versions map to [A-Za-z0-9._+-] and cap
  at 64), neutralising CRLF-injection into S3 metadata and stored-XSS
  into the UI via Content-Disposition and list responses.
- uploader_ip is recorded from the true peer address (RemoteAddr),
  not from c.ClientIP(), so a malicious appliance cannot spoof audit
  metadata via X-Forwarded-For.
- Retention no longer uses LastModified. Objects are sorted by key
  (whose leading UUIDv7 embeds the ingest timestamp monotonically),
  and a Redis SET NX lock per system_id serialises concurrent pruning
  so a race cannot delete the just-uploaded backup.
- BACKUP_PRESIGN_TTL is clamped at 15 minutes and a new
  validateBackupEndpoint refuses BACKUP_S3_(PRESIGN_)?ENDPOINT values
  that are not https, unless the host is a loopback/localtest name or
  BACKUP_S3_ALLOW_INSECURE=true is set explicitly for dev. Applies to
  both collect and backend.
- DestroySystem now purges the system's S3 prefix before the DB row is
  removed (GDPR Article 17). A purge failure refuses the destroy so
  the operator can retry instead of leaving orphan PII in the bucket.
- Backend DownloadSystemBackup emits LogBusinessOperation so presigned
  URL issuance appears in the audit trail alongside delete.
The four backup-related ListObjectsV2 calls all consumed a single
response, which S3 caps at 1,000 keys. With current per-system limits
(10 backups) this never triggered, but the bug was latent: raising
BACKUP_MAX_PER_SYSTEM or repurposing a prefix would silently drop
objects from the UI list and leave the retention loop under-counting
the system's quota. Switch every call to s3.NewListObjectsV2Paginator
so the full key set is exhausted before the handlers act on it:

- collect listBackupsForSystem (appliance list)
- collect enforceBackupRetention (post-upload pruning)
- backend listSystemBackups (UI list)
- backend purgeSystemBackups (GDPR destroy)
…script, warn on single-node garage

- Pass ServerSideEncryption=AES256 on PutObject and CopyObject so the
  at-rest encryption relies on an explicit request rather than the
  default bucket policy. Defense-in-depth — the payload is already
  GPG-encrypted by the appliance.
- services/backup/test-roundtrip.sh now pipes the curl `user =` config
  over stdin and pipes the GPG passphrase via --passphrase-fd, so
  neither the system_secret nor the GPG passphrase ends up in process
  argv or shell history.
- services/backup/garage-local.toml gets a warning comment next to
  replication_factor = 1 so the template cannot be copy-pasted into
  a production cluster without noticing.
Two hardening items left open by the previous security pass:

- SystemAuthCacheTTL default drops from 24h to 10m. Credential
  rotation, system delete, and system destroy done on the backend now
  propagate to collect within ten minutes instead of up to thirty
  hours. Each miss still hits Redis and then Postgres, so the cost is
  limited to a small rise in DB lookups on low-traffic systems.
- UploadBackup now passes through enforceIngestRateLimit, a Redis-
  backed per-system-id counter with two buckets (minute + hour) whose
  caps default to 6/minute and 60/hour — generous vs the daily-timer
  cadence real appliances use, tight enough to block flood-style
  abuse. Both caps are configurable via BACKUP_RATE_LIMIT_PER_*;
  setting both to 0 disables the limiter. A Redis outage fails open
  so the limiter never becomes the cause of an outage itself.
Four items were deferred because each needs an ops or design decision
outside this component. Record them in the backup README so they stay
visible after the PR ships:

- Split S3 credentials between collect (write) and backend (read).
- Per-organization aggregate quota ceiling.
- Move retention enforcement to an async worker.
- Replace TTL-based cache expiry with a pub/sub invalidation bus.
Move gopkg.in/yaml.v3 out of the indirect block — a direct consumer
lives in the merged alerting code from main.
Six alerting paths under /services/mimir/alertmanager/api/v2/silences
referenced `#/components/responses/RequestEntityTooLarge`, which was
never declared — the redocly validator rejected the whole spec with
unresolved-ref errors. Add the missing component alongside the other
shared error responses so the validator is green again and the 413
semantics are documented at a single source of truth.
…d and collect READMEs

services/backup only ever existed as a local Garage scaffold to
exercise the round-trip before any real S3 bucket was wired up. The
actual integration lives entirely in backend and collect: each
service reads BACKUP_S3_* env vars pointing at any S3-compatible
bucket the operator provisions (DigitalOcean Spaces in prod; any
dev bucket of choice for local work). Removing the directory avoids
having two sources of truth and a first-class component we never
deploy.

Changes:
- Delete services/backup/ (Containerfile, Makefile, docker-compose,
  Garage config, entrypoint, test-roundtrip script, VERSION, env,
  and README).
- Drop the entry from version.json and the matching update blocks
  in release.sh (jq expression + VERSION file bump).
- Drop from AGENTS.md's component tree, return the header count to
  "six first-class components", remove section 3.6 Backup (renumber
  Proxy back to 3.6), and move the backup architecture note inline
  under 3.2 Collect. Add a pointer to backups.go under 3.1 Backend's
  methods catalog.
- Drop from the root README's component list.
- Delete docs/docs/systems/backups.md (EN and IT placeholder scaffolds).
- Add a "Backup storage" block to backend/.env example inside
  backend/README.md and an expanded "Backup storage" section to
  collect/README.md (storage model, object layout, metadata headers,
  and a curl-based round-trip recipe the operator can run against
  any S3 provider).
- Mention the new collect/storage/ directory in collect/README.md's
  project structure tree.
…on bus

Close the last two security follow-ups that did not require ops work:

- Add `BACKUP_MAX_SIZE_PER_ORG` (collect). When non-zero, ingest
  rejects uploads that would push the authenticated system's
  organization past the aggregate byte ceiling. A paginated
  ListObjectsV2 over the `{org_id}/` prefix sums current usage; a
  transient failure is logged and fails open. Defaults to 0
  (unlimited) so existing deployments are unaffected.

- Introduce a Redis pub/sub channel `my:auth:invalidate` that
  backend publishes on whenever a system is soft-deleted, hard-
  destroyed, or has its secret regenerated. Collect starts a
  subscriber at boot (`middleware.StartAuthInvalidator`) that purges
  the matching entries from both the in-process sync.Map and from
  Redis via SCAN + DEL. This cuts cache staleness from the
  SystemAuthCacheTTL window down to sub-second propagation, while
  keeping the TTL as a natural fallback if the bus misfires.
Replace the earlier draft scaffold with full user-oriented guides (EN
and IT) under docs/docs/systems/backups.md. Covers the end-to-end
encryption boundary, authentication model, storage layout and
metadata headers, retention + quota settings, GDPR-driven deletion
behaviour, and the administrator API surface.

Wire the new page into the Systems category of sidebars.ts so it
shows up alongside management / registration / inventory-heartbeat.
Cover the same sequence of S3 operations the ingest and read handlers
rely on — PutObject, same-key CopyObject for the sha256 metadata
reconciliation, HeadObject, ListObjectsV2, GetObject, DeleteObject —
against a live S3-compatible endpoint.

- collect/methods/backups_integration_test.go: guarded by the
  `integration` build tag so it never runs in the default test
  pipeline, skips silently when BACKUP_S3_* env vars are missing, and
  cleans up after itself so an accidental run against a real bucket
  leaves nothing behind.
- collect/Makefile: new `test-integration` target wrapping the same
  `go test -tags=integration` invocation for local use.
- .github/workflows/ci-main.yml: new `backup-integration` job that
  spins up a MinIO service container on :9000, waits for
  /minio/health/live, and runs the integration test against the
  auto-created `my-backups-ci` bucket. The build job now gates on it.
Security review flagged the single shared access key as a blast-radius
concern. The code already supports different keys per service (each
reads BACKUP_S3_ACCESS_KEY/SECRET_KEY from its own env), so the fix is
purely an operational note: call out the supported split in the
backup storage section, describe the minimum IAM surface each side
needs, and list the providers that support scoped keys.
The code default stays `us-east-1` — that matches the AWS SDK's own
fallback and is the right universal value for any S3-compatible
endpoint. Examples in the `.env.example` and README files, however,
already name DigitalOcean Spaces as the production target, so set
`ams3` in the shown config and add a short note explaining when to
change it (AWS region, or any non-empty string for MinIO/Garage).
Use the user-facing `NETH-...` identifier as the middle segment of the
object key instead of the internal UUID. Operators browsing a raw
bucket listing can now recognise each system at a glance without
cross-referencing the DB. Keys become:

    {org_id}/{system_key}/{backup_id}.{ext}

Touches both sides of the backup path — collect (ingest, list,
download, retention, rate limit) and backend (list, download, delete,
GDPR purge) — plus the user-facing and developer docs.
@edospadoni edospadoni force-pushed the feat/backup-service branch from 3eec0c1 to 2007caf Compare April 21, 2026 14:05
@edospadoni edospadoni deployed to feat/backup-service - my-backend-qa PR #81 April 21, 2026 14:05 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-collect-qa PR #81 April 21, 2026 14:05 — with Render Active
@edospadoni edospadoni temporarily deployed to feat/backup-service - my-mimir-qa PR #81 April 22, 2026 07:15 — with Render Destroyed
@edospadoni edospadoni temporarily deployed to feat/backup-service - my-proxy-qa PR #81 April 22, 2026 07:15 — with Render Destroyed
@edospadoni edospadoni closed this Apr 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🗑️ Redirect URIs Removed from Logto

The following redirect URIs have been automatically removed from the Logto application configuration:

Redirect URIs:

  • https://my-proxy-qa-pr-81.onrender.com/login-redirect

Post-logout redirect URIs:

  • https://my-proxy-qa-pr-81.onrender.com/login

Cleanup completed for PR #81.

@edospadoni edospadoni reopened this Apr 22, 2026
@edospadoni edospadoni deployed to feat/backup-service - my-backend-qa PR #81 April 22, 2026 07:30 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-collect-qa PR #81 April 22, 2026 07:30 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-frontend-qa PR #81 April 22, 2026 07:30 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-mimir-qa PR #81 April 22, 2026 07:30 — with Render Active
@github-actions
Copy link
Copy Markdown
Contributor

🔗 Redirect URIs Added to Logto

The following redirect URIs have been automatically added to the Logto application configuration:

Redirect URIs:

  • https://my-proxy-qa-pr-81.onrender.com/login-redirect

Post-logout redirect URIs:

  • https://my-proxy-qa-pr-81.onrender.com/login

These will be automatically removed when the PR is closed or merged.

@edospadoni edospadoni deployed to feat/backup-service - my-collect-qa PR #81 April 22, 2026 08:59 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-backend-qa PR #81 April 22, 2026 09:00 — with Render Active
… 200

DownloadSystemBackup was handing out a presigned GET URL without
checking that the object actually existed — the client got a 200 with
a URL that 404ed on fetch, which leaked the ID mismatch only after an
extra round-trip and confused anyone reading the response shape.

DeleteSystemBackup had the same UX gap for the opposite reason: S3
DeleteObject is idempotent and returns success whether the key was
there or not, so the NoSuchKey branch was dead code and phantom IDs
silently reported "deleted".

Both now probe with HeadObject first, map *s3types.NotFound to 404,
and only proceed when the object exists. Lists already returned 200
with an empty array and the collect-side download already surfaced
404 via GetObject's error path — unchanged.
@edospadoni edospadoni deployed to feat/backup-service - my-backend-qa PR #81 April 22, 2026 09:25 — with Render Active
The four cards on the system Overview (NethSecurity, Status, Subscription,
Additional services) were visually out of sync: NethSecurity uses a
32px SystemLogo while the others use a 20px FontAwesome icon, which
pushed the first-row divider 12px lower in the NethSecurity card; and
AdditionalServicesCard hand-built its rows with py-3 instead of the
py-4 used by DataItem, shaving 8px off every subsequent row.

Normalise both: pin each header to h-10 so the icon size does not bleed
into the row position, and rework the AdditionalServices rows to match
DataItem's flex / padding / typography so rows line up across cards.
@edospadoni edospadoni deployed to feat/backup-service - my-frontend-qa PR #81 April 22, 2026 10:16 — with Render Active
The Render proxy was inheriting nginx defaults on the /collect/api/
location: client_max_body_size defaulting to 1 MB and proxy_read_timeout
capped at 30 s. A 21 MB test upload via the my-ent translation proxy
hit 413 here even though my-ent had already accepted the body, because
the Render nginx in front of collect refused anything over 1 MB.

Add a dedicated ^~ match on /collect/api/systems/backups that raises
the body cap to 2 GB, extends read/send timeouts to 600 s, and turns
off proxy_request_buffering so the GPG blob streams through with a
flat memory footprint. Also prevents 504s when an appliance downloads
a multi-MB backup for restore (GET /api/systems/backups/:id).

Also document the plural/singular REST convention in AGENTS.md.
Hardens the backup subsystem against issues flagged by the security review:

- Destroy no longer leaks backups when the system was already soft-deleted:
  GetSystemIncludingDeleted bypasses the deleted_at filter so the purge
  still resolves (org_id, system_key) in the two-step soft-delete→destroy
  path. Docs clarified: soft delete keeps backups (for Restore), only
  destroy triggers the GDPR erasure.
- Download forces Content-Type: application/octet-stream + nosniff,
  removing the reflected-XSS vector from a client-supplied upload MIME.
- Ingest extension allowlist aligned with the backend regex: unknown
  suffixes collapse to .bin so no stored object becomes un-listable or
  un-deletable from the admin UI.
- BACKUP_MAX_SIZE_PER_ORG defaults to 100 GiB (was 0 = disabled), with
  a startup warning when explicitly disabled.
- uploader_ip removed from API responses (backend + collect + frontend
  types/i18n). It stays in S3 metadata for operator audit but is no
  longer shown to any RBAC tier — the value is inaccurate via the
  translation proxy anyway and is a reconnaissance aid otherwise.
- used_bytes dropped from the 413 quota response; moved to the server
  log so a compromised appliance cannot profile the org's consumption.
The nethsecurity UI drives a "delete backup" action from the
restore/backups list. The appliance already has Basic-Auth access to
upload/list/download under its own prefix; mirror the same contract for
delete so the new my flow can retire a backup directly, without round-
tripping through the admin backend.

Server-derives the S3 key from the authenticated identity, so the
caller can only touch objects under its own {org_id}/{system_key}/
prefix. HeadObject gates a phantom id to 404 instead of the idempotent
200 DeleteObject would otherwise return.
@edospadoni edospadoni deployed to feat/backup-service - my-frontend-qa PR #81 April 23, 2026 08:05 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-backend-qa PR #81 April 23, 2026 08:05 — with Render Active
@edospadoni edospadoni deployed to feat/backup-service - my-collect-qa PR #81 April 23, 2026 08:05 — with Render Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant