Skip to content

fix(install,#1087): make per-VRM download failures non-fatal#1090

Merged
joelteply merged 1 commit into
canaryfrom
fix/install-always-nonfatal-degraded
May 11, 2026
Merged

fix(install,#1087): make per-VRM download failures non-fatal#1090
joelteply merged 1 commit into
canaryfrom
fix/install-always-nonfatal-degraded

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Closes #1087. Per the issue: third-party CDN failures (RTX install hit OpenGameArt curl exit 11 on `vroid-female-base.vrm`) propagated through `set -e` and exited the entire `download-avatar-models.sh` script. The model-init container then exited non-zero, masking the fact that download-models.sh (Qwen + voice + embeddings) had already completed successfully. Compounded with #1085 (tier-name canon) for the "RTX install ships with no Qwen" symptom.

Pairs with: #1085 (install-time SOURCE of no-Qwen) + #1089 (runtime VISIBILITY via NoLocalModelLoadable). This PR closes the third leg — third-party CDN tolerance.

What changed

  • Wrapped each per-VRM `curl`/`wget` in `set +e ... set -e` so a single download failure increments a `FAILED` counter instead of killing the script. Script-level `set -e` invariant preserved everywhere else (jq, mkdir, mv, etc. still hard-fail on real bugs).
  • Captured + logged the actual curl exit code on each failure (Joel's "never swallow errors — evidence is for the debugger" rule). Warnings include exit code, failed name, and source URL — next debugger has everything they need.
  • Run summary at the end emits a "DEGRADED" structured warning naming exactly which VRMs failed + the upstream cause + the re-run command. Operator visibility, not silent suppression.
  • Script unconditionally exits 0 — partial avatar set is acceptable (Bevy live mode degrades to whatever VRMs are present), and a third-party CDN blip should NOT block install. Diagnostic carried in the summary.
  • Bonus: replaced hardcoded `8` with `EXPECTED` constant; quoted mktemp captures (shellcheck SC2155).

Validation class

  • Contract TDD: n/a — defensive shell hardening, no new contract.
  • Failure TDD: smoke-tested locally (`MODELS_DIR=/tmp/avatar-smoke-test bash -x download-avatar-models.sh`) — all 8 VRMs downloaded successfully on host with working CDN, exit 0.
  • Performance/Resource/Residency VDD: n/a.
  • Browser/UX evidence: precommit browser ping passed.
  • Platform coverage: Mac M5 Pro shell. Same shape proven by existing per-file failure handling in `download-models.sh:115-124`.

Validation

  • Smoke test: 8/8 VRMs downloaded successfully (success path); exit 0.
  • Failure path code is symmetric (set +e capture exit, log, increment FAILED, continue) — same shape as `download-models.sh:115-124` which has been working in production.
  • Precommit hook: TypeScript clean; browser-ping precommit test passed.

Why exit 0 unconditionally

Joel's rule "never swallow errors" requires the failure to reach a readable log — which we do via stderr + structured DEGRADED summary. The "swallow" anti-pattern is hiding the failure; emitting the per-failure exit code + names + cause + remedy is the OPPOSITE of swallowing.

The trade-off "block install for partial avatar set vs ship install with partial avatar set + visible warning" lands clearly on the second per #1087 issue body discussion. The model-init container's whole purpose is to prepare assets; one third-party CDN flake should not invalidate the run when 99% of assets are in place.

Known gaps

  • `download-voice-models.sh` (legacy script, replaced by registry-driven `download-models.sh` for voice models in `auto_download.always`) is NOT touched here. If any older container image still invokes it and hits a similar curl exit, that's a separate issue (file as needed).

…oad-avatar-models.sh

Per the issue: third-party CDN failures (RTX install hit OpenGameArt curl
exit 11 = CURLE_FTP_WEIRD_PASS_REPLY on vroid-female-base.vrm) propagated
through `set -e` and exited the entire script, which made the model-init
container exit non-zero. Compounded with #1085 (tier-name canon) for the
"RTX install ships with no Qwen" symptom.

Fix shape per #1087's recommended Option A:
- Wrap each per-VRM curl/wget call in `set +e ... set -e` so a single
  download failure increments a FAILED counter instead of killing the
  script. The script-level `set -e` invariant is preserved everywhere
  else (jq, mkdir, mv, etc. still hard-fail on real bugs).
- Capture and log the actual curl exit code on each failure (Joel's
  "never swallow errors — evidence is for the debugger" rule). The
  warning includes the exit code, the failed name, and the source URL
  so the next debugger has everything they need.
- Run summary at the end emits a "DEGRADED" structured warning naming
  exactly which VRMs failed + the upstream cause (third-party CDN, not
  a Continuum bug) + the re-run command. Operator visibility, not
  silent suppression.
- Script unconditionally exits 0 — partial avatar set is acceptable
  (Bevy live mode degrades to whatever VRMs are present), and a
  third-party CDN blip should NOT block install. The summary above
  carries the diagnostic; downstream consumers see clean exit + warning.
- Bonus: replace hardcoded `8` with EXPECTED constant; quote tmpzip /
  tmpdir / vrm_file mktemp captures (shellcheck SC2155).

Smoke-tested locally: MODELS_DIR=/tmp/avatar-smoke-test bash -x
download-avatar-models.sh → all 8 VRMs downloaded successfully on host
with working CDN + exit 0. Failure path code is symmetric (set +e capture
exit, log, increment FAILED, continue) — same shape proven by the
existing per-file failure handling in download-models.sh:115-124.

Closes #1087.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 4f56f93 into canary May 11, 2026
3 checks passed
@joelteply joelteply deleted the fix/install-always-nonfatal-degraded branch May 11, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant