fix(install,#1087): make per-VRM download failures non-fatal#1090
Merged
Conversation
…oad-avatar-models.sh Per the issue: third-party CDN failures (RTX install hit OpenGameArt curl exit 11 = CURLE_FTP_WEIRD_PASS_REPLY on vroid-female-base.vrm) propagated through `set -e` and exited the entire script, which made the model-init container exit non-zero. Compounded with #1085 (tier-name canon) for the "RTX install ships with no Qwen" symptom. Fix shape per #1087's recommended Option A: - Wrap each per-VRM curl/wget call in `set +e ... set -e` so a single download failure increments a FAILED counter instead of killing the script. The script-level `set -e` invariant is preserved everywhere else (jq, mkdir, mv, etc. still hard-fail on real bugs). - Capture and log the actual curl exit code on each failure (Joel's "never swallow errors — evidence is for the debugger" rule). The warning includes the exit code, the failed name, and the source URL so the next debugger has everything they need. - Run summary at the end emits a "DEGRADED" structured warning naming exactly which VRMs failed + the upstream cause (third-party CDN, not a Continuum bug) + the re-run command. Operator visibility, not silent suppression. - Script unconditionally exits 0 — partial avatar set is acceptable (Bevy live mode degrades to whatever VRMs are present), and a third-party CDN blip should NOT block install. The summary above carries the diagnostic; downstream consumers see clean exit + warning. - Bonus: replace hardcoded `8` with EXPECTED constant; quote tmpzip / tmpdir / vrm_file mktemp captures (shellcheck SC2155). Smoke-tested locally: MODELS_DIR=/tmp/avatar-smoke-test bash -x download-avatar-models.sh → all 8 VRMs downloaded successfully on host with working CDN + exit 0. Failure path code is symmetric (set +e capture exit, log, increment FAILED, continue) — same shape proven by the existing per-file failure handling in download-models.sh:115-124. Closes #1087. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1087. Per the issue: third-party CDN failures (RTX install hit OpenGameArt curl exit 11 on `vroid-female-base.vrm`) propagated through `set -e` and exited the entire `download-avatar-models.sh` script. The model-init container then exited non-zero, masking the fact that download-models.sh (Qwen + voice + embeddings) had already completed successfully. Compounded with #1085 (tier-name canon) for the "RTX install ships with no Qwen" symptom.
Pairs with: #1085 (install-time SOURCE of no-Qwen) + #1089 (runtime VISIBILITY via NoLocalModelLoadable). This PR closes the third leg — third-party CDN tolerance.
What changed
Validation class
Validation
Why exit 0 unconditionally
Joel's rule "never swallow errors" requires the failure to reach a readable log — which we do via stderr + structured DEGRADED summary. The "swallow" anti-pattern is hiding the failure; emitting the per-failure exit code + names + cause + remedy is the OPPOSITE of swallowing.
The trade-off "block install for partial avatar set vs ship install with partial avatar set + visible warning" lands clearly on the second per #1087 issue body discussion. The model-init container's whole purpose is to prepare assets; one third-party CDN flake should not invalidate the run when 99% of assets are in place.
Known gaps