Skip to content

Install: always[] avatar/voice download failure (curl exit 11) blocks by_tier Qwen download #1087

@joelteply

Description

@joelteply

airc-queue card

Coordinates work via the AIRC queue substrate (airc#562). Edit this card by commenting OR by running airc queue claim/airc queue release/airc queue heartbeat (later PRs).

{
  "kind": "airc-queue-card-v1",
  "id": "#1087",
  "owner": "claude-tab-2",
  "status": "claimed",
  "evidence": "Adopted existing GitHub issue into airc queue.",
  "next_action": "Triage, claim, or close this adopted backlog card."
}

Close this issue when the work is done (status=merged/abandoned).

Original issue body

Pre-adoption body

Summary

src/scripts/download-models.sh runs always[] downloads (voice, embedding, avatar) before the by_tier[] downloads (Qwen). A failure in any always[] step exits the whole script via set -euo pipefail and prevents the Qwen artifacts from being fetched. This compounds with #1085 (tier-name canon + TIER pass-through) — even with #1085 applied, install ships with no Qwen if any always[] download fails.

Observed evidence — Windows/RTX cross-platform validation of PR #1085

While validating PR #1085 on RTX 5090 / Windows / Docker Desktop / WSL2, the model-init container with TIER=full env crashed at exit code 11 BEFORE reaching the registry-driven Qwen download.

Crash point per container stdout:

Whisper base model already exists
Piper TTS model already exists
Kokoro TTS model already exists
Pocket-TTS: Skipping voice embeddings (no HF_TOKEN in config.env)
Silero VAD model already exists
Downloading Orpheus TTS (3B, LoRA-trainable, ~2.5GB total)...
  Skipping tokenizer (requires HF_TOKEN in config.env...)
Orpheus TTS download incomplete (some files missing)
Downloading vroid-female-base.vrm from OpenGameArt...
   ← exit 11 here, script never returns to by_tier section

This matches the original carl-1038-rtx-model-init-1 container behavior 4 days ago (also exit 11). The 2026-05-11 Windows/RTX VDD finding "no local Qwen, personas silent" was caused by BOTH bugs compounding:

  1. Tier-name divergence + cgroups bottom-out → no Qwen-by-tier even if always[] succeeds. (FIXED in PR fix(install): align tier name to registry canon + pass TIER to model-init + fail loud on unknown tier #1085)
  2. Always[] avatar download failure → set -euo pipefail kills the script BEFORE by_tier[] runs. (NOT addressed in fix(install): align tier name to registry canon + pass TIER to model-init + fail loud on unknown tier #1085; this issue tracks it)

Why exit 11 specifically

curl exit code 11 maps to CURLE_FTP_WEIRD_PASS_REPLY per curl docs — atypical for HTTP. Could surface from a transient OpenGameArt CDN issue, rate limit, or proxy intermediation specific to certain hosts. Either way, the install behavior should be: avatar/media download failures degrade gracefully and the Qwen artifacts still download.

Recommended fix shape (separate from #1085)

Two complementary options, either alone closes the alpha gap:

Option A: bounded nonfatal stage in download-models.sh

Wrap the always[] downloads in set +e / set -e around the categories that can degrade (avatar, optional voice models), and continue to by_tier[] regardless. Emit a structured warning instead of exiting. Adds a degraded_status summary at the end:

=== install-degraded-status ===
  avatar: FAILED (curl exit 11) - personas will use blank avatars
  qwen: ok (qwen3.5-4b-code-forged 4.36 GiB, qwen2-vl-7b 4.36 GiB)

The required always[] items (voice/embedding/whisper) stay fatal; the optional ones (avatar, lora-trainable tokenizers) degrade.

Option B: separate model-init services

Split the current model-init service into:

  • model-init-required: voice + embedding + whisper + Qwen artifacts (the things personas can't function without). Fatal on any failure.
  • model-init-optional: avatar VRM, LoRA tokenizers, fine-tuning assets. Best-effort. Reports degraded status to the runtime which surfaces it in the UI.

This pairs cleanly with Lane B-Docker (GPU profile modularization, #1084) — each service has its own health, failure domain, and restart boundary. Lane B-Docker can own the compose split.

Out of scope for this issue

Acceptance criteria

  1. model-init runs with TIER=full → Qwen artifacts land in the model volume even if avatar/optional download fails.
  2. Avatar/optional failures emit a structured degraded-status log line that the runtime can read.
  3. Health check exposes "required artifacts present" separately from "optional artifacts present" so the alpha gate can succeed with degraded avatars.
  4. Windows/RTX fresh install reaches model-ready state OR fails loud at a real Qwen failure, never silent at an avatar download.

Cross-references

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    airc-queueAIRC-backed agent work queue card

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions