Skip to content

fix(updater,dash): surface apply errors + auto-recover stale install dir#386

Merged
thinmintdev merged 1 commit into
mainfrom
fix/update-button-feedback
May 28, 2026
Merged

fix(updater,dash): surface apply errors + auto-recover stale install dir#386
thinmintdev merged 1 commit into
mainfrom
fix/update-button-feedback

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Summary

  • Wires the Settings → Updates flow end-to-end so the Install update button stops silently no-op'ing: the apply call now polls /api/updates/status/{id} until terminal and toasts the backend's verdict; the useUpdateCheck hook stops POSTing to a GET-only route.
  • Auto-recovers stale /usr/lib/hal0/hal0-<v>/ extract dirs from prior failed applies. Foreign non-empty dirs are still refused — recognises a hal0 install by VERSION file or pyproject.toml name="hal0", otherwise moves aside to <dest>.stale-<unix-ts> so the retry succeeds.
  • Surfaces the apply error path via onError toasts on both Install update and lemonade Check.

What was broken

End-to-end repro on the LXC: Update available 0.3.1-alpha.1 → click Install update → success toast → nothing happens.

Root cause stack:

  1. Backend POST /api/updates/apply returns 202 with {id, state: "queued"}.
  2. UI shows "Update started" toast and never polls /status/{id}.
  3. Background job hits UpdateExtractError: install dir already exists and is non-empty: /usr/lib/hal0/hal0-0.3.1-alpha.1 from a previous half-failed attempt, transitions to state: "failed".
  4. The "Check" button on the lemonade row POSTs to /api/updates/check which is GET-only → silent 405.

Changes

  • ui/src/api/hooks/useUpdates.ts: fix verb + arg shapes, add useUpdateJob(jobId) poller (1.5s, tolerates transient errors during self-restart, stops on applied/failed).
  • ui/src/dash/settings.jsx: wire applyM with onSuccess(setJobId) + onError(toast), render queued… / installing… inline, fire applied/failed toasts once.
  • src/hal0/updater/updater.py: _extract_tarball quarantines prior hal0 extractions instead of refusing; remove duplicate non-empty check in apply().
  • tests/updater/test_updater.py: refit the "refuses non-empty" test to require a foreign payload; add quarantine-and-retry test.

Test plan

  • pytest tests/updater/ — 34/34
  • pytest tests/api/test_updater_routes.py tests/api/test_typed_errors.py tests/api/test_stubs_return_envelope.py — 75/77 (2 skipped)
  • ruff format --check + ruff check clean on edited files
  • tsc --noEmit + npm run build green
  • Manual smoke on LXC: click Install update → see inline progress → terminal toast

🤖 Generated with Claude Code

The Settings → Updates "Install update" button silently no-op'd from
the user's POV: the apply endpoint returned 202 with a job id, the UI
showed a "started" toast and never polled, and the background job
crashed with UpdateExtractError ("install dir already exists and is
non-empty") because a prior failed attempt left /usr/lib/hal0/hal0-<v>/
populated. Three fixes:

1. ui/src/api/hooks/useUpdates.ts:
   - useUpdateCheck now GETs /api/updates/check (was POSTing to a
     GET-only route → 405).
   - useUpdateApply takes an optional `version` (was misnamed
     `channel`); the channel is implicit server-side.
   - New useUpdateJob(jobId) polls /api/updates/status/{id} until
     terminal so the UI can show "queued/installing…" inline and
     toast on applied/failed.

2. ui/src/dash/settings.jsx:
   - "Install update" calls applyM.mutate(undefined, …) with onError
     toast, captures the returned job id, and shows progress + the
     backend's verdict via useUpdateJob.
   - lemonade "Check" calls checkM.mutate(undefined) with onError;
     channel arg was meaningless.

3. src/hal0/updater/updater.py:
   - _extract_tarball now quarantines a prior hal0 extraction at the
     same path to <dest>.stale-<unix-ts> instead of refusing, so a
     retry after a half-failed apply isn't permanently wedged.
     Foreign non-empty dirs are still refused — we identify hal0
     installs by VERSION file or pyproject.toml name="hal0".
   - Removed the duplicate non-empty check in Updater.apply(); the
     extract step is the single source of truth.

Tests: updated test_apply_refuses_when_install_dir_exists_nonempty to
require a foreign payload; added test_apply_quarantines_stale_hal0_install
asserting the recovery path. 34/34 updater + 41/41 touched API tests
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev merged commit 0430190 into main May 28, 2026
4 checks passed
@thinmintdev thinmintdev deleted the fix/update-button-feedback branch May 28, 2026 15:15
thinmintdev added a commit that referenced this pull request May 28, 2026
…#387)

PR #386 fixed the silent Install update bug; this records it in the
CHANGELOG (Unreleased → Fixed) and codifies the underlying pattern
in PLAN §9.

CHANGELOG: new Fixed section under Unreleased explaining the three
fixes (UI hook verbs/signatures + job poller, backend extract
quarantine, dedupe).

PLAN §9 (Update mechanism):
  - Note the new extract-time quarantine behavior for prior hal0
    extractions (no longer blocks retry after a half-failed apply).
  - New "Async-job API contract" subsection: any 202+job_id endpoint
    (updater apply, model pull, future jobs) requires the client to
    poll GET /status/{id} until terminal state and toast the
    verdict. A 202 ack alone is silently wrong — this was the entire
    root cause of #386.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev added a commit that referenced this pull request May 29, 2026
End-of-stream cut for v0.3. Bundles MCP-completion, memory-map redesign,
Settings → Updates fix (#386), silent-eviction dispatcher recovery (#392),
ADR-0020 OpenRouter callback skeleton (#409), persona spending-cap
primitive (#411), δ-harness Hermes coverage (#410), and the docs/internal
pin + dashboard-v3 walkthrough (#389/#390).

After this tag, active scope rolls to v0.4 (install-mode reconciliation
+ UI polish + fully-implemented Agents/UI/Install bootstrapped) and v0.5
(MCP admin + memory wiring across UI and agents).

CHANGELOG merged from two coexisting Unreleased blocks into a single
[v0.3.2-alpha.1] section; added missing entries for #392 (dispatcher),
#387 (async-job polling contract), and the docs PRs #389/#390.

pyproject 0.3.1-alpha.1 → 0.3.2-alpha.1. uv.lock resynced (was stuck at
0.3.0a1 from prior drift).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant