Skip to content

fix(toolbox/flm): pass -noert to XRT build to skip Vitis-only firmware#13

Merged
thinmintdev merged 4 commits into
mainfrom
fix/flm-toolbox-ci-2026-05-15
May 16, 2026
Merged

fix(toolbox/flm): pass -noert to XRT build to skip Vitis-only firmware#13
thinmintdev merged 4 commits into
mainfrom
fix/flm-toolbox-ci-2026-05-15

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Summary

  • XRT build.sh aborted in 0.3s with XILINX_VITIS is undefined
  • XDNA2 NPU doesn't use the MicroBlaze ERT firmware path — -noert flag bypasses that requirement
  • Unblocks the FLM toolbox build → unblocks manifest.json flm digest pin → unblocks release.yml (refuses to publish if any toolbox digest is null)
  • Closes task chore(ci): reformat 3 files added by post-#2 PRs #15

Status — DRAFT until CI completes

Test plan

  • toolbox.yml dispatched run succeeds
  • manifest.json:18 shows real sha256:... digest (not null)
  • FLM image pulls + runs on hal0-test

🤖 Generated with Claude Code

thinmintdev and others added 2 commits May 16, 2026 00:42
Squashes 9 incremental fixes from fix/flm-toolbox-ci-2026-05-15 into a
single coherent Dockerfile change.  The continue-on-error mask on the
FLM matrix entry (see follow-up commit for #26) hid each failure as a
green check during the chase; this commit captures the full set of
working build deps in one place.

Highlights:
  - install rustc/cargo via rustup pinned to 1.85.0 (apt's 1.75 is too
    old for transitive crates like unicode-segmentation@1.13.2 which
    require >=1.85)
  - run upstream xrtdeps.sh for XRT system deps (boost, ncurses,
    systemd, opencl, ffmpeg, nasm)
  - cd into src/ for cmake (the build expects to run from there)
  - copy the XRT staging tree directly — ./build.sh -install is broken
    for our packaging path
  - pass -noert to skip Vitis-only firmware (irrelevant for runtime
    inference on consumer Strix Halo)

Result: the FLM build now actually completes end-to-end on the GitHub
runner, ~25-30 min.  Co-located here with the matrix unmasking so
future breaks surface as red checks instead of silent skipped uploads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The FLM matrix entry had `optional: "true"` paired with
`continue-on-error: \${{ matrix.optional == 'true' }}`, meaning a real
build failure surfaced as a green check with the upload step "skipped".

This mask hid ~9 cascading build failures across several CI runs today
(rustc version, boost deps, ncurses, opencl-headers, ffmpeg, build
context, XRT install path, ...).  Each green check made it look like
the toolbox image had been published when in fact no bytes shipped.

Removing the flag so the continue-on-error condition evaluates false
for FLM and real failures fail the matrix — the canonical fix
recommended by Team I after they spotted the masked-failure pattern
on run 25951943017.

ComfyUI keeps `optional: "true"` for now — it's been a true ENOSPC
flakiness issue, not a stuck build, and has a documented local-build
path until we move it to a larger runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev force-pushed the fix/flm-toolbox-ci-2026-05-15 branch from e708eef to 2bd1020 Compare May 16, 2026 04:42
thinmintdev and others added 2 commits May 16, 2026 01:10
Ubuntu 24.04's base image ships a default `ubuntu` user at uid 1000
with a matching group at gid 1000.  The hal0 user runs at uid/gid 1000
deliberately (matches the host hal0 user so bind-mounted model caches
under /var/lib/hal0 don't end up owned by nobody), so the groupadd
fails with "GID 1000 is not unique" → exit 4.

Single-line prefix to the user-setup block: drop the conflicting ubuntu
user before claiming uid/gid 1000.  `|| true` keeps the build working
on base images that don't ship the ubuntu user (e.g. Debian).

Found post-rebase by Team I on run 25952734725 — surfaced cleanly now
that the continue-on-error mask is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ghcr.io/hal0ai/hal0-toolbox-flm:v1 @
sha256:6ef99c2f202a0166b3034d474726ba49a36093b16f26e6e472607876a715e690

Sourced from the manifest-pinned artifact produced by toolbox.yml run
25953541525 on cd3fd15, where the FLM job completed end-to-end
(Build & push → cosign sign → digest emit → digest upload) with no
mask suppressing the result. Unblocks task #15 and lets release.yml's
null-digest gate pass for the v1 RC.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@thinmintdev thinmintdev marked this pull request as ready for review May 16, 2026 06:23
@thinmintdev thinmintdev merged commit ef8a51e into main May 16, 2026
3 of 6 checks passed
@thinmintdev thinmintdev deleted the fix/flm-toolbox-ci-2026-05-15 branch May 16, 2026 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant