feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769
feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds coordinated host binfmt_misc handling to prefer kernel-native 32-bit ARM ELF on aarch64 hosts, conditional qemu deploy/undeploy behavior, and invokes the setup during rootfs creation and image build; setup failures are ignored. ChangesNative armhf via binfmt_elf Mode
Sequence Diagram(s)sequenceDiagram
participant Build as Build Flow
participant Setup as _native_armhf_setup_binfmt_elf()
participant Binfmt as /proc/sys/fs/binfmt_misc
participant Trap as Cleanup Handler
participant Deploy as deploy_qemu_binary_to_chroot()
Build->>Setup: invoke (post-rootfs extraction)
Setup->>Binfmt: read qemu-arm entry/state
alt qemu-arm enabled
Setup->>Binfmt: write disable qemu-arm
Setup->>Trap: register restore handler
Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
else qemu-arm absent_or_disabled
Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
end
Build->>Deploy: proceed to deploy step
Deploy->>Deploy: check ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF
alt native mode set
Deploy-->>Build: early return (skip qemu copy)
else
Deploy->>Build: copy qemu-arm-static into chroot
end
Build-->>Trap: on exit/signal
Trap->>Binfmt: write restore qemu-arm state
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early return in the qemu-static check (when
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF == "yes") prevents the matching undeploy
from running later, leaking host qemu into cached rootfs; instead of returning
from the function, set a clear marker variable (e.g.,
ARMBIAN_QEMU_DEPLOY_SKIPPED=yes or ARMBIAN_QEMU_DEPLOYED=no) after calling
display_alert and allow the function to exit normally so the caller
(rootfs-create.sh) can always run the undeploy path and check that marker;
update the undeploy logic to look for that same marker
(ARMBIAN_QEMU_DEPLOY_SKIPPED or ARMBIAN_QEMU_DEPLOYED) so it only skips removal
when deploy truly never ran.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eba7fdae-710a-42e6-bcb9-0d41993767df
📒 Files selected for processing (4)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.shlib/library-functions.sh
502287f to
6f7903b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
lib/functions/rootfs/qemu-static.sh (1)
20-25:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDeploy/undeploy asymmetry can leak host
qemu-arm-staticinto the rootfs cache.The flag check is symmetric inside this file, but at the call sites (
rootfs-create.sh,rootfs-image.shper the summary),deploy_qemu_binary_to_chrootruns before_native_armhf_setup_binfmt_elfflipsARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes, whileundeploy_qemu_binary_from_chrootruns after. In that ordering the binary gets copied into the chroot, but the matching removal is suppressed by the new early-return at Line 56. The hostqemu-arm-staticthen persists into the rootfs cache and into final armhf images on subsequent cache-hit builds.Track per-(caller, chroot_target) deployment state and only skip undeploy when deploy actually skipped, rather than gating both on the flag alone.
🛠️ Suggested fix
+declare -gA armbian_qemu_deployed_to_chroot=() + function deploy_qemu_binary_to_chroot() { declare chroot_target="${1}" caller="${2}" @@ if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info" return 0 fi @@ display_alert "Deploying qemu-user-static binary to chroot" "${QEMU_BINARY} during ${caller}" "info" run_host_command_logged cp -pv "${src_host}" "${dst_target}" + armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]=yes return 0 } @@ function undeploy_qemu_binary_from_chroot() { @@ - if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then + if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" \ + && "${armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]:-no}" != "yes" ]]; then display_alert "Native armhf via binfmt_elf" "no qemu binary to remove during ${caller}" "debug" return 0 fiAlso applies to: 55-59
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/functions/rootfs/qemu-static.sh` around lines 20 - 25, The early-return based solely on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry across separate invocations; change the logic so that when deploy_qemu_binary_to_chroot skips copying due to the flag it records that decision with a per-(caller,chroot_target) marker (e.g., a temp file named with ${caller} and the chroot target inside the rootfs cache or a shared temp dir), and then have undeploy_qemu_binary_from_chroot check that marker and only skip removal when the deploy previously skipped for that same (caller,chroot_target); ensure you still clean up the marker when an actual copy was performed and when removal runs so state cannot leak into later builds (references: functions deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 169-188: Move the add_cleanup_handler call so it runs before
writing "0" to /proc/sys/fs/binfmt_misc/qemu-arm (register the trap before the
destructive action) and update trap_handler_native_armhf_restore_qemu_arm to not
early-return based on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF; instead key the
cleanup on the actual binfmt_misc state (check [[ -e
/proc/sys/fs/binfmt_misc/qemu-arm ]] and attempt echo 1 >
/proc/sys/fs/binfmt_misc/qemu-arm || true), so a partially completed setup or an
interrupt between the disable and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF assignment
will still re-enable qemu-arm.
---
Duplicate comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early-return based solely on
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in
deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry
across separate invocations; change the logic so that when
deploy_qemu_binary_to_chroot skips copying due to the flag it records that
decision with a per-(caller,chroot_target) marker (e.g., a temp file named with
${caller} and the chroot target inside the rootfs cache or a shared temp dir),
and then have undeploy_qemu_binary_from_chroot check that marker and only skip
removal when the deploy previously skipped for that same (caller,chroot_target);
ensure you still clean up the marker when an actual copy was performed and when
removal runs so state cannot leak into later builds (references: functions
deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e697b537-3deb-4c61-94a1-02ef9c76d299
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/functions/main/rootfs-image.sh
- lib/functions/rootfs/rootfs-create.sh
eab2ab1 to
d990278
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/functions/rootfs/qemu-static.sh (1)
131-134: 💤 Low valueOptional: case-insensitive killswitch matching.
NATIVE_ARMHF_ON_ARM64=NO(orDisabled,False, etc.) would slip past the killswitch and proceed to detection. This is a common ergonomic gotcha when users export env vars from CI configs. Low priority since the function self-aborts on the subsequent guards anyway, but a one-liner makes the contract explicit.♻️ Optional refactor
- # Killswitch — accept no/never/disabled as synonyms; bail before any detection runs. - case "${NATIVE_ARMHF_ON_ARM64:-auto}" in - no | never | disabled) return 1 ;; - esac + # Killswitch — accept no/never/disabled (any case) as synonyms; bail before any detection runs. + case "${NATIVE_ARMHF_ON_ARM64:-auto,,}" in + no | never | disabled) return 1 ;; + esac🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/functions/rootfs/qemu-static.sh` around lines 131 - 134, The killswitch case for NATIVE_ARMHF_ON_ARM64 is currently case-sensitive and will miss values like "NO" or "Disabled"; update the case statement around the NATIVE_ARMHF_ON_ARM64 check to perform case-insensitive matching (e.g., convert the value to lowercase with parameter expansion like "${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the patterns to bracketed forms such as [Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee] to accept common synonyms) so that the return 1 kill path in that branch reliably triggers for uppercase/mixed-case inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 131-134: The killswitch case for NATIVE_ARMHF_ON_ARM64 is
currently case-sensitive and will miss values like "NO" or "Disabled"; update
the case statement around the NATIVE_ARMHF_ON_ARM64 check to perform
case-insensitive matching (e.g., convert the value to lowercase with parameter
expansion like "${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the
patterns to bracketed forms such as
[Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee]
to accept common synonyms) so that the return 1 kill path in that branch
reliably triggers for uppercase/mixed-case inputs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ff5855c0-1c00-4b4b-8c4e-dc619c12d9a0
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/functions/main/rootfs-image.sh
- lib/functions/rootfs/rootfs-create.sh
|
Re: nitpick on |
d990278 to
15789ee
Compare
|
Who use arm64 builders — try it! |
|
aarch64 host (CIX). Always fails:
|
15789ee to
df7aaf9
Compare
|
@igorpecovnik thanks for the report. Root cause: Fix: arch-test moved to AFTER the binfmt_misc disable, so it reflects what the chroot exec will actually face. On detection failure we restore qemu-arm immediately and fall back to qemu-static emulation transparently. After this fix lands on your builder, your build will work as before — slowly, via qemu-arm-static, no regression. To get the speedup, build your builder host's kernel with the |
…lel builds
PR review by codex (gpt-5.5) flagged the missing synchronisation around the
shared /proc/sys/fs/binfmt_misc/qemu-arm flag as the main blocker for shipping
NATIVE_ARMHF_ON_ARM64 default-on. The previous refcount-less code disabled
qemu-arm globally in setup and unconditionally re-enabled it in cleanup —
two parallel builds on the same host could:
* have build A disable qemu-arm and register restore;
* have build B observe 'already disabled', set
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes and skip copying
qemu-arm-static into its own chroot;
* have build A finish, cleanup blindly write 1 → qemu-arm enabled;
* have build B continue with no qemu binary in chroot but
qemu-arm globally registered — runtime failure.
Replace with a per-host owner-flock pattern (codex-recommended; refcount/
pidlist breaks under PID namespaces and PID reuse, kernel fd-release on
process exit gives crash safety for free):
* /run/lock/armbian-native-armhf/owners/<ARMBIAN_BUILD_UUID>.lock —
one file per builder, long-lived flock held for the whole native-mode
window. Closing the fd (process exit, normal or crash) releases the
lock; another builder probing it with non-blocking flock detects the
death and prunes the file.
* control.lock — short-lived flock around prune+count+state-write so
first/last-owner decisions stay consistent under concurrency.
* Live owner count = number of owner files we can NOT acquire
non-blocking. Refcount is not used as source of truth.
* First owner records prior qemu-arm state (1 / 0 / missing) into
prior-qemu-arm-state and only writes 0 if prior was 1. Subsequent
owners join without state mutation.
* Last owner restores from the saved prior — never blindly writes 1.
If prior was already disabled or the handler was missing, leave it
alone. State divergence (admin or another tool changed qemu-arm
during the active window) emits a warning, not a fight.
Per-host scope by design — binfmt_misc is per-kernel state, so cross-host
NFS coordination would only serialise independent kernels for no benefit.
The lockdir lives under /run/lock so it is local-to-host and host-shared
(the Armbian build container bind-mounts /run from the host, satisfying
the requirement). Documented in the function header comment.
Other review items addressed:
* The 'caller sets ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes' comment was
inaccurate (the function sets it itself) — fixed.
* Function header rewritten to spell out the concurrency model and
Docker requirement explicitly.
Reported-by: codex (PR #9769 review-helper)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
…9f27 Codex review of the previous lock-pattern commit (5359f27) found three real holes: 1. trap_handler released the owner lock BEFORE taking the control lock. Between fd close and flock(control) a fresh builder could enter setup, prune+count and see live_count == 1 (only itself), then observe the now-disabled qemu-arm and write '0' as its prior_state. Our subsequent restore would read that overwritten state and never re-enable qemu-arm — turning a coordinated handoff into permanent disablement. Fix: take control.lock first, release owner UNDER control, then count and restore. The whole transition is now atomic. 2. The post-disable arch-test failure path released the owner lock manually and then called the trap handler. The handler's first guard ([[ -n ${_native_armhf_owner_fd:-} ]] || return 0) saw the already-cleared global and exited without restore — leaving qemu-arm disabled and the state file leaking. Fix: drop control.lock and let the handler do the release+count+restore under its own control lock, matching the normal cleanup path. 3. Stale active window after process kill (SIGKILL / OOM / kill -9 between echo 0 and trap firing): kernel releases the fd so the owner lock is gone, but the state file remains. The next first-owner would observe the leftover-disabled qemu-arm and record disabled as its new prior_state — clean shutdown then never re-enables. Fix: in the first-owner branch, before saving prior_state, check for an existing state file. If found and prior was '1', reconcile qemu-arm back to enabled and remove the stale file before observing our own prior_state. Crash recovery is now self-healing on the next build. Same Edit additionally cleans up the disable-failure case: when 'echo 0 > qemu-arm' itself fails (no CAP_SYS_ADMIN), no state actually changed, so leave state file and owner lock alone — the cleanup handler will observe current==prior==1 and no-op the restore. The previous code released owner outside control and rm-ed the state file, both of which introduced the same race the trap_handler change closes elsewhere. Reported-by: codex (PR #9769 lock-impl review-helper) Assisted-by: Claude:claude-opus-4.7 Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
…3d4c8cd Codex re-review of fix2 (commit 3d4c8cd) confirmed the original three races are properly closed but identified two new ones, both producing the same class of runtime failure: a parallel builder ends up running chroot exec against globally-enabled qemu-arm with no qemu binary inside its chroot. 1. Disable-failure path left a live owner. When 'echo 0 > qemu-arm' itself fails (no CAP_SYS_ADMIN), fix2 chose to leave the owner lock and the prior-state file in place, on the theory that the cleanup handler would no-op the restore at exit. But the owner stays counted as live for the entire build duration, so a concurrent builder enters its own setup, sees live_count > 1, takes the join branch, skips qemu binary deployment, and fails at chroot exec time. Fix: a failed disable now performs a full abort under the held control lock — rm state file, release owner, drop control, return 1. The aborted setup leaves no observable native window for anyone to join. 2. Join path did not verify the actual kernel state. fix2 trusted that live_count > 1 implies qemu-arm is disabled. But an external agent (admin running 'update-binfmts --enable qemu-arm', a stray service) can flip it back on during the active window. Our pre-flight arch-test then passes as a false positive — qemu interprets the ARM stub — and we skip qemu binary deployment, hitting the same runtime failure as case 1. Fix: in the join branch, observe the current qemu-arm state explicitly. If it is enabled, this is not a normal join — abort to qemu fallback with a warning naming the external mutator. This is a one-shot invariant check at join time, not a periodic re-verify (which codex previously argued against). Reported-by: codex (PR #9769 lock-impl re-review) Assisted-by: Claude:claude-opus-4.7 Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
3d4c8cd to
4998f52
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 324-331: The else branch that runs when live_count > 1 must not
assume qemu-arm is already disabled; before registering the join behavior
(trap_handler_native_armhf_restore_qemu_arm, display_alert and setting
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF), read the actual qemu-arm state (the same
state/sysfs/binfmt path used when writing "0"/"1") and verify it is "0"
(disabled); if it is not "0", fail fast (error out) rather than joining, so the
subsequent arch-test armhf and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF logic do not
incorrectly skip deploying qemu-arm-static. Ensure you reference the same state
file/variable used by the disable code path and keep the cleanup/restore handler
only for true joiners.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6b9577ae-7b84-40f0-972d-8e8b8d70ef1c
📒 Files selected for processing (1)
lib/functions/rootfs/qemu-static.sh
4998f52 to
152d4f6
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/functions/main/rootfs-image.sh (1)
19-19: ⚡ Quick winMake the non-fatal fallback explicit instead of silently swallowing setup failure.
At Line 19,
|| truepreserves behavior, but it hides when native armhf activation fails. Please wrap this in an explicitif ! ...; then display_alert ...; fiso operators can see when the build falls back to qemu-static.Suggested change
- _native_armhf_setup_binfmt_elf || true + if ! _native_armhf_setup_binfmt_elf; then + display_alert "Native armhf via binfmt_elf unavailable" "Falling back to qemu-static path" "warn" + fiBased on learnings: In the Armbian build framework, scripts run with
set -e, so masked failures should be explicitly handled and not silently ignored.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/functions/main/rootfs-image.sh` at line 19, The call to _native_armhf_setup_binfmt_elf currently uses "|| true" which silently swallows failures; replace that pattern with an explicit conditional: invoke _native_armhf_setup_binfmt_elf and if it returns non-zero run a visible fallback notification (e.g. if ! _native_armhf_setup_binfmt_elf; then display_alert "native armhf binfmt activation failed; falling back to qemu-static"; fi) so operators see when the script falls back to qemu-static; keep the existing non-fatal behavior but surface the failure via display_alert (or the project’s alert/logging helper).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/functions/main/rootfs-image.sh`:
- Line 19: The call to _native_armhf_setup_binfmt_elf currently uses "|| true"
which silently swallows failures; replace that pattern with an explicit
conditional: invoke _native_armhf_setup_binfmt_elf and if it returns non-zero
run a visible fallback notification (e.g. if ! _native_armhf_setup_binfmt_elf;
then display_alert "native armhf binfmt activation failed; falling back to
qemu-static"; fi) so operators see when the script falls back to qemu-static;
keep the existing non-fatal behavior but surface the failure via display_alert
(or the project’s alert/logging helper).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c0143a21-0438-46b0-9277-ecb11ae8e453
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/functions/rootfs/rootfs-create.sh
- lib/functions/rootfs/qemu-static.sh
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
@coderabbitai re your nitpick on
Wrapping the call in Happy to add an inline comment in |
|
Tip For best results, initiate chat on the files or code changes.
The inline comment offer sounds like a good idea — even a one-liner like ✏️ Learnings added
|
152d4f6 to
5b04967
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/functions/rootfs/qemu-static.sh (1)
527-543:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't let native
arch-testskipqemu-armregistration here.This helper is only entered when
qemu-armis missing frombinfmt_miscor/usr/share/binfmts, butarch-test armcan still pass natively on a compat-capable aarch64 host. In that case this branch returns without importing/enablingqemu-arm, even though the new flow inrootfs-create.shstill relies onmmdebstraphaving that handler available until the later_native_armhf_setup_binfmt_elf()switchover. The result is a native-capable host that never registersqemu-armfor bootstrap and can break pre-libc maintainer-script execution duringmmdebstrap.Possible fix
- # to check, we use arch-test; if will return 0 if _either_ the host can natively run armhf, or if qemu-arm is correctly working. - if arch-test arm; then - display_alert "Host can run armhf natively or emulation is correctly setup already" "no need to enable qemu-arm" "debug" - else - display_alert "arm64 host can't run armhf natively" "importing enabling qemu-arm" "debug" - cat <<- BINFMT_ARM_MAGIC > /usr/share/binfmts/qemu-arm + # If we got here, the qemu-arm binfmt registration is incomplete. Even on + # hosts that can later run armhf natively, mmdebstrap still needs qemu-arm + # registered during bootstrap; `_native_armhf_setup_binfmt_elf()` disables it + # only afterwards. + display_alert "Ensuring qemu-arm binfmt for bootstrap" "importing/enabling qemu-arm" "debug" + cat <<- BINFMT_ARM_MAGIC > /usr/share/binfmts/qemu-arm package qemu-user-static interpreter /usr/bin/qemu-arm-static magic \x7f\x45\x4c\x46\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00 offset 0 mask \xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff credentials yes fix_binary no preserve yes BINFMT_ARM_MAGIC - run_host_command_logged update-binfmts --import "qemu-${wanted_arch}" - run_host_command_logged update-binfmts --enable "qemu-${wanted_arch}" + run_host_command_logged update-binfmts --import "qemu-${wanted_arch}" + run_host_command_logged update-binfmts --enable "qemu-${wanted_arch}" - # Test again using arch-test. - display_alert "Checking if arm 32-bit emulation on arm64 works after enabling" "qemu-arm emulation" "info" - run_host_command_logged arch-test arm - display_alert "arm 32-bit emulation on arm64" "has been correctly setup" "cachehit" - fi + # Test again using arch-test. + display_alert "Checking if arm 32-bit emulation on arm64 works after enabling" "qemu-arm emulation" "info" + run_host_command_logged arch-test arm + display_alert "arm 32-bit emulation on arm64" "has been correctly setup" "cachehit"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/functions/rootfs/qemu-static.sh` around lines 527 - 543, The current early return when arch-test succeeds skips registering qemu-arm even though this helper is invoked only when binfmt/qemu handler is missing; remove the conditional that returns on arch-test (the if arch-test arm; then ... else ... fi) and always write the /usr/share/binfmts/qemu-arm blob and call run_host_command_logged update-binfmts --import "qemu-${wanted_arch}" and run_host_command_logged update-binfmts --enable "qemu-${wanted_arch}" so the qemu handler is registered regardless of native compatibility; keep the informational display_alert messages but do not use arch-test to skip the import/enable steps (refer to arch-test, display_alert, wanted_arch, and run_host_command_logged in qemu-static.sh).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 527-543: The current early return when arch-test succeeds skips
registering qemu-arm even though this helper is invoked only when binfmt/qemu
handler is missing; remove the conditional that returns on arch-test (the if
arch-test arm; then ... else ... fi) and always write the
/usr/share/binfmts/qemu-arm blob and call run_host_command_logged update-binfmts
--import "qemu-${wanted_arch}" and run_host_command_logged update-binfmts
--enable "qemu-${wanted_arch}" so the qemu handler is registered regardless of
native compatibility; keep the informational display_alert messages but do not
use arch-test to skip the import/enable steps (refer to arch-test,
display_alert, wanted_arch, and run_host_command_logged in qemu-static.sh).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3cbcc8ad-d97e-44b1-8490-ce550bcfee87
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/functions/main/rootfs-image.sh
|
@coderabbitai re your outside-diff comment on The helper is gated by line 505 in The scenario you describe (compat-capable aarch64 + If you'd like the defensive hardening regardless (always import+enable on this code path so a custom-stripped host still works), that's a reasonable separate PR — but it's pre-existing legacy code unrelated to native armhf and not introduced by this change. |
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
|
|
@igorpecovnik the fix is in |
Works now. |
5b04967 to
e51b530
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 475-486: The cleanup branch for last-owner in
_native_armhf_cleanup_last_owner currently treats a prior_state of "0" as "leave
as is" and only logs, but must actively restore that saved disabled state if the
host was re-enabled during the window; modify the last-owner branch that handles
prior_state (reference: _NATIVE_ARMHF_PRIOR_STATE and function
_native_armhf_cleanup_last_owner) to call the same restore/unregister logic used
for other prior_state values when prior_state == "0" (use
_native_armhf_observe_qemu_arm_state to check current state and run the
unregister/disable sequence for qemu-arm, and log with display_alert on success
or failure), ensuring the host binfmt_misc is reset to the saved state rather
than left changed by another tool.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a34882ba-e5e6-4af1-8f27-1ebbd4d5e868
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/functions/rootfs/rootfs-create.sh
- lib/functions/main/rootfs-image.sh
e51b530 to
fe2733b
Compare
42770de to
7b782ce
Compare
|
@coderabbitai full review please. Branch force-pushed and squashed into a single commit ( |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 329-341: The early-continue when calling
_native_armhf_setup_binfmt_elf lets the "killswitch" path
(NATIVE_ARMHF_ON_ARM64=no) bypass the later disabled-state guard and skip
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target; change the logic so the
caller only continues when _native_armhf_setup_binfmt_elf reports an actual
active-native success (explicit success exit code), and when the function
returns due to the killswitch (use a distinct exit code or status), fall through
instead of continuing so the later
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target and the disabled-state
guard still run; apply the same fix to the analogous block around
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target (lines 356-371) and
ensure _native_armhf_setup_binfmt_elf and its callers agree on exit codes
semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6912ee78-6640-4866-9e0d-f7c2e50ad718
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
…u-arm When building an armhf image on an aarch64 host that supports 32-bit ARM execution (kernel CONFIG_COMPAT, CPU has 32-bit user mode), disable the qemu-arm handler in /proc/sys/fs/binfmt_misc/ for the duration of the build. The kernel's own binfmt_elf then handles 32-bit ARM ELFs natively via CONFIG_COMPAT — typically ~12x faster than qemu-arm-static emulation on Cortex-A53/A55/A72/A73/A76. Continues #9284 (arm64-compat-vdso extension + custom_kernel_make_params hook), which enabled COMPAT_VDSO in arm64 kernels. Activation point is delayed to AFTER mmdebstrap. Its cross-arch path extracts base packages and runs their postinst hooks, which require a working qemu-arm registration to populate the chroot before libc/ld-linux-armhf.so.3 exist for binfmt_elf to fall through to. The speedup applies to all subsequent chroot operations (install_distribution_specific, customize_image, update-initramfs). Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms: never, disabled). Concurrent armbian builds on the same host kernel are coordinated purely via kernel BSD flock on the binfmt entry itself — no userspace state, no per-builder lockfiles, no /run/lock directory tree: - Each builder holds LOCK_SH on /proc/sys/fs/binfmt_misc/qemu-arm via a long-lived fd. Kernel BSD-flock counter is the refcount; the kernel releases the fd on process exit (crash-safe). - First-arrival idempotently `echo 0` to disable. Subsequent arrivals observe 0 and proceed without writing. - On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on a fresh fd, succeeds iff zero other LOCK_SH holders. Last-out re-enables qemu-arm. Trade-offs (documented): - Prior qemu-arm state is not recorded across independent builds. Last- out unconditionally re-enables. Admin's pre-existing `disabled` policy is not preserved. - No defense against an external agent toggling qemu-arm mid-build. Documented; either don't, or set NATIVE_ARMHF_ON_ARM64=no on builds that should not interfere. - Cleanup ordering invariant: BSD flock is per-OFD, so a forked subshell inheriting our SH-fd shares the same lock entry. add_cleanup_handler runs in registration order; we register after the umount/SDCARD/MOUNT teardown handlers, so by the time we run, the docker container is killed and its child-tree (with our inherited fd) gone. Documented in the trap handler's docstring; if a future refactor breaks ordering, POSIX F_SETLK or explicit descendant-kill are the documented escape hatches. Empirical (helios4 mvebu/armhf, BUILD_MINIMAL=yes BRANCH=edge RELEASE=noble): - baseline (rootfs cache-miss): 60:35 - this PR (rootfs cache-miss): 19:27 (3.12x) - this PR (rootfs cache-hit): 6:38 (~9x over expected baseline cache-hit) Concurrency verified on Hetzner CAX21 (kernel 6.8.0-90-generic): two independent armbian worktrees concurrent on same host, both enter native path, kernel SH-counter holds qemu-arm disabled across both, first- finished suppresses restore (LOCK_EX-NB blocked by other's SH), last- finished restores qemu-arm to enabled. SIGINT mid-chroot also cleans up correctly. Reviewed iteratively with codex (gpt-5.5) and coderabbitai. The earlier revision of this work coordinated via a userspace owner-flock directory (~250 lines, depended on /run/lock being host-bind-mounted into build containers, which Armbian docker does not do); replaced with the kernel- flock-on-binfmt-entry approach above. Assisted-by: Claude:claude-opus-4.7
7b782ce to
9d8b848
Compare
…ative builder probes EX-NB before disabling qemu-arm K-builder (NATIVE_ARMHF_ON_ARM64=no) now takes a SH-lock on /proc/sys/fs/binfmt_misc/qemu-arm for its lifetime instead of returning immediately. N-builder (default) probes EX-NB on a fresh fd before acquiring its own SH; if probe fails AND qemu-arm is observably enabled (state==1), a K-builder holds the emulation anchor and switching to native mid-flight would corrupt their qemu-arm-static routing — refuse with exit_with_error. Symmetric closure of the K↔N race: - N→K: K sees qemu-arm=0 → fail-fast via the existing prepare_host guard (addressed in 9d8b848). - K→N: N sees EX-NB fail + state=1 → fail-fast with 'concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock' (this commit). Verified end-to-end on Hetzner CAX21 (4-core Ampere Altra, Ubuntu 6.8): - N first, K second: K exits 8s with 'qemu-arm globally disabled by another concurrent build'. - K first, N second: N exits 12s with 'cannot enable native armhf: concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock'. - Cleanup of both K (release SH) and N (last-out restore echo 1) leaves qemu-arm in 'enabled' state. Assisted-by: Claude:claude-opus-4.7
Summary
Continues #9284 (
arm64-compat-vdsoextension +custom_kernel_make_paramshook).That PR enabled
COMPAT_VDSOin arm64 kernels; this PR uses the resulting32-bit-capable kernels to run 32-bit ARM ELF directly instead of through
qemu-armemulation in chroot work — typically ~12× faster thanqemu-arm-staticon Cortex-A53/A55/A72/A73/A76.
How
Disable
qemu-armin/proc/sys/fs/binfmt_misc/for the duration of thebuild; kernel
binfmt_elfthen handles 32-bit ARM ELFs natively viaCONFIG_COMPAT. Activation is delayed until AFTERmmdebstrap, whosecross-arch path needs a working
qemu-armregistration to populate thechroot before
libc/ld-linux-armhf.so.3exist.Killswitch:
NATIVE_ARMHF_ON_ARM64=no(synonymsnever,disabled).Concurrency — kernel
flockon the binfmt entryConcurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD
flock(2)on/proc/sys/fs/binfmt_misc/qemu-armitself.No userspace state, no per-builder lockfiles, no
/run/lockdirectory.LOCK_SHon the binfmt entry via a long-lived fd.Kernel BSD-flock counter is the refcount; kernel releases the fd on
process exit (crash-safe).
echo 0to disable; subsequent arrivalsobserve 0 and proceed without writing.
LOCK_SH; last-out detects viaLOCK_EX-LOCK_NBona fresh fd, succeeds iff zero other
LOCK_SHholders. Last-outre-enables
qemu-arm.Trade-offs
qemu-armstate is not recorded across builds. Last-outunconditionally re-enables. An admin's pre-existing
disabledpolicy is not preserved across the build window.
qemu-armmid-build.Don't, or use
NATIVE_ARMHF_ON_ARM64=noon builds that should notinterfere.
inheriting our SH-fd shares the same lock entry. We rely on
add_cleanup_handlerrunning registrations in order and the umountteardown handlers (registered first) killing the docker container
before our restore handler runs. Documented in the trap handler's
docstring; if a future refactor breaks ordering, POSIX
F_SETLKorexplicit descendant-kill are the documented escape hatches.
The earlier revision of this work coordinated via a userspace
owner-flock directory (~250 lines, depended on
/run/lockbeinghost-bind-mounted into build containers — which Armbian docker does
not do). Replaced with the kernel-flock approach above; available in
PR history if needed.
Empirical
./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes RELEASE=nobleon droid (RK3328) and m1 (RK3568, Cortex-A55):
mmdebstrapitself (~10 min of every cache-miss build) cannot beaccelerated — speedup applies to subsequent chroot operations.
Test plan
Phase 1 — feature on iron (single builder):
qemu-armon exitDOCKER_PRIVILEGED=no(CAP_SYS_ADMIN unconditional)NATIVE_ARMHF_ON_ARM64=no/never/disabled) — silent returnPREFER_DOCKER=nosmokePhase 2 — concurrency, all on Hetzner CAX21 (kernel 6.8.0-90, binfmt_misc fs):
/proc/sys/fs/binfmt_misc/qemu-armwork as expectedSH-counter holds qemu-arm disabled, first-finished suppresses restore,
last-finished restores
NATIVE_ARMHF_ON_ARM64=noopt-out with concurrent native builder —refused-fast guard rejects (avoids clobbering)
qemu-arm restored, no stale state
Reviewed with codex (gpt-5.5) over multiple rounds and coderabbitai. The
final review surfaced the cleanup-ordering invariant; documented inline
and verified by SIGINT test above.
Summary by CodeRabbit
New Features
Chores
flockon/proc/sys/fs/binfmt_misc/qemu-arm; first-arrival disables qemu-arm, last-departure restores it. Setup failures are non‑fatal so image builds continue.Summary by CodeRabbit
New Features
Bug Fixes