Skip to content

feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769

Open
iav wants to merge 2 commits intomainfrom
feat/native-armhf-on-arm64
Open

feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769
iav wants to merge 2 commits intomainfrom
feat/native-armhf-on-arm64

Conversation

@iav
Copy link
Copy Markdown
Contributor

@iav iav commented May 5, 2026

Summary

Continues #9284 (arm64-compat-vdso extension + custom_kernel_make_params hook).
That PR enabled COMPAT_VDSO in arm64 kernels; this PR uses the resulting
32-bit-capable kernels to run 32-bit ARM ELF directly instead of through
qemu-arm emulation in chroot work — typically ~12× faster than qemu-arm-static
on Cortex-A53/A55/A72/A73/A76.

How

Disable qemu-arm in /proc/sys/fs/binfmt_misc/ for the duration of the
build; kernel binfmt_elf then handles 32-bit ARM ELFs natively via
CONFIG_COMPAT. Activation is delayed until AFTER mmdebstrap, whose
cross-arch path needs a working qemu-arm registration to populate the
chroot before libc/ld-linux-armhf.so.3 exist.

Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms never, disabled).

Concurrency — kernel flock on the binfmt entry

Concurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD flock(2) on /proc/sys/fs/binfmt_misc/qemu-arm itself.
No userspace state, no per-builder lockfiles, no /run/lock directory.

  • Each builder holds LOCK_SH on the binfmt entry via a long-lived fd.
    Kernel BSD-flock counter is the refcount; kernel releases the fd on
    process exit (crash-safe).
  • First-arrival idempotently echo 0 to disable; subsequent arrivals
    observe 0 and proceed without writing.
  • On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on
    a fresh fd, succeeds iff zero other LOCK_SH holders. Last-out
    re-enables qemu-arm.

Trade-offs

  1. Prior qemu-arm state is not recorded across builds. Last-out
    unconditionally re-enables. An admin's pre-existing disabled
    policy is not preserved across the build window.
  2. No defense against an external agent toggling qemu-arm mid-build.
    Don't, or use NATIVE_ARMHF_ON_ARM64=no on builds that should not
    interfere.
  3. Cleanup ordering invariant — BSD flock is per-OFD; a forked subshell
    inheriting our SH-fd shares the same lock entry. We rely on
    add_cleanup_handler running registrations in order and the umount
    teardown handlers (registered first) killing the docker container
    before our restore handler runs. Documented in the trap handler's
    docstring; if a future refactor breaks ordering, POSIX F_SETLK or
    explicit descendant-kill are the documented escape hatches.

The earlier revision of this work coordinated via a userspace
owner-flock directory (~250 lines, depended on /run/lock being
host-bind-mounted into build containers — which Armbian docker does
not do). Replaced with the kernel-flock approach above; available in
PR history if needed.

Empirical

./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes RELEASE=noble
on droid (RK3328) and m1 (RK3568, Cortex-A55):

Scenario Real time Speedup
baseline, rootfs cache-miss 60:35
this PR, rootfs cache-miss 19:27 3.12×
this PR, rootfs cache-hit 6:38 ~9× over expected baseline cache-hit

mmdebstrap itself (~10 min of every cache-miss build) cannot be
accelerated — speedup applies to subsequent chroot operations.

Test plan

Phase 1 — feature on iron (single builder):

  • helios4 cache-miss + cache-hit, full image
  • cleanup re-enables qemu-arm on exit
  • DOCKER_PRIVILEGED=no (CAP_SYS_ADMIN unconditional)
  • killswitch (NATIVE_ARMHF_ON_ARM64=no/never/disabled) — silent return
  • x86_64 host regression — silent return
  • aarch64 native (no-container) PREFER_DOCKER=no smoke

Phase 2 — concurrency, all on Hetzner CAX21 (kernel 6.8.0-90, binfmt_misc fs):

  • flock primitives on /proc/sys/fs/binfmt_misc/qemu-arm work as expected
  • single-tree build: setup + 5 phase markers + last-out restore
  • two independent worktrees concurrent: both enter native path, kernel
    SH-counter holds qemu-arm disabled, first-finished suppresses restore,
    last-finished restores
  • NATIVE_ARMHF_ON_ARM64=no opt-out with concurrent native builder —
    refused-fast guard rejects (avoids clobbering)
  • SIGINT mid-chroot — cleanup ordering invariant holds; last-out fires,
    qemu-arm restored, no stale state

Reviewed with codex (gpt-5.5) over multiple rounds and coderabbitai. The
final review surfaced the cleanup-ordering invariant; documented inline
and verified by SIGINT test above.

Summary by CodeRabbit

  • New Features

    • Enable native 32-bit ARM (armhf) execution on ARM64 hosts during builds so chroot operations can run 32-bit ARM binaries without injecting emulation binaries.
  • Chores

    • Coordinate host binfmt/qemu management directly via kernel flock on /proc/sys/fs/binfmt_misc/qemu-arm; first-arrival disables qemu-arm, last-departure restores it. Setup failures are non‑fatal so image builds continue.

Summary by CodeRabbit

  • New Features

    • Allow native execution of 32-bit ARM binaries on compatible 64-bit hosts to speed and simplify ARM image builds.
    • Skip copying emulation binaries into chroots when native native-ARM handling is active.
  • Bug Fixes

    • Improve coordination and recovery around host binary-format handling to avoid interference between concurrent builds and ignore non-fatal setup failures.

Review Change Stack

@iav iav requested a review from a team as a code owner May 5, 2026 18:15
@iav iav requested review from PanderMusubi and rpardini and removed request for a team May 5, 2026 18:15
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5895d07a-8282-4e43-ab2e-a07211ead009

📥 Commits

Reviewing files that changed from the base of the PR and between 9d8b848 and 5cab79e.

📒 Files selected for processing (1)
  • lib/functions/rootfs/qemu-static.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/functions/rootfs/qemu-static.sh

📝 Walkthrough

Walkthrough

Adds coordinated host binfmt_misc handling to prefer kernel-native 32-bit ARM ELF on aarch64 hosts, conditional qemu deploy/undeploy behavior, and invokes the setup during rootfs creation and image build; setup failures are ignored.

Changes

Native armhf via binfmt_elf Mode

Layer / File(s) Summary
Core Observation
lib/functions/rootfs/qemu-static.sh
Adds _native_armhf_observe_qemu_arm_state() to report qemu-arm binfmt_misc presence and enabled/disabled state.
Core Setup & Coordination
lib/functions/rootfs/qemu-static.sh
Adds _native_armhf_setup_binfmt_elf() that acquires a control flock, may disable qemu-arm, runs arch-test armhf, and exports ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes.
Cleanup & Owner Restore
lib/functions/rootfs/qemu-static.sh
Adds trap_handler_native_armhf_restore_qemu_arm() that restores qemu-arm only when the last owner exits; otherwise it releases ownership.
Killswitch Release Handler
lib/functions/rootfs/qemu-static.sh
Adds trap_handler_native_armhf_release_emul_lock() that releases the shared emulation lock fd without restoring binfmt state.
qemu Deploy/Undeploy Wiring
lib/functions/rootfs/qemu-static.sh
deploy_qemu_binary_to_chroot() skips copying qemu when ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes; undeploy_qemu_binary_from_chroot() treats missing chroot qemu as a no-op only in native mode.
Prepare Guards & Verification
lib/functions/rootfs/qemu-static.sh
Skip enabling qemu-arm for non-armhf targets on aarch64 hosts; attempt native-armhf setup early for aarch64→armhf; add preflight guard refusing to re-enable an observably-disabled registration; change verification to arch-test armhf and tweak heredoc formatting.
Rootfs Creation Integration
lib/functions/rootfs/rootfs-create.sh
Calls `_native_armhf_setup_binfmt_elf
Image Build Integration
lib/functions/main/rootfs-image.sh
Calls `_native_armhf_setup_binfmt_elf

Sequence Diagram(s)

sequenceDiagram
    participant Build as Build Flow
    participant Setup as _native_armhf_setup_binfmt_elf()
    participant Binfmt as /proc/sys/fs/binfmt_misc
    participant Trap as Cleanup Handler
    participant Deploy as deploy_qemu_binary_to_chroot()

    Build->>Setup: invoke (post-rootfs extraction)
    Setup->>Binfmt: read qemu-arm entry/state
    alt qemu-arm enabled
        Setup->>Binfmt: write disable qemu-arm
        Setup->>Trap: register restore handler
        Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
    else qemu-arm absent_or_disabled
        Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
    end
    Build->>Deploy: proceed to deploy step
    Deploy->>Deploy: check ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF
    alt native mode set
        Deploy-->>Build: early return (skip qemu copy)
    else
        Deploy->>Build: copy qemu-arm-static into chroot
    end
    Build-->>Trap: on exit/signal
    Trap->>Binfmt: write restore qemu-arm state
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled at binfmt's quiet seam,
Turned qemu off so armhf could dream,
Flag raised, trap set neat,
Chroots wake on native feet,
Builds hop forward, light and trim.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: enabling native armhf execution on aarch64 hosts by disabling qemu-arm binfmt_misc, which is the primary feature introduced across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/native-armhf-on-arm64

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added size/medium PR with more then 50 and less then 250 lines 05 Milestone: Second quarter release Needs review Seeking for review Framework Framework components labels May 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early return in the qemu-static check (when
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF == "yes") prevents the matching undeploy
from running later, leaking host qemu into cached rootfs; instead of returning
from the function, set a clear marker variable (e.g.,
ARMBIAN_QEMU_DEPLOY_SKIPPED=yes or ARMBIAN_QEMU_DEPLOYED=no) after calling
display_alert and allow the function to exit normally so the caller
(rootfs-create.sh) can always run the undeploy path and check that marker;
update the undeploy logic to look for that same marker
(ARMBIAN_QEMU_DEPLOY_SKIPPED or ARMBIAN_QEMU_DEPLOYED) so it only skips removal
when deploy truly never ran.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eba7fdae-710a-42e6-bcb9-0d41993767df

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce7dec and 502287f.

📒 Files selected for processing (4)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
  • lib/library-functions.sh

Comment thread lib/functions/rootfs/qemu-static.sh
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 502287f to 6f7903b Compare May 5, 2026 18:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
lib/functions/rootfs/qemu-static.sh (1)

20-25: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Deploy/undeploy asymmetry can leak host qemu-arm-static into the rootfs cache.

The flag check is symmetric inside this file, but at the call sites (rootfs-create.sh, rootfs-image.sh per the summary), deploy_qemu_binary_to_chroot runs before _native_armhf_setup_binfmt_elf flips ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes, while undeploy_qemu_binary_from_chroot runs after. In that ordering the binary gets copied into the chroot, but the matching removal is suppressed by the new early-return at Line 56. The host qemu-arm-static then persists into the rootfs cache and into final armhf images on subsequent cache-hit builds.

Track per-(caller, chroot_target) deployment state and only skip undeploy when deploy actually skipped, rather than gating both on the flag alone.

🛠️ Suggested fix
+declare -gA armbian_qemu_deployed_to_chroot=()
+
 function deploy_qemu_binary_to_chroot() {
 	declare chroot_target="${1}" caller="${2}"
@@
 	if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
 		display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info"
 		return 0
 	fi
@@
 	display_alert "Deploying qemu-user-static binary to chroot" "${QEMU_BINARY} during ${caller}" "info"
 	run_host_command_logged cp -pv "${src_host}" "${dst_target}"
+	armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]=yes
 
 	return 0
 }
@@
 function undeploy_qemu_binary_from_chroot() {
@@
-	if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
+	if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" \
+		&& "${armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]:-no}" != "yes" ]]; then
 		display_alert "Native armhf via binfmt_elf" "no qemu binary to remove during ${caller}" "debug"
 		return 0
 	fi

Also applies to: 55-59

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/rootfs/qemu-static.sh` around lines 20 - 25, The early-return
based solely on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in
deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry
across separate invocations; change the logic so that when
deploy_qemu_binary_to_chroot skips copying due to the flag it records that
decision with a per-(caller,chroot_target) marker (e.g., a temp file named with
${caller} and the chroot target inside the rootfs cache or a shared temp dir),
and then have undeploy_qemu_binary_from_chroot check that marker and only skip
removal when the deploy previously skipped for that same (caller,chroot_target);
ensure you still clean up the marker when an actual copy was performed and when
removal runs so state cannot leak into later builds (references: functions
deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 169-188: Move the add_cleanup_handler call so it runs before
writing "0" to /proc/sys/fs/binfmt_misc/qemu-arm (register the trap before the
destructive action) and update trap_handler_native_armhf_restore_qemu_arm to not
early-return based on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF; instead key the
cleanup on the actual binfmt_misc state (check [[ -e
/proc/sys/fs/binfmt_misc/qemu-arm ]] and attempt echo 1 >
/proc/sys/fs/binfmt_misc/qemu-arm || true), so a partially completed setup or an
interrupt between the disable and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF assignment
will still re-enable qemu-arm.

---

Duplicate comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early-return based solely on
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in
deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry
across separate invocations; change the logic so that when
deploy_qemu_binary_to_chroot skips copying due to the flag it records that
decision with a per-(caller,chroot_target) marker (e.g., a temp file named with
${caller} and the chroot target inside the rootfs cache or a shared temp dir),
and then have undeploy_qemu_binary_from_chroot check that marker and only skip
removal when the deploy previously skipped for that same (caller,chroot_target);
ensure you still clean up the marker when an actual copy was performed and when
removal runs so state cannot leak into later builds (references: functions
deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e697b537-3deb-4c61-94a1-02ef9c76d299

📥 Commits

Reviewing files that changed from the base of the PR and between 502287f and 6f7903b.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/rootfs-create.sh

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
@iav iav force-pushed the feat/native-armhf-on-arm64 branch 2 times, most recently from eab2ab1 to d990278 Compare May 5, 2026 19:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/functions/rootfs/qemu-static.sh (1)

131-134: 💤 Low value

Optional: case-insensitive killswitch matching.

NATIVE_ARMHF_ON_ARM64=NO (or Disabled, False, etc.) would slip past the killswitch and proceed to detection. This is a common ergonomic gotcha when users export env vars from CI configs. Low priority since the function self-aborts on the subsequent guards anyway, but a one-liner makes the contract explicit.

♻️ Optional refactor
-	# Killswitch — accept no/never/disabled as synonyms; bail before any detection runs.
-	case "${NATIVE_ARMHF_ON_ARM64:-auto}" in
-		no | never | disabled) return 1 ;;
-	esac
+	# Killswitch — accept no/never/disabled (any case) as synonyms; bail before any detection runs.
+	case "${NATIVE_ARMHF_ON_ARM64:-auto,,}" in
+		no | never | disabled) return 1 ;;
+	esac
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/rootfs/qemu-static.sh` around lines 131 - 134, The killswitch
case for NATIVE_ARMHF_ON_ARM64 is currently case-sensitive and will miss values
like "NO" or "Disabled"; update the case statement around the
NATIVE_ARMHF_ON_ARM64 check to perform case-insensitive matching (e.g., convert
the value to lowercase with parameter expansion like
"${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the patterns to bracketed
forms such as
[Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee]
to accept common synonyms) so that the return 1 kill path in that branch
reliably triggers for uppercase/mixed-case inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 131-134: The killswitch case for NATIVE_ARMHF_ON_ARM64 is
currently case-sensitive and will miss values like "NO" or "Disabled"; update
the case statement around the NATIVE_ARMHF_ON_ARM64 check to perform
case-insensitive matching (e.g., convert the value to lowercase with parameter
expansion like "${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the
patterns to bracketed forms such as
[Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee]
to accept common synonyms) so that the return 1 kill path in that branch
reliably triggers for uppercase/mixed-case inputs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff5855c0-1c00-4b4b-8c4e-dc619c12d9a0

📥 Commits

Reviewing files that changed from the base of the PR and between 6f7903b and d990278.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/rootfs-create.sh

@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 5, 2026

Re: nitpick on qemu-static.sh:131-134 (case-insensitive killswitch matching) — declined. Project convention for boolean toggles is literal lowercase comparison ([[ "${VAR}" == "yes" ]], [[ "${VAR:-no}" == "yes" ]]). Examples: cli-artifact.sh:56 (ARTIFACT_USE_CACHE), utils-cli.sh:247 (PREFER_DOCKER:-yes), bsp/armbian-bsp-cli-deb.sh:239 (SHOW_DEBUG). No ${var,,} normalization on boolean flags anywhere in lib/functions/. Keeping the killswitch in line with that convention.

@iav iav force-pushed the feat/native-armhf-on-arm64 branch from d990278 to 15789ee Compare May 5, 2026 19:42
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 5, 2026

Who use arm64 builders — try it!

@igorpecovnik
Copy link
Copy Markdown
Member

aarch64 host (CIX). Always fails:

./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes KERNEL_CONFIGURE=no RELEASE=noble SHARE=yes

[🐳|🌱] git: Fetch from remote completed, rev-parsing... [ 'mmdebstrap-debian-devel' 'main' 'FETCH_HEAD' ]
[🐳|🌱] mmdebstrap version [ '1.5.7 (2025-04-08)' for /armbian/cache/sources/mmdebstrap-debian-devel/mmdebstrap ]
[🐳|🌱] Installing base system with 5 packages [ Stage 1/1 ]
[🐳|🌱] fetch_distro_keyring(noble) [ cache found, skipping ]
[🐳|🌱] Deploying qemu-user-static binary to chroot [ qemu-arm-static during rootfs ]
[🐳|🔨]   '/usr/bin/qemu-arm-static' -> '/armbian/.tmp/rootfs-09e5dde9-e782-4c83-a704-7d2314fd45c0/usr/bin/qemu-arm-static'
[🐳|🔨]   I: automatically chosen mode: root
[🐳|🔨]   I: armhf cannot be executed natively, but transparently using qemu-user binfmt emulation
[🐳|🔨]   I: automatically chosen format: directory
[🐳|🔨]   I: skipping check/empty as requested
[🐳|🔨]   I: running special hook: copy-in /armbian/cache/keyrings/ubuntu/usr /armbian/cache/keyrings/ubuntu/etc /
[🐳|🔨]   I: running --setup-hook in shell: sh -c 'mkdir -p /armbian/cache/aptcache/noble-armhf/archives "$1"/var/cache/apt/archives/' exec /armbian/.tmp/rootfs-09e5dde9-e782-4c83-a704-7d2314fd45c0
[🐳|🔨]   I: running special hook: sync-in /armbian/cache/aptcache/noble-armhf/archives /var/cache/apt/archives/
[🐳|🔨]   I: running apt-get update...
[🐳|🔨]   I: downloading packages with apt...
[🐳|🔨]   I: extracting archives...
[🐳|🔨]   I: installing essential packages...
[🐳|🔨]   I: installing remaining packages inside the chroot...
[🐳|🔨]   I: running special hook: sync-out /var/cache/apt/archives/ /armbian/cache/aptcache/noble-armhf/archives
[🐳|🔨]   I: cleaning package lists and apt cache...
[🐳|🔨]   I: success in 338.3367 seconds
[🐳|🌱] Native armhf via binfmt_elf [ kernel 6.18.8-current-arm64, aarch64 host with COMPAT_VDSO; qemu-arm disabled, kernel binfmt_elf takes over ]
[🐳|🌱] Cleaning up after mmdebstrap [ mmdebstrap cleanup ]
[🐳|🌱] Diverting [ initctl/start-stop-daemon ]
[🐳|🔨]   chroot: failed to run command '/usr/bin/env': Exec format error

@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 15789ee to df7aaf9 Compare May 5, 2026 22:07
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 5, 2026

@igorpecovnik thanks for the report.

Root cause: arch-test armhf returns 0 on your CIX host because qemu-arm in binfmt_misc interprets its tiny test stub — false positive on hosts whose kernel was built without COMPAT_VDSO. After my code disabled qemu-arm, the kernel tried to execute armhf ELF natively and failed with Exec format error because the host kernel doesn't have the compat-vdso layer.

Fix: arch-test moved to AFTER the binfmt_misc disable, so it reflects what the chroot exec will actually face. On detection failure we restore qemu-arm immediately and fall back to qemu-static emulation transparently.

After this fix lands on your builder, your build will work as before — slowly, via qemu-arm-static, no regression. To get the speedup, build your builder host's kernel with the arm64-compat-vdso extension enabled (extensions/arm64-compat-vdso, PR #9284); then native armhf via binfmt_elf will activate automatically.

iav added a commit that referenced this pull request May 7, 2026
…lel builds

PR review by codex (gpt-5.5) flagged the missing synchronisation around the
shared /proc/sys/fs/binfmt_misc/qemu-arm flag as the main blocker for shipping
NATIVE_ARMHF_ON_ARM64 default-on. The previous refcount-less code disabled
qemu-arm globally in setup and unconditionally re-enabled it in cleanup —
two parallel builds on the same host could:

  * have build A disable qemu-arm and register restore;
  * have build B observe 'already disabled', set
    ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes and skip copying
    qemu-arm-static into its own chroot;
  * have build A finish, cleanup blindly write 1 → qemu-arm enabled;
  * have build B continue with no qemu binary in chroot but
    qemu-arm globally registered — runtime failure.

Replace with a per-host owner-flock pattern (codex-recommended; refcount/
pidlist breaks under PID namespaces and PID reuse, kernel fd-release on
process exit gives crash safety for free):

  * /run/lock/armbian-native-armhf/owners/<ARMBIAN_BUILD_UUID>.lock —
    one file per builder, long-lived flock held for the whole native-mode
    window. Closing the fd (process exit, normal or crash) releases the
    lock; another builder probing it with non-blocking flock detects the
    death and prunes the file.
  * control.lock — short-lived flock around prune+count+state-write so
    first/last-owner decisions stay consistent under concurrency.
  * Live owner count = number of owner files we can NOT acquire
    non-blocking. Refcount is not used as source of truth.
  * First owner records prior qemu-arm state (1 / 0 / missing) into
    prior-qemu-arm-state and only writes 0 if prior was 1. Subsequent
    owners join without state mutation.
  * Last owner restores from the saved prior — never blindly writes 1.
    If prior was already disabled or the handler was missing, leave it
    alone. State divergence (admin or another tool changed qemu-arm
    during the active window) emits a warning, not a fight.

Per-host scope by design — binfmt_misc is per-kernel state, so cross-host
NFS coordination would only serialise independent kernels for no benefit.
The lockdir lives under /run/lock so it is local-to-host and host-shared
(the Armbian build container bind-mounts /run from the host, satisfying
the requirement). Documented in the function header comment.

Other review items addressed:

  * The 'caller sets ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes' comment was
    inaccurate (the function sets it itself) — fixed.
  * Function header rewritten to spell out the concurrency model and
    Docker requirement explicitly.

Reported-by: codex (PR #9769 review-helper)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
iav added a commit that referenced this pull request May 7, 2026
…9f27

Codex review of the previous lock-pattern commit (5359f27) found three
real holes:

1. trap_handler released the owner lock BEFORE taking the control lock.
   Between fd close and flock(control) a fresh builder could enter setup,
   prune+count and see live_count == 1 (only itself), then observe the
   now-disabled qemu-arm and write '0' as its prior_state. Our subsequent
   restore would read that overwritten state and never re-enable
   qemu-arm — turning a coordinated handoff into permanent disablement.
   Fix: take control.lock first, release owner UNDER control, then count
   and restore. The whole transition is now atomic.

2. The post-disable arch-test failure path released the owner lock
   manually and then called the trap handler. The handler's first
   guard ([[ -n ${_native_armhf_owner_fd:-} ]] || return 0) saw the
   already-cleared global and exited without restore — leaving qemu-arm
   disabled and the state file leaking. Fix: drop control.lock and let
   the handler do the release+count+restore under its own control lock,
   matching the normal cleanup path.

3. Stale active window after process kill (SIGKILL / OOM / kill -9
   between echo 0 and trap firing): kernel releases the fd so the
   owner lock is gone, but the state file remains. The next first-owner
   would observe the leftover-disabled qemu-arm and record disabled as
   its new prior_state — clean shutdown then never re-enables. Fix: in
   the first-owner branch, before saving prior_state, check for an
   existing state file. If found and prior was '1', reconcile qemu-arm
   back to enabled and remove the stale file before observing our own
   prior_state. Crash recovery is now self-healing on the next build.

Same Edit additionally cleans up the disable-failure case: when
'echo 0 > qemu-arm' itself fails (no CAP_SYS_ADMIN), no state actually
changed, so leave state file and owner lock alone — the cleanup handler
will observe current==prior==1 and no-op the restore. The previous code
released owner outside control and rm-ed the state file, both of which
introduced the same race the trap_handler change closes elsewhere.

Reported-by: codex (PR #9769 lock-impl review-helper)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
@github-actions github-actions Bot added size/large PR with 250 lines or more and removed size/medium PR with more then 50 and less then 250 lines labels May 7, 2026
iav added a commit that referenced this pull request May 7, 2026
…3d4c8cd

Codex re-review of fix2 (commit 3d4c8cd) confirmed the original three
races are properly closed but identified two new ones, both producing
the same class of runtime failure: a parallel builder ends up running
chroot exec against globally-enabled qemu-arm with no qemu binary
inside its chroot.

1. Disable-failure path left a live owner. When 'echo 0 > qemu-arm'
   itself fails (no CAP_SYS_ADMIN), fix2 chose to leave the owner lock
   and the prior-state file in place, on the theory that the cleanup
   handler would no-op the restore at exit. But the owner stays
   counted as live for the entire build duration, so a concurrent
   builder enters its own setup, sees live_count > 1, takes the join
   branch, skips qemu binary deployment, and fails at chroot exec
   time. Fix: a failed disable now performs a full abort under the
   held control lock — rm state file, release owner, drop control,
   return 1. The aborted setup leaves no observable native window for
   anyone to join.

2. Join path did not verify the actual kernel state. fix2 trusted that
   live_count > 1 implies qemu-arm is disabled. But an external agent
   (admin running 'update-binfmts --enable qemu-arm', a stray service)
   can flip it back on during the active window. Our pre-flight
   arch-test then passes as a false positive — qemu interprets the
   ARM stub — and we skip qemu binary deployment, hitting the same
   runtime failure as case 1. Fix: in the join branch, observe the
   current qemu-arm state explicitly. If it is enabled, this is not
   a normal join — abort to qemu fallback with a warning naming the
   external mutator. This is a one-shot invariant check at join time,
   not a periodic re-verify (which codex previously argued against).

Reported-by: codex (PR #9769 lock-impl re-review)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 3d4c8cd to 4998f52 Compare May 7, 2026 17:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 324-331: The else branch that runs when live_count > 1 must not
assume qemu-arm is already disabled; before registering the join behavior
(trap_handler_native_armhf_restore_qemu_arm, display_alert and setting
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF), read the actual qemu-arm state (the same
state/sysfs/binfmt path used when writing "0"/"1") and verify it is "0"
(disabled); if it is not "0", fail fast (error out) rather than joining, so the
subsequent arch-test armhf and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF logic do not
incorrectly skip deploying qemu-arm-static. Ensure you reference the same state
file/variable used by the disable code path and keep the cleanup/restore handler
only for true joiners.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b9577ae-7b84-40f0-972d-8e8b8d70ef1c

📥 Commits

Reviewing files that changed from the base of the PR and between df7aaf9 and 3d4c8cd.

📒 Files selected for processing (1)
  • lib/functions/rootfs/qemu-static.sh

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 4998f52 to 152d4f6 Compare May 7, 2026 17:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/functions/main/rootfs-image.sh (1)

19-19: ⚡ Quick win

Make the non-fatal fallback explicit instead of silently swallowing setup failure.

At Line 19, || true preserves behavior, but it hides when native armhf activation fails. Please wrap this in an explicit if ! ...; then display_alert ...; fi so operators can see when the build falls back to qemu-static.

Suggested change
-	_native_armhf_setup_binfmt_elf || true
+	if ! _native_armhf_setup_binfmt_elf; then
+		display_alert "Native armhf via binfmt_elf unavailable" "Falling back to qemu-static path" "warn"
+	fi

Based on learnings: In the Armbian build framework, scripts run with set -e, so masked failures should be explicitly handled and not silently ignored.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/main/rootfs-image.sh` at line 19, The call to
_native_armhf_setup_binfmt_elf currently uses "|| true" which silently swallows
failures; replace that pattern with an explicit conditional: invoke
_native_armhf_setup_binfmt_elf and if it returns non-zero run a visible fallback
notification (e.g. if ! _native_armhf_setup_binfmt_elf; then display_alert
"native armhf binfmt activation failed; falling back to qemu-static"; fi) so
operators see when the script falls back to qemu-static; keep the existing
non-fatal behavior but surface the failure via display_alert (or the project’s
alert/logging helper).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/functions/main/rootfs-image.sh`:
- Line 19: The call to _native_armhf_setup_binfmt_elf currently uses "|| true"
which silently swallows failures; replace that pattern with an explicit
conditional: invoke _native_armhf_setup_binfmt_elf and if it returns non-zero
run a visible fallback notification (e.g. if ! _native_armhf_setup_binfmt_elf;
then display_alert "native armhf binfmt activation failed; falling back to
qemu-static"; fi) so operators see when the script falls back to qemu-static;
keep the existing non-fatal behavior but surface the failure via display_alert
(or the project’s alert/logging helper).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c0143a21-0438-46b0-9277-ecb11ae8e453

📥 Commits

Reviewing files that changed from the base of the PR and between 4998f52 and 152d4f6.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/functions/rootfs/rootfs-create.sh
  • lib/functions/rootfs/qemu-static.sh

@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 7, 2026

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 7, 2026

@coderabbitai re your nitpick on lib/functions/main/rootfs-image.sh:19 — declining as won't-fix.

_native_armhf_setup_binfmt_elf already emits a specific display_alert on every failure path it has: arch-test pre-flight failed, qemu-arm not registered, qemu-arm already disabled by another agent, cannot create lockdir, could not acquire control lock, could not disable qemu-arm (no CAP_SYS_ADMIN?), post-disable verification failed, qemu-arm observably enabled (external agent re-enabled it), owner lock for uuid … already held. Each names the actual reason and prints falling back to qemu-arm-static emulation.

Wrapping the call in if ! _; then display_alert "fallback"; fi would emit a second, generic message every time, with strictly less information than the inner one. The || true here is intentional non-fatality, not a masked failure — the function never silently returns 1; it always logs the specific cause first.

Happy to add an inline comment in rootfs-image.sh documenting this if it would help readers — but the duplicate alert itself I'd rather skip.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Tip

For best results, initiate chat on the files or code changes.

@iav, that's a well-reasoned justification — thank you for walking through it. Since every failure path inside _native_armhf_setup_binfmt_elf already emits a specific, actionable display_alert before returning 1, a wrapper alert would only add noise without adding signal. The || true as intentional non-fatality with self-documenting internals makes sense.

The inline comment offer sounds like a good idea — even a one-liner like # failures are self-logged inside; || true is intentional non-fatality would help future readers understand the pattern at a glance without needing to dive into the function. Happy to leave it at that.


✏️ Learnings added
Learnt from: iav
URL: https://github.com/armbian/build/pull/9769

Timestamp: 2026-05-07T19:12:36.962Z
Learning: In the Armbian build system (`lib/functions/`), `_native_armhf_setup_binfmt_elf || true` is an intentional non-fatal call pattern. The function emits a specific `display_alert` on every internal failure path (e.g., "arch-test pre-flight failed", "qemu-arm not registered", "post-disable verification failed", etc.) before returning 1, so wrapping it in an explicit `if ! _; then display_alert ...; fi` would produce a duplicate, less-informative alert. The `|| true` must not be flagged as a masked failure.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 152d4f6 to 5b04967 Compare May 7, 2026 20:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/functions/rootfs/qemu-static.sh (1)

527-543: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let native arch-test skip qemu-arm registration here.

This helper is only entered when qemu-arm is missing from binfmt_misc or /usr/share/binfmts, but arch-test arm can still pass natively on a compat-capable aarch64 host. In that case this branch returns without importing/enabling qemu-arm, even though the new flow in rootfs-create.sh still relies on mmdebstrap having that handler available until the later _native_armhf_setup_binfmt_elf() switchover. The result is a native-capable host that never registers qemu-arm for bootstrap and can break pre-libc maintainer-script execution during mmdebstrap.

Possible fix
-	# to check, we use arch-test; if will return 0 if _either_ the host can natively run armhf, or if qemu-arm is correctly working.
-	if arch-test arm; then
-		display_alert "Host can run armhf natively or emulation is correctly setup already" "no need to enable qemu-arm" "debug"
-	else
-		display_alert "arm64 host can't run armhf natively" "importing enabling qemu-arm" "debug"
-		cat <<- BINFMT_ARM_MAGIC > /usr/share/binfmts/qemu-arm
+	# If we got here, the qemu-arm binfmt registration is incomplete. Even on
+	# hosts that can later run armhf natively, mmdebstrap still needs qemu-arm
+	# registered during bootstrap; `_native_armhf_setup_binfmt_elf()` disables it
+	# only afterwards.
+	display_alert "Ensuring qemu-arm binfmt for bootstrap" "importing/enabling qemu-arm" "debug"
+	cat <<- BINFMT_ARM_MAGIC > /usr/share/binfmts/qemu-arm
 			package qemu-user-static
 			interpreter /usr/bin/qemu-arm-static
 			magic \x7f\x45\x4c\x46\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00
 			offset 0
 			mask \xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff
 			credentials yes
 			fix_binary no
 			preserve yes
 		BINFMT_ARM_MAGIC
-		run_host_command_logged update-binfmts --import "qemu-${wanted_arch}"
-		run_host_command_logged update-binfmts --enable "qemu-${wanted_arch}"
+	run_host_command_logged update-binfmts --import "qemu-${wanted_arch}"
+	run_host_command_logged update-binfmts --enable "qemu-${wanted_arch}"

-		# Test again using arch-test.
-		display_alert "Checking if arm 32-bit emulation on arm64 works after enabling" "qemu-arm emulation" "info"
-		run_host_command_logged arch-test arm
-		display_alert "arm 32-bit emulation on arm64" "has been correctly setup" "cachehit"
-	fi
+	# Test again using arch-test.
+	display_alert "Checking if arm 32-bit emulation on arm64 works after enabling" "qemu-arm emulation" "info"
+	run_host_command_logged arch-test arm
+	display_alert "arm 32-bit emulation on arm64" "has been correctly setup" "cachehit"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/rootfs/qemu-static.sh` around lines 527 - 543, The current
early return when arch-test succeeds skips registering qemu-arm even though this
helper is invoked only when binfmt/qemu handler is missing; remove the
conditional that returns on arch-test (the if arch-test arm; then ... else ...
fi) and always write the /usr/share/binfmts/qemu-arm blob and call
run_host_command_logged update-binfmts --import "qemu-${wanted_arch}" and
run_host_command_logged update-binfmts --enable "qemu-${wanted_arch}" so the
qemu handler is registered regardless of native compatibility; keep the
informational display_alert messages but do not use arch-test to skip the
import/enable steps (refer to arch-test, display_alert, wanted_arch, and
run_host_command_logged in qemu-static.sh).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 527-543: The current early return when arch-test succeeds skips
registering qemu-arm even though this helper is invoked only when binfmt/qemu
handler is missing; remove the conditional that returns on arch-test (the if
arch-test arm; then ... else ... fi) and always write the
/usr/share/binfmts/qemu-arm blob and call run_host_command_logged update-binfmts
--import "qemu-${wanted_arch}" and run_host_command_logged update-binfmts
--enable "qemu-${wanted_arch}" so the qemu handler is registered regardless of
native compatibility; keep the informational display_alert messages but do not
use arch-test to skip the import/enable steps (refer to arch-test,
display_alert, wanted_arch, and run_host_command_logged in qemu-static.sh).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3cbcc8ad-d97e-44b1-8490-ce550bcfee87

📥 Commits

Reviewing files that changed from the base of the PR and between 152d4f6 and 5b04967.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/functions/main/rootfs-image.sh

@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 7, 2026

@coderabbitai re your outside-diff comment on qemu-static.sh:527-543 (prepare_host_binfmt_qemu_cross_arm64_host_armhf_target) — declining as not-applicable in normal flow.

The helper is gated by line 505 in prepare_host_binfmt_qemu_cross: it's only invoked when /proc/sys/fs/binfmt_misc/qemu-arm or /usr/share/binfmts/qemu-arm is missing. Armbian's prepare_host.sh apt-installs qemu-user-static + binfmt-support as part of host preparation; the qemu-user-static postinst creates /usr/share/binfmts/qemu-arm and runs update-binfmts --enable qemu-arm. So in any normal armbian build environment, both files exist before this helper would even be considered.

The scenario you describe (compat-capable aarch64 + arch-test arm passes natively + qemu-arm not registered) requires the operator to have manually removed qemu-user-static or its binfmts file post-install. In that case mmdebstrap's failure mode is "host doesn't have qemu", which is a host-prep issue independent of this PR.

If you'd like the defensive hardening regardless (always import+enable on this code path so a custom-stripped host still works), that's a reasonable separate PR — but it's pre-existing legacy code unrelated to native armhf and not introduced by this change.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Tip

For best results, initiate chat on the files or code changes.

@iav, understood — the gating condition at line 505 makes the concern moot for any normally-prepared Armbian host, and the broken scenario you describe is a host-prep issue that predates this PR entirely. Agreed that defensive hardening there, if ever desired, belongs in a separate PR. Thanks for the thorough explanation.


✏️ Learnings added
Learnt from: iav
URL: https://github.com/armbian/build/pull/9769

Timestamp: 2026-05-07T20:24:41.014Z
Learning: In the Armbian build system (`lib/functions/rootfs/qemu-static.sh`), `prepare_host_binfmt_qemu_cross_arm64_host_armhf_target` is only invoked when `/proc/sys/fs/binfmt_misc/qemu-arm` OR `/usr/share/binfmts/qemu-arm` is missing (gated at line 505 of `prepare_host_binfmt_qemu_cross`). In any normal Armbian build environment, `prepare_host.sh` apt-installs `qemu-user-static` + `binfmt-support`, whose postinst creates `/usr/share/binfmts/qemu-arm` and runs `update-binfmts --enable qemu-arm`, so both files exist before this helper is ever reached. A scenario where compat-capable aarch64 + arch-test passes natively + qemu-arm is not registered requires the operator to have manually removed qemu-user-static, which is a host-prep issue unrelated to the native armhf feature. Do not flag missing import/enable hardening in this helper as a concern for this PR.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 8, 2026

@igorpecovnik the fix is in 5b049678. On your CIX host (kernel without COMPAT_VDSO) the post-disable arch-test armhf will now correctly fail, qemu-arm gets restored, and the build falls back to qemu-static — same path as before this PR, no Exec format error. Whenever you get a chance, a quick re-run of the same command would confirm the regression is gone on your side.

@igorpecovnik
Copy link
Copy Markdown
Member

Whenever you get a chance, a quick re-run of the same command would confirm the regression is gone on your side.

Works now.

@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 5b04967 to e51b530 Compare May 8, 2026 21:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 475-486: The cleanup branch for last-owner in
_native_armhf_cleanup_last_owner currently treats a prior_state of "0" as "leave
as is" and only logs, but must actively restore that saved disabled state if the
host was re-enabled during the window; modify the last-owner branch that handles
prior_state (reference: _NATIVE_ARMHF_PRIOR_STATE and function
_native_armhf_cleanup_last_owner) to call the same restore/unregister logic used
for other prior_state values when prior_state == "0" (use
_native_armhf_observe_qemu_arm_state to check current state and run the
unregister/disable sequence for qemu-arm, and log with display_alert on success
or failure), ensuring the host binfmt_misc is reset to the saved state rather
than left changed by another tool.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a34882ba-e5e6-4af1-8f27-1ebbd4d5e868

📥 Commits

Reviewing files that changed from the base of the PR and between 5b04967 and e51b530.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/functions/rootfs/rootfs-create.sh
  • lib/functions/main/rootfs-image.sh

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
@iav iav marked this pull request as draft May 9, 2026 03:44
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from e51b530 to fe2733b Compare May 10, 2026 00:43
@iav iav marked this pull request as ready for review May 10, 2026 00:57
@iav iav marked this pull request as draft May 10, 2026 01:18
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 42770de to 7b782ce Compare May 10, 2026 01:45
@github-actions github-actions Bot added size/medium PR with more then 50 and less then 250 lines and removed size/large PR with 250 lines or more labels May 10, 2026
@iav iav marked this pull request as ready for review May 10, 2026 01:54
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 10, 2026

@coderabbitai full review please.

Branch force-pushed and squashed into a single commit (7b782ceac). The userspace owner-flock coordination layer (~250 lines) is replaced with kernel flock(2) directly on /proc/sys/fs/binfmt_misc/qemu-arm. Earlier review threads on prior revisions are obsolete; please look at the current revision fresh.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 10, 2026

@iav, understood — I'll do a fresh full review of the current squashed commit (7b782ceac), treating the kernel flock(2)-on-/proc/sys/fs/binfmt_misc/qemu-arm approach as the canonical implementation and ignoring all prior revision threads.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 329-341: The early-continue when calling
_native_armhf_setup_binfmt_elf lets the "killswitch" path
(NATIVE_ARMHF_ON_ARM64=no) bypass the later disabled-state guard and skip
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target; change the logic so the
caller only continues when _native_armhf_setup_binfmt_elf reports an actual
active-native success (explicit success exit code), and when the function
returns due to the killswitch (use a distinct exit code or status), fall through
instead of continuing so the later
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target and the disabled-state
guard still run; apply the same fix to the analogous block around
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target (lines 356-371) and
ensure _native_armhf_setup_binfmt_elf and its callers agree on exit codes
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6912ee78-6640-4866-9e0d-f7c2e50ad718

📥 Commits

Reviewing files that changed from the base of the PR and between 38c583f and 7b782ce.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh

Comment thread lib/functions/rootfs/qemu-static.sh
…u-arm

When building an armhf image on an aarch64 host that supports 32-bit ARM
execution (kernel CONFIG_COMPAT, CPU has 32-bit user mode), disable the
qemu-arm handler in /proc/sys/fs/binfmt_misc/ for the duration of the
build. The kernel's own binfmt_elf then handles 32-bit ARM ELFs natively
via CONFIG_COMPAT — typically ~12x faster than qemu-arm-static emulation
on Cortex-A53/A55/A72/A73/A76.

Continues #9284 (arm64-compat-vdso extension + custom_kernel_make_params
hook), which enabled COMPAT_VDSO in arm64 kernels.

Activation point is delayed to AFTER mmdebstrap. Its cross-arch path
extracts base packages and runs their postinst hooks, which require a
working qemu-arm registration to populate the chroot before
libc/ld-linux-armhf.so.3 exist for binfmt_elf to fall through to.
The speedup applies to all subsequent chroot operations
(install_distribution_specific, customize_image, update-initramfs).

Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms: never, disabled).

Concurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD flock on the binfmt entry itself — no userspace state, no
per-builder lockfiles, no /run/lock directory tree:

- Each builder holds LOCK_SH on /proc/sys/fs/binfmt_misc/qemu-arm via a
  long-lived fd. Kernel BSD-flock counter is the refcount; the kernel
  releases the fd on process exit (crash-safe).
- First-arrival idempotently `echo 0` to disable. Subsequent arrivals
  observe 0 and proceed without writing.
- On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on a
  fresh fd, succeeds iff zero other LOCK_SH holders. Last-out re-enables
  qemu-arm.

Trade-offs (documented):
- Prior qemu-arm state is not recorded across independent builds. Last-
  out unconditionally re-enables. Admin's pre-existing `disabled` policy
  is not preserved.
- No defense against an external agent toggling qemu-arm mid-build.
  Documented; either don't, or set NATIVE_ARMHF_ON_ARM64=no on builds
  that should not interfere.
- Cleanup ordering invariant: BSD flock is per-OFD, so a forked subshell
  inheriting our SH-fd shares the same lock entry. add_cleanup_handler
  runs in registration order; we register after the umount/SDCARD/MOUNT
  teardown handlers, so by the time we run, the docker container is
  killed and its child-tree (with our inherited fd) gone. Documented in
  the trap handler's docstring; if a future refactor breaks ordering,
  POSIX F_SETLK or explicit descendant-kill are the documented escape
  hatches.

Empirical (helios4 mvebu/armhf, BUILD_MINIMAL=yes BRANCH=edge RELEASE=noble):
- baseline (rootfs cache-miss): 60:35
- this PR (rootfs cache-miss): 19:27 (3.12x)
- this PR (rootfs cache-hit): 6:38 (~9x over expected baseline cache-hit)

Concurrency verified on Hetzner CAX21 (kernel 6.8.0-90-generic): two
independent armbian worktrees concurrent on same host, both enter native
path, kernel SH-counter holds qemu-arm disabled across both, first-
finished suppresses restore (LOCK_EX-NB blocked by other's SH), last-
finished restores qemu-arm to enabled. SIGINT mid-chroot also cleans up
correctly.

Reviewed iteratively with codex (gpt-5.5) and coderabbitai. The earlier
revision of this work coordinated via a userspace owner-flock directory
(~250 lines, depended on /run/lock being host-bind-mounted into build
containers, which Armbian docker does not do); replaced with the kernel-
flock-on-binfmt-entry approach above.

Assisted-by: Claude:claude-opus-4.7
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 7b782ce to 9d8b848 Compare May 10, 2026 03:00
…ative builder probes EX-NB before disabling qemu-arm

K-builder (NATIVE_ARMHF_ON_ARM64=no) now takes a SH-lock on
/proc/sys/fs/binfmt_misc/qemu-arm for its lifetime instead of returning
immediately. N-builder (default) probes EX-NB on a fresh fd before
acquiring its own SH; if probe fails AND qemu-arm is observably enabled
(state==1), a K-builder holds the emulation anchor and switching to
native mid-flight would corrupt their qemu-arm-static routing — refuse
with exit_with_error.

Symmetric closure of the K↔N race:
- N→K: K sees qemu-arm=0 → fail-fast via the existing prepare_host
  guard (addressed in 9d8b848).
- K→N: N sees EX-NB fail + state=1 → fail-fast with 'concurrent build
  with NATIVE_ARMHF_ON_ARM64=no holds emulation lock' (this commit).

Verified end-to-end on Hetzner CAX21 (4-core Ampere Altra, Ubuntu 6.8):
- N first, K second: K exits 8s with 'qemu-arm globally disabled by
  another concurrent build'.
- K first, N second: N exits 12s with 'cannot enable native armhf:
  concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock'.
- Cleanup of both K (release SH) and N (last-out restore echo 1) leaves
  qemu-arm in 'enabled' state.

Assisted-by: Claude:claude-opus-4.7
@github-actions github-actions Bot added size/large PR with 250 lines or more and removed size/medium PR with more then 50 and less then 250 lines labels May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

05 Milestone: Second quarter release Framework Framework components Needs review Seeking for review size/large PR with 250 lines or more

Development

Successfully merging this pull request may close these issues.

2 participants