Skip to content

[codex] Reduce BusyBox boot instability around scheduler and rootfs bring-up#19

Merged
zhoubot merged 1 commit into
mainfrom
codex/linux-busybox-recovery
May 16, 2026
Merged

[codex] Reduce BusyBox boot instability around scheduler and rootfs bring-up#19
zhoubot merged 1 commit into
mainfrom
codex/linux-busybox-recovery

Conversation

@zhoubot

@zhoubot zhoubot commented May 16, 2026

Copy link
Copy Markdown
Collaborator

What changed

  • update the Linx kernel/task-handoff path around scheduler and atomic bring-up
  • refresh BusyBox rootfs tooling and initramfs glue used by the bring-up flow
  • document the associated ABI notes in the kernel tree

Why

The full-OS bring-up still needs a narrower, more reproducible kernel/rootfs path around the current BusyBox boot regression.

Impact

This packages the current Linux-side bring-up work so the remaining full-OS blocker can be debugged from a cleaner baseline.

Validation

  • no fresh clean-tree BusyBox full-boot rerun in this publish turn

The full-OS bring-up path has been failing in the handoff between early rootfs
setup and scheduler-driven task transitions. This change set keeps the kernel,
atomic helpers, and BusyBox tooling aligned with the current Linx bring-up path
so the remaining regression can be debugged from a narrower post-boot window.

Constraint: BusyBox full-OS regression still localizes near task-switch and return-template handoff paths
Rejected: Leave the boot harness and scheduler plumbing unchanged | keeps the rootfs failure opaque and hard to reproduce
Confidence: low
Scope-risk: broad
Directive: Re-verify finish_task_switch and return-template assumptions before further scheduler cleanup
Not-tested: Fresh BusyBox full-boot rerun from a clean merged kernel tree
@zhoubot zhoubot merged commit 3875990 into main May 16, 2026
@zhoubot zhoubot deleted the codex/linux-busybox-recovery branch May 16, 2026 06:47

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates LinxISA documentation and toolchain profiles, seeds thread state to prevent zero link targets during task switches, and introduces several optnone / noinline workarounds for compiler/hardware bring-up issues in block, atomic64, and scheduler code. It also refactors the busybox boot script to use PTYs for more robust interaction and changes /sbin/init to a hard link. Review feedback identified dead code in the boot script and potential file descriptor leaks when spawning QEMU.

Comment on lines +39 to +50
def _drain_stdout(proc: subprocess.Popen, out_chunks: list[bytes]) -> None:
if proc.stdout is None:
return
while True:
try:
chunk = os.read(proc.stdout.fileno(), 4096)
except OSError:
return
if not chunk:
return
out_chunks.append(chunk)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function _drain_stdout is defined but never called within the script. Since _run_once handles reading from the PTY master file descriptor directly, this helper appears to be dead code and should be removed to maintain code cleanliness.

Comment on lines +53 to +60
master_fd, slave_fd = pty.openpty()
proc = subprocess.Popen(
cmd,
stdin=slave_fd,
stdout=slave_fd,
stderr=slave_fd,
)
if rerun.returncode == 0:
sys.stderr.write("note: boot.py recovered after same-config retry\n")
if rerun.stdout:
sys.stdout.write(rerun.stdout)
sys.stdout.flush()
return rerun.returncode
os.close(slave_fd)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file descriptors master_fd and slave_fd are opened but not protected by a try...finally block during process initialization. If subprocess.Popen fails (e.g., if the QEMU binary is not found), both descriptors will be leaked. It is recommended to use a try...finally block to ensure slave_fd is closed immediately after spawning and master_fd is closed when the function exits.

Suggested change
master_fd, slave_fd = pty.openpty()
proc = subprocess.Popen(
cmd,
stdin=slave_fd,
stdout=slave_fd,
stderr=slave_fd,
)
if rerun.returncode == 0:
sys.stderr.write("note: boot.py recovered after same-config retry\n")
if rerun.stdout:
sys.stdout.write(rerun.stdout)
sys.stdout.flush()
return rerun.returncode
os.close(slave_fd)
master_fd, slave_fd = pty.openpty()
try:
proc = subprocess.Popen(
cmd,
stdin=slave_fd,
stdout=slave_fd,
stderr=slave_fd,
)
finally:
os.close(slave_fd)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant