[codex] Reduce BusyBox boot instability around scheduler and rootfs bring-up#19
Conversation
The full-OS bring-up path has been failing in the handoff between early rootfs setup and scheduler-driven task transitions. This change set keeps the kernel, atomic helpers, and BusyBox tooling aligned with the current Linx bring-up path so the remaining regression can be debugged from a narrower post-boot window. Constraint: BusyBox full-OS regression still localizes near task-switch and return-template handoff paths Rejected: Leave the boot harness and scheduler plumbing unchanged | keeps the rootfs failure opaque and hard to reproduce Confidence: low Scope-risk: broad Directive: Re-verify finish_task_switch and return-template assumptions before further scheduler cleanup Not-tested: Fresh BusyBox full-boot rerun from a clean merged kernel tree
There was a problem hiding this comment.
Code Review
This pull request updates LinxISA documentation and toolchain profiles, seeds thread state to prevent zero link targets during task switches, and introduces several optnone / noinline workarounds for compiler/hardware bring-up issues in block, atomic64, and scheduler code. It also refactors the busybox boot script to use PTYs for more robust interaction and changes /sbin/init to a hard link. Review feedback identified dead code in the boot script and potential file descriptor leaks when spawning QEMU.
| def _drain_stdout(proc: subprocess.Popen, out_chunks: list[bytes]) -> None: | ||
| if proc.stdout is None: | ||
| return | ||
| while True: | ||
| try: | ||
| chunk = os.read(proc.stdout.fileno(), 4096) | ||
| except OSError: | ||
| return | ||
| if not chunk: | ||
| return | ||
| out_chunks.append(chunk) | ||
|
|
| master_fd, slave_fd = pty.openpty() | ||
| proc = subprocess.Popen( | ||
| cmd, | ||
| stdin=slave_fd, | ||
| stdout=slave_fd, | ||
| stderr=slave_fd, | ||
| ) | ||
| if rerun.returncode == 0: | ||
| sys.stderr.write("note: boot.py recovered after same-config retry\n") | ||
| if rerun.stdout: | ||
| sys.stdout.write(rerun.stdout) | ||
| sys.stdout.flush() | ||
| return rerun.returncode | ||
| os.close(slave_fd) |
There was a problem hiding this comment.
The file descriptors master_fd and slave_fd are opened but not protected by a try...finally block during process initialization. If subprocess.Popen fails (e.g., if the QEMU binary is not found), both descriptors will be leaked. It is recommended to use a try...finally block to ensure slave_fd is closed immediately after spawning and master_fd is closed when the function exits.
| master_fd, slave_fd = pty.openpty() | |
| proc = subprocess.Popen( | |
| cmd, | |
| stdin=slave_fd, | |
| stdout=slave_fd, | |
| stderr=slave_fd, | |
| ) | |
| if rerun.returncode == 0: | |
| sys.stderr.write("note: boot.py recovered after same-config retry\n") | |
| if rerun.stdout: | |
| sys.stdout.write(rerun.stdout) | |
| sys.stdout.flush() | |
| return rerun.returncode | |
| os.close(slave_fd) | |
| master_fd, slave_fd = pty.openpty() | |
| try: | |
| proc = subprocess.Popen( | |
| cmd, | |
| stdin=slave_fd, | |
| stdout=slave_fd, | |
| stderr=slave_fd, | |
| ) | |
| finally: | |
| os.close(slave_fd) |
What changed
Why
The full-OS bring-up still needs a narrower, more reproducible kernel/rootfs path around the current BusyBox boot regression.
Impact
This packages the current Linux-side bring-up work so the remaining full-OS blocker can be debugged from a cleaner baseline.
Validation