Skip to content

cc: fold byte-immediate store + movzx reload through local#446

Merged
bboe merged 1 commit into
mainfrom
bboe/cc-uint8-deref-local-peephole
May 20, 2026
Merged

cc: fold byte-immediate store + movzx reload through local#446
bboe merged 1 commit into
mainfrom
bboe/cc-uint8-deref-local-peephole

Conversation

@bboe
Copy link
Copy Markdown
Owner

@bboe bboe commented May 20, 2026

Summary

The bitfield-struct + kernel_outb driver idiom

struct cr c = { .stop = 1, .rd = 4, .page = 1 };
kernel_outb(0x300, *(uint8_t *)&c);

was emitting a useless trip through a frame slot:

mov byte [ebp-1], 18      ; designated-init const-fold
movzx eax, byte [ebp-1]   ; *(uint8_t *)&c

A new peephole walks forward from each mov byte [bp-N], <imm> store, scanning past instructions that don't touch [bp-N], and folds the matching movzx <reg>, byte [bp-N] into mov <reg>, <imm>. Scan bails on writes to [bp-N], control-flow (ret/jmp/jcc/call), or labels — the rewrite is provably safe within a basic block.

Extending peephole_dead_temp_slots to recognise mov byte [bp-N], ... as a store lets it then reclaim the unreferenced byte store.

Impact

Kernel kasm: -217 bytes across the build. Folds fire in ata, fdc, ne2k, ps2, sb16, opl3, console — every driver that uses the bitfield-struct + kernel_outb pattern. Partially claws back the +88-byte regression from PR #434 (the bitfield-struct port for 8237 DMA + FDC DOR) per the binary-reductions backlog.

Test plan

  • New unit: tests/unit/test_cc_codegen.py::test_peephole_fold_byte_immediate_through_local (456 unit tests pass)
  • tests/test_cc_casts.py::test_byte_dereference_after_cast updated to use a runtime value so it keeps exercising the original AddressOf(Var) shortcut; the const-fold path is covered by the new unit test
  • Full matrix locally: test_asm (42), test_programs bbfs (89), test_programs ext2 (119), test_bboefs (6), test_cc_bits (110), test_cc_bitfields (12), test_cc_casts (6), test_cc_local_structs (10), test_cc_compatibility (57), test_kernel_archive (12), test_archive (12), test_pipeline_* (8), test_draw, test_floppy_boot

🤖 Generated with Claude Code

A bitfield-struct local whose only use is ``*(uint8_t *)&local`` (the
driver port-I/O idiom) was emitting:

    sub esp, 5
    mov byte [ebp-1], 18      ; designated-init const-fold
    movzx eax, byte [ebp-1]   ; *(uint8_t *)&c
    push eax
    mov edx, 768
    pop eax
    out dx, al

The movzx provably reads the value just stored, so a direct ``mov
eax, <imm>`` is equivalent.  The new
``peephole_fold_byte_immediate_through_local`` walks forward from
each ``mov byte [bp-N], <imm>`` store, scanning past instructions
that don't touch ``[bp-N]`` (sibling designated-init stores, the
rest of ``kernel_outb``'s codegen), and folds the matching
``movzx <reg>, byte [bp-N]`` reload into ``mov <reg>, <imm>``.
The scan stops at any write to ``[bp-N]``, control flow (``ret``,
``jmp``, ``jcc``, ``call``), or label.

Extending ``peephole_dead_temp_slots`` to recognise
``mov byte [bp-N], ...`` as a store (it previously only matched
``mov [bp-N], <reg>`` width-less stores) lets it then reclaim the
unreferenced byte store.

Real-kernel impact: 217-byte reduction across the kernel kasm build,
folds firing in ata / fdc / ne2k / ps2 / sb16 / opl3 / console.

``tests/test_cc_casts.py::test_byte_dereference_after_cast`` was
using a const-fold struct literal whose load is now folded by the
peephole; switch it to a runtime-sourced value so the byte-load
idiom remains present (the const-immediate fold path is covered
by the new unit test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bboe bboe merged commit 16527b1 into main May 20, 2026
27 checks passed
@bboe bboe deleted the bboe/cc-uint8-deref-local-peephole branch May 20, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant