cc: fold byte-immediate store + movzx reload through local#446
Merged
Conversation
A bitfield-struct local whose only use is ``*(uint8_t *)&local`` (the
driver port-I/O idiom) was emitting:
sub esp, 5
mov byte [ebp-1], 18 ; designated-init const-fold
movzx eax, byte [ebp-1] ; *(uint8_t *)&c
push eax
mov edx, 768
pop eax
out dx, al
The movzx provably reads the value just stored, so a direct ``mov
eax, <imm>`` is equivalent. The new
``peephole_fold_byte_immediate_through_local`` walks forward from
each ``mov byte [bp-N], <imm>`` store, scanning past instructions
that don't touch ``[bp-N]`` (sibling designated-init stores, the
rest of ``kernel_outb``'s codegen), and folds the matching
``movzx <reg>, byte [bp-N]`` reload into ``mov <reg>, <imm>``.
The scan stops at any write to ``[bp-N]``, control flow (``ret``,
``jmp``, ``jcc``, ``call``), or label.
Extending ``peephole_dead_temp_slots`` to recognise
``mov byte [bp-N], ...`` as a store (it previously only matched
``mov [bp-N], <reg>`` width-less stores) lets it then reclaim the
unreferenced byte store.
Real-kernel impact: 217-byte reduction across the kernel kasm build,
folds firing in ata / fdc / ne2k / ps2 / sb16 / opl3 / console.
``tests/test_cc_casts.py::test_byte_dereference_after_cast`` was
using a const-fold struct literal whose load is now folded by the
peephole; switch it to a runtime-sourced value so the byte-load
idiom remains present (the const-immediate fold path is covered
by the new unit test).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The bitfield-struct +
kernel_outbdriver idiomwas emitting a useless trip through a frame slot:
A new peephole walks forward from each
mov byte [bp-N], <imm>store, scanning past instructions that don't touch[bp-N], and folds the matchingmovzx <reg>, byte [bp-N]intomov <reg>, <imm>. Scan bails on writes to[bp-N], control-flow (ret/jmp/jcc/call), or labels — the rewrite is provably safe within a basic block.Extending
peephole_dead_temp_slotsto recognisemov byte [bp-N], ...as a store lets it then reclaim the unreferenced byte store.Impact
Kernel kasm: -217 bytes across the build. Folds fire in ata, fdc, ne2k, ps2, sb16, opl3, console — every driver that uses the bitfield-struct +
kernel_outbpattern. Partially claws back the +88-byte regression from PR #434 (the bitfield-struct port for 8237 DMA + FDC DOR) per the binary-reductions backlog.Test plan
tests/unit/test_cc_codegen.py::test_peephole_fold_byte_immediate_through_local(456 unit tests pass)tests/test_cc_casts.py::test_byte_dereference_after_castupdated to use a runtime value so it keeps exercising the originalAddressOf(Var)shortcut; the const-fold path is covered by the new unit test🤖 Generated with Claude Code