-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix spillslot allocation to actually reuse spillslots. #56
Conversation
The old logic, which did some linked-list rearranging to try to probe more-likely-to-be-free slots first and which was inherited straight from the original IonMonkey allocator, was slightly broken (error in translation and not in IonMonkey, to be clear): it did not get the list-splicing right, so quite often dropped a slot on the floor and failed to consider it for further reuse. On some experimentation, it seems to work just as well to keep a SmallVec of spillslot indices per size class instead, and save the last probe-point in order to spread load throughout the allocated slots while limiting the number of probes (to bound quadratic behavior). This change moves the maximum slot count from 285 to 92 in `python.wasm` from bytecodealliance/wasmtime#4214, and the maximum frame size from 2384 bytes to 752 bytes.
first_slot = spillslot_iter; | ||
} | ||
for _attempt in 0..std::cmp::min(self.slots_by_size[size].slots.len(), MAX_ATTEMPTS) { | ||
let spillslot = self.slots_by_size[size].slots[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the correctness of this [i]
oddly subtle even though it's ensured mostly by the line immediately above. It wasn't immediately obvious to me that the loop doesn't run when the slots array is empty. If you wanted to make my head hurt less, you could define a method on SpillSlotList
that returns Option<SpillSlotIndex>
and encapsulates the wrapping behavior that's currently split across several parts of this loop body. I don't think that's necessary for merging this fix, though.
Other than that: I'm convinced this patch does what you said it does, and it's nice that it deletes a bunch of code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes, it's not totally clear as-is! I went a bit further and defined an iterator; let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I woke up in the middle of the night wondering whether probe_start is updated if the iteration limit is hit. I see that in either unsuccessful case you always reset it to the newly added slot. So that seems okay, I guess?
Thank you for the extended comments, that helps a lot!
Updated slightly non-trivially, PTAL if you like! |
f9bb41e
to
f01d18d
Compare
Actually, went back to version you reviewed and then did a more minimal revision based more directly on your suggestion; the |
Pulls in an improvement to spillslot allocation (bytecodealliance/regalloc2#56).
Pulls in an improvement to spillslot allocation (bytecodealliance/regalloc2#56).
The old logic, which did some linked-list rearranging to try to probe
more-likely-to-be-free slots first and which was inherited straight from
the original IonMonkey allocator, was slightly broken (error in
translation and not in IonMonkey, to be clear): it did not get the
list-splicing right, so quite often dropped a slot on the floor and
failed to consider it for further reuse.
On some experimentation, it seems to work just as well to keep a
SmallVec of spillslot indices per size class instead, and save the last
probe-point in order to spread load throughout the allocated slots while
limiting the number of probes (to bound quadratic behavior).
This change moves the maximum slot count from 285 to 92 in
python.wasm
from bytecodealliance/wasmtime#4214, and the maximum frame size from
2384 bytes to 752 bytes.
Sightglass results with Cranelift: no differences except on SpiderMonkey
(presumably because of very large stack frames in main interpreter loop):