-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix arm64 register allocation issue for XLOAD. #438
Fix arm64 register allocation issue for XLOAD. #438
Conversation
For the arm64 implementation of asm_xload(), it is possible for the dest register selected to be the same as one of the source registers generated in the asm_fusexref() call. To prevent this, exclude the dest register from the list of allowed registers for that call. Thanks to Javier for guidance as well as his script to replicate the issue.
Could you share your testcase? |
Certainly. This was pointed out by @javierguerragiraldez
|
It doesn't seem to me that you fixed the real issue. Destination and operand registers don't necesaarily need to be different. I've taken a look at this and it seems that constant rematerialization uses one of the operands (which has the same register as destination) to rematerialize a constant. That is probably what the real issue is. Constant rematerialization shouldn't use modified registers. |
hi @sindrom91 if I follow you, in if so, can you elaborate on why |
Or should we add the fix in |
hum... shouldn't the |
I think it gets excluded because it holds data and we don't want it to get evicted and lose whatever data it holds. On the other hand, this doesn't apply to
Yes, I was referring to
This crossed my mind as well. |
@sindrom91 Thanks for that info. The modset change does seem to fix the problem, but I want to look further into whether that's the right way to fix it, or if there's a better/preferred way. |
Yes, there's a more general problem behind that. Constant rematerialization must not use other registers that contain constants, if the register is in-flight. This is an issue for many IR instructions, not just What happens here: The assembly of an IR instruction allocates a constant into a free register. Then it spills another register (due to high register pressure), which is rematerialized using the same constant (which it assumes is now in the allocated register). If the first register also happens to be the destination register, then it's modified before the rematerialization. Boom.
Note that it's quite common to use the same register for the destination and one of the sources. This is very helpful for PHIs in loops, too. And the particular problem only happens when the rematerialization can make use of the same constant as one of the sources of the IR instruction. What I'm wondering though: why doesn't this happen more often and on more architectures? There's no guard in cross-register constant rematerialization against that. Even something as basic as Sigh ... running out of time. [ |
Thanks @MikePall for that explanation. It sounds like my initial proposed solution is backwards -- instead of prohibiting rd to be used in the rematk, I should be selecting rd in such a way that it doesn't collide with the rematk. And the
That's a good question... I'm do some digging to see if I can figure that part out. I'd rather fix the root issue here than slap on a (wrong) workaround... |
Did some more digging into this issue and did some parallel gdb walkthroughs between arm64, armv7, and ppc. First of all, all three platforms generate the same/similar instruction sequence (load byte, rematerialize the string, compare the loaded-byte value), but armv7 and ppc never seem to hit the same register issue as arm64. It looks like this is due to when both of those platforms call The armv7 and ppc architectures work -- and arm64 fails -- because the In The reason armv7 and ppc both pass But arm64 -- on the other hand -- uses At that point, arm64 drops into the "else" clause at My proposal to fix this for arm64 is to modify the "else" clause at I've made a code change locally, and I'm currently testing it to make sure I didn't break anything else obvious, but figured I'd check to make sure my reasoning was correct. |
while i still don't grasp all the details, the base explanation seems solid (the last check is done only on |
Hi Javier, Testing is coming around pretty good. I ran this patch through LuaJIT-test-cleanup bench and test (with some of the bench tests expanded to hit the JIT a bit harder), and haven't seen any regressions yet. Doesn't mean they're not in there. Will run this against more scripts as I find them in an attempt to see if I can break it. |
For arm64, it's possible for both IRRefs to fail asm_isk32(), but one of them pass irref_isk(). Add a secondary check for the latter call if both asm_isk32() calls fail.
PR LuaJIT#438 describes an issue with constant rematerialization where a rematerializaed constant could use a register it thought was constant but was in fact clobbered in the same instruction that it was defined in. Fix the problem by adding a mask and a safe entry point for the allocator that marks constant holding registers that are unsafe for rematerialization due to them being the same as the destination register for an instruction it is used in. While this is only implemented in asm_fusexref, it should ideally be implemented for all instructions where the destination is likely to clobber one of the source registers and if one or more of the source registers hold constants.
I spent some time on this and created a new PR to fix this problem here: However I thought about this a bit more and it looks like your approach (the patch still needs a minor fix-up; I'll post it soon) might be the most minimal way to fix this. This problem of remat not getting things right when a const register is reused for the destination can only occur when the const register is allocated before the other operand because this is the only case when ra_rematk sees the const register. This might also explain why the problem is not seen more commonly; we just need to make sure that constants are allocated last. This won't happen when there are two const operands, nor will it happen with two variants. I'll leave my PR as is for now in case there are comments but otherwise I'll also post a small fixup for this patch. |
Even if the offset is a constant, it is not 32-bit since it failed that check earlier before it came here. The code is thus useless and hence removed. This also fixes inconsistencies with op1/op2 renaming that were introduced in PR LuaJIT#438. They were never triggered because the code path is effectively dead for arm64.
Even if the offset is a constant, it is not 32-bit since it failed that check earlier before it came here. The code is thus useless and hence removed. This also fixes inconsistencies with op1/op2 renaming that were introduced in PR LuaJIT#438. They were never triggered because the code path is effectively dead for arm64.
Fixed. Thanks! |
@MikePall @siddhesh I have spent lots of time to check, and find your merge here. |
Even if the offset is a constant, it is not 32-bit since it failed that check earlier before it came here. The code is thus useless and hence removed. This also fixes inconsistencies with op1/op2 renaming that were introduced in PR LuaJIT#438. They were never triggered because the code path is effectively dead for arm64.
For the arm64 implementation of asm_xload(), it is possible for
the dest register selected to be the same as one of the source
registers generated in the asm_fusexref() call. To prevent this,
exclude the dest register from the list of allowed registers for
that call.
Thanks to Javier for guidance as well as his script to replicate
the issue.