Fix GenerateShuffleArray to support cyclic shuffles#26169
Fix GenerateShuffleArray to support cyclic shuffles#26169janvorli merged 2 commits intodotnet:masterfrom
Conversation
jakobbotsch
left a comment
There was a problem hiding this comment.
LGTM.
As part of this change you can also get rid of the following hashset in the ABI stress test:
coreclr/tests/src/JIT/Stress/ABI/Program.cs
Lines 125 to 129 in fdb2bc4
The GenerateShuffleArray was not handling case when there was a cycle in the register / stack slots shuffle and it resulted in an infinite loop in this function. This issue is Unix Amd64 ABI specific. To fix that, this change reworks the algorithm completely. Besides fixing the issue, it has also better performance in some cases. To fix the cyclic shuffling, I needed an extra helper register. However, there was no available general purpose register available, so I had to use xmm8 for this purpose.
7938021 to
e689c67
Compare
|
@jakobbotsch I've updated the PR to update the ABI stress as you've asked. |
|
I wish we just use JIT to generate code for these. This is getting so complex that there are guaranteed to be bugs. |
For a future blog post, I've been digging into all the places that the CoreCLR uses stubs, turns out there's quite a few!! I did wonder why so many stubs/thunks are hand-written assembly, is there a reason? My best guess what that the JIT is tailored towards converting IL to assembly, so the type of code that's needed in stubs isn't what the JIT is suited for or designed to do, is that right? Or is it historical reasons, i.e. the stubs were being developed at the same time as the JIT was, so it made sense to have them hand-written rather that waiting for the JIT to be able to create them? (I know that there has been an initiative to move stubs to IL ('FEATURE_STUBS_AS_IL'), but I assume that there are several types of stubs that do things that can't be expressed in IL (moving/writing regsiters), so they would always have to be do in raw assembly or emitted by this JIT.) BTW, slightly related question, is there a technical difference between 'stubs' and 'thunks'? It seems that thunks (i.e. 'shuffle thunks') are longer, more complex, whilst stubs are simpler, is that difference? |
Do you mean static ones (e.g. the ones in *.S files) or dynamic ones (e.g. the shuffle thunks)? Static: It is always possible to add JIT intrinsic that produces a specific machine instruction sequence. JIT intrinsic is much more expensive to implement than writing a few lines in .S file. If everything else (e.g. performance) is equal, static assembly helpers are strongly preferred over teaching JIT new tricks. Dynamic: Most of these exist for historic reasons. .NET Framework 1.0 run on x86 only. Hand-emitting x86 instruction is easy and straightforward, so that is what was done originally. As you have mentioned, IL and JIT being in flux were likely contributing factors too. When porting to a new platform, it is typically easier and cheaper to port the hand-emitted assembly code to the new platform. I had to spend a lot of energy on convincing folks to go an extra mile and implement stubs-as-il variant for the stubs with highest maintenance costs to make the ports cheaper and less buggy. Note that CoreRT has implementation for all dynamically generated stubs via stubs-as-il (including the shuffle thunk that this issue is about). The techniques required to make stubs-as-il possible everywhere were developed in Redhawk and ProjectN. You can think about stubs-as-il as porting of the goodness from CoreRT into mainstream .NET Core.
I do not think there is a clear pattern for how these names are used in CoreCLR. They are used interchangeably for the most part. "stub" is the more popular name. |
|
@jkotas thanks (as always) for the useful info, I'd not quite fully appreciated the difference between the 'static' and 'dynamic' assembly code, makes sense now!
Going back to your original message. If that was to happen, would this mean that all the |
|
It did not mean to suggest that we hard-code knowledge of shuffle thunks into the JIT. I meant that it would be nice to switch these to the |
|
@jkotas, makes sense, thanks again for the info |
…6169) * Fix GenerateShuffleArray to support cyclic shuffles The GenerateShuffleArray was not handling case when there was a cycle in the register / stack slots shuffle and it resulted in an infinite loop in this function. This issue is Unix Amd64 ABI specific. To fix that, this change reworks the algorithm completely. Besides fixing the issue, it has also better performance in some cases. To fix the cyclic shuffling, I needed an extra helper register. However, there was no available general purpose register available, so I had to use xmm8 for this purpose. * Remove special handling of the hang from ABI stress Commit migrated from dotnet/coreclr@94b27e2
The GenerateShuffleArray was not handling case when there was a cycle in
the register / stack slots shuffle and it resulted in an infinite loop
in this function. This issue is Unix Amd64 ABI specific.
To fix that, this change reworks the algorithm completely. Besides
fixing the issue, it has also better performance in some cases.
To fix the cyclic shuffling, I needed an extra helper register. However,
there was no available general purpose register available, so I had to
use xmm8 for this purpose and implement code emission for the movq
instruction.
Close #26054