Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GenerateShuffleArray to support cyclic shuffles #26169

Merged
merged 2 commits into from Aug 15, 2019

Conversation

@janvorli
Copy link
Member

commented Aug 14, 2019

The GenerateShuffleArray was not handling case when there was a cycle in
the register / stack slots shuffle and it resulted in an infinite loop
in this function. This issue is Unix Amd64 ABI specific.
To fix that, this change reworks the algorithm completely. Besides
fixing the issue, it has also better performance in some cases.
To fix the cyclic shuffling, I needed an extra helper register. However,
there was no available general purpose register available, so I had to
use xmm8 for this purpose and implement code emission for the movq
instruction.

Close #26054

@janvorli janvorli added the area-VM label Aug 14, 2019
@janvorli janvorli added this to the 5.0 milestone Aug 14, 2019
@janvorli janvorli requested review from jkotas and jakobbotsch Aug 14, 2019
@janvorli janvorli self-assigned this Aug 14, 2019
Copy link
Member

left a comment

LGTM.

As part of this change you can also get rid of the following hashset in the ABI stress test:

// https://github.com/dotnet/coreclr/issues/26054
private static readonly HashSet<int> s_pinvokeInfiniteLoops = new HashSet<int>
{
56, 113, 173, 611, 734, 863, 897, 960
};

janvorli added 2 commits Aug 13, 2019
The GenerateShuffleArray was not handling case when there was a cycle in
the register / stack slots shuffle and it resulted in an infinite loop
in this function. This issue is Unix Amd64 ABI specific.
To fix that, this change reworks the algorithm completely. Besides
fixing the issue, it has also better performance in some cases.
To fix the cyclic shuffling, I needed an extra helper register. However,
there was no available general purpose register available, so I had to
use xmm8 for this purpose.
@janvorli janvorli force-pushed the janvorli:fix-generate-shuffle-array branch from 7938021 to e689c67 Aug 14, 2019
@janvorli

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

@jakobbotsch I've updated the PR to update the ABI stress as you've asked.

@jkotas

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

I wish we just use JIT to generate code for these. This is getting so complex that there are guaranteed to be bugs.

@jkotas
jkotas approved these changes Aug 15, 2019
@janvorli janvorli merged commit 94b27e2 into dotnet:master Aug 15, 2019
38 checks passed
38 checks passed
WIP Ready for review
Details
coreclr-ci Build #20190814.18 had test failures
Details
coreclr-ci (Build Linux arm checked) Build Linux arm checked succeeded
Details
coreclr-ci (Build Linux arm64 checked) Build Linux arm64 checked succeeded
Details
coreclr-ci (Build Linux arm64 release) Build Linux arm64 release succeeded
Details
coreclr-ci (Build Linux_musl x64 checked) Build Linux_musl x64 checked succeeded
Details
coreclr-ci (Build Linux_musl x64 release) Build Linux_musl x64 release succeeded
Details
coreclr-ci (Build Linux_rhel6 x64 release) Build Linux_rhel6 x64 release succeeded
Details
coreclr-ci (Build Windows_NT arm Checked) Build Windows_NT arm Checked succeeded
Details
coreclr-ci (Build Windows_NT arm64 Checked) Build Windows_NT arm64 Checked succeeded
Details
coreclr-ci (Build Windows_NT x64 Checked) Build Windows_NT x64 Checked succeeded
Details
coreclr-ci (Build Windows_NT x64 Debug) Build Windows_NT x64 Debug succeeded
Details
coreclr-ci (Build Windows_NT x86 Checked) Build Windows_NT x86 Checked succeeded
Details
coreclr-ci (Build Windows_NT x86 Debug) Build Windows_NT x86 Debug succeeded
Details
coreclr-ci (Formatting Linux x64) Formatting Linux x64 succeeded
Details
coreclr-ci (Test Pri0 CoreFX Linux x64 checked) Test Pri0 CoreFX Linux x64 checked succeeded
Details
coreclr-ci (Test Pri0 CoreFX Windows_NT x64 checked) Test Pri0 CoreFX Windows_NT x64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux x64 checked) Test Pri0 Linux x64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux_musl x64 release) Test Pri0 Linux_musl x64 release succeeded
Details
coreclr-ci (Test Pri0 OSX x64 checked) Test Pri0 OSX x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R Linux x64 checked) Test Pri0 R2R Linux x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R OSX x64 checked) Test Pri0 R2R OSX x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R Windows_NT x64 checked) Test Pri0 R2R Windows_NT x64 checked succeeded
Details
coreclr-ci (Test Pri0 R2R Windows_NT x86 checked) Test Pri0 R2R Windows_NT x86 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT arm checked) Test Pri0 Windows_NT arm checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT arm64 checked) Test Pri0 Windows_NT arm64 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT x64 checked) Test Pri0 Windows_NT x64 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT x86 checked) Test Pri0 Windows_NT x86 checked succeeded
Details
coreclr-ci (Test crossgen-comparison Linux arm checked) Test crossgen-comparison Linux arm checked succeeded
Details
coreclr-ci (Test pri0 Linux arm checked) Test pri0 Linux arm checked succeeded
Details
coreclr-ci (Test pri0 Linux arm64 checked) Test pri0 Linux arm64 checked succeeded
Details
coreclr-ci (Test pri0 Linux_musl x64 checked) Test pri0 Linux_musl x64 checked succeeded
Details
coreclr-ci (build Linux x64 Checked) build Linux x64 Checked succeeded
Details
coreclr-ci (build OSX x64 Checked) build OSX x64 Checked succeeded
Details
coreclr-ci (build Windows_NT arm Release) build Windows_NT arm Release succeeded
Details
coreclr-ci (build Windows_NT arm64 Release) build Windows_NT arm64 Release succeeded
Details
coreclr-ci (build Windows_NT x64 Release) build Windows_NT x64 Release succeeded
Details
license/cla All CLA requirements met.
Details
@mattwarren

This comment has been minimized.

Copy link
Collaborator

commented Aug 15, 2019

I wish we just use JIT to generate code for these. This is getting so complex that there are guaranteed to be bugs.

For a future blog post, I've been digging into all the places that the CoreCLR uses stubs, turns out there's quite a few!!

I did wonder why so many stubs/thunks are hand-written assembly, is there a reason? My best guess what that the JIT is tailored towards converting IL to assembly, so the type of code that's needed in stubs isn't what the JIT is suited for or designed to do, is that right? Or is it historical reasons, i.e. the stubs were being developed at the same time as the JIT was, so it made sense to have them hand-written rather that waiting for the JIT to be able to create them?

(I know that there has been an initiative to move stubs to IL ('FEATURE_STUBS_AS_IL'), but I assume that there are several types of stubs that do things that can't be expressed in IL (moving/writing regsiters), so they would always have to be do in raw assembly or emitted by this JIT.)

BTW, slightly related question, is there a technical difference between 'stubs' and 'thunks'? It seems that thunks (i.e. 'shuffle thunks') are longer, more complex, whilst stubs are simpler, is that difference?

@jkotas

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

why so many stubs/thunks are hand-written assembly

Do you mean static ones (e.g. the ones in *.S files) or dynamic ones (e.g. the shuffle thunks)?

Static: It is always possible to add JIT intrinsic that produces a specific machine instruction sequence. JIT intrinsic is much more expensive to implement than writing a few lines in .S file. If everything else (e.g. performance) is equal, static assembly helpers are strongly preferred over teaching JIT new tricks.

Dynamic: Most of these exist for historic reasons. .NET Framework 1.0 run on x86 only. Hand-emitting x86 instruction is easy and straightforward, so that is what was done originally. As you have mentioned, IL and JIT being in flux were likely contributing factors too. When porting to a new platform, it is typically easier and cheaper to port the hand-emitted assembly code to the new platform. I had to spend a lot of energy on convincing folks to go an extra mile and implement stubs-as-il variant for the stubs with highest maintenance costs to make the ports cheaper and less buggy.

Note that CoreRT has implementation for all dynamically generated stubs via stubs-as-il (including the shuffle thunk that this issue is about). The techniques required to make stubs-as-il possible everywhere were developed in Redhawk and ProjectN. You can think about stubs-as-il as porting of the goodness from CoreRT into mainstream .NET Core.

difference between 'stubs' and 'thunks'

I do not think there is a clear pattern for how these names are used in CoreCLR. They are used interchangeably for the most part. "stub" is the more popular name.

BTW: Wikipedia has a description for stub and thunk.

@janvorli janvorli deleted the janvorli:fix-generate-shuffle-array branch Aug 15, 2019
@mattwarren

This comment has been minimized.

Copy link
Collaborator

commented Aug 16, 2019

@jkotas thanks (as always) for the useful info, I'd not quite fully appreciated the difference between the 'static' and 'dynamic' assembly code, makes sense now!

I wish we just use JIT to generate code for these. This is getting so complex that there are guaranteed to be bugs.

Going back to your original message. If that was to happen, would this mean that all the StubLinkerCPU::XXX(..) code would be replaced by code in the JIT that does it instead? Am I right in thinking that at the moment none of the 'Dynamic' assembly goes via the JIT? (except for STUBS_AS_IL which is compiled by it)

@jkotas

This comment has been minimized.

Copy link
Member

commented Aug 17, 2019

It did not mean to suggest that we hard-code knowledge of shuffle thunks into the JIT. I meant that it would be nice to switch these to the STUB_AS_IL plan so that the JIT deals with the register allocation and calling convention corner cases.

@mattwarren

This comment has been minimized.

Copy link
Collaborator

commented Aug 20, 2019

@jkotas, makes sense, thanks again for the info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.