Add support in crossgen2 for 32-byte alignment #32602

MichalStrehovsky · 2020-02-20T15:23:46Z

What's in this pull request won't take effect in crossgen2 right now because crossgen2 doesn't set the Tier-1 flag. This pull request is more of a discussion point that will either see another commit, or will be closed.

32-byte alignment was added in #2249, affecting Tier-1 code only. One of the other pull requests that are open right now was discussing compCodeOpt and that made me realize this might be worth revisiting: I wanted to switch the check in JIT from emitComp->opts.jitFlags->IsSet(JitFlags::JIT_FLAG_TIER1) to compCodeOpt() == FAST_CODE (so that this takes effect for crossgen2 as well), only to find out that compCodeOpt is currently hardcoded to return BLENDED.

Do we have an issue tracking what we want to do with compCodeOpt? There are customers who rely solely on R2R and giving them a knob that says "give me fastest code possible" might be useful for them. Crossgen2 has that knob, it just seems like it doesn't do anything.

Alternatively, we could turn on the Tier-1 flag in crossgen2 when the user requests optimizing for time.

Cc @dotnet/crossgen-contrib @AndyAyersMS

MichalStrehovsky · 2020-02-21T10:01:00Z

Cc @mjsabby since the steady state throughput achievable with R2R is in his area of interest.

mjsabby · 2020-02-21T11:09:28Z

Thanks @MichalStrehovsky. Sounds good.

What is the reason for not doing Tier1 code in Crossgen2 by default?

MichalStrehovsky · 2020-02-21T11:46:16Z

What is the reason for not doing Tier1 code in Crossgen2 by default?

From looking at how the Tier1 flag is used within RyuJIT, it seems like up until the 32byte alignment pull request, setting the flag made no difference for the generated code (it does try to enable FAST_CODE but since compCodeOpt() is hardcoded to return BLENDED, it might not amount to anything meaningful).

Crossgen2 could set it (not sure if it would be good to set it by default - the default still tries to balance size and speed), but it's more of a question of how JIT_FLAG_TIER1 would be different from JIT_FLAG_SPEED_OPT when crossgenning - it feels like specifying TIER1 in crossgen is a bit of a lie, but on the other hand SPEED_OPT doesn't do anything in RyuJIT.

jkotas · 2020-02-21T18:02:49Z

it feels like specifying TIER1 in crossgen is a bit of a lie

Right, overloading JIT/EE interface flags like this always ends up poorly. TIER1 flag should be set only when we are running as a JIT and producing TIER1 code.

on the other hand SPEED_OPT doesn't do anything in RyuJIT

That's something we may want to fix.

AndyAyersMS · 2020-02-21T18:59:17Z

This hits on some broader issues:

We don't have an overall strategy for what the jit should do differently for FAST_CODE (SPEED_OPT) versus BLENDED_CODE. One can presumably up some limits and inline more aggressively, but without care, all this will do is create bigger methods, slower compiles, and a bigger test matrix.
The methods in the jit that control optimization need some revision; I've started in on this a few times because I was looking at enabling some opts at TIER0, but I've never ended up with something I was happy with.
32 byte alignment doesn't guarantee faster code -- it is intended to provide more predictable performance. More work is needed in the jit to leverage this for faster code.

To express crossgen2's "fastest code possible" option, SPEED_OPT conveys the right intent. I would keep TIER1 as a jitting-only flag.

MichalStrehovsky · 2020-02-21T20:13:30Z

it is intended to provide more predictable performance

We might still be interested in that when we do perf tests comparing R2R code with jitted code - so maybe this pull request on its own could be still useful (we can just whack the Tiet-1 flag in for testing, but we need to pipe through the resulting alignments).

We don't have an overall strategy for what the jit should do differently for FAST_CODE (SPEED_OPT) versus BLENDED_CODE

Do you know if there's an issue tracking that? I can make one if there isn't.

AndyAyersMS · 2020-02-21T20:21:19Z

Do you know if there's an issue tracking that? I can make one if there isn't.

I think we can use #7751, it has most of the previous discussion on this topic.

davidwrighton

So, while this has no effect due to a lack of triggering the jit to produce the higher alignments, I believe that we should attempt to respect the jit's request for alignment as much as possible. (In case something comes along like, copying Vector256 constants into aligned memory for improved vector performance work or something.

I believe the current opt for size/opt for speed arguments to crossgen2 actually flow into the CorJitFlag.CORJIT_FLAG_SIZE_OPT flag that controls swapping between MinimumFunctionAlignment and OptimumFunctionAlignment. I believe we should probably have a separate conversation about the idea of optimize for size/speed/general purpose, as I think this is a relatively small portion of that discussion.

MichalStrehovsky · 2020-02-26T09:18:05Z

Thanks David! I think this code will also be useful when we want to do perf measurements comparing with JITted code. Additional stability would be useful.

Add support in crossgen2 for 32-byte alignment

5b0a875

MichalStrehovsky added the area-crossgen2-coreclr label Feb 20, 2020

davidwrighton approved these changes Feb 26, 2020

View reviewed changes

MichalStrehovsky merged commit 7855aa3 into dotnet:master Feb 26, 2020

MichalStrehovsky deleted the align32 branch February 26, 2020 09:17

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support in crossgen2 for 32-byte alignment #32602

Add support in crossgen2 for 32-byte alignment #32602

MichalStrehovsky commented Feb 20, 2020

MichalStrehovsky commented Feb 21, 2020

mjsabby commented Feb 21, 2020

MichalStrehovsky commented Feb 21, 2020

jkotas commented Feb 21, 2020

AndyAyersMS commented Feb 21, 2020

MichalStrehovsky commented Feb 21, 2020

AndyAyersMS commented Feb 21, 2020

davidwrighton left a comment

MichalStrehovsky commented Feb 26, 2020

Add support in crossgen2 for 32-byte alignment #32602

Add support in crossgen2 for 32-byte alignment #32602

Conversation

MichalStrehovsky commented Feb 20, 2020

MichalStrehovsky commented Feb 21, 2020

mjsabby commented Feb 21, 2020

MichalStrehovsky commented Feb 21, 2020

jkotas commented Feb 21, 2020

AndyAyersMS commented Feb 21, 2020

MichalStrehovsky commented Feb 21, 2020

AndyAyersMS commented Feb 21, 2020

davidwrighton left a comment

Choose a reason for hiding this comment

MichalStrehovsky commented Feb 26, 2020