Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support in crossgen2 for 32-byte alignment #32602

Merged
merged 1 commit into from
Feb 26, 2020

Conversation

MichalStrehovsky
Copy link
Member

What's in this pull request won't take effect in crossgen2 right now because crossgen2 doesn't set the Tier-1 flag. This pull request is more of a discussion point that will either see another commit, or will be closed.

32-byte alignment was added in #2249, affecting Tier-1 code only. One of the other pull requests that are open right now was discussing compCodeOpt and that made me realize this might be worth revisiting: I wanted to switch the check in JIT from emitComp->opts.jitFlags->IsSet(JitFlags::JIT_FLAG_TIER1) to compCodeOpt() == FAST_CODE (so that this takes effect for crossgen2 as well), only to find out that compCodeOpt is currently hardcoded to return BLENDED.

Do we have an issue tracking what we want to do with compCodeOpt? There are customers who rely solely on R2R and giving them a knob that says "give me fastest code possible" might be useful for them. Crossgen2 has that knob, it just seems like it doesn't do anything.

Alternatively, we could turn on the Tier-1 flag in crossgen2 when the user requests optimizing for time.

Cc @dotnet/crossgen-contrib @AndyAyersMS

@MichalStrehovsky
Copy link
Member Author

Cc @mjsabby since the steady state throughput achievable with R2R is in his area of interest.

@mjsabby
Copy link
Contributor

mjsabby commented Feb 21, 2020

Thanks @MichalStrehovsky. Sounds good.

What is the reason for not doing Tier1 code in Crossgen2 by default?

@MichalStrehovsky
Copy link
Member Author

What is the reason for not doing Tier1 code in Crossgen2 by default?

From looking at how the Tier1 flag is used within RyuJIT, it seems like up until the 32byte alignment pull request, setting the flag made no difference for the generated code (it does try to enable FAST_CODE but since compCodeOpt() is hardcoded to return BLENDED, it might not amount to anything meaningful).

Crossgen2 could set it (not sure if it would be good to set it by default - the default still tries to balance size and speed), but it's more of a question of how JIT_FLAG_TIER1 would be different from JIT_FLAG_SPEED_OPT when crossgenning - it feels like specifying TIER1 in crossgen is a bit of a lie, but on the other hand SPEED_OPT doesn't do anything in RyuJIT.

@jkotas
Copy link
Member

jkotas commented Feb 21, 2020

it feels like specifying TIER1 in crossgen is a bit of a lie

Right, overloading JIT/EE interface flags like this always ends up poorly. TIER1 flag should be set only when we are running as a JIT and producing TIER1 code.

on the other hand SPEED_OPT doesn't do anything in RyuJIT

That's something we may want to fix.

@AndyAyersMS
Copy link
Member

This hits on some broader issues:

  • We don't have an overall strategy for what the jit should do differently for FAST_CODE (SPEED_OPT) versus BLENDED_CODE. One can presumably up some limits and inline more aggressively, but without care, all this will do is create bigger methods, slower compiles, and a bigger test matrix.
  • The methods in the jit that control optimization need some revision; I've started in on this a few times because I was looking at enabling some opts at TIER0, but I've never ended up with something I was happy with.
  • 32 byte alignment doesn't guarantee faster code -- it is intended to provide more predictable performance. More work is needed in the jit to leverage this for faster code.

To express crossgen2's "fastest code possible" option, SPEED_OPT conveys the right intent. I would keep TIER1 as a jitting-only flag.

@MichalStrehovsky
Copy link
Member Author

it is intended to provide more predictable performance

We might still be interested in that when we do perf tests comparing R2R code with jitted code - so maybe this pull request on its own could be still useful (we can just whack the Tiet-1 flag in for testing, but we need to pipe through the resulting alignments).

We don't have an overall strategy for what the jit should do differently for FAST_CODE (SPEED_OPT) versus BLENDED_CODE

Do you know if there's an issue tracking that? I can make one if there isn't.

@AndyAyersMS
Copy link
Member

Do you know if there's an issue tracking that? I can make one if there isn't.

I think we can use #7751, it has most of the previous discussion on this topic.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, while this has no effect due to a lack of triggering the jit to produce the higher alignments, I believe that we should attempt to respect the jit's request for alignment as much as possible. (In case something comes along like, copying Vector256 constants into aligned memory for improved vector performance work or something.

I believe the current opt for size/opt for speed arguments to crossgen2 actually flow into the CorJitFlag.CORJIT_FLAG_SIZE_OPT flag that controls swapping between MinimumFunctionAlignment and OptimumFunctionAlignment. I believe we should probably have a separate conversation about the idea of optimize for size/speed/general purpose, as I think this is a relatively small portion of that discussion.

@MichalStrehovsky MichalStrehovsky merged commit 7855aa3 into dotnet:master Feb 26, 2020
@MichalStrehovsky MichalStrehovsky deleted the align32 branch February 26, 2020 09:17
@MichalStrehovsky
Copy link
Member Author

Thanks David! I think this code will also be useful when we want to do perf measurements comparing with JITted code. Additional stability would be useful.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants