Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT optimization: loop head alignment #8108

Closed
JosephTremoulet opened this issue May 15, 2017 · 7 comments
Closed

JIT optimization: loop head alignment #8108

JosephTremoulet opened this issue May 15, 2017 · 7 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@JosephTremoulet
Copy link
Contributor

(I'm creating tracking issues for some optimizations that RyuJit doesn't perform, so we'll have a place to reference/note when we see the lack of them affecting particular benchmarks)

This one actually has an implementation plumbed through, but it's triggered only by the unsupported COMPlus_JitAlignLoops flag. It should probably be enabled by default in at least some configurations/methods.

category:cq
theme:loop-opt
skill-level:expert
cost:medium

@JosephTremoulet
Copy link
Contributor Author

This came up in #4400

@AndyAyersMS
Copy link
Member

Eugene has some evidence that 32 byte alignment matters for hot loops on recent CPUs. More crucial when loop bodies are small.

We'd also, as a prerequisite, need to align method entries. Right now for x64 we always use 8 byte alignment (while somewhat oddly, on x86, we do things differently).

R2R images don't honor alignment requests (fragile NIs do though).

I would like to to collect some distribution data on native code sizes so we could guess at likely costs for different method alignment algorithms. Probably need SPMI working in Core or could try it over in desktop. Benefit would be harder to assess.

So, lots to sort through here.

@adamsitnik
Copy link
Member

it's triggered only by the unsupported COMPlus_JitAlignLoops

what do you mean by unsupported? Even if I set COMPlus_JitAlignLoops env var it is not going to work?

@AndyAyersMS
Copy link
Member

IIRC setting the env var will do something, but is not guaranteed to do what you'd expect.

@adamsitnik
Copy link
Member

@AndyAyersMS thanks!

btw if you would like to test the performance of CoreCLR with some flag enabled/disabled, then you can use following config to run defeault settings vs flag configured:

static void Main(string[] args)
    => BenchmarkRunner.Run<Benchmarks>(
        DefaultConfig.Instance
            .With(DisassemblyDiagnoser.Create(new DisassemblyDiagnoserConfig(printAsm: true, printPrologAndEpilog: true)))
            .With(Job.Default.With(Runtime.Core).AsBaseline())
            .With(Job.Default.With(Runtime.Core).With(new EnvironmentVariable[1] { new EnvironmentVariable("COMPlus_JitAlignLoops", "1") }))
        );

@voinokin
Copy link

voinokin commented Aug 3, 2018

I wonder whether it will possible some time in the future to allow loop head alignment on per-method basis, eg. with [MethodImpl(MethodImplOptions.AlignLoopHeads)]
In my hi-perf sorting app there are lots of different kinds of loops and I'd like to fine-tune some methods to squeeze as much additional perf as possible out of it without affecting some others.
According to Intel and Agner Fog's manuals the loop alignment affects behavior related to branch predition and uops cache.

@kunalspathak
Copy link
Member

This has been enabled for x86/x64 by #44370.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants