JIT: profile checking through loop opts #99367

AndyAyersMS · 2024-03-06T17:48:32Z

Keep profile checks enabled until after we have finished running the loop optmizations (recall this is currently just checking for edge likelihood consistency).

Fix various maintenance issues to make this possible. Most are straightforward, but a few are not:

Whenever we create a new BBJ_COND we have to figure out the right likelihoods. If we're copying an existing one (say loop inversion) we currently duplicate the likelihoods. This is a choice, and it may not accurately represent what happends, but we have no better information.
If we invent new branching structures we need to put in reasonable likelihoods. For cloning we assume flowing to the cold loop is unlikely but can happen.

Block weights and edge likelihoods are not yet consistent. The plan is to get all the edge likelihoods "correct" and self-consistent, and then start rectifying edge likelihoods and block weights.

Contributes to #93020.

Diffs

Keep profile checks enabled until after we have finished running the loop optmizations (recall this is currently just checking for edge likelihood consistency). Fix various maintenance issues to make this possible. Most are straightforward, but a few are not: * Whenever we create a new BBJ_COND we have to figure out the right likelihoods. If we're copying an existing one (say loop inversion) we currently duplicate the likelihoods. This is a choice, and it may not accurately represent what happends, but we have no better information. * If we invent new branching structures we need to put in reasonable likelihoods. For cloning we assume flowing to the cold loop is unlikely but can happen. Block weights and edge likelihoods are not yet consistent. The plan is to get all the edge likelihoods "correct" and self-consistent, and then start rectifying edge likelihoods and block weights. Contributes to dotnet#93020.

ghost · 2024-03-06T17:48:41Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Keep profile checks enabled until after we have finished running the loop optmizations (recall this is currently just checking for edge likelihood consistency).

Fix various maintenance issues to make this possible. Most are straightforward, but a few are not:

Whenever we create a new BBJ_COND we have to figure out the right likelihoods. If we're copying an existing one (say loop inversion) we currently duplicate the likelihoods. This is a choice, and it may not accurately represent what happends, but we have no better information.
If we invent new branching structures we need to put in reasonable likelihoods. For cloning we assume flowing to the cold loop is unlikely but can happen.

Block weights and edge likelihoods are not yet consistent. The plan is to get all the edge likelihoods "correct" and self-consistent, and then start rectifying edge likelihoods and block weights.

Contributes to #93020.

Author:	AndyAyersMS
Assignees:	AndyAyersMS
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2024-03-06T17:49:21Z

@amanasifkhalid PTAL
cc @dotnet/jit-contrib

It's possible getting past loop opts was the hard part. We'll see.

jakobbotsch · 2024-03-06T17:52:16Z

Would be good to run some extended testing.

AndyAyersMS · 2024-03-06T18:26:41Z

Would be good to run some extended testing.

Also seeing some diffs, which is a bit unexpected; didn't think anything relied on likelihoods yet

amanasifkhalid · 2024-03-06T18:36:00Z

src/coreclr/jit/loopcloning.cpp

+    // TODO: this is a bit of out sync with what we do for block weights.
+    // Reconcile.
+    //
+    const weight_t fastLikelihood = 0.999;


Out of curiosity, where does this number of significant figures come from?

It's just an arbitrary factor. For most cloned loops we actually never expect the cold loop to run, since we think it's execution may cause exceptions. But setting likelihood to zero seemed to drastic.

optCloneLoop scales the fast loop blocks by 0.99. There is a comment about it:

runtime/src/coreclr/jit/loopcloning.cpp

Lines 1944 to 1949 in 5f52977

// We assume that the fast path will run 99% of the time, and thus should get 99% of the block weights.

// The slow path will, correspondingly, get only 1% of the block weights. It could be argued that we should

// mark the slow path as "run rarely", since it really shouldn't execute (given the currently optimized loop

// conditions) except under exceptional circumstances.

const weight_t fastPathWeightScaleFactor = 0.99;

const weight_t slowPathWeightScaleFactor = 1.0 - fastPathWeightScaleFactor;

It seems like we should use the same likelihood in both places.

Ah, good point. I will tweak this in the next round of changes.

amanasifkhalid

LGTM, thanks!

AndyAyersMS · 2024-03-06T21:08:06Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

azure-pipelines · 2024-03-06T21:08:24Z

Azure Pipelines successfully started running 2 pipeline(s).

AndyAyersMS · 2024-03-06T23:18:41Z

Would be good to run some extended testing.

Also seeing some diffs, which is a bit unexpected; didn't think anything relied on likelihoods yet

CI diffs are a lot more modest than my local diffs. I will dig into a few and see what's up.

AndyAyersMS · 2024-03-06T23:49:20Z

          55 (1.11 % of base) : 110047.dasm - OrchardCore.ContentManagement.Display.ContentDisplay.ContentPartDisplayDriverResolver:GetDisplayModeDrivers(System.String,System.String):System.Collections.Generic.IList`1[OrchardCore.ContentManagement.Display.ContentDisplay.IContentPartDisplayDriver]:this (Tier1)
        -165 (-12.87 % of base) : 71288.dasm - System.Collections.Concurrent.ConcurrentDictionary`2[Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.ModelMetadataIdentity,Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.DefaultModelMetadataProvider+ModelMetadataCacheEntry]:TryGetValueInternal(System.Collections.Concurrent.ConcurrentDictionary`2+Tables[Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.ModelMetadataIdentity,Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.DefaultModelMetadataProvider+ModelMetadataCacheEntry],Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.ModelMetadataIdentity,int,byref):ubyte (Tier1)

We mark blocks rare that were not rare before, and this modifies layout and alters allocation.

ghost assigned AndyAyersMS Mar 6, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 6, 2024

AndyAyersMS requested a review from amanasifkhalid March 6, 2024 17:48

amanasifkhalid reviewed Mar 6, 2024

View reviewed changes

amanasifkhalid approved these changes Mar 6, 2024

View reviewed changes

build-analysis bot mentioned this pull request Mar 6, 2024

NuGet failing with Response status code does not indicate success: 503 (Service Unavailable) dotnet/arcade#11723

Open

5 tasks

AndyAyersMS merged commit 1c05c06 into dotnet:main Mar 7, 2024
174 checks passed

DrewScoggins mentioned this pull request Mar 12, 2024

[Perf] Windows/x64: 2 Regressions on 3/7/2024 9:14:14 PM #99616

Open

LoopedBard3 mentioned this pull request Mar 14, 2024

[Perf] Linux/arm64: 1 Regression on 3/7/2024 6:42:59 PM dotnet/perf-autofiling-issues#31065

Closed

AndyAyersMS mentioned this pull request Mar 15, 2024

JIT: Flow Graph Modernization and Improved Block Layout #93020

Open

51 tasks

github-actions bot locked and limited conversation to collaborators Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: profile checking through loop opts #99367

JIT: profile checking through loop opts #99367

AndyAyersMS commented Mar 6, 2024 •

edited

Loading

ghost commented Mar 6, 2024

AndyAyersMS commented Mar 6, 2024

jakobbotsch commented Mar 6, 2024

AndyAyersMS commented Mar 6, 2024

amanasifkhalid Mar 6, 2024

AndyAyersMS Mar 6, 2024

jakobbotsch Mar 6, 2024

AndyAyersMS Mar 6, 2024

amanasifkhalid left a comment

AndyAyersMS commented Mar 6, 2024

azure-pipelines bot commented Mar 6, 2024

AndyAyersMS commented Mar 6, 2024 •

edited

Loading

AndyAyersMS commented Mar 6, 2024

	// We assume that the fast path will run 99% of the time, and thus should get 99% of the block weights.
	// The slow path will, correspondingly, get only 1% of the block weights. It could be argued that we should
	// mark the slow path as "run rarely", since it really shouldn't execute (given the currently optimized loop
	// conditions) except under exceptional circumstances.
	const weight_t fastPathWeightScaleFactor = 0.99;
	const weight_t slowPathWeightScaleFactor = 1.0 - fastPathWeightScaleFactor;

JIT: profile checking through loop opts #99367

JIT: profile checking through loop opts #99367

Conversation

AndyAyersMS commented Mar 6, 2024 • edited Loading

ghost commented Mar 6, 2024

AndyAyersMS commented Mar 6, 2024

jakobbotsch commented Mar 6, 2024

AndyAyersMS commented Mar 6, 2024

amanasifkhalid Mar 6, 2024

Choose a reason for hiding this comment

AndyAyersMS Mar 6, 2024

Choose a reason for hiding this comment

jakobbotsch Mar 6, 2024

Choose a reason for hiding this comment

AndyAyersMS Mar 6, 2024

Choose a reason for hiding this comment

amanasifkhalid left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Mar 6, 2024

azure-pipelines bot commented Mar 6, 2024

AndyAyersMS commented Mar 6, 2024 • edited Loading

AndyAyersMS commented Mar 6, 2024

AndyAyersMS commented Mar 6, 2024 •

edited

Loading

AndyAyersMS commented Mar 6, 2024 •

edited

Loading