Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/6.0] JIT: fix scalability issue in redundant branch optimizer #68972

Merged
merged 1 commit into from
May 14, 2022

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented May 6, 2022

Backport of #66259 to release/6.0

/cc @AndyAyersMS

Customer Impact

We have now gotten two reports of poor experiences with updating from 5.0 to 6.0 (#68953 and #66076) because a particular IL pattern can cause the JIT to spend an inordinately long time trying to optimize a method.

So far, the problematic patterns seem to come from dynamically generated IL. The problem can be worked around by disabling JIT optimization, but this may not be easy/practical in many cases.

In the case from #68953, there was a method with 38821 bytes of IL, 3782 basic blocks, and 816 recurrences of the IL pattern:

ldarg.1     
castclass    0x200002A (token can vary)

Because we're optimizing this method, this IL sequence gets expanded inline, and turns into a null check on arg1 followed by a check of the method table and eventually a helper call. The redundant branch optimizer then searches through 28196 pairs of these null checks looking to see if the leading member of the pair makes the trailing member of the pair redundant. It fails in every case.

With this fix, the optimizer only considers 3322 such pairs.

Testing

Verified the fix on the customer repro cases. Saw a handful of methods with slightly different codegen via SPMI. From the PR:

Local SPMI run sees fairly minimal diffs (15 methods, maybe 4 different "non-test" methods).

SPMI typically runs through ~2M method instances.

Risk

The core algorithm has potentially quadratic behavior. The fix limits the scope of the optimization to a limited number of occurrences so that the quadratic factor doesn't ever grow too large.

Risk is low. We are foregoing a small number of optimizations.

In methods with long skinny dominator trees and lots of redundant branches
the jit can spend too much time trying to optimize the branches.

Place a limit on the number of redundant branches with matching VNs that
the jit will consider for a given branch.

Fixes #66067.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 6, 2022
@ghost
Copy link

ghost commented May 6, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #66259 to release/6.0

/cc @AndyAyersMS

Customer Impact

Testing

Risk

IMPORTANT: If this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.

Author: github-actions[bot]
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS AndyAyersMS added this to the 6.0.x milestone May 6, 2022
@AndyAyersMS
Copy link
Member

@dotnet/jit-contrib need an approver
cc @jeffschwMSFT

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Can you add a little more detail on why the problem occurs and perhaps include a small IL snippet (if available) on what causes the issue? We will take for consideration in 6.0.x.
cc @JulieLeeMSFT

@AndyAyersMS
Copy link
Member

Approved. Can you add a little more detail on why the problem occurs and perhaps include a small IL snippet (if available) on what causes the issue? We will take for consideration in 6.0.x

Added.

@AndyAyersMS
Copy link
Member

Arm64 failures perhaps due to reduced capacity. Going to retry.

FWIW Build analysis was unhelpful here, the pipeline is finished.

image

@AndyAyersMS
Copy link
Member

Another ARM64 failure, this time the console log was simply

Console log: 'JIT.Generics' from job 64b4976f-f8ab-4890-ae52-09e9c32512eb workitem 6a94bb48-1526-422b-ba13-01e34de43bee (windows.10.arm64v8.open) executed on machine DDARM64-147

and the "run" took less than one second:

2022-05-06T21:56:34.536Z	INFO   	executor(833)	_execute_command	running %HELIX_CORRELATION_PAYLOAD%\scripts\c35cc5595c714bf58975ac2fdc929d7a\execute.cmd in D:\h\w\AB4A0952\w\A5AE0961\e max 1800 seconds
2022-05-06T21:56:35.439Z	INFO   	job(38)	wait	Work item's process exited normally.

@jakobbotsch
Copy link
Member

Another ARM64 failure, this time the console log was simply

That one is dotnet/arcade#9284

@carlossanlop
Copy link
Member

@jakobbotsch @AndyAyersMS Please add the servicing consider label when ready, so it goes through Tactics. Code complete due date for servicing is May 16th.

@JulieLeeMSFT JulieLeeMSFT added the Servicing-consider Issue for next servicing release review label May 11, 2022
@rbhanda rbhanda added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels May 12, 2022
@rbhanda rbhanda modified the milestones: 6.0.x, 6.0.6 May 12, 2022
@carlossanlop carlossanlop merged commit 3743f5e into release/6.0 May 14, 2022
@carlossanlop carlossanlop deleted the backport/pr-66259-to-release/6.0 branch May 14, 2022 00:36
@ghost ghost locked as resolved and limited conversation to collaborators Jun 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants