Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct improvement, part1: create more LCL_FLD #48377

Merged
merged 3 commits into from
Mar 19, 2021

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Feb 17, 2021

When we copy a promoted struct to an unpromoted struct use LCL_FLD unpromoted fldOffset instead of IND(ADDR(LCL_VAR unpromoted, fldOffset), it prevents creating additional instructions, including taking address if `unpromoted struct'.
So it deletes marking 'unpromoted struct' as address is taken and allows more optimizations with it.

Edit: see the updated diffs in the comment below, the PR was rewritten to use LCL_FLD only when FIELD_HANDLE belongs to LCL_VAR's struct type.

@sandreenko sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 17, 2021
@sandreenko sandreenko marked this pull request as draft February 17, 2021 02:51
@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sandreenko sandreenko added the enhancement Product code improvement that does NOT require public API changes/additions label Feb 17, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Feb 18, 2021
@sandreenko sandreenko marked this pull request as ready for review February 26, 2021 02:42
@sandreenko
Copy link
Contributor Author

PTAL @dotnet/jit-contrib, this is the first part of struct improvement for 6.0.

It is known that such optimizations often reveal issues in VN/CSE phases so I am writing tests to add before the merge, please tell me if you know any particular transformations that could be affected by this.

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@sandreenko
Copy link
Contributor Author

Diff example, Method code size: 157->93
base.txt
diff.txt

Base automatically changed from master to main March 1, 2021 09:07
@sandreenko sandreenko mentioned this pull request Mar 1, 2021
10 tasks
@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@sandreenko
Copy link
Contributor Author

The PR has been updated and now it passes all tests, + I am adding a new test here with problematic struct copies + #49504 to improve VN checks.

The updated diffs, framework libraries:

x64 windows crossgen: 
Total bytes of delta: -32543 (-0.10% of base)
6003 total methods with Code Size differences (5602 improved, 401 regressed), 194172 unchanged.
x64 windows pmi:
Total bytes of delta: -62008 (-0.12% of base)
9093 total methods with Code Size differences (8083 improved, 1010 regressed), 253818 unchanged.

arm64 windows crossgen (altjit):
Total bytes of delta: -15496 (-0.03% of base)
1263 total methods with Code Size differences (1251 improved, 12 regressed), 198403 unchanged.
arm64 windows pmi (altjit):
Total bytes of delta: -56156 (-0.10% of base)
3891 total methods with Code Size differences (3827 improved, 64 regressed), 259020 unchanged.

x64 unix crossgen (altjit):
Total bytes of delta: -9012 (-0.02% of base)
1779 total methods with Code Size differences (1415 improved, 364 regressed), 197888 unchanged.
x64 unix pmi (altjit):
Total bytes of delta: -46202 (-0.09% of base)
3395 total methods with Code Size differences (2866 improved, 529 regressed), 259516 unchanged.

Benchmark diffs:

x64 windows crossgen: 
Total bytes of delta: -175 (-0.05% of base)
39 total methods with Code Size differences (38 improved, 1 regressed), 1737 unchanged.
x64 windows pmi:
Total bytes of delta: 76 (0.02% of base)
75 total methods with Code Size differences (64 improved, 11 regressed), 1816 unchanged.

arm64 windows crossgen (altjit):
Total bytes of delta: -12 (-0.00% of base)
2 total methods with Code Size differences (2 improved, 0 regressed), 1772 unchanged.
arm64 windows pmi (altjit):
Total bytes of delta: -368 (-0.07% of base)
23 total methods with Code Size differences (23 improved, 0 regressed), 1868 unchanged.

x64 unix crossgen (altjit):
Total bytes of delta: -8 (-0.00% of base)
3 total methods with Code Size differences (2 improved, 1 regressed), 1771 unchanged.
x64 unix pmi (altjit):
Total bytes of delta: 249 (0.05% of base)
14 total methods with Code Size differences (1 improved, 13 regressed), 1877 unchanged.

The regressions are from:

  1. worse liveness for tracked variables, tracked by Precise liveness for LCL_FLD-s  #49199, we zero-init them in prolog as the result;
  2. LclVars are not marked as address-taken so we allow more tail-calls to happen, code size regression, but an actual performance improvement;

PTAL @dotnet/jit-contrib

// for calls that return multiple registers
// Don't use field by field assignment if the src is a call,
// lowering will handle it without spilling the call result into memory
// to access the individual fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. What is the behavior when assigning a call result to a promoted struct that has a single field.

  2. How does lower patch things up?
    Don't we create a copy block here, which makes the dest become address taken?
    How can lower fix that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the behavior when assigning a call result to a promoted struct that has a single field.

This is a special case covered by CanBeReplacedWithItsField, it checks exactly for promoted struct with 1 field without holes, in this case the dst is not address taken. it happens in LowerStoreLocCommon:

JITDUMP("Replacing an independently promoted local var V%02u with its only field V%02u for the store "

  1. How does lower patch things up?

for the case of ASG(LCL_VAR, call) it would come as STORE_LCL_VAR(call) to LowerStoreSingleRegCallStruct and it will generate STORE_IND or STORE_BLK . Store_blk is used when the struct has the size of 3, 5, 6, 7 bytes. The details are in LowerCallStruct.

void Lowering::LowerStoreSingleRegCallStruct(GenTreeBlk* store)

@sandreenko
Copy link
Contributor Author

ping @dotnet/jit-contrib

I recommend reviewing it with "Hide whitespace changes checked".

The idea is quite simple when we have

ASG struct1type
  dst LCL_VAR struct1type promoted V01
  src LCL_VAR struct1type not-promoted V02

we want to generate a field-by-field assignment for V01. In order to do that we need to access V02 fields and in the past we did as

IND(ADDR(V02) + field offset)

when we can actually do

LCL_FLD V02 fieldOffset

instead, it prevents V02 from becoming address exposed and generates smaller code for the access.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look ok to me.

Brian was recently in this code so should also approve.

fgMorphCopyBlock is complicated and it is hard to have confidence that what it is doing is correct. You might think about how to rework it to make what it is doing more obvious.

@sandreenko
Copy link
Contributor Author

fgMorphCopyBlock is complicated and it is hard to have confidence that what it is doing is correct. You might think about how to rework it to make what it is doing more obvious.

It is on my plate but pretty far, with struct enregistration this transformation needs to be moved from morph into a later phase, I have a description of my design in PostponeMorphingOfBlockOperations.md.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few notes

fgMorphCopyBlock has become an unreadable beast...

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
@sandreenko sandreenko merged commit a03a675 into dotnet:main Mar 19, 2021
@sandreenko sandreenko deleted the enregStruct_part1_moreLclFld branch March 19, 2021 15:37
@ghost ghost locked as resolved and limited conversation to collaborators Apr 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants