Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Refactor call argument representation #67238

Merged
merged 55 commits into from
Apr 15, 2022

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Mar 28, 2022

This refactors the JIT's representation of call arguments. It replaces
GenTreeCall::Use and fgArgTabEntry with a single class CallArg.
CallArg always contains space for ABI information and contains two
intrusive linked list nodes: one for all args (similar to current
gtCallArgs) where all standard arguments are always in argument order,
and one for late args (similar to current gtCallLateArgs) that may be
reordered. The late args list may also not contain all arguments.

fgArgInfo is also replaced by a new class CallArgs that is stored
inline in GenTreeCall. It encapsulates all handling of arguments
(insertion/removal). The change also begins treating the 'this' argument
as a normal argument with CallArgs providing convenient access to it
when necessary.

The main benefit of this change is to avoid keeping track of the side
table fgArgInfo and having to scan through this side table repeatedly
when we need to query argument information. In addition it gives more
convenient ways to access well known arguments like 'this', the ret
buffer, VSD cell, etc.

Follow-up changes I want to make after this:

  • Remove quirks. In trying to keep this zero-diff several quirks were
    added. In particular we currently miss a significant optimization
    opportunity by not retyping 'this' arguments as unmanaged pointers
    when they are on stack.
  • Remove GT_PUTARG_TYPE nodes. These are used only to transfer
    information from import to morph. Now that we always allocate argument
    information we can just store this information directly.
  • Remove GTF_LATE_ARG
  • Remove CallArg::ArgNum, not necessary since args are in order now
  • Remove DEBUG_ARG_SLOTS which adds some complication and has served its purpose

Since this is a large change I will wait until at least #62843 is merged before marking it as ready.
Also, I need to review this a couple of times and see testing.

Some important differences for JIT devs:

  • CallArg::GetNode() is equivalent to fgArgTabEntry::GetNode(). That means it is the late node if it is present. In particular it means that the following loops are not completely equivalent:
for (GenTreeCall::Use& use : call->gtCallArgs)
{
  GenTree* node = use.GetNode();
  // Do something with early node
}
for (CallArg& arg : call->gtArgs.Args())
{
  GenTree* node = arg.GetNode();
  // Node is actual argument node, not the setup node
}

This is intentional since I think it is a pitfall if GetNode() returns the setup node instead of the actual arg node. Also note that it is quite rare that we want access to the actual setup node after morph and they are equivalent before. The equivalent to the former loop is to use GetEarlyNode().

This refactors the JIT's representation of call arguments. It replaces
`GenTreeCall::Use` and `fgArgTabEntry` with a single class `CallArg`.
`CallArg` always contains space for ABI information and contains two
intrusive linked list nodes: one for all args (similar to current
`gtCallArgs`) where all standard arguments are always in argument order,
and one for late args (similar to current `gtCallLateArgs`) that may be
reordered. The late args list may also not contain all arguments.

`fgArgInfo` is also replaced by a new class `CallArgs` that is stored
inline in `GenTreeCall`. It encapsulates all handling of arguments
(insertion/removal). The change also begins treating the 'this' argument
as a normal argument with `CallArgs` providing convenient access to it
when necessary.

The main benefit of this change is to avoid keeping track of the side
table `fgArgInfo` and having to scan through this side table repeatedly
when we need to query argument information. In addition it gives more
convenient ways to access well known arguments like 'this', the ret
buffer, VSD cell, etc.

Follow-up changes I want to make after this:
* Remove quirks. In trying to keep this zero-diff several quirks were
  added. In particular we currently miss a significant optimization
  opportunity by not retyping 'this' arguments as unmanaged pointers
  when they are on stack.
* Remove GT_PUTARG_TYPE nodes. These are used only to transfer
  information from import to morph. Now that we always allocate argument
  information we can just store this information directly.
@ghost ghost assigned jakobbotsch Mar 28, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 28, 2022
@ghost
Copy link

ghost commented Mar 28, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This refactors the JIT's representation of call arguments. It replaces
GenTreeCall::Use and fgArgTabEntry with a single class CallArg.
CallArg always contains space for ABI information and contains two
intrusive linked list nodes: one for all args (similar to current
gtCallArgs) where all standard arguments are always in argument order,
and one for late args (similar to current gtCallLateArgs) that may be
reordered. The late args list may also not contain all arguments.

fgArgInfo is also replaced by a new class CallArgs that is stored
inline in GenTreeCall. It encapsulates all handling of arguments
(insertion/removal). The change also begins treating the 'this' argument
as a normal argument with CallArgs providing convenient access to it
when necessary.

The main benefit of this change is to avoid keeping track of the side
table fgArgInfo and having to scan through this side table repeatedly
when we need to query argument information. In addition it gives more
convenient ways to access well known arguments like 'this', the ret
buffer, VSD cell, etc.

Follow-up changes I want to make after this:

  • Remove quirks. In trying to keep this zero-diff several quirks were
    added. In particular we currently miss a significant optimization
    opportunity by not retyping 'this' arguments as unmanaged pointers
    when they are on stack.
  • Remove GT_PUTARG_TYPE nodes. These are used only to transfer
    information from import to morph. Now that we always allocate argument
    information we can just store this information directly.

Since this is a large change I will wait until at least #62843 is merged before marking it as ready.
Also, I need to review this a couple of times and see testing.

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@@ -1067,7 +1062,8 @@ GenTree* Compiler::fgOptimizeDelegateConstructor(GenTreeCall* call,
}
else if (oper == GT_CALL && targetMethod->AsCall()->gtCallMethHnd == eeFindHelper(CORINFO_HELP_VIRTUAL_FUNC_PTR))
{
GenTree* handleNode = targetMethod->AsCall()->gtCallArgs->GetNext()->GetNext()->GetNode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see such ->GetNext()->GetNext()->GetNode() are removed 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, although I will have to review these carefully a few times since 'this' is now a regular arg, so some of these might have shifted.

@jakobbotsch
Copy link
Member Author

FYI @dotnet/jit-contrib

@jakobbotsch

This comment was marked as spam.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr crossgen2 outerloop, Fuzzlyn, jit-cfg

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@jakobbotsch
Copy link
Member Author

The CI pipelines ran in https://github.com/dotnet/runtime/pull/67238/checks?check_run_id=5963207833 but I have committed some minor documentation changes since then. All the failures look preexisting, the majority being #67821. I believe this is ready -- zero diffs and I have reviewed it a couple of times myself.

cc @dotnet/jit-contrib, anyone wants to volunteer to review this?

Also cc @shushanhf -- there are a few changes in LoongArch64 for the new call arg representation that you may want to double check/test.

@jakobbotsch jakobbotsch marked this pull request as ready for review April 11, 2022 12:34
@shushanhf
Copy link
Contributor

Also cc @shushanhf -- there are a few changes in LoongArch64 for the new call arg representation that you may want to double check/test.

Thanks, I will check it for LA.

@jakobbotsch
Copy link
Member Author

PIN results on libraries.pmi for x64:
Before: 327070083563
After: 325054392352 (-0.6%)

I suspect most of this is due to avoiding the repeated linear searches for arg table entries. Will post memory stats when I'm back tomorrow.

@jakobbotsch
Copy link
Member Author

Diff of memory stats over libraries.pmi (left: before, right: after): https://www.diffchecker.com/kFqhlvta

Before: 34513223680 bytes
After: 34400698368 bytes (-0.3%)

@AndyAyersMS
Copy link
Member

I looked most everything over just to familiarize myself with the changes (except morph). Will go back through in more detail tomorrow and write up some specific feedback.

Overall I agree this looks like a very nice cleanup.

Seems like we use the "enumerate early args then late args" pattern quite often, I wonder if some sort of combined iterator makes sense.

Also wonder if we can come up with some kind of safeguards to make it harder to look at the wrong enumerations. Or maybe find better descriptive names?

@jakobbotsch
Copy link
Member Author

Seems like we use the "enumerate early args then late args" pattern quite often, I wonder if some sort of combined iterator makes sense.

I think for the most part these cases are "visit all operands of GenTreeCall nodes" and we should probably look into making them use either GenTree::UseEdges, GenTree::Operands or GenTreeVisitor. Also if the ordering is not important we can just use Args() and access both GetEarlyNode() and GetLateNode() in the loop body. However, I think this is a cleanup for another PR.

Also wonder if we can come up with some kind of safeguards to make it harder to look at the wrong enumerations. Or maybe find better descriptive names?

I agree, it would be nice to find a better name for LateArgs that makes it clear that this is a subset of Args and that it may not be in arg order. ArgsNeedingLatePlacement could be one, although a bit verbose...
I do think with this PR the situation is significantly better than before. Iterating Args() and using GetNode() is in almost all scenarios the thing you want (it will never give you a placeholder/setup node). Contrast that with before, where iterating gtCallArgs and using GetNode() would give you setup nodes/placeholder after morph.

Note that many places in this PR are using GetEarlyNode() before morph which is always the same node as GetNode(). New code should probably just use GetNode() here, but when I mass-converted the uses I attempted to make sure the semantics were the same as previously. Perhaps something I might look at simplifying in a future PR.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have looked through everything but morph.cpp.

src/coreclr/jit/flowgraph.cpp Show resolved Hide resolved
src/coreclr/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/flowgraph.cpp Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
GenTreeCall* Compiler::gtNewCallNode(gtCallTypes callType,
CORINFO_METHOD_HANDLE callHnd,
var_types type,
const DebugInfo& di)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're no longer setting node flags based on arg flags.

Presumably this now happens up in the callers? Or does it not matter much since GTF_CALL usually implies all the other side effects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should happen in callers now.
In general the arg list here was only used interestingly for JIT helpers, so I have folded them in as optional arguments in gtNewHelperCallNode that takes care of setting the flags.

For user calls, impImportCall will use impPopCallArgs and set the flags from the arguments after this. With that said, I have reviewed the sites that use impPopCallArgs and it looks like I have missed the flags in a couple of places... will fold it into impPopCallArgs.

I have also considered making CallArgs handle it when inserting arguments. It should be easy to get a back-pointer to the parent GenTreeCall given that it is stored inline. I think it would be nice not to be able to forget, but also it is conceptually wrong to do this after rationalization, so I am not totally sure if it makes sense to do always.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, initially I wanted to replace the previous argument with std::initializer_list, but we cannot use STL :-(

Comment on lines +21254 to +21256
// TODO-CallArgs-REVIEW: Might discard commas otherwise?
assert(thisObj == thisArg->GetEarlyNode());
thisArg->SetEarlyNode(localCopyThis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please open an issue.

src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
@@ -2291,36 +1910,25 @@ GenTree* Compiler::fgInsertCommaFormTemp(GenTree** ppTree, CORINFO_CLASS_HANDLE
return new (this, GT_LCL_VAR) GenTreeLclVar(GT_LCL_VAR, subTree->TypeGet(), lclNum);
}

//------------------------------------------------------------------------
// fgInitArgInfo: Construct the fgArgInfo for the call with the fgArgEntry for each arg
// AddFinalArgsAndDetermineABIInfo:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add back the

//---------

line and document the args

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we dead set on the full template everywhere?
In this case the documentation would be something like "comp - the compiler" and "call - the call", which is not super useful and just adds noise. I have in many cases omitted documenting args when it would not clarify anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is our convention, yes.

I usually add them even if I think the args are self-descriptive, to at least distinguish those cases from the ones where the args might seem to be self-descriptive but aren't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me include these as part of one of the follow-up changes to avoid rerunning CI for it.

src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
The early arg could be a local that was optimized to a constant later
@jakobbotsch jakobbotsch merged commit dbf5c58 into dotnet:main Apr 15, 2022
@jakobbotsch jakobbotsch deleted the refactor-call-args branch April 15, 2022 08:18
jakobbotsch added a commit to jakobbotsch/runtime that referenced this pull request Apr 15, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants