-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: improve types for single def locals and temps #10471
Conversation
Track whether a local has a single definition, and if so, if it has a reference type, try and update its type from the declared type to a better type taken from the value being assigned to the local. Obtain types for some of the 'short-lived' ref type temps that should have a single definition. Use both the tree and the eval stack as sources of type information (the latter can be phased out if/when all tree nodes can return rich type information). Refactor the code that sets or updates lvClassHnd into utilities to provide better auditing of type flow and make the set/update process a bit more rigorous. Cleanup the code that passes argument values a bit by commoning redundant argument lookup expressions.
@JosephTremoulet PTAL Impacts 114 methods in the jit-diff framework set. Overall devirt stats for System.Private.CoreLib:
Baseline rates were 9.42%, 2.23%, and 6.89%. This also lets us devirtualize the obvious things one would hope, eg the call to class B
{
public virtual int F() { return 3; }
}
class D : B
{
public override int F() { return 5; }
}
class X
{
public static int Main(string[] args)
{
B b = new D();
return b.F() + 95;
}
} |
Can you leverage the existing: |
It looks like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a few comments
src/jit/compiler.h
Outdated
unsigned char lvArgWrite : 1; // variable is a parameter and STARG was used on it | ||
unsigned char lvIsTemp : 1; // Short-lifetime compiler temp | ||
unsigned char lvArgWrite : 1; // there is at least one STLOC or STARG on this local | ||
unsigned char lvMultipleArgWrite : 1; // there is more than one STLOC on this local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO these names are confusing now, especially the fact that lvMultipleArgWrite
doesn't apply to args/starg. Maybe something more like HasStore
/HasMultipleStores
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an asymmetry between args and locals that I wrestled with -- args are implicitly written once, so a single starg
or ldarga
creates a multiple-def arg, whereas for locals we need to see two of stloc
or ldloca
to have a multiple def local. I could resolve this by setting lvArgWrite
when the arg temp is allocated and update the logic for args and then the fields would have same meaning. Does that seem preferable?
store
seems a bit too specific for the name since these also apply to address-of operators. I'd use definition
but then the overlap with lvISingleDef
would be even more confusing. How about HasSingleWrite
and HasMultipleWrites
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use definition but then the overlap with lvISingleDef would be even more confusing. How about HasSingleWrite and HasMultipleWrites?
I thought about definition and the same thing occurred to me. I picked "store" because it's the verb used in the IL opcodes, and to my mind they key point is that these properties track which IL opcodes get used and how often, so that you can take advantage of them when importing that single IL operation... I don't think that "write" conveys that (though I'll admit that "store" doesn't do a great job of conveying it). Maybe something along the lines of lvSingleILDef
or lvSingleILStoreOp
?
I could resolve this by setting lvArgWrite when the arg temp is allocated and update the logic for args and then the fields would have same meaning. Does that seem preferable?
maybe? On the one hand, I feel ok with what you have (modulo naming). On the other hand, I had to read over the change a few times to figure out that the asymmetry was intentional and because of the implicit arg def at the callsite... I guess whatever fits with the names is preferable to me, so if you keep them with names matching the opcodes then like you have it seems preferable, but if you switch to something more generic like Def
/Write
then yeah, reworking it to count the implicit arg defs at callsites as defs/writes would make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing in IL seems like a good improvement, so lvSingleILStoreOp
and lvMultipleILStoreOp
? And then similarly for the local bools that get introduced....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so
lvSingleILStoreOp
andlvMultipleILStoreOp
?
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, lvSingleILStoreOp
is sounding more precise than it is, it does not mean exactly one. Maybe lvHasILStoreOp
and lvHasMultipleILStoreOp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also am going to split the Set/Update method into two methods instead of passing in a bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
lvHasILStoreOp
andlvHasMultipleILStoreOp
?
Sure.
if (inlArgInfo[argNum].argHasTmp) | ||
const InlArgInfo& argInfo = inlArgInfo[argNum]; | ||
const bool argIsSingleDef = !argInfo.argHasLdargaOp && !argInfo.argHasStargOp; | ||
GenTree* const argNode = inlArgInfo[argNum].argNode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shorten to argInfo.argNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, not sure how I missed these. Will clean them up.
src/jit/flowgraph.cpp
Outdated
{ | ||
assert(inlArgInfo[argNum].argNode->OperIsConst() || inlArgInfo[argNum].argNode->gtOper == GT_ADDR); | ||
assert(argInfo.argNode->OperIsConst() || argInfo.argNode->gtOper == GT_ADDR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use your new local argNode
here?
src/jit/flowgraph.cpp
Outdated
(inlArgInfo[argNum].argNode->gtOper != GT_LCL_VAR || | ||
(inlArgInfo[argNum].argNode->gtFlags & GTF_GLOB_REF))); | ||
noway_assert((argInfo.argIsLclVar == 0) == | ||
(argInfo.argNode->gtOper != GT_LCL_VAR || (argInfo.argNode->gtFlags & GTF_GLOB_REF))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argNode
?
src/jit/flowgraph.cpp
Outdated
|
||
if (inlArgInfo[argNum].argNode->gtOper == GT_OBJ || | ||
inlArgInfo[argNum].argNode->gtOper == GT_MKREFANY) | ||
if (argInfo.argNode->gtOper == GT_OBJ || argInfo.argNode->gtOper == GT_MKREFANY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argNode
? (and again once in each branch of the if/else in the next 10 lines)
src/jit/importer.cpp
Outdated
// We should have seen a stloc in our IL prescan. | ||
assert(lvaTable[lclNum].lvArgWrite); | ||
|
||
const bool isSingleDefLocal = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this has the differences you mentioned vs lvSingleDef
, maybe this should be something more like isSingleStlocLocal
?
} | ||
|
||
tiRetVal = verMakeTypeInfo(resolvedToken.hClass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the stack type handle to set temp types; this makes the more precise type available more generally.
@@ -6491,6 +6601,14 @@ void Compiler::lvaDumpEntry(unsigned lclNum, FrameLayoutState curState, size_t r | |||
{ | |||
printf(" stack-byref"); | |||
} | |||
if (varDsc->lvClassHnd != nullptr) | |||
{ | |||
printf(" class-hnd"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to actually dump the class handle (or look up a name for it and dump that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class names get pretty long and there should now be notes left earlier in the dump stream when this field is set that correlate the variable, the name of the type, and the corresponding handle address. So my instinct is to leave this as is.
I could add this as a dump option, though we seem to have perhaps too many dump options already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add this as a dump option, though we seem to have perhaps too many dump options already
My inclination would be not to add a dump option, and leave how you have it -- thanks for explaining.
} | ||
|
||
// Are we updating the type? | ||
if (varDsc->lvClassHnd != clsHnd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the descriptions (in the comment on this method and commit message), I was expecting more of a meet operation here, not "the new one must be better" -- i.e. that we could have multiple sources of information and call this with each... would it make sense to do something like add a debug check that this only gets called once per local?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is misleading, sorry about that. This is not a meet (where abstractly we'd be moving from subtypes towards supertypes), but is simply modelling assignment. I had been thinking of handling the types for the block boundary spill temps when I wrote this but subsequently realized it can't be done yet. I'll reword this.
Not sure if you are using "local" here in the specific or general sense -- this might be called twice for callee arg temps (which are also caller locals): once to set the type based on the signature, and then again to "improve" the type based on the expression for the caller supplied value. I can add more state to track whether a temp is an callee arg temp and then assert that the update cases only happen where expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me refine that last paragraph -- updates can happen for single-def locals too, as we move from the signature type to the type of the initializing value. So the only cases of temps-corresponding-to-IL variables we won't update are the args in the root method; root method locals, callee args and locals all might be updated.
Compiler temps should never get updated; their types should only be set when the temp is allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using "local" in the more general sense and didn't realize you have the two cases that can apply to the same callee arg temp. Yes if there's a reasonable way to add checks along these lines (maybe an "is callee local" bit like you suggest, or maybe just have a way to assert this isn't getting called twice but also a way to exempt that callee arg callsite), I'd think that would be useful. Ok without if that gets too unwieldy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updates can happen for single-def locals too, as we move from the signature type to the type of the initializing value
So generally you'd expect zero or one calls to the overload that doesn't take a tree, followed by zero or one calls to the overload that does take a tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Maybe it would be simpler to do this: track for each var if a type update has happened, and assert it can only happen once. Pass an "update expected" flag down from the calling context since there are just two specific locations where we do updates: when we assign to the arg temp (where we won't have a stack handle), and when we see a STLOC for a single IL store local (where we will).
So we could check that updates only happened when expected and only happened once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that sounds good (I'm assuming that by "update" you mean "get a non-null class handle passed in and already had a non-null class handle stored on the type?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly that -- even if it turns out we don't alter the handle or exactness bit.
@JosephTremoulet hopefully this covers most of your feedback. It reads better now so I appreciate your review. |
src/jit/importer.cpp
Outdated
{ | ||
CORINFO_CLASS_HANDLE stkHnd = verCurrentState.esStack[level].seTypeInfo.GetClassHandle(); | ||
lvaSetClass(tnum, tree, stkHnd); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did see a comment in the code base that lvIsTemp variables are single def, but there is nothing that enforces that and I spotted at least in one place where that isn't true.
genReturnLocal = lvaGrabTemp(true DEBUGARG("Single return block return value"));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I don't rely on lvIsTemp
== "single def" to be true in general (otherwise I would do a more comprehensive type capture over in impAssignTempGen
). But in this specific context the temps are single def when lvIsTemp
is true.
Seems like we ought to fix up those places where lvIsTemp
does not imply a single def with an eye towards ultimately making this reliable, at least through the importer / inliner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we use the lvIsSingleDef property when we need that behavior to be true.
We may want to deprecate the lvIsTemp or define is mean more carefully. It origin was as a
hint to the old register allocator that we had a short-lifetime temp and that we should try a bit harder to both track it and place it in a register. It didn't mean that there was a single definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That case that I pointed out may be of interest to your devirtualization work.
If we have many return blocks (5 or more) we create a new LclVar to hold the return value
and each return block gets converted into an assignment of the return value into this new LclVar and a branch to the "single" return block. Thus we end up with a new LclVar that have 5 or more defs and a single use. It has a "short-lifetime" as the JIT can typically find a register usually RAX to hold the value assigned and immediately returned. If you were to treat this as having a single def you might infer that the return type is known when in fact there could be different return types.
This would probably only impact the inlining case and we probably don't try to inline a method with so many return blocks, but I'm not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For devirt we can't use lvIsSingleDef
as currently constituted, for a couple of reasons:
- It is computed from a global tree walk, and so is not available when importing code when we are trying to devirtualize
- It currently does not consider some constructs as "single def" that can be treated that way during type propagation -- for example, question ops
If you're suggesting that we eliminate/deprecate/redefine lvIsTemp
and start setting lvIsSingleDef
during the importer (and update subsequent code to verify that if it was set during importation it was set correctly) then perhaps we can use it, if we can reconcile the question ops issue. But that is future work.
I'll update the code here to capture the type for newly introduced temps instead of looking at lvIsTemp
; this better captures the intent anyways.
As for return temps -- I have not gone and comprehensively tried to capture types for all temps, in part because some (like this one) may have multiple definitions, and we don't have a general way of computing bounds on types when we have multiple definitions (specifically for cases where we have mixtures of exact and shared types).
We use the return temp more often these days, since it's also used to avoid interference from the post-inline unpin/gc ref nulling, and so quite often it may be a single def. So we could get that case and we can always use the type from the method signature as the initial approximation. I'll leave those as future enhancements.
Updates look good. |
LGTM with comments |
OSX now hitting a java remoting error (log)
Will retry. @dotnet-bot retest OSX10.12 x64 Checked Build and Test |
Looks like OSX is just generally in a bad way lately, the last 6 or so attempts by any PR have all failed. @mmitche can you take a look? |
@AndyAyersMS Took a look. A particular machine was acting up, though I haven't seen the particular error in a while. I rebooted it. |
@dotnet-bot test OSX10.12 x64 Checked Build and Test |
lgtm |
Track whether a local has a single definition, and if so, if it has
a reference type, try and update its type from the declared type to
a better type taken from the value being assigned to the local.
Obtain types for some of the 'short-lived' ref type temps that should
have a single definition. Use both the tree and the eval stack as sources
of type information (the latter can be phased out if/when all tree nodes
can return rich type information).
Refactor the code that sets or updates lvClassHnd into utilities
to provide better auditing of type flow and make the set/update process
a bit more rigorous.
Cleanup the code that passes argument values a bit by commoning redundant
argument lookup expressions.