-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] Add BasicBlock::bbFlags helper methods #95139
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAndy mentioned in #94239 that it would be nice to have a method handle verbose flag checks like
|
cc @dotnet/jit-contrib. With these changes, we only have a few spots where we directly access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. Left a few notes.
If we've made it this far, would it be worth just making bbFlags private, and adding GetFlags() and SetFlags(...) to cover those last few cases?
With my suggested CopyFlags
function, how many more cases are there? I think it would make sense to make it private. I would name the functions GetFlagsRaw
/SetFlagsRaw
to make it even more clear this is an "escape hatch" and not preferred usage.
src/coreclr/jit/block.h
Outdated
@@ -869,8 +894,7 @@ struct BasicBlock : private LIR::Range | |||
{ | |||
if (this->bbWeight == BB_ZERO_WEIGHT) | |||
{ | |||
this->bbFlags &= ~BBF_RUN_RARELY; // Clear any RarelyRun flag | |||
this->bbFlags &= ~BBF_PROF_WEIGHT; // Clear any profile-derived flag | |||
this->RemoveFlag((BBF_RUN_RARELY | BBF_PROF_WEIGHT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
this->RemoveFlag((BBF_RUN_RARELY | BBF_PROF_WEIGHT)); | |
this->RemoveFlag(BBF_RUN_RARELY | BBF_PROF_WEIGHT); |
src/coreclr/jit/codegenarmarch.cpp
Outdated
@@ -5435,7 +5435,7 @@ void CodeGen::genFnEpilog(BasicBlock* block) | |||
} | |||
#endif // DEBUG | |||
|
|||
bool jmpEpilog = ((block->bbFlags & BBF_HAS_JMP) != 0); | |||
bool jmpEpilog = (block->HasFlag(BBF_HAS_JMP)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
bool jmpEpilog = (block->HasFlag(BBF_HAS_JMP)); | |
bool jmpEpilog = block->HasFlag(BBF_HAS_JMP); |
src/coreclr/jit/codegencommon.cpp
Outdated
|
||
// Use coldness of current block, as this label will | ||
// be contained in it. | ||
block->bbFlags |= (compiler->compCurBB->bbFlags & BBF_COLD); | ||
block->SetFlag(compiler->compCurBB->bbFlags & BBF_COLD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a CopyFlags
function and have:
block->SetFlag(compiler->compCurBB->bbFlags & BBF_COLD); | |
block->CopyFlags(compiler->compCurBB, BBF_COLD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. We do have a few cases where the second argument to CopyFlags
would be multiple flags OR'd together, e.g. in Compiler::fgSplitBlockBeforeTree
:
block->SetFlag(originalFlags & (BBF_SPLIT_GAINED | BBF_IMPORTED | BBF_GC_SAFE_POINT | BBF_LOOP_PREHEADER | BBF_RETLESS_CALL));
Would you be ok with this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
block->CopyFlags(fromBlock, BBF_IMPORTED | BBF_GC_SAFE_POINT | BBF_LOOP_PREHEADER | BBF_RETLESS_CALL);
would be fine.
The more complex cases where we (1) grab the flags, (2) do some computations, then (3) set the flags, might be where the GetFlagsRaw
would come in. So the example you reference might need to use that.
src/coreclr/jit/block.h
Outdated
return !CheckFlag(flag, BBF_EMPTY); | ||
} | ||
|
||
void SetFlag(const BasicBlockFlags flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be SetFlags/RemoveFlags?
src/coreclr/jit/block.h
Outdated
return CheckFlag(flag, flag); | ||
} | ||
|
||
bool HasFlag(const BasicBlockFlags flag) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this assert that flag
has only one bit set? And add an explicit HasAnyFlag
that allows for the rare cases where you're checking if any flag is set? (actually, we should probably just make callers use HasFlag(one) || HasFlag(two)
in that case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea. It doesn't come up often, but we do have a few HasFlag(BBF_FLAG1 | BBF_FLAG2)
calls, so maybe we can use the templated approach we use for BasicBlock::KindIs
? If not, we can just use more than one call to HasFlag
in the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to make the caller use HasFlag(one) || HasFlag(two)
, or HasFlag(one) && HasFlag(two)
.
Yes, we could use the template magic to have HasAnyFlags(one,two,...)
and HasAllFlags(one,two,...)
but does that happen very often? IMO, we should not have HasFlag(one,two,...)
because it's not clear if it's "any" or "all".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could use the template magic to have HasAnyFlags(one,two,...) and HasAllFlags(one,two,...) but does that happen very often?
I don't recall seeing it enough to justify introducing two more methods.
I think it's better to make the caller use HasFlag(one) || HasFlag(two), or HasFlag(one) && HasFlag(two).
Now that you mention the ||
vs &&
cases, I agree it's more readable to just have the caller make two calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, there are some cases where multiple calls to HasFlag
quickly becomes ugly, so I'll introduce HasAnyFlags
to cover these scenarios (I haven't found a need for HasAllFlags
anywhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in English, the preferred wording would be HasAnyFlag and HasAllFlags, IMO
src/coreclr/jit/fgopt.cpp
Outdated
} | ||
|
||
/* Update the flags for block with those found in bNext */ | ||
|
||
block->bbFlags |= (bNext->bbFlags & BBF_COMPACT_UPD); | ||
block->SetFlag(bNext->bbFlags & BBF_COMPACT_UPD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case for CopyFlag
src/coreclr/jit/flowgraph.cpp
Outdated
@@ -3026,7 +3026,7 @@ void Compiler::fgInsertFuncletPrologBlock(BasicBlock* block) | |||
assert(nullptr == fgGetPredForBlock(block, newHead)); | |||
fgAddRefPred(block, newHead); | |||
|
|||
assert((newHead->bbFlags & BBF_INTERNAL) == BBF_INTERNAL); | |||
assert(newHead->CheckFlag(BBF_INTERNAL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be HasFlag
. It's equivalent if the flag being checked is a single bit. That's probably the case for most/all of the uses of CheckFlag
with a single flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I was able to get rid of CheckFlag
altogether.
The cases that require
Overall, usage of the raw getters/setters is uncommon. |
I don't have a strong opinion on a change like this, but I will say that I don't think the original pattern is verbose, and bitwise manipulation is such a primitive operation that I don't think this makes things clearer, only a bit inconsistent with other flags throughout the JIT. If other people on the team prefer these wrapper methods, then I am ok with it. |
I agree this isn't a significant readability improvement, but this does fix the style inconsistencies with checking the flags (like use of implicit boolean, number of parentheses, etc). I noticed from the last run that this change has unexpectedly large TP diffs. I suspect the templated definition of Side note: I tried profiling the TP diff locally, but the recent changes to |
I don't see why |
So the codegen for the
In 6 places in the JIT, neither version is inlined (though there are no instances where one is inlined, and the other isn't). These calls are in larger methods like We could use |
I think this is a good case for the use of
Note that TP diffs purely count instructions; they are not a true measure of run-time performance difference. An instruction count is the best, extremely low noise and reproducible, proxy for run-time perf differences that we have. |
src/coreclr/jit/block.h
Outdated
// by checking if it is a power of 2 | ||
// (HasFlag expects to check only one flag at a time) | ||
assert(isPow2(flag)); | ||
return (bool)(bbFlags & flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this get us into odd unnormalized bools territory? Why not (bbFlags & flag) != 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Though it looks like the codegen is still worse when the boolean version of this method is inlined (diffs). I'll take a look at the inlined instances and share the codegen here.
This seemed to fix the remaining diffs on x86.
I see, thanks for pointing that out. We're still seeing TP regressions with the inlined boolean implementation of the method. I can confirm that the boolean implementation results in a larger If everyone's ok with the slightly less intuitive code, I'm going to revert to the |
0113d05
to
1ab9def
Compare
So the implementation with |
Ah sorry, I linked the wrong diffs. It's bigger than that. |
Also while looking at the disassembly of large methods like |
With the current variation that is running diffs, would you expect TP to universally improve, since there are lots of places using |
Seems like a good experiment to try (the What really matters is the profile-feedback-based Release build. Note that TP diffs, while they use Release builds, disable profile feedback being used by the compiler, to ensure "apples to apples" comparison that doesn't bias the baseline (which presumably has more accurate profile data). |
Good question. This previous run is with |
I checked the disassembly of That snippet is from |
I looked into the largest Windows TP regression in the "bool" returning "(bbFlags & flags) != 0" implementation, and it looks like a CQ limitation of the MSVC compiler: the clang/gcc compilers optimize this better when inlined. I opened https://developercommunity.visualstudio.com/t/Sub-optimal-code-for-inlined-bool-functi/10535537 to request the MSVC team look at improving this. I'm ok merging this with the |
Fix 2 bugs in importer with flags conversion to functions. Implement full `CopyFlags` to eliminate some GetFlagsRaw cases. For BasicBlockFlags operators, mark them as FORCEINLINE. They have always been expected to be fully inlined, but I found some cases in Release builds where they were not.
@amanasifkhalid I submitted a PR to update this PR, with fixes to a couple bugs I noticed while reviewing, as well as adding some additional FORCEINLINE cases (that unfortunately don't materially improve measured TP): |
Implement HasAllFlags and fix a couple bugs
@BruceForstall thanks for investigating this further, and for the fixes! Hopefully we'll see an MSVC fix for this soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if tests pass
No asm diffs (as expected). No TP diffs on linux. Small TP improvements to small regressions on Windows (-0.01% to 0.02%) |
Failures are known NativeAOT issues. |
Andy mentioned in #94239 that it would be nice to have a method handle verbose flag checks like
((block->bbFlags & BBF_SOME_FLAG) != 0)
, so this adds some helper methods for checking/modifying block flags.