Skip to content

JIT: Transform 'CONST - x' to 'xor x, CONST' when possible#126529

Open
BoyBaykiller wants to merge 6 commits intodotnet:mainfrom
BoyBaykiller:transform-const-x-to-xor
Open

JIT: Transform 'CONST - x' to 'xor x, CONST' when possible#126529
BoyBaykiller wants to merge 6 commits intodotnet:mainfrom
BoyBaykiller:transform-const-x-to-xor

Conversation

@BoyBaykiller
Copy link
Copy Markdown
Contributor

@BoyBaykiller BoyBaykiller commented Apr 4, 2026

Given any CONST - x and range info for x obtained from IntegralRange::ForNode see if we can transform it to x ^ CONST. These changes motivate improvements in IntegralRange::ForNode and RangeOps (#126553). It also made me find a missing GT_NOT in asserprop to enable VN based folding.

Found by looking at diffs from #126442

Example:

int Byte255(byte x)
{
    return 255 - x;
}
;; ------ BASE
G_M64250_IG02:  ;; offset=0x0000
       movzx    rax, dl
       neg      eax
       add      eax, 255
						;; size=10 bbWeight=1 PerfScore 0.75
G_M64250_IG03:  ;; offset=0x000A
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00

;; ------ DIFF
G_M64250_IG02:  ;; offset=0x0000
       movzx    rax, dl
       xor      eax, 255
						;; size=8 bbWeight=1 PerfScore 0.50
G_M64250_IG03:  ;; offset=0x0008
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
int IntMin1(int x)
{
    return -1 - x;
}
;; ------ BASE
G_M50835_IG02:  ;; offset=0x0000
       mov      eax, edx
       neg      eax
       dec      eax
						;; size=6 bbWeight=1 PerfScore 0.75
G_M50835_IG03:  ;; offset=0x0006
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
;; ------ DIFF
G_M50835_IG02:  ;; offset=0x0000
       mov      eax, edx
       not      eax
						;; size=4 bbWeight=1 PerfScore 0.50
G_M50835_IG03:  ;; offset=0x0004
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 4, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 4, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BoyBaykiller BoyBaykiller changed the title JIT: Transform 'CONST - x' -> 'xor x, CONST' when possible JIT: Transform 'CONST - x' to 'xor x, CONST' when possible Apr 4, 2026
@BoyBaykiller BoyBaykiller marked this pull request as ready for review April 4, 2026 10:27
@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

There is one type of regression where GetRangeFromAssertions doesn't handle XOR and we fail to remove useless JTRUE. Will fix that in a different PR.

@saucecontrol
Copy link
Copy Markdown
Member

saucecontrol commented Apr 5, 2026

As a generalization, the rule is that you can implement subtraction as xor any time you can be sure there won't be a borrow, because binary subtraction is defined as an xor followed by borrow.

This is always true when the left operand is a bit pattern of all ones and the right operand is less than or equal to the left. So naturally, it works for subtraction of any integral type if left has all bits set.

It also happens to work when left has all but the MSB set, because in that case, the only borrow could come from beyond the MSB. This is why the optimization works for int.MaxValue even if subtraction is of type uint (meaning right could be larger).

This also means, for example, that with a byte subtraction, 127 - x can always be implemented as xor, provided you then zero-extend the byte, because any borrow will propagate to all the high bits of the register.

The reason I mention this is that if you have a range assertion that can prove right is smaller for any reason, it can apply there too. We have examples where this optimization is done manually in the libraries code:

// LZCNT returns index starting from MSB, whereas BSR gives the index from LSB.
// 31 ^ BSR here is equivalent to 31 - BSR since the BSR result is always between 0 and 31.
// This saves an instruction, as subtraction from constant requires either MOV/SUB or NEG/ADD.
return 31 ^ (int)X86Base.BitScanReverse(value);

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

BoyBaykiller commented Apr 5, 2026

Thanks for the explanation! I was just about to look for the more general case to use in #126553.
Clang seems to catch it: https://godbolt.org/z/e8PjefYKx

Unfortunately I am not sure there is a standard way to get general range information (or LE assertion) in morph. And it's only run if fgGlobalMorph...

As for handling smaller types: I skipped that because I still need to see what's the proper & easy way to see through this IR that we create for small types and get the actual type as defined in the C# source:

CAST      int <- ubyte <- int
*  LCL_VAR   int    V01 arg1          (last use)

@saucecontrol
Copy link
Copy Markdown
Member

saucecontrol commented Apr 6, 2026

what's the proper & easy way to see through this IR that we create for small types

In IL, you'll always have a cast back to the small type after an arithmetic operation, because IL doesn't define those operations for anything smaller than int32. Within JIT, small types may be flagged as normalize on load or normalize on store, so JIT will force zero/sign extension as appropriate there.

Unfortunately I am not sure there is a standard way to get general range information (or LE assertion) in morph.

I'm not terribly familiar with that part of the code, but you can fetch assertions from morph. Here's an example that might lead you to the right parts.

// Small-typed arguments and aliased locals are normalized on load. Other small-typed locals are
// normalized on store. If this is one of the former, insert a narrowing cast on the load.
// ie. Convert: var-short --> cast-short(var-int)
//
if (fgGlobalMorph && lclNode->OperIs(GT_LCL_VAR) && varDsc->lvNormalizeOnLoad() &&
/* TODO-ASG: delete this zero-diff quirk */ lclNode->CanCSE())
{
var_types lclVarType = varDsc->TypeGet();
// Assertion prop can tell us to omit adding a cast here. This is useful when the local is a small-typed
// parameter that is passed in a register: in that case, the ABI specifies that the upper bits might be
// invalid, but the assertion guarantees us that we have normalized when we wrote it.
if (optLocalAssertionProp &&
optAssertionIsSubrange(lclNode, IntegralRange::ForType(lclVarType), apLocal) != NO_ASSERTION_INDEX)

And you can see some example assertions that could be interesting for this optimization if only they were more specific:

case NI_AVX2_LeadingZeroCount:
case NI_AVX2_TrailingZeroCount:
case NI_AVX2_X64_LeadingZeroCount:
case NI_AVX2_X64_TrailingZeroCount:
case NI_X86Base_PopCount:
case NI_X86Base_X64_PopCount:
// Note: No advantage in using a precise range for IntegralRange.
// Example: IntCns = 42 gives [0..127] with a non -precise range, [42,42] with a precise range.
return {SymbolicIntegerValue::Zero, SymbolicIntegerValue::ByteMax};

I'm not following exactly why the comment says there's no advantage to a more specific range, though.

* use 'IntegralRange::ForNode' for somewhat better range info (could be improved still)
* move to fgOptimizeAddition
@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

I've generalized the transformation to catch all cases, given CNS - x and range info for x.
I've switched to IntegralRange::ForNode to attain this range info, but as you said its somewhat coarse. Apparently there was a TODO once to make it more precise: #106982. I think this should be revisited.

In theory the case you showed would be naturally handled now if range for BitScanReverse was proper.

Also in order to handle this byte subtract case you mentioned:

byte Byte127(byte x)
{
    return (byte)(127 - x);
}

I need to detect the outer byte cast so I know the range is in [0, 255] which I don't know how to properly do in fgOptimizeAddition. However 255 - x works.

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

@EgorBo PTAL

If this gets merged we can justify to add general RangeOps:Xor to rangecheck #126553 as this fixes some regressions from this PR.

I think it would also be nice to revisit making IntegralRange able to handle arbitrary ranges instead of just "symbolic constants". With a precise range it would automatically use XOR here and the code could be simplified to simple subtract:

// LZCNT returns index starting from MSB, whereas BSR gives the index from LSB.
// 31 ^ BSR here is equivalent to 31 - BSR since the BSR result is always between 0 and 31.
// This saves an instruction, as subtraction from constant requires either MOV/SUB or NEG/ADD.
return 31 ^ (int)X86Base.BitScanReverse(value);

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves CoreCLR JIT morph optimizations by recognizing when CONST - x (canonicalized as ADD(NEG(x), CONST)) can be safely rewritten as XOR(x, CONST) using IntegralRange::ForNode range information, and it enables additional VN-based folding in assertion propagation.

Changes:

  • Add an ADD(NEG(x), CONST) => XOR(x, CONST) transform in fgOptimizeAddition guarded by an IntegralRange-based validity check.
  • Update optVNBasedFoldCurStmt to include GT_NOT among the node kinds eligible for VN-based folding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/jit/morph.cpp Adds a range-checked transform to rewrite certain constant subtractions into XOR for better codegen.
src/coreclr/jit/assertionprop.cpp Enables VN-based folding for GT_NOT nodes by adding it to the foldable opcode set.

Comment on lines +10438 to +10439
uint64_t mask = UINT64_MAX >> BitOperations::LeadingZeroCount(lo ^ hi);
mask = lo | mask;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask computation can invoke undefined behavior when lo == hi (i.e., lo ^ hi == 0): BitOperations::LeadingZeroCount(0) returns 64, so UINT64_MAX >> 64 is an invalid shift in C++. This can miscompile the JIT itself and may incorrectly allow/deny the transform for singleton ranges. Handle the lo == hi case explicitly (e.g., treat the range mask as just lo, or otherwise ensure the shift count is < 64).

Suggested change
uint64_t mask = UINT64_MAX >> BitOperations::LeadingZeroCount(lo ^ hi);
mask = lo | mask;
uint64_t diff = lo ^ hi;
uint64_t mask = lo;
if (diff != 0)
{
mask |= UINT64_MAX >> BitOperations::LeadingZeroCount(diff);
}

Copilot uses AI. Check for mistakes.
@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

@EgorBo


DEBUG_DESTROY_NODE(op1);
auto isSubToXorValid = [=](uint64_t cns, IntegralRange range) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a single-use lambda here?

Copy link
Copy Markdown
Contributor Author

@BoyBaykiller BoyBaykiller Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a personal style choice. We will use the same lambda in RangeOps::Xor. This make the concepts clear. And allows us to easily factor it out into a shared function, potentially.

{
add->ChangeOper(GT_XOR);
add->gtOp1 = op1->gtGetOp1();
return fgMorphTree(add);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the path below just returns the changed node while this calls morph on it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may produce XOR(x, -1) which can be further transformed into NOT(x)

uint64_t cns = (uint64_t)op2->AsIntConCommon()->IntegralValue();
if (isSubToXorValid(cns, range))
{
add->ChangeOper(GT_XOR);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you need to pass PRESERVE_VN here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ok. I saw enough cases in morph in other similar transformations where there was no PRESERVE_VN, so I wasnt sure. Also, should I be using ChangeOper or SetOper? @EgorBo

@@ -6118,6 +6037,7 @@ Compiler::fgWalkResult Compiler::optVNBasedFoldCurStmt(BasicBlock* block,
case GT_LSH:
case GT_RSH:
case GT_RSZ:
case GT_NOT:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this related to this PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to check the diffs to find out what exactly this transformation helps with and this change adds additional diffs from my understanding

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR can produce NOT (from XOR(x, -1)). And this exposed the fact that GT_NOT was being ignored by VN based (constant) folding. In 29009 libraries_tests_no_tiered_compilation diffs is an example case:
image

And adding this one line fixes the regression. Although if you really want I can do it in separate PR.

add->gtOp2 = op1->AsOp()->gtGetOp1();
if (op2->IsIntegralConst())
{
// ADD(NEG(x), CONST) => XOR(x, CONST)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what makes it legal if ADD (or NEG?) are in checked context again?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix


return add;
// This mask is a OR of all numbers in [lo, hi]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding that all we need to check is:

(KnownBits(x) & ~CONST) == 0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would KnownBits(x) be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants