New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't optimize MultiplyNoFlags away #21928

Merged
merged 1 commit into from Jan 11, 2019

Conversation

Projects
None yet
5 participants
@fiigii
Copy link
Collaborator

fiigii commented Jan 10, 2019

Contribute to #21899

@fiigii

This comment has been minimized.

Copy link
Collaborator

fiigii commented Jan 10, 2019

@CarolEidt @AndyAyersMS PTAL

The code below was compiled incorrectly that MultiplyNoFlags was optimized away.

        static unsafe ulong test(ulong a, ulong b)
        {
            ulong r;
	    Bmi2.X64.MultiplyNoFlags(a, b, &r);
            return r;
        }

My current solution is to assign the intrinsic call to a local variable. Although it would have a redundant ASG IR for "normal usages" (i.e., res = Bmi2.X64.MultiplyNoFlags(a, b, &r)), the final codegen does not have redundant movs.

@fiigii fiigii force-pushed the fiigii:fixMulx branch from cc28c7d to f6f81ff Jan 10, 2019

@mikedn

This comment has been minimized.

Copy link
Contributor

mikedn commented Jan 10, 2019

Hmm, this intrinsic should be treated as a memory store. Is it? I guess this is another peculiar thing about it - the other memory stores don't return a value AFAIR.

@fiigii

This comment has been minimized.

Copy link
Collaborator

fiigii commented Jan 10, 2019

Right, the importer usually forces to append the expr into the IR tree if it returns VOID. But this intrinsic has a real return value and memory store semantic simultaneously.

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Jan 10, 2019

Do we know why this isn't this covered by it both being a memory store (https://github.com/dotnet/coreclr/blob/master/src/jit/gentree.cpp#L17920) and the intrinsics having been updated to be special handled in fgComputeLifeLIR and fgCheckRemoveStmt (see #16088 for more details)?

@fiigii

This comment has been minimized.

Copy link
Collaborator

fiigii commented Jan 10, 2019

@tannergooding this happens in the importer and doesn’t reach that point.

@mikedn

This comment has been minimized.

Copy link
Contributor

mikedn commented Jan 10, 2019

Check how GT_CMPXCHG is handled in the importer, maybe it helps. It has the same characteristics - writes to memory and returns a value.

@CarolEidt

This comment has been minimized.

Copy link
Member

CarolEidt commented Jan 10, 2019

I believe that it needs to be marked as having a side-effect. In the GT_CMPXCHG case, this is the GTF_ASG flag. The method GenTree::OperRequiresAsgFlag() is where the conditions for this are checked. I suspect that any memory store intrinsic should also be handled there.

@fiigii

This comment has been minimized.

Copy link
Collaborator

fiigii commented Jan 10, 2019

I believe that it needs to be marked as having a side-effect. In the GT_CMPXCHG case, this is the GTF_ASG flag.

Ah, thank you so much, will try.

@fiigii fiigii force-pushed the fiigii:fixMulx branch from f6f81ff to 0c9f78d Jan 10, 2019

@fiigii fiigii changed the title [WIP] Don't optimize MultiplyNoFlags away Don't optimize MultiplyNoFlags away Jan 10, 2019

@fiigii

This comment has been minimized.

Copy link
Collaborator

fiigii commented Jan 10, 2019

Added GTF_GLOB_REF and GT_ASG flags to solve the issue. Checked JitDump that does not have redundant IRs or codegen.

@CarolEidt @AndyAyersMS @mikedn PTAL

@fiigii fiigii force-pushed the fiigii:fixMulx branch from 0c9f78d to 34569bd Jan 10, 2019

@CarolEidt

This comment has been minimized.

Copy link
Member

CarolEidt commented Jan 11, 2019

I think that you should also change GenTree::OperRequiresAsgFlag() to return true for these intrinsics. @erozenfeld would know better than I, but I believe that, even though these intrinsics are setting it in the constructor, we may need this to also return true for checking of flags.

@erozenfeld

This comment has been minimized.

Copy link
Member

erozenfeld commented Jan 11, 2019

Yes, GenTree::OperRequiresAsgFlag() should also be changed. It's used both for flags validation and for recomputing the flags.

@fiigii

This comment has been minimized.

Copy link
Collaborator

fiigii commented Jan 11, 2019

@CarolEidt @erozenfeld OperRequiresAsgFlag has already had the code to check these intrinsics.

coreclr/src/jit/gentree.cpp

Lines 5063 to 5081 in 459b58a

bool GenTree::OperRequiresAsgFlag()
{
if (OperIs(GT_ASG) || OperIs(GT_XADD, GT_XCHG, GT_LOCKADD, GT_CMPXCHG, GT_MEMORYBARRIER))
{
return true;
}
#ifdef FEATURE_HW_INTRINSICS
if (gtOper == GT_HWIntrinsic)
{
GenTreeHWIntrinsic* hwIntrinsicNode = this->AsHWIntrinsic();
if (hwIntrinsicNode->OperIsMemoryStore())
{
// A MemoryStore operation is an assignment
return true;
}
}
#endif // FEATURE_HW_INTRINSICS
return false;
}

@CarolEidt

This comment has been minimized.

Copy link
Member

CarolEidt commented Jan 11, 2019

OperRequiresAsgFlag has already had the code to check these intrinsics.

Wow, right there as I was looking at it, and I missed it! Thanks.

@CarolEidt

This comment has been minimized.

Copy link
Member

CarolEidt commented Jan 11, 2019

test Ubuntu arm Cross Checked Innerloop Build and Test

@CarolEidt
Copy link
Member

CarolEidt left a comment

LGTM - thanks!

@CarolEidt CarolEidt merged commit b3881b4 into dotnet:master Jan 11, 2019

30 checks passed

CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
Linux-musl x64 Debug Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked crossgen_comparison Build and Test Build finished.
Details
Ubuntu arm Cross Checked no_tiered_compilation_innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Release crossgen_comparison Build and Test Build finished.
Details
Ubuntu x64 Checked CoreFX Tests Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
Ubuntu16.04 arm64 Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu16.04 arm64 Cross Checked no_tiered_compilation_innerloop Build and Test Build finished.
Details
Windows_NT arm Cross Debug Innerloop Build Build finished.
Details
Windows_NT arm64 Cross Debug Innerloop Build Build finished.
Details
Windows_NT x64 Checked CoreFX Tests Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release CoreFX Tests Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
license/cla All CLA requirements met.
Details

@fiigii fiigii deleted the fiigii:fixMulx branch Jan 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment