Get rid of unoptimal moves in a popular ML pattern. #13541
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
optimization
Milestone
ML.Net code has several places where we do
a = a * const_int;
, for example,MurmurHash
has 6imul
instructions in the final asm for x64 https://github.com/dotnet/machinelearning/blob/b861b5d64841cbe0f2c866ee7586872aac450a51/src/Microsoft.ML.Core/Utilities/Hashing.cs#L118and in some cases, we do them with one extra mov:
instead of
the non-optimal codegen happens when we inline
MurmurRound
and create a temp LCL_VAR forhash
argument:hash = MurmurRound(hash, (uint)len);
IR looks like:
and we want to get rid of
STMT00030
.I have thought about 3 possible places where it could be done:
Compiler::fgInlinePrependStatements
, https://github.com/dotnet/coreclr/blob/master/src/jit/flowgraph.cpp#L23243.I have tried all of them and did not get a good result,
3:
fgInlinePrependStatements
already can replace an argument that was single used with the original tree, I was able to teach it to replace an argument that was originally a lcl_var with this lcl_var loads (instead of creating a new one), but only if the argument was not modified inside the inline method (not our case).That means it supports cases like
we can support defs if we now that
inline myMethod(lclVar0);
is the last use of lclVar0 (our case, because we havehash = call(hash)
), but it happens before we generate live information, so we don't know thatcall(lclVar0)
is the last use oflclVar0
.2:
ContainCheckMul
set contained onIsContainableMemoryOp
, so it doesn't support moves from one register to another, forcing it to set contained on[000077]
gave me many asserts.1:
Compiler::optCopyProp
looks like the best candidate to handle this extra move, but currently, it doesn't work because:1.1
[000361] D------N---- +--* LCL_VAR long V04 loc1 d:3
doesn't have a VN pair, because it is a phi statement that is processed here:https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/valuenum.cpp#L5885-L5890
and there we don't set
VNPair
for the tree, socopyProp
ends on:https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L203-L207
if we change that and assign a VNPair for
[000361]
then we would consider it as a candidate for[000081] D------N---- +--* LCL_VAR int V08 tmp1 d:3 $286
replacement, but it would be declined, because000361
is long and000081
is int, socopyProp
ends on:https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L208-L211
If we fix that we will still have different VN values so the copy propagation won't happen:
https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L212-L215
but if somehow we skip these checks (manually in a debugger for example) and do the propagation, then we have asm that we want without any asserts in later stages.
Note: the moves are cheap but there are many of them so I expect it to give us at least measurable code size improvement.
category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
impact:medium
The text was updated successfully, but these errors were encountered: