Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of unoptimal moves in a popular ML pattern. #13541

Open
sandreenko opened this issue Oct 7, 2019 · 5 comments
Open

Get rid of unoptimal moves in a popular ML pattern. #13541

sandreenko opened this issue Oct 7, 2019 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@sandreenko
Copy link
Contributor

sandreenko commented Oct 7, 2019

ML.Net code has several places where we do a = a * const_int;, for example, MurmurHash has 6 imul instructions in the final asm for x64 https://github.com/dotnet/machinelearning/blob/b861b5d64841cbe0f2c866ee7586872aac450a51/src/Microsoft.ML.Core/Utilities/Hashing.cs#L118
and in some cases, we do them with one extra mov:

IN0054: 000127 mov      eax, r15d
IN0055: 00012A imul     eax, eax, 0xFFFFFFFFCC9E2D51 (2 instructions, 9 bytes)

instead of

IN0055: 00012A imul     eax, r15d, 0xFFFFFFFFCC9E2D51 (1 instruction, 7 bytes)

the non-optimal codegen happens when we inline MurmurRound and create a temp LCL_VAR for hash argument: hash = MurmurRound(hash, (uint)len);

IR looks like:

***** BB02
STMT00068 (IL   ???...  ???)
[000363] -A------R---              *  ASG       long  
[000361] D------N----              +--*  LCL_VAR   long   V04 loc1         d:3
[000362] ------------              \--*  PHI       long  
[000385] ------------ pred BB04       +--*  PHI_ARG   long   V04 loc1         u:5
[000364] ------------ pred BB08       \--*  PHI_ARG   long   V04 loc1         u:2 $100

***** BB03
STMT00030 (IL 0x010...  ???)
[000125] -A------R---              *  ASG       int    $285
[000124] D------N----              +--*  LCL_VAR   int    V08 tmp1         d:2 $285
[000030] C-----------              \--*  LCL_VAR   int    V04 loc1         u:3 $285

***** BB03
STMT00020 (IL 0x010...  ???)
[000082] -A------R---              *  ASG       int    $286
[000081] D------N----              +--*  LCL_VAR   int    V08 tmp1         d:3 $286
[000079] ------------              \--*  MUL       int    $286
[000077] ------------                 +--*  LCL_VAR   int    V08 tmp1         u:2 (last use) $285
[000078] ------------                 \--*  CNS_INT   int    0xffffffffcc9e2d51 $49

and we want to get rid of STMT00030.

I have thought about 3 possible places where it could be done:

  1. copy propagation https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L131;
  2. lowering set contained for mul https://github.com/dotnet/coreclr/blob/0a00ee7fdfd113c8c2d47c85ed210de78cab4bdd/src/jit/lowerxarch.cpp#L1644;
  3. where we create a lclVar for the argument, Compiler::fgInlinePrependStatements, https://github.com/dotnet/coreclr/blob/master/src/jit/flowgraph.cpp#L23243.

I have tried all of them and did not get a good result,

3: fgInlinePrependStatements already can replace an argument that was single used with the original tree, I was able to teach it to replace an argument that was originally a lcl_var with this lcl_var loads (instead of creating a new one), but only if the argument was not modified inside the inline method (not our case).
That means it supports cases like

inline myMethod(lclVar0);
where myMethod(arg)
{
  multiple uses of arg, but no defs.
}

we can support defs if we now that inline myMethod(lclVar0); is the last use of lclVar0 (our case, because we have hash = call(hash)), but it happens before we generate live information, so we don't know that call(lclVar0) is the last use of lclVar0.

2: ContainCheckMul set contained on IsContainableMemoryOp, so it doesn't support moves from one register to another, forcing it to set contained on [000077] gave me many asserts.

1: Compiler::optCopyProp looks like the best candidate to handle this extra move, but currently, it doesn't work because:
1.1 [000361] D------N---- +--* LCL_VAR long V04 loc1 d:3 doesn't have a VN pair, because it is a phi statement that is processed here:
https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/valuenum.cpp#L5885-L5890
and there we don't set VNPair for the tree, so copyProp ends on:
https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L203-L207

if we change that and assign a VNPair for [000361] then we would consider it as a candidate for [000081] D------N---- +--* LCL_VAR int V08 tmp1 d:3 $286 replacement, but it would be declined, because 000361 is long and 000081 is int, so copyProp ends on:
https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L208-L211
If we fix that we will still have different VN values so the copy propagation won't happen:
https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L212-L215

but if somehow we skip these checks (manually in a debugger for example) and do the propagation, then we have asm that we want without any asserts in later stages.

Note: the moves are cheap but there are many of them so I expect it to give us at least measurable code size improvement.

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
impact:medium

@sandreenko
Copy link
Contributor Author

I would like to continue working on this case, could you give me some advice on how to move forward, @AndyAyersMS @CarolEidt @dotnet/jit-contrib?

@mikedn
Copy link
Contributor

mikedn commented Oct 7, 2019

Compiler::optCopyProp looks like the best candidate to handle this extra move, but currently, it doesn't work because:

It does, though this might be a special case that I haven't noticed before - the source of the copy is a PHI.

but it would be declined, because 000361 is long and 000081 is int, so copyProp ends on:

Ah, that's weird. I suppose the variable type is actually long and the JIT retypes the LCL_VAR node to avoid inserting a cast. A bad idea IMO. Granted, adding back the cast won't actually fix the problem since this is no longer a copy. Though then the cast could perhaps be made contained before codegen (since such a narrowing cast is basically a no-op).

@mikedn
Copy link
Contributor

mikedn commented Oct 7, 2019

Though I suppose you may as well modify if (op->TypeGet() != tree->TypeGet()) to recognize this particular pattern of long->int reinterpretation. When in Rome...

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@AndyAyersMS
Copy link
Member

@sandreenko I take it you are no longer actively working on this? Seems like we should move it to future.

@sandreenko
Copy link
Contributor Author

I agree with that.

@AndyAyersMS AndyAyersMS modified the milestones: 5.0, Future May 1, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@sandreenko sandreenko removed the JitUntriaged CLR JIT issues needing additional triage label Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

5 participants