Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] X64 - More replacement sequences for integer multiplication by a constant #77137

Merged
merged 57 commits into from Nov 5, 2022

Conversation

TIHan
Copy link
Member

@TIHan TIHan commented Oct 17, 2022

Description

Resolves this issue: #75119 - we are only planning on doing 3-instruction replacement sequences or less, no more than that.

This PR does these two optimizations:

-       mov      edx, eax
-       shl      edx, 2
+       lea      edx, [4*rax]
-       mov      edx, eax
-       shl      edx, 3
+       lea      edx, [8*rax]

This PR also lifts the restriction that only allowed GT_LCL_VAR to be the first operand for "multiply by constant"; now it will create a tmp local if needs to - this ensures that we can take advantage of these optimizations.

I wanted to keep the "multiply by constant" optimizations in one place, and that place is lowering. Doing this required to disable any "multiply by constant" optimizations from happening in Tier-0, which I think is reasonable. This means Tier-0 will always emit imul, except for the cases that can emit lea.

We only do the "multiply by constant" -> lea instruction in codegen.

Notes:
There are other replacement sequences that were considered, but ultimately, those replacement sequences have higher latency totals than the single imul for modern CPUs. These cases are described and tested in the IntMultiply disasm tests.

Microbenchmark Results

    using BenchmarkDotNet.Attributes;
    using BenchmarkDotNet.Running;

    namespace Perf
    {
        public class Multiply
        {
            static ulong fieldValue = 1;

            [Benchmark]
            public void shift_2()
            {
                ulong value = fieldValue;
                for (ulong i = 1; i < 10000000; i++)
                {
                    value = i << 2;
                }
                fieldValue = value;
            }

            [Benchmark]
            public void shift_3()
            {
                ulong value = fieldValue;
                for (ulong i = 1; i < 10000000; i++)
                {
                    value = i << 3;
                }
                fieldValue = value;
            }
        }
        static class Program
        {
            static int Main(string[] args)
            {
                BenchmarkRunner.Run<Multiply>(null, args);
                return 0;
            }
        }
    }

Before:

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.674)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.0 (7.0.22.47203), X64 RyuJIT AVX2


|  Method |     Mean |     Error |    StdDev |
|-------- |---------:|----------:|----------:|
| shift_2 | 1.842 ms | 0.0032 ms | 0.0030 ms |
| shift_3 | 1.842 ms | 0.0025 ms | 0.0023 ms |

After:

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.674)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), X64 RyuJIT AVX2
  Job-GILPZC : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Toolchain=CoreRun

|  Method |     Mean |     Error |    StdDev |
|-------- |---------:|----------:|----------:|
| shift_2 | 1.828 ms | 0.0048 ms | 0.0045 ms |
| shift_3 | 1.830 ms | 0.0052 ms | 0.0049 ms |

Acceptance Criteria

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 17, 2022
@ghost ghost assigned TIHan Oct 17, 2022
@ghost
Copy link

ghost commented Oct 17, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description
TBD

Refactors some of the codegen multiply optimizations by moving them to lowering.

Acceptance Criteria

Author: TIHan
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Oct 21, 2022

Can you please run fuzz pipelines yourself once you're done with changes?

@TIHan
Copy link
Member Author

TIHan commented Oct 21, 2022

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@TIHan
Copy link
Member Author

TIHan commented Oct 21, 2022

Looks like Fuzzlyn x64 passed. I need to update one of the disasm checks as I forgot a minor thing.

@tannergooding
Copy link
Member

tannergooding commented Oct 23, 2022

What about:

-       mov      edx, eax
-       shl      edx, 1
+       lea      edx, [2*rax]

@TIHan
Copy link
Member Author

TIHan commented Oct 24, 2022

@tannergooding We already handle that case for x << 1 in codegen, it emits this:

lea rax, [rcx+rcx]

@TIHan
Copy link
Member Author

TIHan commented Oct 24, 2022

@dotnet/jit-contrib This is ready again - I fixed the tests and one of the earlier commits did pass fuzzlyn.

@build-analysis build-analysis bot mentioned this pull request Oct 25, 2022
2 tasks
@BruceForstall
Copy link
Member

Have you created a BDN microbenchmark to show the performance before/after your optimizations?

@TIHan
Copy link
Member Author

TIHan commented Oct 26, 2022

Have you created a BDN microbenchmark to show the performance before/after your optimizations?

Have not, but will do that now.

@TIHan
Copy link
Member Author

TIHan commented Oct 26, 2022

@BruceForstall I've provided microbenchmark results in the description of this PR.

@BruceForstall
Copy link
Member

Your SuperPMI jobs are failing with:

Traceback (most recent call last):

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1720, in create_one_artifact

    with open(item_path, 'r') as file_handle:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\h\\w\\B0FF09A8\\p\\artifacts\\spmi\\asm.coreclr_tests.run.windows.x64.checked\\base\\18886.dasm'



The above exception was the direct cause of the following exception:



Traceback (most recent call last):

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 4451, in <module>

    sys.exit(main(args))

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 4342, in main

    success = asm_diffs.replay_with_asm_diffs()

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1736, in replay_with_asm_diffs

    subproc_helper.run_to_completion(create_replay_artifacts, self, mch_file, asm_dotnet_vars_full_env, text_differences, base_asm_location, diff_asm_location, ".dasm")

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 601, in run_to_completion

    loop.run_until_complete(self.__run_to_completion__(async_callback, *extra_args))

  File "C:\python3\lib\asyncio\base_events.py", line 647, in run_until_complete

    return future.result()

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 584, in __run_to_completion__

    await asyncio.gather(*tasks)

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 556, in __get_item__

    await async_callback(print_prefix, item, *extra_args)

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1726, in create_replay_artifacts

    base_txt = await create_one_artifact(self.base_jit_path, base_location, flags + base_option_flags_for_diff_artifact)

  File "C:\h\w\B0FF09A8\p\superpmi.py", line 1723, in create_one_artifact

    raise create_exception() from err

Exception: Failure while creating JitStdOutFile.

Exit code: 0

This has been fixed. Maybe you just need to rebase/re-push to trigger updated CI testing?

@BruceForstall
Copy link
Member

@TIHan Note that the formatting job failed. If you don't already, try to get into the habit of running "jit-format -f" before pushing a PR change to GitHub.

Also, it looks like the unix-x64 superpmi-diffs job failed for unknown reasons.

@TIHan
Copy link
Member Author

TIHan commented Nov 4, 2022

@dotnet/jit-contrib @BruceForstall looks like CI is passing - is there anything else that I should do in the PR?

@BruceForstall
Copy link
Member

Diffs

@TIHan TIHan merged commit 2406653 into dotnet:main Nov 5, 2022
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants