Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avx2.AlignRight does not generate VPPALIGNR intrinsic #61877

Closed
zvrba opened this issue Nov 20, 2021 · 7 comments
Closed

Avx2.AlignRight does not generate VPPALIGNR intrinsic #61877

zvrba opened this issue Nov 20, 2021 · 7 comments

Comments

@zvrba
Copy link

zvrba commented Nov 20, 2021

Description

I have a piece of code that looks like this:

    using V = Vector256<int>;

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        unsafe V Load8(int* v, int c) {
            var m = Avx2.AlignRight(AlternatingMaskLo128, Complement, (byte)((8 - c) << 2));
            return Avx2.BlendVariable(Max, Avx2.MaskLoad(v, m), m);
        }

which is called from various places in the code. When looking at the disassembly, I find the following code:

00007FF8FCD0A78F  vextractf128 xmm7,ymm6,1  
00007FF8FCD0A795  call        Method stub for: System.Runtime.Intrinsics.X86.Avx2.AlignRight(System.Runtime.Intrinsics.Vector256`1<Int32>, System.Runtime.Intrinsics.Vector256`1<Int32>, Byte) (07FF8FCCFAF58h)  
00007FF8FCD0A79A  vmovupd     ymm0,ymmword ptr [rsi+0D8h]  
00007FF8FCD0A7A2  vmovupd     ymm1,ymmword ptr [rsp+130h]  
00007FF8FCD0A7AB  mov         rax,qword ptr [rsp+168h]  
00007FF8FCD0A7B3  vpmaskmovd  ymm1,ymm1,ymmword ptr [rax]  
00007FF8FCD0A7B8  vmovupd     ymm2,ymmword ptr [rsp+130h]  
00007FF8FCD0A7C1  vpblendvb   ymm8,ymm0,ymm1,ymm2  

i.e., everything gets nicely inlined and optimized, except PALIGNR instruction not being generated. I cannot even step into the CALL in visual studio to see what it does. The code had executed for a while before I set a breakpoint to disassemble it.

Configuration

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19042.1348 (20H2/October2020Update)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.303
[Host] : .NET Core 3.1.21 (CoreCLR 4.700.21.51404, CoreFX 4.700.21.51508), X64 RyuJIT
DefaultJob : .NET Core 3.1.21 (CoreCLR 4.700.21.51404, CoreFX 4.700.21.51508), X64 RyuJIT

@zvrba zvrba added the tenet-performance Performance related issue label Nov 20, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 20, 2021
@zvrba zvrba changed the title Avx2.AlignRight does not generate PALIGNR intrinsic Avx2.AlignRight does not generate VPPALIGNR intrinsic Nov 20, 2021
@ghost
Copy link

ghost commented Nov 20, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I have a piece of code that looks like this:

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        unsafe V Load8(int* v, int c) {
            var m = Avx2.AlignRight(AlternatingMaskLo128, Complement, (byte)((8 - c) << 2));
            return Avx2.BlendVariable(Max, Avx2.MaskLoad(v, m), m);
        }

which is called from various places in the code. When looking at the disassembly, I find the following code:

00007FF8FCD0A78F  vextractf128 xmm7,ymm6,1  
00007FF8FCD0A795  call        Method stub for: System.Runtime.Intrinsics.X86.Avx2.AlignRight(System.Runtime.Intrinsics.Vector256`1<Int32>, System.Runtime.Intrinsics.Vector256`1<Int32>, Byte) (07FF8FCCFAF58h)  
00007FF8FCD0A79A  vmovupd     ymm0,ymmword ptr [rsi+0D8h]  
00007FF8FCD0A7A2  vmovupd     ymm1,ymmword ptr [rsp+130h]  
00007FF8FCD0A7AB  mov         rax,qword ptr [rsp+168h]  
00007FF8FCD0A7B3  vpmaskmovd  ymm1,ymm1,ymmword ptr [rax]  
00007FF8FCD0A7B8  vmovupd     ymm2,ymmword ptr [rsp+130h]  
00007FF8FCD0A7C1  vpblendvb   ymm8,ymm0,ymm1,ymm2  

i.e., everything gets nicely inlined and optimized, except PALIGNR instruction not being generated. I cannot even step into the CALL in visual studio to see what it does. The code had executed for a while before I set a breakpoint to disassemble it.

Configuration

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19042.1348 (20H2/October2020Update)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.303
[Host] : .NET Core 3.1.21 (CoreCLR 4.700.21.51404, CoreFX 4.700.21.51508), X64 RyuJIT
DefaultJob : .NET Core 3.1.21 (CoreCLR 4.700.21.51404, CoreFX 4.700.21.51508), X64 RyuJIT

Author: zvrba
Assignees: -
Labels:

area-System.Runtime.Intrinsics, tenet-performance, untriaged

Milestone: -

@saucecontrol
Copy link
Member

vpalignr requires the byte count to be an immediate (constant) value. Since you are using a variable count, JIT must generate a stub to emit the code with the value as a constant.

@zvrba
Copy link
Author

zvrba commented Nov 21, 2021

Oh, good catch, thanks. I overlooked that. And C# doesn't have an expressive enough type system to require that this parameter must be a compile-time constant.

@zvrba zvrba closed this as completed Nov 21, 2021
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Nov 21, 2021
@jkotas
Copy link
Member

jkotas commented Nov 21, 2021

We plan to add analyzer that warns about these situations. See #33771 .

@zvrba
Copy link
Author

zvrba commented Nov 21, 2021

@jkotas One last question: why won't visual studio step into stub methods? If I could have seen the disassembled stub (and the generated jump table), I'd figure this out on my own. I even tried to paste the address of the CALL target into the disassembly window, but I got no meaningful result.

@jkotas
Copy link
Member

jkotas commented Nov 21, 2021

why won't visual studio step into stub methods?

I do not know. I have opened #61890 . It is not clear to me whether it is a problem in the runtime or in Visual Studio.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants