Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX512: Fold some bitwise operations to vpternlogq #84534

Closed
EgorBo opened this issue Apr 9, 2023 · 5 comments
Closed

AVX512: Fold some bitwise operations to vpternlogq #84534

EgorBo opened this issue Apr 9, 2023 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Apr 9, 2023

E.g.

bool Test(string s) => s == "https://pkgs.dev.azure.com/dnc";

Currently emits:

; Method Prog:Test(System.String):bool:this
       C5F877               vzeroupper 
       4885D2               test     rdx, rdx
       7431                 je       SHORT G_M52811_IG05
       837A081E             cmp      dword ptr [rdx+08H], 30
       752B                 jne      SHORT G_M52811_IG05
       C5FC10420C           vmovups  ymm0, ymmword ptr[rdx+0CH]
       C5FDEF0535000000     vpxor    ymm0, ymm0, ymmword ptr[reloc @RWD00]
       C5FC104A28           vmovups  ymm1, ymmword ptr[rdx+28H]
       C5F5EF0D48000000     vpxor    ymm1, ymm1, ymmword ptr[reloc @RWD32]
       C5FDEBC1             vpor     ymm0, ymm0, ymm1
       C4E27D17C0           vptest   ymm0, ymm0
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
       EB02                 jmp      SHORT G_M52811_IG06
G_M52811_IG05:              
       33C0                 xor      eax, eax
G_M52811_IG06:              
       C5F877               vzeroupper 
       C3                   ret      
RWD00  	dq	0070007400740068h, 002F002F003A0073h, 00730067006B0070h, 007600650064002Eh
RWD32  	dq	0061002E00760065h, 006500720075007Ah, 006D006F0063002Eh, 0063006E0064002Fh
; Total bytes of code: 63

where for

       C5F5EF0D48000000     vpxor    ymm1, ymm1, ymmword ptr[reloc @RWD32]
       C5FDEBC1             vpor     ymm0, ymm0, ymm1

we could emit:

       C5F5EF0D48000000     vpternlogq      ymm0, ymm1, ymmword ptr [reloc @RWD32], 246

on AVX512 CPU. Same for other bitwise patterns where we can benefit from this.

Reference: https://godbolt.org/z/Tx53eKxf9

llvm-mca diff: https://www.diffchecker.com/UxW51oqr/

@EgorBo EgorBo added the tenet-performance Performance related issue label Apr 9, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 9, 2023
@EgorBo EgorBo added this to the Future milestone Apr 9, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 9, 2023
@ghost
Copy link

ghost commented Apr 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

E.g.

bool Test(string s) => s == "https://pkgs.dev.azure.com/dnc";

Currently emits:

; Method Prog:Test(System.String):bool:this
       C5F877               vzeroupper 
       4885D2               test     rdx, rdx
       7431                 je       SHORT G_M52811_IG05
       837A081E             cmp      dword ptr [rdx+08H], 30
       752B                 jne      SHORT G_M52811_IG05
       C5FC10420C           vmovups  ymm0, ymmword ptr[rdx+0CH]
       C5FDEF0535000000     vpxor    ymm0, ymm0, ymmword ptr[reloc @RWD00]
       C5FC104A28           vmovups  ymm1, ymmword ptr[rdx+28H]
       C5F5EF0D48000000     vpxor    ymm1, ymm1, ymmword ptr[reloc @RWD32]
       C5FDEBC1             vpor     ymm0, ymm0, ymm1
       C4E27D17C0           vptest   ymm0, ymm0
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
       EB02                 jmp      SHORT G_M52811_IG06
G_M52811_IG05:              ;; offset=0039H
       33C0                 xor      eax, eax
G_M52811_IG06:              ;; offset=003BH
       C5F877               vzeroupper 
       C3                   ret      
RWD00  	dq	0070007400740068h, 002F002F003A0073h, 00730067006B0070h, 007600650064002Eh
RWD32  	dq	0061002E00760065h, 006500720075007Ah, 006D006F0063002Eh, 0063006E0064002Fh
; Total bytes of code: 63

where for

       C5F5EF0D48000000     vpxor    ymm1, ymm1, ymmword ptr[reloc @RWD32]
       C5FDEBC1             vpor     ymm0, ymm0, ymm1

we could emit:

       C5F5EF0D48000000     vpternlogq      ymm0, ymm1, ymmword ptr [reloc @RWD32], 246

on AVX512 CPU.

Reference: https://godbolt.org/z/Tx53eKxf9

Author: EgorBo
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Apr 9, 2023
@BruceForstall BruceForstall added the avx512 Related to the AVX-512 architecture label Apr 10, 2023
@BruceForstall
Copy link
Member

@dotnet/avx512-contrib

@DeepakRajendrakumaran
Copy link
Contributor

The entire list of operations can be found in software development manual here - https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html (Section 5.1)

image

image

@Ruihan-Yin
Copy link
Contributor

Hi @EgorBo, @anthonycanino and I will be working on this issue, hope we will have a draft PR shortly.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 21, 2023

Closed by #91227

@EgorBo EgorBo closed this as completed Oct 21, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

4 participants