Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Vector<>.op_Division with constant divisor #43759

Closed
EgorBo opened this issue Oct 23, 2020 · 1 comment
Closed

Optimize Vector<>.op_Division with constant divisor #43759

EgorBo opened this issue Oct 23, 2020 · 1 comment
Labels

Comments

@EgorBo
Copy link
Member

EgorBo commented Oct 23, 2020

From twitter: https://twitter.com/nietras1/status/1319546076756643842

static Vector<int> DivTest(Vector<int> vec) => vec / new Vector<int>(2);

Current codegen:

G_M28418_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 144
       vzeroupper 
       mov      rsi, rcx
G_M28418_IG02:
       mov      ecx, 2
       vmovd    xmm0, ecx
       vpbroadcastd ymm0, ymm0
       vmovupd  ymm1, ymmword ptr[rdx]
       vmovupd  ymmword ptr[rsp+50H], ymm1
       vmovupd  ymmword ptr[rsp+30H], ymm0
       vxorps   ymm0, ymm0
       vmovupd  ymmword ptr[rsp+70H], ymm0
       xor      rdi, rdi
G_M28418_IG03:
       test     rdi, rdi
       jl       SHORT G_M28418_IG05
G_M28418_IG04:
       cmp      rdi, 8
       setl     cl
       movzx    rcx, cl
       jmp      SHORT G_M28418_IG06
G_M28418_IG05:
       xor      ecx, ecx
G_M28418_IG06:
       mov      rdx, 0xD1FFAB1E
       mov      rbx, gword ptr [rdx]
       mov      rdx, rbx
       mov      gword ptr [rsp+28H], rdx
       mov      rax, rdx
       test     cl, cl
       jne      SHORT G_M28418_IG08
G_M28418_IG07:
       mov      rcx, rax
       mov      rdx, rax
       call     System.Diagnostics.Debug:Fail(System.String,System.String)  ;; Checked config.
G_M28418_IG08:
       lea      rcx, bword ptr [rsp+50H]
       mov      ebx, dword ptr [rcx+4*rdi]
       test     rdi, rdi
       jl       SHORT G_M28418_IG10
G_M28418_IG09:
       cmp      rdi, 8
       setl     cl
       movzx    rcx, cl
       jmp      SHORT G_M28418_IG11
G_M28418_IG10:
       xor      ecx, ecx
G_M28418_IG11:
       mov      rdx, gword ptr [rsp+28H]
       mov      rax, rdx
       test     cl, cl
       jne      SHORT G_M28418_IG13
G_M28418_IG12:
       mov      rcx, rax
       mov      rdx, rax
       call     System.Diagnostics.Debug:Fail(System.String,System.String)
G_M28418_IG13:
       lea      rcx, bword ptr [rsp+30H]
       mov      edx, dword ptr [rcx+4*rdi]
       mov      ecx, ebx
       call     System.Numerics.Vector`1[Int32][System.Int32]:ScalarDivide(int,int):int ;; not inlined?
       mov      ebx, eax
       test     rdi, rdi
       jl       SHORT G_M28418_IG15
G_M28418_IG14:
       cmp      rdi, 8
       setl     cl
       movzx    rcx, cl
       jmp      SHORT G_M28418_IG16
G_M28418_IG15:
       xor      ecx, ecx
G_M28418_IG16:
       mov      rdx, gword ptr [rsp+28H]
       test     cl, cl
       jne      SHORT G_M28418_IG18
G_M28418_IG17:
       mov      rcx, rdx
       call     System.Diagnostics.Debug:Fail(System.String,System.String)
G_M28418_IG18:
       lea      rax, bword ptr [rsp+70H]
       mov      dword ptr [rax+4*rdi], ebx
       inc      rdi
       cmp      rdi, 8
       jl       G_M28418_IG03
G_M28418_IG19:
       vmovupd  ymm0, ymmword ptr[rsp+70H]
       vmovupd  ymmword ptr[rsi], ymm0
       mov      rax, rsi
G_M28418_IG20:
       vzeroupper 
       add      rsp, 144
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
; Total bytes of code 267

Expected codegen:

DivTest: 
        vpsrld  ymm1, ymm0, 31
        vpaddd  ymm0, ymm0, ymm1
        vpsrad  ymm0, ymm0, 1
        ret

There are no instructions in x86 to perform a division for vectors of integers, but it makes sense to share the "Magic Division" logic we have for scalars for these Vectors since we expose operator / (Users assume it's accelerated just like other Vector APIs)
At least for power-of-two numbers.

Bonus points:

  1. Revise current implementation, it looks less efficient than https://godbolt.org/z/81r66h (Loops are not unrolled?)
  2. File an API proposal for operator / scalar (e.g. operator * has a scalar overload, division does not)

/cc @tannergooding

@EgorBo EgorBo added area-System.Numerics tenet-performance Performance related issue labels Oct 23, 2020
@ghost
Copy link

ghost commented Oct 23, 2020

Tagging subscribers to this area: @tannergooding, @pgovind, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 23, 2020
@EgorBo EgorBo closed this as completed Oct 23, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 6, 2020
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants