Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][jit] Fuse SIMD extract and insert on arm64 #92714

Merged
merged 6 commits into from
Oct 4, 2023

Conversation

jandupej
Copy link
Member

Arm64 has the ins instruction which is capable of inserting from a specific lane into another lane. This PR takes advantage of this and fuses constant extract with constant insert (or elementwise constructor). Only float and double vectors are affected, as the appropriate mono operations for int types are missing. This change affects only mini, not LLVM.

This code:

private static Vector128<float> DUT1(Vector128<float> x)
{
    return Vector128.Create(
        Vector128.GetElement(x, 1),
        Vector128.GetElement(x, 2),
        Vector128.GetElement(x, 3),
        Vector128.GetElement(x, 0)); 
}

Would compile into:

0000000000000000        stp     x29, x30, [sp, #-0x30]!
0000000000000004        mov     x29, sp
0000000000000008        stp     x0, x1, [x29, #0x20]
000000000000000c        ldr     q0, [x29, #0x20]
0000000000000010        dup.4s  v4, v0[1]
0000000000000014        dup.4s  v3, v0[2]
0000000000000018        dup.4s  v2, v0[3]
000000000000001c        dup.4s  v1, v0[0]
0000000000000020        eor.16b v0, v0, v0
0000000000000024        mov.s   v0[0], v4[0]
0000000000000028        mov.s   v0[1], v3[0]
000000000000002c        mov.s   v0[2], v2[0]
0000000000000030        mov.s   v0[3], v1[0]
0000000000000034        str     q0, [x29, #0x10]
0000000000000038        ldp     x0, x1, [x29, #0x10]
000000000000003c        mov     sp, x29
0000000000000040        ldp     x29, x30, [sp], #0x30
0000000000000044        ret

Current codegen is:

0000000000000000        stp     x29, x30, [sp, #-0x30]!
0000000000000004        mov     x29, sp
0000000000000008        stp     x0, x1, [x29, #0x20]
000000000000000c        eor.16b v0, v0, v0
0000000000000010        ldr     q1, [x29, #0x20]
0000000000000014        mov.s   v0[0], v1[1]
0000000000000018        mov.s   v0[1], v1[2]
000000000000001c        mov.s   v0[2], v1[3]
0000000000000020        mov.s   v0[3], v1[0]
0000000000000024        str     q0, [x29, #0x10]
0000000000000028        ldp     x0, x1, [x29, #0x10]
000000000000002c        mov     sp, x29
0000000000000030        ldp     x29, x30, [sp], #0x30
0000000000000034        ret

Addresses #85166.

@SamMonoRT
Copy link
Member

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

Copy link
Member

@fanyang-mono fanyang-mono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jandupej
Copy link
Member Author

jandupej commented Oct 4, 2023

CI failures seem unrelated. Merging.

@jandupej jandupej merged commit 6486250 into dotnet:main Oct 4, 2023
103 of 106 checks passed
@jandupej jandupej deleted the arm64-fused-ins-ext branch October 4, 2023 12:49
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[mono] Fuse Vector128.WithElement and Vector128.GetElement on arm64
3 participants