Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][jit] Optimize Vector128.Create on Arm64 #91211

Open
matouskozak opened this issue Aug 28, 2023 · 0 comments
Open

[mono][jit] Optimize Vector128.Create on Arm64 #91211

matouskozak opened this issue Aug 28, 2023 · 0 comments
Assignees
Milestone

Comments

@matouskozak
Copy link
Member

matouskozak commented Aug 28, 2023

When initializing Vector128 with constant float/double value (e.g., Vector128.Create(2.0f)) the values are loaded into 32/64-bit register (fmov) and then duplicated into 128-bit register (dup.4s/2d). For immediate floating-point constants on Arm64 this can be optimized into a single fmov FMOV (vector, immediate) which copies the immediate constant into every element of the 128-bit register.

With AOT LLVM the behavior is already optimized and it uses fmov.4s/2d to directly fill the 128-bit register.

Steps to take:

  1. Move the Arm64 immediate floating-point constant check functionality to a separate function so that it can be used both for float/double types as well as for vector types.
    // Arm64 floating-point modified immediate constant check (2 * 128 combinations)
    // Float: aBbbbbbc defgh000 00000000 00000000
    // Double: aBbbbbbb bbcdefgh 00000000 00000000 00000000 00000000 00000000 00000000
    // Trailing zeros check
    if ((r_imm & mask_constant) == 0) {
    // Mask for b
    guint8 mask_b;
    int idx_last_b;
    if (is_double) {
    mask_b = 0xFF;
    idx_last_b = 54;
    } else {
    mask_b = 0x1F;
    idx_last_b = 25;
    }
    guint8 masked_b = (r_imm & ((guint64)mask_b << idx_last_b)) >> idx_last_b;
    int size = is_double ? 64 : 32;
    // NOT(B) == b check
    if (((r_imm & ((guint64)1 << (size - 2))) && masked_b == 0)
    || (!(r_imm & ((guint64)1 << (size - 2))) && masked_b == mask_b)) {
    //imm8 = abcdefgh
    guint8 imm8 = ((r_imm & ((guint64)1 << (size - 1))) >> (size - 8))
    | ((r_imm & ((guint64)1 << idx_last_b)) >> (idx_last_b - 6))
    | ((r_imm & ((guint64)0x3F << (idx_last_b - 6))) >> (idx_last_b - 6));
    arm_fmov_imm(code, (is_double ? 0x01 : 0x00), imm8, dreg);
    break;
    }
    }
  2. Use ^ check inside vector constructor code path to emit single fmov for immediate constants.
@matouskozak matouskozak added this to the Future milestone Aug 28, 2023
@matouskozak matouskozak self-assigned this Aug 28, 2023
@matouskozak matouskozak changed the title [mono][jit] Optimize Vector124.Create on Arm64 [mono][jit] Optimize Vector128.Create on Arm64 Aug 29, 2023
@SamMonoRT SamMonoRT modified the milestones: Future, 9.0.0 Aug 29, 2023
@matouskozak matouskozak modified the milestones: 9.0.0, Future Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants