[mono][jit] Optimize Vector128.Create on Arm64 #91211

matouskozak · 2023-08-28T13:41:35Z

When initializing Vector128 with constant float/double value (e.g., Vector128.Create(2.0f)) the values are loaded into 32/64-bit register (fmov) and then duplicated into 128-bit register (dup.4s/2d). For immediate floating-point constants on Arm64 this can be optimized into a single fmov FMOV (vector, immediate) which copies the immediate constant into every element of the 128-bit register.

With AOT LLVM the behavior is already optimized and it uses fmov.4s/2d to directly fill the 128-bit register.

Steps to take:

Move the Arm64 immediate floating-point constant check functionality to a separate function so that it can be used both for float/double types as well as for vector types.

runtime/src/mono/mono/mini/mini-arm64.c

Lines 4876 to 4906 in ff2de36

    
           // Arm64 floating-point modified immediate constant check (2 * 128 combinations) 
        
           // Float:  aBbbbbbc defgh000 00000000 00000000 
        
           // Double: aBbbbbbb bbcdefgh 00000000 00000000 00000000 00000000 00000000 00000000 
        
           // Trailing zeros check 
        
           if ((r_imm & mask_constant) == 0) { 
        
           	// Mask for b 
        
           	guint8 mask_b; 
        
           	int idx_last_b; 
        
           	if (is_double) { 
        
           		mask_b = 0xFF; 
        
           		idx_last_b = 54; 
        
           	} else { 
        
           		mask_b = 0x1F; 
        
           		idx_last_b = 25; 
        
           	} 
        
           	guint8 masked_b = (r_imm & ((guint64)mask_b << idx_last_b)) >> idx_last_b; 
        
           	int size = is_double ? 64 : 32; 
        
           	// NOT(B) == b check 
        
           	if (((r_imm & ((guint64)1 << (size - 2))) && masked_b == 0) 
        
           	|| (!(r_imm & ((guint64)1 << (size - 2))) && masked_b == mask_b)) {						 
        
           		//imm8 = abcdefgh 
        
           		guint8 imm8 = ((r_imm & ((guint64)1 << (size - 1))) >> (size - 8)) 
        
           					| ((r_imm & ((guint64)1 << idx_last_b)) >> (idx_last_b - 6)) 
        
           					| ((r_imm & ((guint64)0x3F << (idx_last_b - 6))) >> (idx_last_b - 6)); 
        
           		arm_fmov_imm(code, (is_double ? 0x01 : 0x00), imm8, dreg); 
        
           		break; 
        
           	}  
        
           }

Use ^ check inside vector constructor code path to emit single fmov for immediate constants.

The text was updated successfully, but these errors were encountered:

matouskozak added the area-Codegen-JIT-mono label Aug 28, 2023

matouskozak added this to the Future milestone Aug 28, 2023

matouskozak self-assigned this Aug 28, 2023

matouskozak changed the title ~~[mono][jit] Optimize Vector124.Create on Arm64~~ [mono][jit] Optimize Vector128.Create on Arm64 Aug 29, 2023

SamMonoRT modified the milestones: Future, 9.0.0 Aug 29, 2023

matouskozak modified the milestones: 9.0.0, Future Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono][jit] Optimize Vector128.Create on Arm64 #91211

[mono][jit] Optimize Vector128.Create on Arm64 #91211

matouskozak commented Aug 28, 2023 •

edited

Loading

[mono][jit] Optimize Vector128.Create on Arm64 #91211

[mono][jit] Optimize Vector128.Create on Arm64 #91211

Comments

matouskozak commented Aug 28, 2023 • edited Loading

matouskozak commented Aug 28, 2023 •

edited

Loading