compute special constants. #2830

MkazemAkhgary · 2024-04-06T09:08:48Z

constants that are filled with 1s from one side and 0s from another side, such as 0xFFFFFFF8 or 0x000000FF, can be computed directly rather than being broadcasted from memory which should be faster. these numbers are common such as 1, -8, 255, ...

if -1 is already present in a register, then vpcmpeqd is not needed and this will be just one instruction.

vpcmpeqd    ymm0, ymm0, ymm0 # 0xFFFFFFFF
vpslld    ymm1, ymm0, 3 # -8 = 0xFFFFFFF8

vpcmpeqd    ymm0, ymm0, ymm0 # 0xFFFFFFFF
vpsrld    ymm1, ymm0, 24 # 255 = 0x000000FF

constants with 1s in the middle can be computed in similar way, perhaps faster than broadcast, should be faster if -1 is present. also common (2, 4, -2.0, 0.5, ...)

vpcmpeqd    ymm0, ymm0, ymm0 # 0xFFFFFFFF
vpslld    ymm1, ymm0, 24 # 0xFF000000
vpsrld    ymm1, ymm1, 2 # 1.5f = 0x3FC00000

similar trick can be used to compute constants with 0s in the middle using a shift and a rotate. (AVX512 only)

vpcmpeqd    zmm0, zmm0, zmm0 # 0xFFFFFFFF
vpsrld    zmm1, zmm0, 1 # 0x7FFFFFFF
vprold    zmm1, zmm1, 2 # -3 = 0xFFFFFFFD

if a negative number is present, vpabsd can be used to get the positive value.
duplicate of a number can be computed via vpaddd.
if -1 is present, complement of a constant can be computed using vpxor.
adjacent numbers can be computed by adding or subtracting -1.

for some numbers, vpsubd or vpaddd can be used instead of double shifts to reduce port contention. for example, to get the number 2, compute (~0 >> 30) + ~0 instead of (~0 >> 31 << 1). (provided that -1 is present)

caveats:

these methods rely on having a register set to -1, which may induce register spilling in certain cases. However, in smaller code sections or where -1 is already available, these methods may be beneficial.
using too many shifts can lead to contention on execution ports. so depending on what instructions are scheduled, it might be better to use broadcast to better utilize ports.

The text was updated successfully, but these errors were encountered:

dbabokin · 2024-04-08T21:29:38Z

This tricks are implemented by LLVM backend (codegen), ISPC can handle it, but preferably it should be done in LLVM. I suggest verifying that LLVM doesn't do that for C/C++ code (using vector extension) and file this in LLVM project - and linking this issue, so we make sure that it happens in ISPC once it's implemented.

It's important to note in the LLVM issue, that it's for vector constants - as they would expect that it's for scalar by default.

pbrubaker added the Performance All issues related to performance/code generation label Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute special constants. #2830

compute special constants. #2830

MkazemAkhgary commented Apr 6, 2024 •

edited

dbabokin commented Apr 8, 2024

compute special constants. #2830

compute special constants. #2830

Comments

MkazemAkhgary commented Apr 6, 2024 • edited

dbabokin commented Apr 8, 2024

MkazemAkhgary commented Apr 6, 2024 •

edited