You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using shifts avoids loading from memory and results in less register pressure. The second approach should complete in about 2 cycles, whereas the broadcast alone takes about 5 to 8 cycles (if I'm not mistaken). am I missing something?
P.S. there is also a vpand instruction with 3rd operand as memory location. I'm not sure why ispc doesn't use this. (it does use vandps zmm, zmm, m512 for avx512)
If there are additional shift operations similar to this, or if the value 0xfffffffc is already in a register, it would be more efficient to use AND instead.
The text was updated successfully, but these errors were encountered:
This issue could be generalized into use of broadcasting constants. I think some times it's better to just do simple computations rather than broadcasting a constant from memory.
this function with AVX2
compiles to
wouldn't it be better to just do this?
Using shifts avoids loading from memory and results in less register pressure. The second approach should complete in about 2 cycles, whereas the broadcast alone takes about 5 to 8 cycles (if I'm not mistaken). am I missing something?
P.S. there is also a
vpand
instruction with 3rd operand as memory location. I'm not sure why ispc doesn't use this. (it does usevandps zmm, zmm, m512
for avx512)If there are additional shift operations similar to this, or if the value
0xfffffffc
is already in a register, it would be more efficient to use AND instead.The text was updated successfully, but these errors were encountered: