TL:DR emit a libcall for absolute value so we don't clobber a bunch of registers.
The compiler emits this to do an inlined and branchless absolute value:
int mask = (signed)input >> BIT_WIDTH - 1;
int output = (input + mask) ^ mask);
But as we know, XOR is slow, so we should tell the compiler to emit
call __*cmpzero
call m, __*neg
Or possibly a new libcall function dedicated to performing the absolute value
TL:DR emit a libcall for absolute value so we don't clobber a bunch of registers.
The compiler emits this to do an inlined and branchless absolute value:
But as we know, XOR is slow, so we should tell the compiler to emit
Or possibly a new libcall function dedicated to performing the absolute value