Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Inefficient codegen for floating-point operations #1342
Older x86 hardware only had the legacy SSE encoding which encodes two parameters: dst/op1, and op2. These instructions are considered RMW since dst and op1 are encoded in the same parameter. In order to handle this, you generally need to insert an additional move instruction if
Newer x86 hardware (anything with AVX support) has the newer VEX encoding which takes three parameters: dst, op1, and op2. This encoding is not RMW and does not require an additional move instruction. The encoding is also more efficient and takes up the same number of bytes to encode (for the same allocated registers) or less bytes when dst != op1 (since you don't need to also encode an additional move instruction).
We are already emitting the VEX encoding by default for floating-point instructions; however codegen for non-intrinsic codepaths are not VEX aware and are still treating floating-point operations as RMW and as if the encoding only supports
It would be beneficial if the codegen and register allocator were updated to be VEX aware and to call the appropriate