Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient codegen for floating-point operations #26569

Open
tannergooding opened this issue Sep 6, 2019 · 1 comment

Comments

@tannergooding
Copy link
Member

commented Sep 6, 2019

Older x86 hardware only had the legacy SSE encoding which encodes two parameters: dst/op1, and op2. These instructions are considered RMW since dst and op1 are encoded in the same parameter. In order to handle this, you generally need to insert an additional move instruction if dst and op1 were not determined to be the same by the register allocator.

Newer x86 hardware (anything with AVX support) has the newer VEX encoding which takes three parameters: dst, op1, and op2. This encoding is not RMW and does not require an additional move instruction. The encoding is also more efficient and takes up the same number of bytes to encode (for the same allocated registers) or less bytes when dst != op1 (since you don't need to also encode an additional move instruction).

We are already emitting the VEX encoding by default for floating-point instructions; however codegen for non-intrinsic codepaths are not VEX aware and are still treating floating-point operations as RMW and as if the encoding only supports dst/op1 and op2.

It would be beneficial if the codegen and register allocator were updated to be VEX aware and to call the appropriate emit_SIMD codepath (which handles VEX vs legacy encoding differences) where possible.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.