Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Optimizing pow with a literal exponent #1244
Hey guys! The old FXC compiler would always optimize certain cases of pow with a literal exponent. The classic case is something like this, with an exponent of 2.0:
FXC produces the following DXBC output for this:
As you can see it removed the pow and instead multplied x with itself, which I would assume is always going to be cheaper than the log2/mul/exp2 sequence that you get with an non-constant exponent. It looks like dxc doesn't do this transformation:
Do you guys know if this is intentional? If not, I think it would be nice to include that optimization for cases where people use pow(x, 1), pow(x, 2) or pow(x, 4) for convenience.
referenced this issue
Sep 20, 2018
referenced this issue
Sep 25, 2018
For backwards compatibility, the referenced changes will do what fxc did, as long as you pass -HV 2016 (HLSL Version 2016). This should cover existing code you don't want to edit. The issue is that this expansion isn't always correct according to the spec (and IEEE safe mode doesn't correct it on fxc either).
Currently, when we lower (and expand to muls or log/mul/exp), we don't yet know whether things are marked precise, so we can't decide whether mul expansion is ok. Later, when we have precise marking, we would have to match the log-mul-exp pattern without precise and replace it with a mul expansion. Since doing this is extra work and the ideal optimization will be dependent on the target device, it's probably best left to the driver to decide whether to do this optimization.
Additionally, in future shader models, we plan on having a native pow intrinsic in DXIL, so this can be more easily matched for optimization. But the ideal expansion should still be performed by the driver compiler.
For new/modified HLSL, you can use your own manual multiply expansion if that's really what you want. Then you don't have to worry about the spec issues. Here's a function to do the expansion, with overloads for vector sizes, that should optimize to the code you want when using a literal uint up to 15.