Skip to content

Bad codegen for firstbithigh and firstbitlow #169

@dmpots

Description

@dmpots

We generate the wrong code for firstbithigh and inefficient code for firstbitlow.

firstbitlow

The hlsl firstbitlow function returns the first bit set from the lsb. If no bit is set it returns -1.

The dxil FirstbitLo intrinsic returns the first bit set from the lsb. If no bit is set it returns -1.

For firstbitlow, we generate this

  %FirstbitLo = call i32 @dx.op.unaryBits.i32(i32 32, i32 %0)  ; FirstbitLo(value)
  %1 = icmp ne i32 %0, 0
  %2 = select i1 %1, i32 %FirstbitLo, i32 -1

Which seems reasonable, but the select is redundant. If the value is 0 then the select will choose -1. But if the value is 0 the FirstbitLo intrinsic will return -1. So regardless of the input value we can take the result of the intrinsic. This would match the code produced by fxc.

firstbithigh

The hlsl firstbithigh function changes behavior depending on the sign. For unsigned values it returns the index of the first bit set from the msb. However, the index starts from the lsb. For signed values if the value is negative it returns the index of first 0 from the msb, otherwise it returns the index of the first 1. Again all indexes relative to the lsb. If no 1 is found (or 0 for signed) then -1 is returned.

The dxil FirstbitHi intrinsic returns the first bit set from the msb. If no bit is set it returns -1. The index is relative to the msb.

The dxil FirstbitSHi if the value is negative it returns the first 0 found from the msb, or -1 if no 0 is found. If the value is positive it returns the first 1 found or -1 if no 1 is found. The index is relative to the msb.

The codegen for unsigned firstbithigh looks ok:

  %FirstbitHi = call i32 @dx.op.unaryBits.i32(i32 33, i32 %0)  ; FirstbitHi(value)
  %1 = sub i32 31, %FirstbitHi
  %2 = icmp ne i32 %0, 0
  %3 = select i1 %2, i32 %1, i32 -1

The index returned from the FirstbitHi intrinsic is offset by 31 to produce an index based on the lsb. This matches the codegen from fxc.

The codegen for signed firstbithigh looks wrong:

  %FirstbitSHi = call i32 @dx.op.unaryBits.i32(i32 34, i32 %0)  ; FirstbitSHi(value)
  %1 = sub i32 31, %FirstbitSHi
  %2 = icmp ne i32 %0, 0
  %3 = select i1 %2, i32 %1, i32 -1

The problem is that it only handles the "not-found" case for positive numbers. For a negative number with no 0 bits (i.e. 0xffffffff for int) the codegen will produce a value of 32 instead of -1 (because 31-(-1) == 32). Instead of checking the input value for 0, we should check the value returned by the intrinsic for -1 and use that comparison for the select. This would match what fxc does.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions