-
Notifications
You must be signed in to change notification settings - Fork 152
[CIR] Update Accumulate Bits Algorithm #1688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It's difficult to compare the implementations without seeing a side-by-side comparison of the resulting LLVM IR for accessing elements in a structure. I suspect that representing the memory using an array of bytes is going to be bad compared to using power-of-two integers because they result in different physical representations on little-endian machines. Consider this example: https://godbolt.org/z/3eWjee55f
The current incubator produces
I think we could get similar results with your PR by replacing The LLVM IR generated looks like this:
That doesn't seem too bad until you consider that
We had to load two different values and combine them to get the correct result! By contrast, classic codegen emits this:
And the generated x86-64 assembly is this:
It almost looks like classic codegen is cheating by telling LLVM IR to load an |
That is to say, I think you should make the change to get this to emit the same structure that classic codegen emits. |
Yes, @andykaylor 100% you're correct _Z1fv: # @_Z1fv
.L_Z1fv$local:
.type .L_Z1fv$local,@function
.cfi_startproc
# %bb.0:
movzbl .Ls$local+6(%rip), %eax
shll $16, %eax
movzwl .Ls$local+4(%rip), %ecx
orl %eax, %ecx
movl %ecx, %eax
shlq $47, %rax
sarq $47, %rax
movl %eax, -4(%rsp)
movl -4(%rsp), %eax
retq And now, using power-of-two types, it looks much better exactly like CodeGen: _Z1fv: # @_Z1fv
.L_Z1fv$local:
.type .L_Z1fv$local,@function
.cfi_startproc
# %bb.0:
movq .Ls$local(%rip), %rax
shlq $15, %rax
sarq $47, %rax
retq Thanks for the detailed explanation, Andy ;) |
Thanks for fixing this. I agree with Andy here too, matching classic codegen is the way to go! |
This PR addresses the feedback from llvm/llvm-project#142041 (comment). Our algorithm for accumulating bitfields has diverged from CodeGen since Clang 19.
There is one key difference: in CIR, we use the function
getBitfieldStorageType
, which checks whether the bit width of the current accumulation run is a valid fundamental width (i.e., a power of two: 8, 16, 32, 64). If it is, it returns a CIR type of that size otherwise, it returns an array with the closest alignment.For example, given the following struct:
The CodeGen output is:
Whereas the new CIR algorithm produces:
In CIR, the algorithm accumulates up to field
d
, resulting in 50 accumulated bits. Since 50 is not a fundamental width, the closest alignment is 56 bits, which leads to the type!cir.array<!u8i x 7>
. The algorithm stops before accumulating fielde
because including it would exceed the register size (64), which is not ideal.At this point, it's unclear whether this divergence from CodeGen represents an improvement. If we wanted to match CodeGen exactly, we would need to replace the use of
getBitfieldStorageType
withgetUIntNType
. The difference is thatgetUIntNType
always returns the closest power-of-two integer type instead of falling back to an array when the size is not a fundamental width.With this change, CIR would match CodeGen's layout exactly. It would require the following small code change:
You can see a comparison between the two functions https://godbolt.org/z/qjx1MaEWG.
I'm currently unsure whether using one function over the other has performance implications. Regarding the ARM error I mentioned in the meeting: it was an
assert
I had forgotten to update. It's now fixed sorry for the confusion.