Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upOptimization to integer addition #897
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
waywardmonkeys
May 8, 2015
Member
I also took the .ll file and hand edited it to change from xor to sub and passed that to clang -O2 -c -S and then diffed the assembly outputs:
batavia:fib bruce$ diff -u fib.xor.S fib.sub.S
--- fib.xor.S 2015-05-08 20:43:43.000000000 +0700
+++ fib.sub.S 2015-05-08 20:44:09.000000000 +0700
@@ -34,7 +34,7 @@
## BB#3:
movq %rbx, %rdi
callq _KfibVfibI
- xorq $1, %rax
+ decq %rax
addq %rax, %r14
jo LBB0_6
## BB#4:But that would do better in other situations where constants could be folded, etc.
|
I also took the batavia:fib bruce$ diff -u fib.xor.S fib.sub.S
--- fib.xor.S 2015-05-08 20:43:43.000000000 +0700
+++ fib.sub.S 2015-05-08 20:44:09.000000000 +0700
@@ -34,7 +34,7 @@
## BB#3:
movq %rbx, %rdi
callq _KfibVfibI
- xorq $1, %rax
+ decq %rax
addq %rax, %r14
jo LBB0_6
## BB#4:But that would do better in other situations where constants could be folded, etc. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
waywardmonkeys
May 8, 2015
Member
And to clear things up a bit more, the register dance that he was referring to can be seen here:
callq _KfibVfibI
movq %rax, %r14
subq $8, %rbx
jo LBB0_6
## BB#3:
movq %rbx, %rdi
callq _KfibVfibI
decq %rax
addq %rax, %r14
jo LBB0_6
## BB#4:
movq %r14, %rbx
LBB0_5:
movb $1, %dl
movq %rbx, %rax|
And to clear things up a bit more, the register dance that he was referring to can be seen here: callq _KfibVfibI
movq %rax, %r14
subq $8, %rbx
jo LBB0_6
## BB#3:
movq %rbx, %rdi
callq _KfibVfibI
decq %rax
addq %rax, %r14
jo LBB0_6
## BB#4:
movq %r14, %rbx
LBB0_5:
movb $1, %dl
movq %rbx, %rax |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pkhuong
May 8, 2015
dec is one byte (+ REX); xor with immediate is 5 (+ REX). More compact code is good ;)
Here's a short example of how LLVM can fold things better with dec than xor:
int
foo (int *x, long index)
{
return x[index ^ 1];
}
int
bar (int *x, long index)
{
return x[index - 1];
}
foo compiles to xorq $1, %rsi / movl (%rdi,%rsi,4), %eax, while bar compiles to a straight movl -4(%rdi,%rsi,4), %eax.
pkhuong
commented
May 8, 2015
|
dec is one byte (+ REX); xor with immediate is 5 (+ REX). More compact code is good ;) Here's a short example of how LLVM can fold things better with dec than xor:
foo compiles to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
waywardmonkeys
May 8, 2015
Member
I'm testing the straight-forward way of fixing this with a 3 stage bootstrap now.
|
I'm testing the straight-forward way of fixing this with a 3 stage bootstrap now. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
waywardmonkeys
May 8, 2015
Member
Amusingly:
-rwxr-xr-x 1 bruce staff 2129948 May 8 21:46 Bootstrap.2/lib/libdylan.dylib
-rwxr-xr-x 1 bruce staff 2117804 May 8 21:35 Bootstrap.1/lib/libdylan.dylib
It got bigger!
I haven't looked at the generated code to see how much things may have changed yet.
|
Amusingly:
It got bigger! I haven't looked at the generated code to see how much things may have changed yet. |
added a commit
to waywardmonkeys/opendylan
that referenced
this issue
May 8, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
waywardmonkeys
May 8, 2015
Member
With the change on that pull request and diffing disassembly, we can see things like (from table.dylan compiled with the C back-end):
- movl 0x1c(%eax), %eax
- xorl $0x1, %eax
- subl %eax, %esi
- movl _dylan_teb+460(%edi), %eax
+ incl %esi
+ subl 0x1c(%eax), %esi
+ movl _dylan_teb+456(%edi), %eax- andl %ebx, %ecx
- movl %esi, %edx
- movl %edx, -0x58(%ebp)
- xorl $0x1, %esi
- leal 0x5(%esi,%ecx), %eax
+ andl %edx, %ecx
+ leal 0x1(%ecx), %edx
+ leal -0x1(%esi), %eax
+ movl %eax, -0x58(%ebp)
+ leal 0x4(%esi,%ecx), %eax- movl %ecx, %edx
- movl %eax, %ecx
- xorl $0x1, %ecx
- addl %edx, %ecx
- movl %ecx, %esi
+ leal -0x1(%ecx,%eax), %edx
+ movl %edx, %esi- movl _Dsecondary_stridesVKi-20622(%ecx), %eax
- movl 0x8(%edx,%eax), %eax
- xorl $0x1, %eax
- movl %eax, -0x44(%ebp)
- nopl _Kuninitialized_table_testVKiI(%eax)
+ movl _Dsecondary_stridesVKi-20334(%ecx), %eax
+ movl $0x1, %esi
+ subl 0x8(%edx,%eax), %esi
+ movl %esi, -0x40(%ebp)
+ nopl (%eax)
movl %ebx, %eax
- subl -0x44(%ebp), %eax
+ addl -0x40(%ebp), %eax
+ testl %eax, %eax|
With the change on that pull request and diffing disassembly, we can see things like (from table.dylan compiled with the C back-end): - movl 0x1c(%eax), %eax
- xorl $0x1, %eax
- subl %eax, %esi
- movl _dylan_teb+460(%edi), %eax
+ incl %esi
+ subl 0x1c(%eax), %esi
+ movl _dylan_teb+456(%edi), %eax- andl %ebx, %ecx
- movl %esi, %edx
- movl %edx, -0x58(%ebp)
- xorl $0x1, %esi
- leal 0x5(%esi,%ecx), %eax
+ andl %edx, %ecx
+ leal 0x1(%ecx), %edx
+ leal -0x1(%esi), %eax
+ movl %eax, -0x58(%ebp)
+ leal 0x4(%esi,%ecx), %eax- movl %ecx, %edx
- movl %eax, %ecx
- xorl $0x1, %ecx
- addl %edx, %ecx
- movl %ecx, %esi
+ leal -0x1(%ecx,%eax), %edx
+ movl %edx, %esi- movl _Dsecondary_stridesVKi-20622(%ecx), %eax
- movl 0x8(%edx,%eax), %eax
- xorl $0x1, %eax
- movl %eax, -0x44(%ebp)
- nopl _Kuninitialized_table_testVKiI(%eax)
+ movl _Dsecondary_stridesVKi-20334(%ecx), %eax
+ movl $0x1, %esi
+ subl 0x8(%edx,%eax), %esi
+ movl %esi, -0x40(%ebp)
+ nopl (%eax)
movl %ebx, %eax
- subl -0x44(%ebp), %eax
+ addl -0x40(%ebp), %eax
+ testl %eax, %eax |
waywardmonkeys commentedMay 8, 2015
When we add 2 integers, we avoid untagging and retagging them by doing this:
The definition of
strip-integer-tagis:This results in this sort of thing in the generated assembly (x86_64):
@pkhuong has suggested that we could get better generated code if we did a subtraction instead of an xor operation here as that would allow LLVM to do better optimization:
This seems like an interesting experiment to try.
It also pointed out an issue where not all back-ends might be using tagged integers (like my JS back-end might not).
It seems like an easy way to address this would be to add a new
primitive-strip-integer-tagand perhaps for some other similar operations and then each back-end can do the right thing.