Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Partly mitigate bad Clang inlining decision #66
Because a visitor is wrapped several times during visitation it cases extra temporaries usage and useless store and loads that can only be optimized if the
LLVM ticket https://bugs.llvm.org/show_bug.cgi?id=41491
foo: sub rsp, 88 lea rax, [rsp + 64] mov qword ptr [rsp + 72], rax lea rax, [rsp + 72] mov qword ptr [rsp + 80], rax mov eax, dword ptr [rcx] lea r9, [rcx + 4] mov edx, eax sar edx, 31 xor edx, eax xorps xmm0, xmm0 movups xmmword ptr [rsp + 48], xmm0 lea r8, [rsp + 80] mov ecx, eax call tail nop add rsp, 88 ret tail: add edx, -1 cmp edx, 8 ja .LBB1_2 lea rax, [rip + .LJTI1_0] movsxd rcx, dword ptr [rax + 4*rdx] add rcx, rax jmp rcx
foo: sub rsp, 56 lea rax, [rsp + 40] mov qword ptr [rsp + 48], rax lea rdx, [rsp + 48] call tail nop add rsp, 56 ret tail: mov ecx, dword ptr [rcx] mov eax, ecx sar eax, 31 xor eax, ecx add eax, -1 cmp eax, 8 ja .LBB1_2 lea rcx, [rip + .LJTI1_0] movsxd rax, dword ptr [rcx + 4*rax] add rax, rcx jmp rax
Clang has heuristics on explicit
I do not like force inline, as it is too often misused. I think that the compiler should decide when to inline and when not. If the compiler takes the wrong decision - then the compiler should be fixed, not the code. Fixing codegen with force inline one one platform could make the codegen on other platform worse.
Did not know that, seems to be true. https://godbolt.org/z/q8JUle
This will not change anything. The thing is that clang seems to decide inlining or not exclusively on function size. Explicit
It is not the case. The functions I have added forceinline to are exclusively internal. That's why I did not add the forceinline to
In ideal world yes. But you know, "zero cost abstractions" are not always zero cost in reality.
There is nothing wrong decision on compiler side. It did not offer perfect inlining, and I do not think inlining is something that can be done perfectly. GCC seems to be simply much more aggressive, and its just other side of the coin. The code is not compiler friendly, optimizing memory pointers is a very tough problem. If the visitor were stored and passed by value the situation should have been probably be much better.