Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass StringRef by value in comparison operators #2875

Closed

Conversation

Daniel-B-Smith
Copy link

Because StringRef is small (sizeof(StringRef) == 12), it can be efficiently passed in registers. Unfortunately, when passing by reference, the StringRef variable has to have address, which leads to the members needing to be loaded from memory inside the function. The difference is small, but in one profile I have, > 25% of CPU time on a storage server is spent inside operator<().

On my r5d.2xl, the operator<() change led to a ~1.5% increase in TPS in the mako benchmark. I changed the other operators just for consistency.

Obligatory GodBolt link: https://godbolt.org/z/Gswzfd

@satherton
Copy link
Contributor

The link shows what each function compiles to independently, but it doesn't show the practical result of calling them when they are inlined.

Since the StringRef comparators are defined in the header and short, I would expect them to almost always be inlined, and I would also expect that the inlined and optimized final assembly would be effectively the same for both parameter versions.

I could be wrong, but if so then basically to avoid this const reference overhead for small types every template class or function that takes const T &thing to avoid copying large T's needs an alternate definition for small types that does not use const reference.

@satherton
Copy link
Contributor

Here's another Godbolt link showing what happens when you call compare1 vs compare2. The compiled assembly is effectively the same as I suspected. Only the labels have changed.

https://godbolt.org/z/TNsfGw

Side-by-Side diff:

callCompare1(StringRef, StringRef):			      |	callCompare2(StringRef, StringRef):
        push    rbp						        push    rbp
        mov     rbp, rcx					        mov     rbp, rcx
        push    rbx						        push    rbx
        mov     rbx, rsi					        mov     rbx, rsi
        mov     rsi, rdx					        mov     rsi, rdx
        mov     edx, ebx					        mov     edx, ebx
        sub     rsp, 8						        sub     rsp, 8
        cmp     ecx, ebx					        cmp     ecx, ebx
        cmovle  edx, ecx					        cmovle  edx, ecx
        test    edx, edx					        test    edx, edx
        jle     .L19					      |	        jle     .L27
        movsx   rdx, edx					        movsx   rdx, edx
        call    memcmp						        call    memcmp
        test    eax, eax					        test    eax, eax
        jne     .L25					      |	        jne     .L33
.L19:							      |	.L27:
        cmp     ebp, ebx					        cmp     ebp, ebx
        setg    al						        setg    al
        add     rsp, 8						        add     rsp, 8
        pop     rbx						        pop     rbx
        pop     rbp						        pop     rbp
        ret							        ret
.L25:							      |	.L33:
        add     rsp, 8						        add     rsp, 8
        shr     eax, 31						        shr     eax, 31
        pop     rbx						        pop     rbx
        pop     rbp						        pop     rbp
        ret							        ret

@sears
Copy link
Contributor

sears commented Apr 15, 2020

Another godbolt, this time with a for loop, just in case calling it in a pass-through style causes it to do something different: https://godbolt.org/z/DrhB7q

The mystery deepens. I have a tree microbenchmark here:
afc655f

Which is the first commit in: https://github.com/apple/foundationdb/pull/2882/commits

I wonder if anything will show up there.

@satherton
Copy link
Contributor

Good idea. Again only the labels have changed.

        push    r12						        push    r12
        xor     r12d, r12d					        xor     r12d, r12d
        push    rbp						        push    rbp
        mov     rbp, rsi					        mov     rbp, rsi
        push    rbx						        push    rbx
        xor     ebx, ebx					        xor     ebx, ebx
        jmp     .L21					      |	        jmp     .L28
.L24:							      |	.L31:
        mov     rsi, QWORD PTR [rbp+8+rbx*8]			        mov     rsi, QWORD PTR [rbp+8+rbx*8]
        mov     rdi, QWORD PTR [rbp+0+rbx*8]			        mov     rdi, QWORD PTR [rbp+0+rbx*8]
        mov     rdx, rbx					        mov     rdx, rbx
        call    memcmp						        call    memcmp
        test    eax, eax					        test    eax, eax
        setle   al						        setle   al
        movzx   eax, al						        movzx   eax, al
        add     r12d, eax					        add     r12d, eax
        cmp     ebx, 99						        cmp     ebx, 99
        je      .L18					      |	        je      .L25
.L22:							      |	.L29:
        add     rbx, 1						        add     rbx, 1
.L21:							      |	.L28:
        test    rbx, rbx					        test    rbx, rbx
        jne     .L24					      |	        jne     .L31
        add     r12d, 1						        add     r12d, 1
        jmp     .L22					      |	        jmp     .L29
.L18:							      |	.L25:
        mov     eax, r12d					        mov     eax, r12d
        pop     rbx						        pop     rbx
        pop     rbp						        pop     rbp
        pop     r12						        pop     r12
        ret							        ret```

@satherton satherton self-requested a review April 16, 2020 00:05
Copy link
Contributor

@satherton satherton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is microbenchmark evidence that this change is having an effect, which would mean we have use cases which are not being inlined, I am not in favor of merging it.

Also, if there are cases that are not being inlined, we should probably find out why and make them inline-able if possible. For example, calling a StringRef comparator via a pointer to member function would not be inlined but might also be avoidable.

@etschannen etschannen closed this May 7, 2020
@Daniel-B-Smith Daniel-B-Smith deleted the stringref-operator branch May 11, 2020 16:07
@Daniel-B-Smith
Copy link
Author

Apologies for the radio silence on this and thanks for closing. I was also experimenting with a more broad change but couldn't convince myself that it would help or hurt performance. I still think that it is good hygiene to consistently pass small types like this by value, but I couldn't convince myself it was worth the churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants