New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline_memory: optimized mem_is_zero for non-x64 #15307

Merged
merged 1 commit into from Jun 1, 2017

Conversation

Projects
None yet
2 participants
@branch-predictor
Member

branch-predictor commented May 26, 2017

mem_is_zero is fast for x64 where 128-bit registers are available, but it's very easy to optimze it for 32-bit Intel and ARM CPUs (and non-x64 CPUs in general) as well, the speed won't be anywhere near the fastest one but still almost 7x faster than regular byte-by-byte check - checked on both 32bit x86 system and 32bit ARM CPU (ARMv7 Processor rev 3 (v7l)). This works because GCC on 32bit archs is smart enough to do two loads and one flag compare, making checking by 8 bytes at a time still around 50% faster than checking by 4 bytes even if we don't have 64-bit registers:

.L6:
        movl    4(%eax), %edx  # load first 4 bytes
        orl     (%eax), %edx   # OR it with 4 next bytes
        jne     .L4            # proceed if still zero
.L5:
        addl    $8, %eax
        cmpl    %eax, %ecx

and on ARM:

        ldrd    r2, [r0]    # load first 4 bytes
        orrs    r3, r2, r3  # OR it with 4 next bytes
        bne     .L13        # branch out if not zero anymore
[..]
.L13:
        movs    r0, #0
        pop     {r4, r5, r6}
        bx      lr

Patch includes extra test to check for corner cases that may pop with such implementations.

Signed-off-by: Piotr Dałek piotr.dalek@corp.ovh.com

inline_memory: optimized mem_is_zero for non-X64
mem_is_zero is fast for X64 where 128-bit registers are available,
but it's very easy to optimze it for 32-bit Intel and ARM CPUs as
well, the speed won't be anywhere near the fastest one but still almost
7x faster than regular byte-by-byte check.
Now with extra test to check for corner cases that may pop with such
implementations.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
@branch-predictor

This comment has been minimized.

Member

branch-predictor commented May 30, 2017

@markhpc ping

@liewegas liewegas merged commit 703125e into ceph:master Jun 1, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@branch-predictor branch-predictor deleted the ovh:bp-64bit-mem-is-zero branch Dec 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment