New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crc32c: optimize aarch64 crc32c implementation #12977

Merged
merged 1 commit into from Feb 18, 2017

Conversation

Projects
None yet
4 participants
@weixiaowilliam
Contributor

weixiaowilliam commented Jan 18, 2017

ARMv8 defines PMULL crypto instruction.
This patch optimizes crc32c calculate with the instruction when
available rather than original linear crc instructions.

ceph crc32c performance unit test shows that the optimization get
~ x3.90 speedups on ThunderX ARM Core@2.0GHz (Cavium)
~ x1.45 speedups on ARM Cortex-A57@2.1GHz (Huaiwei)
~ x1.16 speedups on ARM Cortex-A57@2.0GHz (Softiron)

Jira: ENTLLT-358
Change-Id: I657422cd20c9ca78237cd060210a5383f4122575

int main() { foo(0); }" HAVE_ARMV8_CRC)
set(CMAKE_REQUIRED_QUIET ${save_quiet})
set(HAVE_ARM 1)
CHECK_C_COMPILER_FLAG(-march=armv8-a+crc HAVE_ARMV8_CRC)

This comment has been minimized.

@tchaikov

tchaikov Jan 18, 2017

Contributor

this basically reverts the change of e70ab48 by @agraf.

This comment has been minimized.

@agraf

agraf Jan 18, 2017

Contributor

Yes, it does indeed. The rationale for my change was that we're compiling with gcc 4.8, but recent binutils. So -mcpu on the gcc command line doesn't know about +crc.

This comment has been minimized.

@agraf

agraf Jan 18, 2017

Contributor

So a clear NAK from my side :)

This comment has been minimized.

@weixiaowilliam

weixiaowilliam Jan 19, 2017

Contributor

You are right! we need to consider the compatibility with old gcc. I have refined the patch so that it can switch back to original implementation if old gcc doesn't recognize compile flags of "+crc+crypto".

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jan 19, 2017

@yghannam and @agraf mind taking a look?

@@ -31,18 +31,22 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64|AARCH64")
if(HAVE_ARMV8_CRC)
message(STATUS " aarch64 crc extensions supported")
endif()
CHECK_C_COMPILER_FLAG(-march=armv8-a+crc+crypto HAVE_ARMV8_CRC_CRYPTO_INTRINSICS)
if(HAVE_ARMV8_CRC_CRYPTO_INTRINSICS)
set(ARMV8_CRC_COMPILE_FLAGS "${ARMV8_CRC_COMPILE_FLAGS} -march=armv8-a+crc+crypto")

This comment has been minimized.

@agraf

agraf Jan 19, 2017

Contributor

This checks for gcc flags. Don't do that. Instead, copy the code above and ask the assembler whether it supports crypto extensions.

This comment has been minimized.

@tchaikov

tchaikov Jan 19, 2017

Contributor

@agraf but @weixiaowilliam is using the GCC intrinsics to implement the CRC digest. i see it's less portable because lower GCC version might not support these intrinsics, but they are easier to write and easier to read.

@weixiaowilliam maybe we can conditionalize this, by defining CRC32CX and CRC32CW using assembly (i.e. crc32w on aach64) if GCC does not support these intrinsics.

This comment has been minimized.

@agraf

agraf Jan 19, 2017

Contributor

I think we're misunderstanding each other. The problem is that our compiler does not support -march= or -mcpu= on anything but armv8-a.

However, our assembler handles those additional instructions just fine. So all we need to change is the check from the one he's implementing to something that resembles the block above.

This comment has been minimized.

@weixiaowilliam

weixiaowilliam Jan 20, 2017

Contributor

i have refined the patch by adding inline assembly so that the optimization will work even on lower GCC version which doesn't support crc&crypto intrinsics. but if on higher GCC version supporting to the intrinsics, it will prefer intrinsics so that ARM compiler has a chance to further optimize it (such as better instruction schedule via more accurate pipeline latency calculation or adopting newest NEON instructions for better performance). besides performance, intrinsics will provide better readability. so the whole logic is now as follows:

if compiler supporting crc+crypto intrinsics then
    {optimization based on compiler intrinsics}
else if assembler supporting both crc and crypto instructions then
    {optimization based on crc+crypto instructions}
else if assembler only supporting crc instruction then
    {original optimization based linear crc instructions}
else
    {pure software implementation}
#include <arm_acle.h>
#include <arm_neon.h>

This comment has been minimized.

@agraf

agraf Jan 19, 2017

Contributor

After you changed the check to an assembler based check rather than a compiler based check, please add the an asm(".arch_extension crc+crypto") intrinsic here.

@weixiaowilliam

This comment has been minimized.

Contributor

weixiaowilliam commented Jan 22, 2017

@tchaikov @agraf @yghannam mind taking a look at the refined patch?

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Feb 3, 2017

@weixiaowilliam needs rebase

@agraf @yghannam this looks sane to me, do you mind taking a look again?

crc32c: optimize aarch64 crc32c implementation
ARMv8 defines PMULL crypto instruction.
This patch optimizes crc32c calculate with the instruction when
available rather than original linear crc instructions.

ceph crc32c performance unit test shows that the optimization get
~ x3.90 speedups on ThunderX ARM Core@2.0GHz (Cavium)
~ x1.45 speedups on ARM Cortex-A57@2.1GHz (Huaiwei)
~ x1.16 speedups on ARM Cortex-A57@2.0GHz (Softiron)

Jira: ENTLLT-358
Change-Id: I657422cd20c9ca78237cd060210a5383f4122575
Signed-off-by: wei xiao <wei.xiao@linaro.org>
@weixiaowilliam

This comment has been minimized.

Contributor

weixiaowilliam commented Feb 4, 2017

@tchaikov rebase is done.

@weixiaowilliam

This comment has been minimized.

Contributor

weixiaowilliam commented Feb 9, 2017

@agraf @yghannam do you mind taking a look again?
i have refined the patch by adding inline assembly so that the optimization will work even on lower GCC version which doesn't support crc&crypto intrinsics. but if on higher GCC version supporting to the intrinsics, it will prefer intrinsics so that ARM compiler has a chance to further optimize it (such as better instruction schedule via more accurate pipeline latency calculation or adopting newest NEON instructions for better performance). besides performance, intrinsics will provide better readability. so the whole logic is now as follows:

if compiler supporting crc+crypto intrinsics then
    {optimization based on compiler intrinsics}
else if assembler supporting both crc and crypto instructions then
    {optimization based on crc+crypto instructions}
else if assembler only supporting crc instruction then
    {original optimization based linear crc instructions}
else
    {pure software implementation}
@weixiaowilliam

This comment has been minimized.

Contributor

weixiaowilliam commented Feb 15, 2017

@tchaikov @agraf @ @yghannam hi guys, any comments for the aarch64 crc32c optimization?

@tchaikov tchaikov added the needs-qa label Feb 15, 2017

@tchaikov tchaikov self-assigned this Feb 15, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Feb 15, 2017

@weixiaowilliam adding needs-qa so we can include it in the next qa batch.

@tchaikov tchaikov merged commit 0603794 into ceph:master Feb 18, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
@dmick

This comment has been minimized.

Member

dmick commented Mar 24, 2017

It looks like this requires gcc 4.9.0; CentOS 7.3 appears to include 4.8.5 (and there is no arm_acle.h there). Researching the best solution. How much of a difference do the intrinsics make? Should we just disable them on CentOS/RHEL?

@dmick

This comment has been minimized.

Member

dmick commented Mar 24, 2017

it appears as though gcc accepts -march=armv8-a+crc+crypto and yet supplies no arm_acle.h. I'm not sure what this means.

@dmick

This comment has been minimized.

Member

dmick commented Mar 25, 2017

Suggesting a fix like so:

#14132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment