New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crc32c: Add crc32c function optimized for ppc architecture #13909

Merged
merged 1 commit into from Apr 1, 2017

Conversation

Projects
None yet
7 participants
@kestrels
Contributor

kestrels commented Mar 9, 2017

Add crc32c function optimized for ppc architecture

Add a performance optimized crc32c function for ppc64le that uses Altivec assembly instructions. Add an architecture probe for ppc (PowerPC 'ppc64le' architecture). When the architecture probe detects that the ppc specific crc32c function (ceph_crc32c_ppc) can be used, it will do so instead of using the default crc32c function (ceph_crc32c_sctp).

@kestrels kestrels changed the title from Add crc32c function optimized for ppc architecture to crc32c: Add crc32c function optimized for ppc architecture Mar 9, 2017

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Mar 10, 2017

@kestrels Assuming I have a ppc64le machine, how could I test this?

EDIT: clarification - what would be the simplest way to test this?

@smatzek

This comment has been minimized.

Contributor

smatzek commented Mar 10, 2017

@tchaikov @michelmno You've both pushed changes for ppc compiles in the past so you may be interested in reviewing this PR.

@tchaikov tchaikov self-requested a review Mar 10, 2017

@kestrels

This comment has been minimized.

Contributor

kestrels commented Mar 10, 2017

@smithfarm

How do I verify the expected values are coming from the ppc specific crc32c calculation?

The unit test code src/test/common/test_crc32c.cc uses ceph_crc32c, which uses whatever crc algorithm is chosen for that hardware (ie it implicitly tests that the probe is working). It compares that against the performance of the two software defined algorithms that don't use optimized assembly (ceph_crc32c_sctp and ceph_crc32c_intel_baseline).

The example output below shows a ~16X boost over the default algorithm, but it is not always that high. I've observed anywhere in the range of ~8X-16X.

cd ceph/build/bin

./unittest_crc32c

Running main() from gmock_main.cc
[==========] Running 7 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 7 tests from Crc32c
[ RUN ] Crc32c.Small
[ OK ] Crc32c.Small (0 ms)
[ RUN ] Crc32c.PartialWord
[ OK ] Crc32c.PartialWord (0 ms)
[ RUN ] Crc32c.Big
[ OK ] Crc32c.Big (2 ms)
[ RUN ] Crc32c.Performance
populating large buffer
calculating crc
best choice = 16230.5 MB/sec
best choice 0xffffffff = 16198 MB/sec
sctp = 901.288 MB/sec
intel baseline = 260.269 MB/sec
[ OK ] Crc32c.Performance (6901 ms)
[ RUN ] Crc32c.Range
[ OK ] Crc32c.Range (0 ms)
[ RUN ] Crc32c.RangeZero
[ RUN ] Crc32c.RangeNull
[ OK ] Crc32c.RangeNull (0 ms)
[----------] 7 tests from Crc32c (6903 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (6903 ms total)
[ PASSED ] 7 tests.

What would be the visible symptom for ceph at runtime if the crc algorithm isn't returning expected values?

The monitor processes won't even start otherwise. I've seen this from experience early on.

ceph-run ceph-mon -f -i

...
starting mon.asolostack2-mon1-va55n4kmf6rj rank 0 at 172.29.244.19:6789/0 mon_data /var/lib/ceph/mon/ceph-asolostack2-mon1-va55n4kmf6rj fsid 46e0965a-4eb3-44cd-b403-a2ab345f9a80
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
what(): buffer::malformed_input: bad crc, actual 1299425307 != expected 3335742267
*** Caught signal (Aborted) **
in thread 3fffa042cc10 thread_name:ceph-mon
ceph version 10.2.5-5992-gbe61e1e (be61e1e4cc037d8978a3141cf1175e13430b5663)
1: (()+0x5a9c94) [0x560e9c94]
2: [0x3fffa03b04d8]
3: (gsignal()+0x40) [0x3fff9f9dedb0]
4: (abort()+0x2b4) [0x3fff9f9e1314]
5: (__gnu_cxx::__verbose_terminate_handler()+0x204) [0x3fff9fd3c304]
6: (()+0xa8b34) [0x3fff9fd38b34]
7: (std::terminate()+0x20) [0x3fff9fd38bf0]
8: (__cxa_throw()+0xb4) [0x3fff9fd39034]
9: (OSDMap::decode(ceph::buffer::list::iterator&)+0xb68) [0x562101c8]
10: (OSDMap::decode(ceph::buffer::list&)+0x4c) [0x562122fc]
11: (OSDMonitor::update_from_paxos(bool*)+0xad8) [0x55e5bd08]
12: (PaxosService::refresh(bool*)+0x418) [0x55e3d9e8]
13: (Monitor::refresh_from_paxos(bool*)+0x1c0) [0x55dc5d70]
14: (Monitor::init_paxos()+0x14c) [0x55dc624c]
15: (Monitor::preinit()+0xa6c) [0x55df98fc]
16: (main()+0x2864) [0x55d66674]
17: (()+0x2309c) [0x3fff9f9c309c]
18: (__libc_start_main()+0xb8) [0x3fff9f9c3298]
2017-02-20 21:23:25.480999 3fffa042cc10 -1 *** Caught signal (Aborted) **
in thread 3fffa042cc10 thread_name:ceph-mon

How do I verify I'm using the right kind of ppc64le hardware where this code can run?

Set LD_SHOW_AUXV=1 and then run any process. This displays
the loader auxilliary vector which keeps track of which
hardware features are present. The AT_HWCAP2 field
needs to have "vcrypto". This should be expected on all power8
or later systems (ie but not on power7 or below).

LD_SHOW_AUXV=1 sleep 1

...
AT_HWCAP2: htm-nosc vcrypto tar isel ebb dscr htm arch_2_07
...

@kestrels

This comment has been minimized.

Contributor

kestrels commented Mar 10, 2017

It isn't clear to me why unmodified submodules check is now failing. The details link is a dead link.

@shinobu-x

This comment has been minimized.

Contributor

shinobu-x commented Mar 11, 2017

retest this please

@michelmno

This comment has been minimized.

Contributor

michelmno commented Mar 13, 2017

nice for me

@kestrels

This comment has been minimized.

Contributor

kestrels commented Mar 14, 2017

@tchaikov I noticed you assigned this to yourself for review late last week. Thanks very much for doing that. My goal is to hopefully get this completed in time for the upcoming luminous release.

static unsigned int crc32_align(unsigned int crc, unsigned char const *p,
unsigned long len)
{
while (len--)

This comment has been minimized.

@tchaikov

tchaikov Mar 15, 2017

Contributor

please fix the indent, we use tab width=8. and indent width=2. see https://github.com/ceph/ceph/blob/master/CodingStyle#L75

This comment has been minimized.

@kestrels

kestrels Mar 16, 2017

Contributor

Done.

unsigned int crc32_vpmsum(unsigned int crc, unsigned char *p,
unsigned long len)
*/
uint32_t ceph_crc32c_ppc_internal(uint32_t crc, unsigned char const *data, unsigned len)

This comment has been minimized.

@tchaikov

tchaikov Mar 15, 2017

Contributor

i think we can continue using the function name of crc32_vpmsum(), and mark it static here. and the only caller of this function is in the same compilation unit.

This comment has been minimized.

@kestrels

kestrels Mar 16, 2017

Contributor

As you suggested, I changed the name from __crc32_vpmsum to crc32_vpmsum.

I was not able to mark it static, since that causes compiler errors related to the loader and toc. The crc32_vpmsum function is called from src/common/crc32c_ppc.c (which becomes crc32c_ppc.o) while the function itself is defined in src/common/crc32c_ppc_asm.S (which becomes crc32c_ppc_asm.o).

This comment has been minimized.

@tchaikov

tchaikov Mar 17, 2017

Contributor

what i suggested was s/ceph_crc32c_ppc_internal/crc32_vpmsum/ and mark crc32_vpmsum() static actually.

This comment has been minimized.

@kestrels

kestrels Mar 17, 2017

Contributor

I see what you mean now. Sorry for the misunderstanding. I've corrected it in the latest version I just pushed.

#ifndef CEPH_COMMON_CRC32C_PPC_H
#define CEPH_COMMON_CRC32C_PPC_H
#include "arch/ppc-opcode.h"

This comment has been minimized.

@tchaikov

tchaikov Mar 15, 2017

Contributor

i don't think ppc-opcode.h is needed in the C code.

This comment has been minimized.

@kestrels

kestrels Mar 16, 2017

Contributor

Done.

@@ -0,0 +1,23 @@
#ifndef __OPCODES_H

This comment has been minimized.

@tchaikov

tchaikov Mar 15, 2017

Contributor

maybe we can move this file into src/common, where crc32c_ppc_asm.S resides. and it is crc32c_ppc_asm.S who is including this header file.

This comment has been minimized.

@kestrels

kestrels Mar 16, 2017

Contributor

Done.

CHECK_C_COMPILER_FLAG("-mcpu=power8" HAS_POWER8)
if(HAS_POWER8)
message(STATUS " HAS_POWER8 yes")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DHAS_POWER8")

This comment has been minimized.

@tchaikov

tchaikov Mar 15, 2017

Contributor

might want to use add_definitions() instead. or better off, IMO, exposing this information in src/include/config-h.cmake.

This comment has been minimized.

@kestrels

kestrels Mar 16, 2017

Contributor

As suggested, I used src/include/config-h.in.cmake. I renamed HAS_POWER8 to be HAVE_POWER8 to better fit with the naming convention within config-h.in.cmake. I added #include "acconfig.h" to src/common/crc32c_ppc.c.

@@ -415,6 +437,8 @@ set(libcommon_files
common/crc32c.cc
common/crc32c_intel_baseline.c
common/crc32c_intel_fast.c
common/crc32c_ppc.c
${ppc_asm_srcs}

This comment has been minimized.

@tchaikov

tchaikov Mar 15, 2017

Contributor

how about

if(HAS_POWER8)
  list(APPEND libcommon_files
    common/crc32c_ppc_asm.S)
endif()

after libcommon_files is initialized.

This comment has been minimized.

@smatzek

smatzek Mar 15, 2017

Contributor

We can do that, but were trying to follow the example of the Intel assembly ${yasm_srcs} in where and how the list is built and added in. Which way is preferred?

This comment has been minimized.

@kestrels

kestrels Mar 16, 2017

Contributor

Done. I used the list append way, as suggested.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Mar 15, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Mar 17, 2017

and might want to update your commit message's title (not the one of this PR), so it looks like "crc32c: Add crc32c function optimized for ppc architecture".

@kestrels

This comment has been minimized.

Contributor

kestrels commented Mar 17, 2017

Good point. The commit message title now matches the title of the PR.

static unsigned int crc32_align(unsigned int crc, unsigned char const *p,
unsigned long len)
{
while (len--)

This comment has been minimized.

@tchaikov

tchaikov Mar 17, 2017

Contributor

the indent is still wrong. you could try

// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
// vim: ts=8 sw=2 smarttab

if you are using emacs or vim to see if it comply to https://github.com/ceph/ceph/blob/master/CodingStyle#L75.

This comment has been minimized.

@kestrels

kestrels Mar 17, 2017

Contributor

Done.

int ceph_arch_ppc_probe(void)
{
ceph_arch_ppc_crc32 = 0;

This comment has been minimized.

@tchaikov

tchaikov Mar 20, 2017

Contributor

indent.

This comment has been minimized.

@kestrels

kestrels Mar 20, 2017

Contributor

Done.

static unsigned int crc32_align(unsigned int crc, unsigned char const *p,
unsigned long len)
{
while (len--)

This comment has been minimized.

@tchaikov

tchaikov Mar 20, 2017

Contributor

indent.

This comment has been minimized.

@kestrels

kestrels Mar 20, 2017

Contributor

Done.

crc32c: Add crc32c function optimized for ppc architecture
Add a performance optimized crc32c function for
ppc64le that uses Altivec assembly instructions.
Add an architecture probe for ppc (PowerPC
'ppc64le' architecture).  When the architecture
probe detects that the ppc specific crc32c function
(ceph_crc32c_ppc) can be used, it will do so instead
of using the default crc32c function (ceph_crc32c_sctp).

Signed-off-by: Andrew Solomon <asolomon@us.ibm.com>
@kestrels

This comment has been minimized.

Contributor

kestrels commented Mar 22, 2017

I've pushed another revision that should address all your current review comments.

@tchaikov tchaikov added the needs-qa label Mar 24, 2017

@kestrels

This comment has been minimized.

Contributor

kestrels commented Mar 30, 2017

@tchaikov I just wanted to double check that my understanding is correct and that you're not waiting on me to do anything in order for this to get merged.

The next step would be for either you, Sage, or Yuri to run it through the QA suite (ie teuthology / pulpito / etc). That seems to be the pattern just looking at the following link:

https://github.com/ceph/ceph/pulls?q=label%3Aneeds-qa+is%3Aclosed

Is that correct?

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Mar 31, 2017

@kestrels yes, i will pull it into my test branch.

@tchaikov tchaikov self-assigned this Mar 31, 2017

@tchaikov

This comment has been minimized.

@tchaikov tchaikov merged commit d65ee4b into ceph:master Apr 1, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment