Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: Leverage a better CRC32C implementation #57271

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tj90241
Copy link

@tj90241 tj90241 commented May 5, 2024

ISA-L provides a few different CRC32C implementations, of
which Ceph has only ever linked against one
(crc32_iscsi_00).

The second implementation of CRC32C provided by ISA-L
(crc32_iscsi_01) improves upon the first as it is used by
Ceph in a couple of ways:

  1. crc32_iscsi_01 explicitly handles and checks for < 8
    byte buffers and computes the CRC32C value using the
    hardware-accelerated CRC32 instruction. In comparison,
    crc32_iscsi_00 prefetches too far in cases of small
    buffers, requiring the Ceph code to explicitly check
    and handle this case differently in software. This
    software-fallback implementation of CRC32 also comes
    with a different set of LUTs (look up tables) and is
    less efficient as it does not make use of the CRC32
    instruction.

  2. crc32_iscsi_00 makes use of large LUTs (look up
    tables) to effectively perform the modular reduction
    required to produce the CRC32C value. In constrast,
    crc32_iscsi_01 uses the PCLMUL instruction set to
    perform reductions 128-bits at a time with smaller
    LUTs, resulting in greater throughput and less data
    cache pollution.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@tj90241
Copy link
Author

tj90241 commented May 5, 2024

cc: @markhpc @Matt1360

ISA-L provides a few different CRC32C implementations, of
which Ceph has only ever linked against one
(crc32_iscsi_00).

The second implementation of CRC32C provided by ISA-L
(crc32_iscsi_01) improves upon the first as it is used by
Ceph in a couple of ways:

  1) crc32_iscsi_01 explicitly handles and checks for < 8
     byte buffers and computes the CRC32C value using the
     hardware-accelerated CRC32 instruction. In comparison,
     crc32_iscsi_00 prefetches too far in cases of small
     buffers, requiring the Ceph code to explicitly check
     and handle this case differently in software. This
     software-fallback implementation of CRC32 also comes
     with a different set of LUTs (look up tables) and is
     less efficient as it does not make use of the CRC32
     instruction.

  2) crc32_iscsi_00 makes use of large LUTs (look up
     tables) to effectively perform the modular reduction
     required to produce the CRC32C value. In constrast,
     crc32_iscsi_01 uses the PCLMUL instruction set to
     perform reductions 128-bits at a time with smaller
     LUTs, resulting in greater throughput and less data
     cache pollution.

Fixes: https://tracker.ceph.com/issues/65791
Signed-off-by: Tyler Stachecki <tstachecki@bloomberg.net>
@tj90241
Copy link
Author

tj90241 commented May 5, 2024

Aside -- I'm not sure the check for overread in crc32_iscsi_00 applies any more either. An ISA-L submodule bump was done here:

commit 441aa3672fe886aed07e08ab944be68fda784d15
Author: Hang Li <lihang48@huawei.com>
Date:   Tue Mar 10 09:37:45 2020 +0800

    Bump the version of isa-l to v2.29.0

At which point this change was swept up: intel/isa-l@105eeb9

Copy link
Contributor

@Matt1360 Matt1360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

@tj90241
Copy link
Author

tj90241 commented May 5, 2024

jenkins retest this please

@markhpc markhpc requested a review from a team May 7, 2024 02:25
@markhpc markhpc self-requested a review May 14, 2024 13:56
Copy link
Member

@markhpc markhpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I don't know if there are any corner cases we should watch out for. Probably good to get this in testing early in the release cycle.

@athanatos
Copy link
Contributor

@tj90241 Does this get executed in a normal suite run? If so, lgtm.

@tj90241
Copy link
Author

tj90241 commented May 15, 2024

Hi @athanatos - any kind of cluster or use case will make extremely heavy use of this change as long as the hosts under QA have PCLMUL support (check grep -I pclmul /proc/cpuinfo).. CRC32, unless disabled in ceph.conf, is heavily used by the msgr component by default and typically CRC32C is one of the hottest symbols in a execution profile of Ceph OSDs.

@athanatos
Copy link
Contributor

[sjust@smithi105 ~]$ grep -I pclmul /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 pti intel_ppin tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi

I see pclmulqdq in cpuinfo -- good enough?

@tj90241
Copy link
Author

tj90241 commented May 15, 2024

Technically you also need sse4_2 (it was a requirement for the previous accelerated CRC32C function as well), but you have that as well. With the above, I can say with certainty that any kind of Ceph cluster which has CRC enabled (by default) will be using this new function heavily when its doing I/O.

If you wanted to be extra pedantic, you can also validate the symbol is hit at runtime with e.g. sudo perf top -p $( pidof ceph-osd | tr ' ' '\n' | head -n1 ) - you will likely see crc32_iscsi_01 somewhere at the top of the hit list when the cluster is doing lots of I/O. Without this commit, you should see crc32_iscsi_00 in its place.

(you could also set a breakpoint in gdb on those symbols)

@athanatos
Copy link
Contributor

@tj90241 I just want to ensure that we're getting test coverage with our normal integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants