Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize crc32 & crc32c on NVIDIA Grace #2204

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

krenzland
Copy link

@krenzland krenzland commented May 16, 2024

This pull request adds hardware accelerated routines for CRC32 and CRC32C for Arm AARCH64 CPUs. The changes here have been tested on NVIDIA Grace.
In detail, it contains routines for:

  • Computing CRC32 and CRC32C hashes on dataset using the CRC intrinsics. On Grace/Neoverse V2, this can process 8 bytes/cycle.
  • A vectorized implementation of the gf_multiply_crc32c_hw and gf_multiply_crc32_hw functions used in routines to merge partial CRC checksums. These functions are more or less a 1:1 translation of the x86 vectorized routines.
  • I've introduced feature flags for AES, and SHA extensions for Arm CPUs. The feature checks for the vectorized functions are a bit more messy than on x86 because CPUs can implement a subset of these extensions.

This should resolve issue #2027.

@krenzland krenzland changed the title Optimize crc32 & crc32c on Nvidia Grace Optimize crc32 & crc32c on NVIDIA Grace May 16, 2024
@facebook-github-bot
Copy link
Contributor

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -77,6 +77,10 @@ uint32_t crc32_hw(
}

bool crc32c_hw_supported() {
return crc32_hw_supported_sse42();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a typo, missing the c after crc32.

Correct name should be crc32c_hw_supported_sse42

@@ -105,7 +111,35 @@ static uint32_t gf_multiply_crc32_hw(uint64_t crc1, uint64_t crc2, uint32_t) {
return _mm_cvtsi128_si32(_mm_srli_si128(_mm_xor_si128(res3, res1), 4));
}

#else
#elif FOLLY_NEON && FOLLY_ARM_FEATURE_CRC32 && FOLLY_ARM_FEATURE_AES && FOLLY_ARM_FEATURE_SHA2
Copy link
Contributor

@meteorfox meteorfox May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long.
Can you resubmit after a clang-format?

Sorry, we should have this automated as part of a git hook or something, not sure why we don't have it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Can I suggest to add clang-format to https://github.com/facebook/folly/blob/main/CONTRIBUTING.md?

- Fix typo
- Clang format all changed files
@krenzland
Copy link
Author

Thanks for the review! I forgot to add that this should be compiled with the flags
python3 build/fbcode_builder/getdeps.py --allow-system-packages build --extra-cmake-defines '{"CMAKE_CXX_FLAGS": "-march=armv8.5-a+crc+crypto"}'
or similar (+crypto could be replaced by +aes+sha2?) to enable all required features.

@facebook-github-bot
Copy link
Contributor

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants