Skip to content

Conversation

@onethumb
Copy link
Contributor

@onethumb onethumb commented Oct 30, 2025

The Problem

By relying heavily on compile-time feature detection, rather than runtime feature detection, the library was more fragile (leading to bugs like #14) and unable to gracefully degrade across CPU architecture variants with a single build.

The Solution

Rely on runtime feature detection (out of the hot path) to determine which hardware acceleration target to use, enabling graceful degradation across CPU types with a single build, and minimizing the risk of a SIGILL or similar bug sneaking in.

As a side benefit, AWS Graviton targets are ~36% faster and peak at ~53GiB/s (Graviton4).

Changes

  • Adds a feature detection mechanism at instantiation time to determine which acceleration target is ideal.
  • Removes much of the compile-time detection.
  • Enables a single binary build to gracefully degrade for older CPUs in the same family.
  • Improves Graviton4 performance by ~36% to ~53GiB/s.
  • Improves file structure and directory layout to isolate each acceleration target further.
  • Adds a benchmarking flag to the checksum utility to enable easier benchmarking using a single binary, rather than a source checkout, across platforms.
  • Updates cargo packages to latest supported.

Planned version bump

  • Which: MINOR
  • Why: non-breaking new functionality

Links

Uses more runtime feature detection, rather than compile time feature
detection, for improved reliability, maintainability, graceful
degradataion across CPU families, and performance.

Should help minimize bugs such as
awesomized#14 in the future.
# Conflicts:
#	Cargo.lock
#	Cargo.toml
@onethumb onethumb requested a review from Copilot October 30, 2025 19:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive x86 architecture support (32-bit) and reorganizes CRC-32 fusion implementations to improve code maintainability across multiple architectures (x86, x86_64, and aarch64). The main changes involve:

  • Adding 32-bit x86 support for CRC calculations with native CRC32C and PCLMULQDQ instructions
  • Restructuring the codebase to separate architecture-specific implementations into dedicated modules
  • Adding extensive integration tests for benchmark functionality
  • Implementing a feature detection system for optimal hardware acceleration selection
  • Adding future-proof tests for CRC key storage functionality

Reviewed Changes

Copilot reviewed 38 out of 44 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/checksum_integration_tests.rs New integration tests for benchmark flag parsing and various input scenarios
src/test/mod.rs Renamed module reference from future_proof to future_proof_tests
src/test/future_proof_tests.rs Extensive new tests for CRC key storage bounds checking and backwards compatibility
src/lib.rs Added x86 support to conditional compilation, new feature_detection module, enhanced documentation
src/feature_detection.rs New comprehensive feature detection system with performance tier selection
src/crc32/mod.rs Extended fusion support to include x86 (32-bit) architecture
src/crc32/fusion/mod.rs Simplified architecture dispatching with cleaner conditional compilation
src/crc32/fusion/x86/mod.rs New x86-specific CRC implementation with AVX512 and SSE support
src/crc32/fusion/x86/iscsi/*.rs Multiple new files implementing SSE, AVX512 PCLMULQDQ, and VPCLMULQDQ variants
src/crc32/fusion/aarch64/mod.rs Reorganized aarch64 implementation into separate sub-modules
src/crc32/fusion/aarch64/iscsi/*.rs Split iSCSI implementations into PMULL and PMULL+SHA3 variants
src/crc32/fusion/aarch64/iso_hdlc/*.rs Split ISO-HDLC implementations into PMULL and PMULL+SHA3 variants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@onethumb onethumb merged commit d9b8291 into awesomized:main Oct 30, 2025
76 checks passed
onethumb added a commit that referenced this pull request Oct 30, 2025
* [Improve runtime feature detection (and performance)](#21)
* [remove libc](#20)
* [Enable generating and publishing binary packages](#22)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant