[Deepin-Kernel-SIG] [linux 6.18-y] [Fromlist] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation by Avenger-285714 · Pull Request #1619 · deepin-community/kernel

Avenger-285714 · 2026-04-13T12:41:00Z

Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems.

The acceleration is implemented using C intrinsics (<arm_neon.h>) rather than raw assembly for better readability and maintainability.

Key highlights of this implementation:

Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers.
Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling.
Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128.

Performance results (kunit crc_benchmark on Cortex-A72):

Generic (len=4096): ~268 MB/s
PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)

Link: https://lore.kernel.org/all/20260329074338.1053550-1-demyansh@gmail.com/

Summary by Sourcery

Add an ARM64 NEON/PMULL-accelerated CRC64-NVMe implementation and wire it into the generic CRC64 architecture-specific path for capable CPUs.

New Features:

Introduce an ARM64 NEON-based inner CRC64-NVMe routine using PMULL intrinsics for accelerated checksum computation.
Provide an ARM64-specific crc64_nvme_arch() helper that conditionally uses the PMULL-accelerated path based on buffer size and CPU capabilities.

Build:

Hook the new ARM64 CRC64 NEON implementation into the CRC64 build when CONFIG_CRC64_ARCH and CONFIG_ARM64 are enabled, with appropriate compiler flags for PMULL/crypto support.

sourcery-ai · 2026-04-13T12:41:07Z

Reviewer's Guide

Adds an ARM64 NEON/PMULL-accelerated CRC64-NVMe implementation and wires it into the generic CRC64 architecture layer, with build flags and a chunked SIMD dispatch path that falls back to the generic implementation for short buffers or when PMULL/SIMD is unavailable.

Sequence diagram for crc64_nvme_arch dispatch and fallback

sequenceDiagram
    participant Caller
    participant crc64_nvme_arch
    participant cpu_have_named_feature
    participant may_use_simd
    participant scoped_ksimd
    participant crc64_nvme_arm64_c
    participant crc64_nvme_generic

    Caller->>crc64_nvme_arch: crc64_nvme_arch(crc, p, len)
    alt len >= 128
        crc64_nvme_arch->>cpu_have_named_feature: cpu_have_named_feature(PMULL)
        cpu_have_named_feature-->>crc64_nvme_arch: has_pmull
        crc64_nvme_arch->>may_use_simd: may_use_simd()
        may_use_simd-->>crc64_nvme_arch: simd_allowed
        alt has_pmull and simd_allowed
            loop while len >= 128
                crc64_nvme_arch->>crc64_nvme_arch: chunk = min(len & ~15, 4KB)
                crc64_nvme_arch->>scoped_ksimd: enter ksimd section
                scoped_ksimd->>crc64_nvme_arm64_c: crc64_nvme_arm64_c(crc, p, chunk)
                crc64_nvme_arm64_c-->>scoped_ksimd: updated_crc
                scoped_ksimd-->>crc64_nvme_arch: leave ksimd section
                crc64_nvme_arch->>crc64_nvme_arch: crc = updated_crc, p += chunk, len -= chunk
            end
        end
    end
    crc64_nvme_arch->>crc64_nvme_generic: crc64_nvme_generic(crc, p, len)
    crc64_nvme_generic-->>crc64_nvme_arch: final_crc
    crc64_nvme_arch-->>Caller: final_crc

Class diagram for CRC64 ARM64 NEON implementation and dispatch

classDiagram
    class Crc64Arm64 {
        +u64 crc64_nvme_arm64_c(u64 crc, const u8 *p, size_t len)
        -u64 fold_consts_val[2]
        -u64 bconsts_val[2]
    }

    class Crc64ArchLayer {
        +u64 crc64_nvme_arch(u64 crc, const u8 *p, size_t len)
        +u64 crc64_nvme_generic(u64 crc, const u8 *p, size_t len)
        +u64 crc64_be_arch(u64 crc, const u8 *p, size_t len)
        +u64 crc64_be_generic(u64 crc, const u8 *p, size_t len)
    }

    class CpuFeature {
        +bool cpu_have_named_feature(int feature)
        +const int PMULL
    }

    class SimdSubsystem {
        +bool may_use_simd()
        +scoped_ksimd scoped_ksimd()
    }

    class BuildConfig {
        +CONFIG_CRC64
        +CONFIG_CRC64_ARCH
        +CONFIG_ARM64
    }

    class Objects {
        +crc64_main_o
        +arm64_crc64_neon_inner_o
        +riscv_crc64_lsb_o
        +riscv_crc64_msb_o
        +x86_crc64_pclmul_o
    }

    Crc64ArchLayer --> Crc64Arm64 : uses
    Crc64ArchLayer --> CpuFeature : checks_features
    Crc64ArchLayer --> SimdSubsystem : manages_simd_context
    BuildConfig --> Objects : selects
    Objects --> Crc64Arm64 : links_arm64_neon_path
    Objects --> Crc64ArchLayer : links_common_crc64

File-Level Changes

Change	Details	Files
Introduce ARM64 NEON/PMULL-based inner CRC64-NVMe implementation	Add crc64_nvme_arm64_c() that computes CRC64 using ARM NEON intrinsics and PMULL, including folding and Barrett reduction using precomputed constants Implement the main 16-byte folding loop operating on a 128-bit CRC state with vmull_p64/vmull_high_p64 and xor blending of input blocks Handle final x^64 multiplication and Barrett reduction to bring the 128-bit state back to the 64-bit CRC value	`lib/crc/arm64/crc64-neon-inner.c`
Provide ARM64-specific CRC64 dispatch helper that uses the NEON path when beneficial	Declare crc64_nvme_arm64_c prototype and define crc64_nvme_arch() as the ARM64 CRC64-NVMe entry point Gate the NEON/PMULL path on buffer length >= 128, CPU PMULL capability, and may_use_simd() Process data in up-to-4KB aligned SIMD chunks under scoped_ksimd() and fall back to crc64_nvme_generic() for remaining bytes or when SIMD is not used Alias crc64_be_arch to crc64_be_generic to keep big-endian CRC64 using the generic implementation	`lib/crc/arm64/crc64.h`
Wire the new ARM64 CRC64 implementation into the build system with appropriate compiler flags	Add arm64/crc64-neon-inner.o to the crc64 objects when CONFIG_ARM64 and CONFIG_CRC64_ARCH are enabled Remove -mgeneral-regs-only for the NEON inner object and add -ffreestanding and -march=armv8-a+crypto to enable PMULL/crypto instructions Include the toolchain system headers via -isystem for the NEON intrinsic definitions Clarify the CONFIG_CRC64_ARCH conditional with a closing comment	`lib/crc/Makefile` `lib/crc/Kconfig`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Consider adding an include guard to lib/crc/arm64/crc64.h to avoid accidental multiple inclusion as this header grows or is reused elsewhere.
crc64_nvme_arm64_c is only used from the architecture-specific path; making it file-local (static) and exposing only the inline wrapper in crc64.h would better encapsulate the NEON implementation detail and reduce the chance of unintended external use.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider adding an include guard to lib/crc/arm64/crc64.h to avoid accidental multiple inclusion as this header grows or is reused elsewhere.
- crc64_nvme_arm64_c is only used from the architecture-specific path; making it file-local (static) and exposing only the inline wrapper in crc64.h would better encapsulate the NEON implementation detail and reduce the chance of unintended external use.

## Individual Comments

### Comment 1
<location path="lib/crc/arm64/crc64.h" line_range="1" />
<code_context>
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * CRC64 using ARM64 PMULL instructions
</code_context>
<issue_to_address>
**nitpick (bug_risk):** Consider adding a traditional include guard to the new header.

In this codebase we usually prefer `#ifndef`/`#define` guards over `#pragma once`, and they make multiple inclusion behavior explicit and easier to reason about. Please add a guard consistent with nearby headers.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-13T12:44:06Z

lib/crc/arm64/crc64.h

@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */


nitpick (bug_risk): Consider adding a traditional include guard to the new header.

In this codebase we usually prefer #ifndef/#define guards over #pragma once, and they make multiple inclusion behavior explicit and easier to reason about. Please add a guard consistent with nearby headers.

Copilot

Pull request overview

Adds an ARM64 NVMe CRC64 implementation accelerated with NEON/PMULL to improve throughput vs the generic shift/XOR path, and wires it into the existing CRC64 arch-dispatch mechanism.

Changes:

Enables CRC64_ARCH by default on ARM64 and adds an ARM64-specific dispatch header for crc64_nvme().
Introduces a NEON/PMULL-based CRC64-NVMe inner implementation (crc64_nvme_arm64_c()), with chunking to cap SIMD critical sections.
Updates lib/crc/Makefile to build the new ARM64 object with appropriate compiler flags.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
lib/crc/Makefile	Adds ARM64 CRC64-NVMe NEON inner object and per-file compile flags.
lib/crc/Kconfig	Defaults `CRC64_ARCH` to enabled on ARM64.
lib/crc/arm64/crc64.h	Adds ARM64 `crc64_nvme_arch()` dispatch with PMULL + SIMD gating and chunking.
lib/crc/arm64/crc64-neon-inner.c	Implements the PMULL-accelerated CRC64-NVMe update routine using NEON intrinsics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T12:54:13Z

lib/crc/arm64/crc64.h

+	if (len >= 128 && cpu_have_named_feature(PMULL) &&
+	    likely(may_use_simd())) {
+		do {
+			size_t chunk = min_t(size_t, len & ~15, SZ_4K);
+
+			scoped_ksimd()
+				crc = crc64_nvme_arm64_c(crc, p, chunk);
+


scoped_ksimd() does not exist anywhere in this tree (the arm64 SIMD users use kernel_neon_begin() / kernel_neon_end() after may_use_simd()). As written, this will fail to build. Replace this with a kernel_neon_begin()/kernel_neon_end() pair around the accelerated call (and include <asm/neon.h>), keeping the 4KB chunking to bound preempt-disable time.

Copilot · 2026-04-13T12:54:13Z

lib/crc/Makefile

+CFLAGS_REMOVE_arm64/crc64-neon-inner.o += -mgeneral-regs-only
+CFLAGS_arm64/crc64-neon-inner.o += -ffreestanding -march=armv8-a+crypto
+CFLAGS_arm64/crc64-neon-inner.o += -isystem $(shell $(CC) -print-file-name=include)


The per-object flags here hardcode -ffreestanding and -isystem $(CC ... -print-file-name=include), and use -march=armv8-a+crypto. For consistency with existing arm64 NEON/crypto intrinsics build rules, prefer using the arch-exported CC_FLAGS_FPU/CC_FLAGS_NO_FPU (see arch/arm64/Makefile:36-42) and consider -mcpu=generic+crypto (as done for aegis128-neon-inner.o in crypto/Makefile:110-123) instead of a raw -march override.

Suggested change

CFLAGS_REMOVE_arm64/crc64-neon-inner.o += -mgeneral-regs-only

CFLAGS_arm64/crc64-neon-inner.o += -ffreestanding -march=armv8-a+crypto

CFLAGS_arm64/crc64-neon-inner.o += -isystem $(shell $(CC) -print-file-name=include)

CFLAGS_REMOVE_arm64/crc64-neon-inner.o += $(CC_FLAGS_NO_FPU)

CFLAGS_arm64/crc64-neon-inner.o += $(CC_FLAGS_FPU) -mcpu=generic+crypto

Avenger-285714 · 2026-04-13T13:22:03Z

@opsiff
Linux previously underwent a cross-architecture refactor of its CRC crypto library, so this patch cannot simply be backported to linux-6.6.y.

That said, backporting it does seem necessary, as it can significantly boost NVMe I/O performance in certain scenarios.

I just couldn't come up with an elegant solution within half a minute, so I couldn't be bothered to deal with it.

[Upstream commit 814f541] Implement the ksimd scoped guard API so that it can be used by code that supports both ARM and arm64. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>

[Upstream commit c5b91a1] Encapsulate kernel_neon_begin() and kernel_neon_end() using a 'ksimd' cleanup guard. This hides the prototype of those functions, allowing them to be changed for arm64 but not ARM, without breaking code that is shared between those architectures (RAID6, AEGIS-128) It probably makes sense to expose this API more widely across architectures, as it affords more flexibility to the arch code to plumb it in, while imposing more rigid rules regarding the start/end bookends appearing in matched pairs. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>

Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems. The acceleration is implemented using C intrinsics (<arm_neon.h>) rather than raw assembly for better readability and maintainability. Key highlights of this implementation: - Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers. - Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling. - Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128. Performance results (kunit crc_benchmark on Cortex-A72): - Generic (len=4096): ~268 MB/s - PMULL (len=4096): ~1556 MB/s (nearly 6x improvement) Signed-off-by: Demian Shulhan <demyansh@gmail.com> Link: https://lore.kernel.org/all/20260329074338.1053550-1-demyansh@gmail.com/ Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn>

deepin-ci-robot · 2026-04-14T08:58:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opsiff

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~deepin/OWNERS~~ [opsiff]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI review requested due to automatic review settings April 13, 2026 12:41

deepin-ci-robot requested review from BLumia and Wenlp April 13, 2026 12:41

Avenger-285714 requested a review from opsiff April 13, 2026 12:41

Copilot started reviewing on behalf of Avenger-285714 April 13, 2026 12:43 View session

sourcery-ai bot reviewed Apr 13, 2026

View reviewed changes

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Avenger-285714 force-pushed the crc64-6.18 branch from 9c6310e to 8a03e66 Compare April 13, 2026 13:46

ardbiesheuvel and others added 2 commits April 14, 2026 10:19

Avenger-285714 force-pushed the crc64-6.18 branch from 8a03e66 to fc50112 Compare April 14, 2026 02:24

opsiff approved these changes Apr 14, 2026

View reviewed changes

opsiff merged commit f39ec81 into deepin-community:linux-6.18.y Apr 14, 2026
10 of 12 checks passed

deepin-ci-robot added the approved label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Deepin-Kernel-SIG] [linux 6.18-y] [Fromlist] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation#1619

[Deepin-Kernel-SIG] [linux 6.18-y] [Fromlist] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation#1619
opsiff merged 3 commits intodeepin-community:linux-6.18.yfrom
Avenger-285714:crc64-6.18

Avenger-285714 commented Apr 13, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Apr 13, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Avenger-285714 commented Apr 13, 2026

Uh oh!

Uh oh!

deepin-ci-robot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Avenger-285714 commented Apr 13, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for crc64_nvme_arch dispatch and fallback

Class diagram for CRC64 ARM64 NEON implementation and dispatch

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Avenger-285714 commented Apr 13, 2026

Uh oh!

Uh oh!

deepin-ci-robot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Avenger-285714 commented Apr 13, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Apr 13, 2026 •

edited

Loading