Skip to content

Conversation

@Yingqiao-Kong
Copy link
Contributor

@Yingqiao-Kong Yingqiao-Kong commented Mar 27, 2025

bugzilla: https://bugzilla.openanolis.cn/show_bug.cgi?id=13056

Reference:
https://gitee.com/anolis/cloud-kernel/pulls/4406/commits
https://gitee.com/openeuler/kernel/pulls/3744

Currently, kernel applies qspinlock according to the request time. The performance problems caused by non-consistency of memory access in numa scenarios are not considered.

The patch herrits from compact NUMA-aware lock,considering about numa nodes when applying qspinlock. Please refers to the newest version in the community: https://lore.kernel.org/linux-arm-kernel/20210514200743.3026725-1-alex.kogan@oracle.com

When running 50-300 threads for MySQL OLTP read_write scenarios on Hygon platforms, the qspinlock competition is fierce. The TPS performance can improve 10%~15% with this patch. For multi process in tmpfs on Hygon platforms, the qspinlock competition is also fierce. The performance can improve 7% with this patch.

Summary by Sourcery

Add NUMA-awareness to qspinlock to improve performance in multi-threaded and multi-process scenarios by optimizing spinlock contention across NUMA nodes

New Features:

  • Implement a NUMA-aware spinlock mechanism that organizes waiting threads into primary and secondary queues based on their NUMA node

Enhancements:

  • Modify qspinlock implementation to consider NUMA node locality when managing lock contention
  • Add configurable threshold for switching NUMA node preference during spinlock waiting

Chores:

  • Update MCS spinlock and qspinlock header files to support NUMA-aware locking
  • Add configuration option to enable/disable NUMA-aware spinlocks

sakogan and others added 7 commits March 27, 2025 18:28
…eneric

cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-2-alex.kogan@oracle.com/

The mcs unlock macro (arch_mcs_lock_handoff) should accept the value to be
stored into the lock argument as another argument. This allows using the
same macro in cases where the value to be stored when passing the lock is
different from 1.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/

Move some of the code manipulating the spin lock into separate functions.
This would allow easier integration of alternative ways to manipulate
that lock.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/

In CNA, spinning threads are organized in two queues, a primary queue for
threads running on the same node as the current lock holder, and a
secondary queue for threads running on other nodes. After acquiring the
MCS lock and before acquiring the spinlock, the MCS lock
holder checks whether the next waiter in the primary queue (if exists) is
running on the same NUMA node. If it is not, that waiter is detached from
the main queue and moved into the tail of the secondary queue. This way,
we gradually filter the primary queue, leaving only waiters running on
the same preferred NUMA node. For more details, see
https://arxiv.org/abs/1810.05600.

Note that this variant of CNA may introduce starvation by continuously
passing the lock between waiters in the main queue. This issue will be
addressed later in the series.

Enabling CNA is controlled via a new configuration option
(NUMA_AWARE_SPINLOCKS). By default, the CNA variant is patched in at the
boot time only if we run on a multi-node machine in native environment and
the new config is enabled. (For the time being, the patching requires
CONFIG_PARAVIRT_SPINLOCKS to be enabled as well. However, this should be
resolved once static_call() is available.) This default behavior can be
overridden with the new kernel boot command-line option
"numa_spinlock=on/off" (default is "auto").

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/

Keep track of the time the thread at the head of the secondary queue
has been waiting, and force inter-node handoff once this time passes
a preset threshold. The default value for the threshold (1ms) can be
overridden with the new kernel boot command-line option
"qspinlock.numa_spinlock_threshold_ns".

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
…s in CNA

cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/

Prohibit moving certain threads (e.g., in irq and nmi contexts)
to the secondary queue. Those prioritized threads will always stay
in the primary queue, and so will have a shorter wait time for the lock.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/

This performance optimization chooses probabilistically to avoid moving
threads from the main queue into the secondary one when the secondary queue
is empty.

It is helpful when the lock is only lightly contended. In particular, it
makes CNA less eager to create a secondary queue, but does not introduce
any extra delays for threads waiting in that queue once it is created.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
Disable CNA by default, this default behavior can be overridden with
the kernel boot command-line option "numa_spinlock=on/off/auto".

Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
@sourcery-ai
Copy link

sourcery-ai bot commented Mar 27, 2025

Reviewer's Guide by Sourcery

This pull request introduces a NUMA-aware qspinlock implementation (CNA) to improve performance in NUMA scenarios. It also refactors MCS spinlock functions for clarity and potential reuse. The CNA implementation manages two queues, a primary queue for threads on the same NUMA node and a secondary queue for threads on other nodes, and moves threads between them based on NUMA node and waiting time. The changes include modifications to the qspinlock structure, the introduction of new functions for queue management, and updates to the slow path to incorporate NUMA awareness.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change Details Files
Introduces NUMA-aware qspinlock implementation (CNA) to improve performance in NUMA scenarios.
  • Added CONFIG_NUMA_AWARE_SPINLOCKS configuration option.
  • Introduced qspinlock_cna.h which implements the CNA logic.
  • Modified qnode structure to include numa_node, real_numa_node, encoded_tail, and start_time for NUMA awareness.
  • Implemented functions for managing primary and secondary queues for threads waiting on the lock.
  • Implemented logic to move threads between queues based on NUMA node and waiting time.
  • Added module parameter numa_spinlock_threshold_ns to control the waiting time threshold.
  • Modified slow path to incorporate NUMA awareness.
  • Modified MCS spinlock functions to support NUMA awareness.
kernel/locking/qspinlock.c
kernel/locking/mcs_spinlock.h
arch/arm/include/asm/mcs_spinlock.h
arch/x86/include/asm/qspinlock.h
arch/x86/kernel/alternative.c
include/asm-generic/mcs_spinlock.h
kernel/locking/qspinlock_paravirt.h
kernel/locking/qspinlock_cna.h
arch/x86/Kconfig
arch/x86/configs/deepin_x86_desktop_defconfig
Refactors MCS spinlock functions for clarity and potential reuse.
  • Renamed arch_mcs_spin_lock_contended to arch_mcs_spin_wait.
  • Renamed arch_mcs_spin_unlock_contended to arch_mcs_lock_handoff.
  • Introduced try_clear_tail and mcs_lock_handoff macros for conditional compilation based on CONFIG_NUMA_AWARE_SPINLOCKS and CONFIG_PARAVIRT_SPINLOCKS.
kernel/locking/qspinlock.c
kernel/locking/mcs_spinlock.h
arch/arm/include/asm/mcs_spinlock.h
include/asm-generic/mcs_spinlock.h
kernel/locking/qspinlock_paravirt.h

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

Hi @Yingqiao-Kong. Thanks for your PR. 😃

@deepin-ci-robot
Copy link

Hi @Yingqiao-Kong. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@deepin-ci-robot deepin-ci-robot requested review from BLumia and Wenlp March 27, 2025 10:57
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Yingqiao-Kong - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a Kconfig option to enable/disable the NUMA-aware qspinlock, instead of relying on the numa_spinlock kernel parameter.
  • It would be helpful to include more details on the specific hardware and kernel configurations used for performance testing.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@opsiff
Copy link
Member

opsiff commented Mar 27, 2025

测试建议: 参考https://gitee.com/openeuler/kernel/issues/I8T8XV 打开CONFIG_DEBUG_LOCKING_API_SELFTESTS=y和
CONFIG_PROVE_LOCKING=y后在内核命令行启用此特性看内核日志中锁的自测能否通过。

@Yingqiao-Kong Yingqiao-Kong changed the title [linux 6.6-y][DEEPIN] Add NUMA-awareness to qspinlock [DEEPIN-Kernel-SIG][linux 6.6-y]Add NUMA-awareness to qspinlock Mar 27, 2025
@Yingqiao-Kong Yingqiao-Kong changed the title [DEEPIN-Kernel-SIG][linux 6.6-y]Add NUMA-awareness to qspinlock [DEEPIN-Kernel-SIG] [linux 6.6-y]Add NUMA-awareness to qspinlock Mar 27, 2025
@Yingqiao-Kong Yingqiao-Kong changed the title [DEEPIN-Kernel-SIG] [linux 6.6-y]Add NUMA-awareness to qspinlock [DEEPIN-Kernel-SIG] [linux 6.6-y] Add NUMA-awareness to qspinlock Mar 27, 2025
@opsiff
Copy link
Member

opsiff commented Apr 1, 2025

/ok-to-test

@opsiff
Copy link
Member

opsiff commented Apr 1, 2025

/approve

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opsiff

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@opsiff opsiff merged commit e6a498c into deepin-community:linux-6.6.y Apr 1, 2025
5 of 7 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a NUMA-aware qspinlock slowpath (CNA) and the supporting infrastructure, aiming to improve spinlock performance on multi-node systems (especially under high contention) while keeping the existing pvqspinlock machinery compatible.

Changes:

  • Add compact NUMA-aware (CNA) MCS/qspinlock slowpath logic (qspinlock_cna.h) and hook it into the generic qspinlock implementation with macro-based slowpath generation.
  • Generalize MCS spinlock architecture hooks (arch_mcs_spin_wait / arch_mcs_lock_handoff) and adjust qspinlock’s slowpath handoff/clear-tail helpers to be overridable by CNA and paravirt variants.
  • Wire up x86 Kconfig, boot-time configuration, and kernel parameters to control NUMA-aware spinlocks, and document the new knobs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
kernel/locking/qspinlock_paravirt.h Updates a comment to reference the new arch_mcs_lock_handoff() name, keeping paravirt qspinlock docs in sync with the new MCS handoff API.
kernel/locking/qspinlock_cna.h Introduces the CNA NUMA-aware MCS/qspinlock slowpath implementation, including per-CPU CNA nodes, dual-queue management, probabilistic shuffling, and the numa_spinlock boot parameter and numa_spinlock_threshold_ns module parameter.
kernel/locking/qspinlock.c Integrates CNA and paravirt slowpaths via the _GEN_CNA_LOCK_SLOWPATH / _GEN_PV_LOCK_SLOWPATH recursion scheme, adds generic __try_clear_tail() and __mcs_lock_handoff() helpers, switches MCS wait/handoff calls to the new abstracted hooks, and adds CNA-specific padding to struct qnode.
kernel/locking/mcs_spinlock.h Changes locked to unsigned int to support encoded secondary-queue tails, replaces the old arch_mcs_spin_lock_contended / arch_mcs_spin_unlock_contended hooks with arch_mcs_spin_wait / arch_mcs_lock_handoff, and updates the generic MCS lock/unlock to use them.
include/asm-generic/mcs_spinlock.h Adjusts the comment to document the new arch_mcs_spin_wait() and arch_mcs_lock_handoff() hook names expected from architectures.
arch/x86/kernel/alternative.c Calls cna_configure_spin_lock_slowpath() during alternative_instructions() when CONFIG_NUMA_AWARE_SPINLOCKS is set, so CNA can install its slowpath before paravirt/alternative patching runs.
arch/x86/include/asm/qspinlock.h Declares cna_configure_spin_lock_slowpath() under CONFIG_NUMA_AWARE_SPINLOCKS so the x86 boot code can call into the CNA configuration logic.
arch/x86/Kconfig Adds CONFIG_NUMA_AWARE_SPINLOCKS (x86_64, NUMA, queued spinlocks, paravirt) to gate building CNA support and default it to y on suitable systems.
arch/arm/include/asm/mcs_spinlock.h Renames ARM’s MCS spin macros to the new arch_mcs_spin_wait() and arch_mcs_lock_handoff() names while preserving the existing barrier and WFE/SEV semantics.
Documentation/admin-guide/kernel-parameters.txt Documents the numa_spinlock= boot parameter (auto/on/off, with default “off”) and qspinlock.numa_spinlock_threshold_ns= to tune the intra-node handoff duration before flushing the secondary queue.

Notable nits and correctness observations:

  • Documentation consistency for numa_spinlock_flag: In qspinlock_cna.h, the comment for numa_spinlock_flag states that 0 (“auto”) is the default, but the code initializes it to -1 and the kernel-parameter documentation says “Not specifying this option is equivalent to numa_spinlock=off.” The comment on the flag and/or the function header of cna_configure_spin_lock_slowpath() should be updated to reflect that the actual default behavior is “off”, not “auto”.
  • Spelling issues in new comments (non-functional but easy to fix for clarity): priortizedprioritized in the CNA description; ecodedencoded in the “ecoded tail word” comment; presevepreserve in cna_lock_handoff()’s “preserve secondary queue” comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants