-
Notifications
You must be signed in to change notification settings - Fork 107
[DEEPIN-Kernel-SIG] [linux 6.6-y] Add NUMA-awareness to qspinlock #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEEPIN-Kernel-SIG] [linux 6.6-y] Add NUMA-awareness to qspinlock #692
Conversation
…eneric cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-2-alex.kogan@oracle.com/ The mcs unlock macro (arch_mcs_lock_handoff) should accept the value to be stored into the lock argument as another argument. This allows using the same macro in cases where the value to be stored when passing the lock is different from 1. Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/ Move some of the code manipulating the spin lock into separate functions. This would allow easier integration of alternative ways to manipulate that lock. Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/ In CNA, spinning threads are organized in two queues, a primary queue for threads running on the same node as the current lock holder, and a secondary queue for threads running on other nodes. After acquiring the MCS lock and before acquiring the spinlock, the MCS lock holder checks whether the next waiter in the primary queue (if exists) is running on the same NUMA node. If it is not, that waiter is detached from the main queue and moved into the tail of the secondary queue. This way, we gradually filter the primary queue, leaving only waiters running on the same preferred NUMA node. For more details, see https://arxiv.org/abs/1810.05600. Note that this variant of CNA may introduce starvation by continuously passing the lock between waiters in the main queue. This issue will be addressed later in the series. Enabling CNA is controlled via a new configuration option (NUMA_AWARE_SPINLOCKS). By default, the CNA variant is patched in at the boot time only if we run on a multi-node machine in native environment and the new config is enabled. (For the time being, the patching requires CONFIG_PARAVIRT_SPINLOCKS to be enabled as well. However, this should be resolved once static_call() is available.) This default behavior can be overridden with the new kernel boot command-line option "numa_spinlock=on/off" (default is "auto"). Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/ Keep track of the time the thread at the head of the secondary queue has been waiting, and force inter-node handoff once this time passes a preset threshold. The default value for the threshold (1ms) can be overridden with the new kernel boot command-line option "qspinlock.numa_spinlock_threshold_ns". Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
…s in CNA cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/ Prohibit moving certain threads (e.g., in irq and nmi contexts) to the secondary queue. Those prioritized threads will always stay in the primary queue, and so will have a shorter wait time for the lock. Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
cherry-picked from https://lore.kernel.org/all/20210514200743.3026725-3-alex.kogan@oracle.com/ This performance optimization chooses probabilistically to avoid moving threads from the main queue into the secondary one when the secondary queue is empty. It is helpful when the lock is only lightly contended. In particular, it makes CNA less eager to create a secondary queue, but does not introduce any extra delays for threads waiting in that queue once it is created. Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
Disable CNA by default, this default behavior can be overridden with the kernel boot command-line option "numa_spinlock=on/off/auto". Signed-off-by: Kong Yingqiao <kongyingqiao@hygon.cn>
Reviewer's Guide by SourceryThis pull request introduces a NUMA-aware qspinlock implementation (CNA) to improve performance in NUMA scenarios. It also refactors MCS spinlock functions for clarity and potential reuse. The CNA implementation manages two queues, a primary queue for threads on the same NUMA node and a secondary queue for threads on other nodes, and moves threads between them based on NUMA node and waiting time. The changes include modifications to the qspinlock structure, the introduction of new functions for queue management, and updates to the slow path to incorporate NUMA awareness. No diagrams generated as the changes look simple and do not need a visual representation. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Hi @Yingqiao-Kong. Thanks for your PR. 😃 |
|
Hi @Yingqiao-Kong. Thanks for your PR. I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Yingqiao-Kong - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a Kconfig option to enable/disable the NUMA-aware qspinlock, instead of relying on the numa_spinlock kernel parameter.
- It would be helpful to include more details on the specific hardware and kernel configurations used for performance testing.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
测试建议: 参考https://gitee.com/openeuler/kernel/issues/I8T8XV 打开CONFIG_DEBUG_LOCKING_API_SELFTESTS=y和 |
efa65fa to
1c3c157
Compare
|
/ok-to-test |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: opsiff The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a NUMA-aware qspinlock slowpath (CNA) and the supporting infrastructure, aiming to improve spinlock performance on multi-node systems (especially under high contention) while keeping the existing pvqspinlock machinery compatible.
Changes:
- Add compact NUMA-aware (CNA) MCS/qspinlock slowpath logic (
qspinlock_cna.h) and hook it into the generic qspinlock implementation with macro-based slowpath generation. - Generalize MCS spinlock architecture hooks (
arch_mcs_spin_wait/arch_mcs_lock_handoff) and adjust qspinlock’s slowpath handoff/clear-tail helpers to be overridable by CNA and paravirt variants. - Wire up x86 Kconfig, boot-time configuration, and kernel parameters to control NUMA-aware spinlocks, and document the new knobs.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| kernel/locking/qspinlock_paravirt.h | Updates a comment to reference the new arch_mcs_lock_handoff() name, keeping paravirt qspinlock docs in sync with the new MCS handoff API. |
| kernel/locking/qspinlock_cna.h | Introduces the CNA NUMA-aware MCS/qspinlock slowpath implementation, including per-CPU CNA nodes, dual-queue management, probabilistic shuffling, and the numa_spinlock boot parameter and numa_spinlock_threshold_ns module parameter. |
| kernel/locking/qspinlock.c | Integrates CNA and paravirt slowpaths via the _GEN_CNA_LOCK_SLOWPATH / _GEN_PV_LOCK_SLOWPATH recursion scheme, adds generic __try_clear_tail() and __mcs_lock_handoff() helpers, switches MCS wait/handoff calls to the new abstracted hooks, and adds CNA-specific padding to struct qnode. |
| kernel/locking/mcs_spinlock.h | Changes locked to unsigned int to support encoded secondary-queue tails, replaces the old arch_mcs_spin_lock_contended / arch_mcs_spin_unlock_contended hooks with arch_mcs_spin_wait / arch_mcs_lock_handoff, and updates the generic MCS lock/unlock to use them. |
| include/asm-generic/mcs_spinlock.h | Adjusts the comment to document the new arch_mcs_spin_wait() and arch_mcs_lock_handoff() hook names expected from architectures. |
| arch/x86/kernel/alternative.c | Calls cna_configure_spin_lock_slowpath() during alternative_instructions() when CONFIG_NUMA_AWARE_SPINLOCKS is set, so CNA can install its slowpath before paravirt/alternative patching runs. |
| arch/x86/include/asm/qspinlock.h | Declares cna_configure_spin_lock_slowpath() under CONFIG_NUMA_AWARE_SPINLOCKS so the x86 boot code can call into the CNA configuration logic. |
| arch/x86/Kconfig | Adds CONFIG_NUMA_AWARE_SPINLOCKS (x86_64, NUMA, queued spinlocks, paravirt) to gate building CNA support and default it to y on suitable systems. |
| arch/arm/include/asm/mcs_spinlock.h | Renames ARM’s MCS spin macros to the new arch_mcs_spin_wait() and arch_mcs_lock_handoff() names while preserving the existing barrier and WFE/SEV semantics. |
| Documentation/admin-guide/kernel-parameters.txt | Documents the numa_spinlock= boot parameter (auto/on/off, with default “off”) and qspinlock.numa_spinlock_threshold_ns= to tune the intra-node handoff duration before flushing the secondary queue. |
Notable nits and correctness observations:
- Documentation consistency for
numa_spinlock_flag: Inqspinlock_cna.h, the comment fornuma_spinlock_flagstates that0(“auto”) is the default, but the code initializes it to-1and the kernel-parameter documentation says “Not specifying this option is equivalent tonuma_spinlock=off.” The comment on the flag and/or the function header ofcna_configure_spin_lock_slowpath()should be updated to reflect that the actual default behavior is “off”, not “auto”. - Spelling issues in new comments (non-functional but easy to fix for clarity):
priortized→prioritizedin the CNA description;ecoded→encodedin the “ecoded tail word” comment;preseve→preserveincna_lock_handoff()’s “preserve secondary queue” comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bugzilla: https://bugzilla.openanolis.cn/show_bug.cgi?id=13056
Reference:
https://gitee.com/anolis/cloud-kernel/pulls/4406/commits
https://gitee.com/openeuler/kernel/pulls/3744
Currently, kernel applies qspinlock according to the request time. The performance problems caused by non-consistency of memory access in numa scenarios are not considered.
The patch herrits from compact NUMA-aware lock,considering about numa nodes when applying qspinlock. Please refers to the newest version in the community: https://lore.kernel.org/linux-arm-kernel/20210514200743.3026725-1-alex.kogan@oracle.com
When running 50-300 threads for MySQL OLTP read_write scenarios on Hygon platforms, the qspinlock competition is fierce. The TPS performance can improve 10%~15% with this patch. For multi process in tmpfs on Hygon platforms, the qspinlock competition is also fierce. The performance can improve 7% with this patch.
Summary by Sourcery
Add NUMA-awareness to qspinlock to improve performance in multi-threaded and multi-process scenarios by optimizing spinlock contention across NUMA nodes
New Features:
Enhancements:
Chores: