x86: Enhanced copy capabilities for Hygon processor#844
Conversation
Reviewer's GuideThis PR introduces Hygon-specific large memory copy optimizations using streaming stores and prefetch for user/kernel transfers, gated by a static branch and non-atomic kernel FPU context management, plus a sysfs tunable for the NT block size threshold. Class Diagram for Key Components in Hygon LMC EnhancementclassDiagram
class hygon_c86_info {
+unsigned int nt_cpy_mini_len
}
class fpu {
+void* regs
+unsigned int default_size
}
class thread_info {
+TIF_USING_FPU_NONATOMIC: boolean
}
class CoreLMCLogic {
<<Static Global>>
+static_key_false hygon_lmc_key
+unsigned int fpu_kernel_nonatomic_xstate_size
+hygon_c86_info hygon_c86_data
+Hygon_LMC_check(unsigned long len) bool
+copy_large_memory_generic_string(void* to, const void* from, unsigned long len) unsigned long
+kernel_fpu_begin_nonatomic_mask(unsigned int kfpu_mask) int
+kernel_fpu_end_nonatomic() void
+switch_kernel_fpu_prepare(task_struct* prev, int cpu) void
+switch_kernel_fpu_finish(task_struct* next) void
+get_nt_block_copy_mini_len() unsigned int
}
CoreLMCLogic ..> hygon_c86_info : uses
class OptimizedCopyRoutines {
<<Assembly Functions>>
+copy_user_sse2_opt_string(void* to, const void* from, unsigned long len) unsigned long
+copy_user_avx2_pf64_nt_string(void* to, const void* from, unsigned long len) unsigned long
+fpu_save_xmm0_3(void* to, ...)
+fpu_restore_xmm0_3(void* to, ...)
+fpu_save_ymm0_7(void* to, ...)
+fpu_restore_ymm0_7(void* to, ...)
}
class KernelInterface {
<<Modified Kernel Functions>>
+copy_user_generic(void* to, const void* from, unsigned long len) unsigned long
+__switch_to(task_struct* prev, task_struct* next) task_struct*
+kernel_fpu_begin_mask(unsigned int kfpu_mask) void
+kernel_fpu_end() void
}
KernelInterface ..> CoreLMCLogic : invokes
CoreLMCLogic ..> OptimizedCopyRoutines : calls
CoreLMCLogic ..> fpu : manages context
CoreLMCLogic ..> thread_info : uses flag
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Hi @zhitengqiu. Thanks for your PR. 😃 |
|
Hi @zhitengqiu. Thanks for your PR. I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
There was a problem hiding this comment.
Hey @zhitengqiu - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| #define MAX_FPU_CTX_SIZE 64 | ||
| #define KERNEL_FPU_NONATOMIC_SIZE (2 * (MAX_FPU_CTX_SIZE)) | ||
|
|
||
| #define copy_user_large_memory_generic_string copy_user_sse2_opt_string |
There was a problem hiding this comment.
issue (bug_risk): Macro redefinition risk if both SSE2 and AVX2 are enabled.
If both macros are enabled, copy_user_large_memory_generic_string will be defined twice, with the last definition overriding the previous one. Use #elif or a clearer selection method to avoid this.
opsiff
left a comment
There was a problem hiding this comment.
need modified the arch/x86/configs/deepin_x86_desktop_defconfig
--- defconfig 2025-06-03 11:49:07.246724991 +0000
+++ defconfig.orig 2025-06-03 11:48:56.175714420 +0000
@@ -76,7 +76,6 @@
CONFIG_ACRN_GUEST=y
CONFIG_INTEL_TDX_GUEST=y
CONFIG_PROCESSOR_SELECT=y
-CONFIG_USING_FPU_IN_KERNEL_NONATOMIC=y
CONFIG_GART_IOMMU=y
CONFIG_MAXSMP=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
@@ -5898,3 +5897,6 @@
CONFIG_X86_DEBUG_FPU is not set
CONFIG_UNWINDER_FRAME_POINTER=y
CONFIG_RUNTIME_TESTING_MENU is not set
+CONFIG_USING_FPU_IN_KERNEL_NONATOMIC=y
+# CONFIG_X86_HYGON_LMC_SSE2_ON is not set
+CONFIG_X86_HYGON_LMC_AVX2_ON=y
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: opsiff The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
hygon inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAQQDF CVE: NA --------------------------- The following methods are used to improve the large memory copy performance of the Hygon processor between kernel and user mode. Prefetch is a technique for reading blocks of data from main memory at very high data rates, then operating on them within the cache. Results are then written out to memory, all with high efficiency. The code can employ a very special instruction: NT. This is a streaming store instruction for writing data to memory. This instruction bypasses the on-chip cache and sends data directly into a write-combining buffer. Because NT allows the CPU to avoid reading the old data from the memory destination address, NT can effectively improve the total write bandwidth. There are similar optimizations for reading data from memory. Interruptions may occur when copying large memory, which may trigger thread switching. You need to save the current MMX register context and continue copying when switching back to the thread next time. Signed-off-by: zhuchao <zhuchao@hygon.cn> Signed-off-by: qiuzhiteng <qiuzhiteng@hygon.cn>
deepin pr auto review代码审查意见:
以上是针对代码审查意见的详细说明,希望能够对您有所帮助。 |
There was a problem hiding this comment.
Pull Request Overview
Enable high-performance large memory copy on Hygon CPUs by introducing non-temporal SSE2/AVX2 routines, non-atomic FPU context support, and runtime toggles via a static key and sysfs.
- Add SSE2- and AVX2-based streaming-store copy_user implementations with prefetch
- Introduce kernel_fpu_begin_nonatomic()/end_nonatomic() APIs and extend FPU state sizing
- Wire up a static branch (
hygon_lmc_key) and expose a sysfs knob for minimum NT copy length
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| arch/x86/lib/copy_user_sse2.S | New SSE2-NT copy_user implementation with prefetch |
| arch/x86/lib/copy_user_avx2.S | New AVX2-NT copy_user implementation with prefetch |
| arch/x86/lib/Makefile | Build rules for copy_user_sse2.o and copy_user_avx2.o |
| arch/x86/kernel/process_64.c | Hook non-atomic FPU save/restore around context switch |
| arch/x86/kernel/fpu/xstate.c, init.c, core.c | Extend xstate sizing and implement non-atomic FPU APIs |
| arch/x86/kernel/cpu/common.c | Define and enable hygon_lmc_key static branch |
| arch/x86/kernel/cpu/hygon.c | Add c86_features/hygon_c86 sysfs group for NT copy threshold |
| arch/x86/include/asm/uaccess_64.h | Integrate Hygon_LMC_check and override generic copy_user path |
| arch/x86/Kconfig.fpu, arch/x86/Kconfig | New configuration options for Hygon large-memory copy support |
Comments suppressed due to low confidence (5)
arch/x86/Kconfig.fpu:3
- The Kconfig symbol name USING_FPU_IN_KERNEL_NONATOMIC is misleading for a Hygon large-memory copy feature; consider renaming it to X86_HYGON_LMC or similar to clearly reflect its purpose.
menuconfig USING_FPU_IN_KERNEL_NONATOMIC
arch/x86/include/asm/uaccess_64.h:142
- Function names in the kernel should use lowercase_with_underscores; rename Hygon_LMC_check to hygon_lmc_check to match style.
static inline bool Hygon_LMC_check(unsigned long len)
arch/x86/kernel/cpu/hygon.c:515
- The sysfs attribute is created with permission mode 0600, preventing non-root users from reading the NT copy threshold; consider using 0644 for read-only access by all.
static struct kobj_attribute nt_cpy_mini_len_attribute = __ATTR(
arch/x86/kernel/fpu/api.h:52
- [nitpick] The new non-atomic FPU APIs (
kernel_fpu_begin_nonatomic/kernel_fpu_end_nonatomic) lack accompanying tests; consider adding unit or integration tests to validate normal and error paths.
static inline int kernel_fpu_begin_nonatomic(void)
arch/x86/kernel/cpu/hygon.c:482
- Calling memset requires <linux/string.h> (or <string.h>) inclusion for clarity; verify the appropriate header is included for consistency.
memset((void *)&hygon_c86_data, 0, sizeof(struct hygon_c86_info));
| #define MAX_FPU_CTX_SIZE 64 | ||
| #define KERNEL_FPU_NONATOMIC_SIZE (2 * (MAX_FPU_CTX_SIZE)) | ||
|
|
There was a problem hiding this comment.
MAX_FPU_CTX_SIZE and KERNEL_FPU_NONATOMIC_SIZE are conditionally redefined in both the SSE2 and AVX2 sections, risking macro redefinition or confusion; consider centralizing or guarding these definitions.
| #define MAX_FPU_CTX_SIZE 64 | |
| #define KERNEL_FPU_NONATOMIC_SIZE (2 * (MAX_FPU_CTX_SIZE)) | |
| #ifndef MAX_FPU_CTX_SIZE | |
| #endif |
9590a37
into
deepin-community:linux-6.6.y
hygon inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/IAQQDF CVE: NA
The following methods are used to improve the large memory copy performance of the Hygon processor between kernel and user mode.
Prefetch is a technique for reading blocks of data from main memory at very high data rates, then operating on them within the cache. Results are then written out to memory, all with high efficiency.
The code can employ a very special instruction: NT. This is a streaming store instruction for writing data to memory. This instruction bypasses the on-chip cache and sends data directly into a write-combining buffer. Because NT allows the CPU to avoid reading the old data from the memory destination address, NT can effectively improve the total write bandwidth. There are similar optimizations for reading data from memory.
Interruptions may occur when copying large memory, which may trigger thread switching. You need to save the current MMX register context and continue copying when switching back to the thread next time.
Summary by Sourcery
Enable high-performance user-kernel memory copying on Hygon processors by adding streaming store and prefetch-accelerated routines, non-atomic FPU context support, and runtime toggles via a static key and sysfs control.
New Features: