Skip to content

Commit 1d10f3a

Browse files
ubizjakIngo Molnar
authored andcommitted
x86/percpu: Use C for arch_raw_cpu_ptr(), to improve code generation
Implement arch_raw_cpu_ptr() in C to allow the compiler to perform better optimizations, such as setting an appropriate base to compute the address. The compiler is free to choose either MOV or ADD from this_cpu_off address to construct the optimal final address. There are some other issues when memory access to the percpu area is implemented with an asm. Compilers can not eliminate asm common subexpressions over basic block boundaries, but are extremely good at optimizing memory access. By implementing arch_raw_cpu_ptr() in C, the compiler can eliminate additional redundant loads from this_cpu_off, further reducing the number of percpu offset reads from 1646 to 1631 on a test build, a -0.9% reduction. Co-developed-by: Nadav Amit <namit@vmware.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20231015202523.189168-2-ubizjak@gmail.com
1 parent a048d3a commit 1d10f3a

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

arch/x86/include/asm/percpu.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,21 @@
4949
#define __force_percpu_prefix "%%"__stringify(__percpu_seg)":"
5050
#define __my_cpu_offset this_cpu_read(this_cpu_off)
5151

52+
#ifdef CONFIG_USE_X86_SEG_SUPPORT
53+
/*
54+
* Efficient implementation for cases in which the compiler supports
55+
* named address spaces. Allows the compiler to perform additional
56+
* optimizations that can save more instructions.
57+
*/
58+
#define arch_raw_cpu_ptr(ptr) \
59+
({ \
60+
unsigned long tcp_ptr__; \
61+
tcp_ptr__ = __raw_cpu_read(, this_cpu_off); \
62+
\
63+
tcp_ptr__ += (unsigned long)(ptr); \
64+
(typeof(*(ptr)) __kernel __force *)tcp_ptr__; \
65+
})
66+
#else /* CONFIG_USE_X86_SEG_SUPPORT */
5267
/*
5368
* Compared to the generic __my_cpu_offset version, the following
5469
* saves one instruction and avoids clobbering a temp register.
@@ -63,6 +78,8 @@
6378
tcp_ptr__ += (unsigned long)(ptr); \
6479
(typeof(*(ptr)) __kernel __force *)tcp_ptr__; \
6580
})
81+
#endif /* CONFIG_USE_X86_SEG_SUPPORT */
82+
6683
#else /* CONFIG_SMP */
6784
#define __percpu_seg_override
6885
#define __percpu_prefix ""

0 commit comments

Comments
 (0)