Skip to content

Commit 47ee3f1

Browse files
committed
x86: re-introduce support for ERMS copies for user space accesses
I tried to streamline our user memory copy code fairly aggressively in commit adfcf42 ("x86: don't use REP_GOOD or ERMS for user memory copies"), in order to then be able to clean up the code and inline the modern FSRM case in commit 577e6a7 ("x86: inline the 'rep movs' in user copies for the FSRM case"). We had reports [1] of that causing regressions earlier with blogbench, but that turned out to be a horrible benchmark for that case, and not a sufficient reason for re-instating "rep movsb" on older machines. However, now Eric Dumazet reported [2] a regression in performance that seems to be a rather more real benchmark, where due to the removal of "rep movs" a TCP stream over a 100Gbps network no longer reaches line speed. And it turns out that with the simplified the calling convention for the non-FSRM case in commit 427fda2 ("x86: improve on the non-rep 'copy_user' function"), re-introducing the ERMS case is actually fairly simple. Of course, that "fairly simple" is glossing over several missteps due to having to fight our assembler alternative code. This code really wanted to rewrite a conditional branch to have two different targets, but that made objtool sufficiently unhappy that this instead just ended up doing a choice between "jump to the unrolled loop, or use 'rep movsb' directly". Let's see if somebody finds a case where the kernel memory copies also care (see commit 68674f9: "x86: don't use REP_GOOD or ERMS for small memory copies"). But Eric does argue that the user copies are special because networking tries to copy up to 32KB at a time, if order-3 pages allocations are possible. In-kernel memory copies are typically small, unless they are the special "copy pages at a time" kind that still use "rep movs". Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1] Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2] Reported-and-tested-by: Eric Dumazet <edumazet@google.com> Fixes: adfcf42 ("x86: don't use REP_GOOD or ERMS for user memory copies") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 0d85b27 commit 47ee3f1

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

arch/x86/lib/copy_user_64.S

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
*/
88

99
#include <linux/linkage.h>
10+
#include <asm/cpufeatures.h>
11+
#include <asm/alternative.h>
1012
#include <asm/asm.h>
1113
#include <asm/export.h>
1214

@@ -29,7 +31,7 @@
2931
*/
3032
SYM_FUNC_START(rep_movs_alternative)
3133
cmpq $64,%rcx
32-
jae .Lunrolled
34+
jae .Llarge
3335

3436
cmp $8,%ecx
3537
jae .Lword
@@ -65,6 +67,12 @@ SYM_FUNC_START(rep_movs_alternative)
6567
_ASM_EXTABLE_UA( 2b, .Lcopy_user_tail)
6668
_ASM_EXTABLE_UA( 3b, .Lcopy_user_tail)
6769

70+
.Llarge:
71+
0: ALTERNATIVE "jmp .Lunrolled", "rep movsb", X86_FEATURE_ERMS
72+
1: RET
73+
74+
_ASM_EXTABLE_UA( 0b, 1b)
75+
6876
.p2align 4
6977
.Lunrolled:
7078
10: movq (%rsi),%r8

0 commit comments

Comments
 (0)