eal/x86: improve multiple of 64 bytes memcpy performance

[ upstream commit 2ef17be88e8b26f871cfb0265227341e36f486ea ] In rte_memcpy_aligned(), one redundant round is taken in the 64 bytes block copy loops if the size is a multiple of 64. So, let the catch-up copy the last 64 bytes in this case. Fixes: f547270 ("eal: optimize aligned memcpy on x86") Suggested-by: Morten Brørup <mb@smartsharesystems.com> Signed-off-by: Leyi Rong <leyi.rong@intel.com> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>
bluca · Jun 14, 2023 · 4154fc9 · 4154fc9
1 parent 206434a
commit 4154fc9
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/lib/librte_eal/x86/include/rte_memcpy.h b/lib/librte_eal/x86/include/rte_memcpy.h
@@ -846,7 +846,7 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n)
 	}
 
 	/* Copy 64 bytes blocks */
-	for (; n >= 64; n -= 64) {
+	for (; n > 64; n -= 64) {
 		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
 		dst = (uint8_t *)dst + 64;
 		src = (const uint8_t *)src + 64;