Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memmove should use the REP MOVSB instruction on newer Intel microarchitectures. #66958

Closed
cocotyty opened this issue Apr 22, 2024 · 1 comment
Labels
arch-amd64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@cocotyty
Copy link
Contributor

cocotyty commented Apr 22, 2024

Go version

go version go1.22.2 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/go1.22.2'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/go1.22.2/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.2'
GCCGO='gccgo'
GOAMD64='v3'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2893885177=/tmp/go-build -gno-record-gcc-switches'

What did you do?

package main

func main() {
	data := make([]byte, 16*1024)
	dest := make([]byte, 16*1024)
	copy(dest, data)
}

Copy data larger than 2KB.

What did you see happen?

On newer Intel CPUs which support ERMS, copy uses AVX to copy the data instead of using the REP MOVSB instruction.

What did you expect to see?

The current memmove implementation uses REP MOVSB to copy data larger than 2KB when the useAVXmemmove global variable is false and the CPU supports the ERMS feature.

According to the runtime/cpuflags_amd64.go code:

var useAVXmemmove bool

func init() {
	// Let's remove stepping and reserved fields
	processor := processorVersionInfo & 0x0FFF3FF0

	processor := processorVersionInfo & 0x0FFF3FF0
		processor == 0x206A0 ||
		processor == 0x206A0 || processor == 0x206D0 || processor == 0x306D0
		processor == 0x206A0 || processor == 0x206D0 || processor == 0x306A0 ||
		processor == 0x306A0 || processor == 0x306E0

	useAVXmemmove = cpu.X86.HasAVX && !isIntelBridgeFamily
X86.HasAVX && !isIntelBridgeFamily }

As you can see this feature is currently only enabled on CPUs in the Sandy Bridge (Client), Sandy Bridge (Server), Ivy Bridge (Client), and Ivy Bridge (Server) microarchitectures.

For modern Intel CPU microarchitectures that support the ERMS feature, such as Ice Lake (Server), Sapphire Rapids , REP MOVSB achieves better performance than the AVX-based copies currently implemented in memmove.

(You can get the CPUID table here: https://en.wikichip.org/wiki/intel/cpuid)

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 22, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/580735 mentions this issue: runtime: Add Ice Lake and Sapphire Rapids ERMS support for memmove

@joedian joedian added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Apr 23, 2024
@mknyszek mknyszek added this to the Backlog milestone Apr 24, 2024
cocotyty added a commit to cocotyty/go that referenced this issue Jul 2, 2024
The current memmove implementation uses REP MOVSB to copy data larger than
2KB when the useAVXmemmove global variable is false and the CPU supports
the ERMS feature.

This feature is currently only enabled on CPUs in the Sandy Bridge (Client)
, Sandy Bridge (Server), Ivy Bridge (Client), and Ivy Bridge (Server)
microarchitectures.

For modern Intel CPU microarchitectures that support the ERMS feature, such
as Ice Lake (Server), Sapphire Rapids , REP MOVSB achieves better
performance than the AVX-based copy currently implemented in memmove.

Benchstat result:

goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz
               │  ./old.txt  │              ./new.txt              │
               │   sec/op    │   sec/op     vs base                │
Memmove/2048-2   25.24n ± 0%   24.27n ± 0%   -3.84% (p=0.000 n=10)
Memmove/4096-2   44.87n ± 0%   33.16n ± 1%  -26.11% (p=0.000 n=10)
geomean          33.65n        28.37n       -15.71%

               │  ./old.txt   │               ./new.txt               │
               │     B/s      │      B/s       vs base                │
Memmove/2048-2   75.56Gi ± 0%    78.59Gi ± 0%   +4.02% (p=0.000 n=10)
Memmove/4096-2   85.01Gi ± 0%   115.05Gi ± 1%  +35.34% (p=0.000 n=10)
geomean          80.14Gi         95.09Gi       +18.65%

Fixes golang#66958

Signed-off-by: TangYang <yang.tang@intel.com>
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Jul 22, 2024
@dmitshur dmitshur modified the milestones: Backlog, Go1.24 Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-amd64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

Successfully merging a pull request may close this issue.

6 participants