-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: bytes, strings: add CutByte #67101
Comments
CutByte optimizes slicing operations for single-byte separators, offering a more efficient alternative when only a single byte is involved. There is more discussion on https://golang.org/issue/67101. Fixes golang#67101.
Change https://go.dev/cl/582176 mentions this issue: |
Can the performance of the existing function be improved without adding a new one? For example, how does this compare: func Cut(s, sep []byte) (before, after []byte, found bool) {
if len(sep) == 1 {
if i := IndexByte(s, sep); i >= 0 {
return s[:i], s[i+1:], true
}
}
if i := Index(s, sep); i >= 0 {
return s[:i], s[i+len(sep):], true
}
return s, nil, false
} |
@adonovan Index already has a optimization for one-byte strings go/src/internal/stringslite/strings.go Lines 25 to 31 in 16ce8b3
|
Please don't use images for plain text. Images are hard to read. Plain text is easy. Thanks. |
See #22148 for a relevant discussion. |
So it does. Are you saying that even with this fastpath, the overhead of evaluating |
I'm sorry, I've changed it to text, to avoid too much content, now you need to click on Details to see the content
Such code will not improve anything and will even degrade performance
This is my benchmark code. func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
skip := 32
s := Repeat(Repeat(" ", skip)+"a"+Repeat(" ", skip), 1<<16/skip)
b.Run(fmt.Sprintf("Func=Cut-%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
_, _, _ = Cut(s, "a")
}
})
b.Run(fmt.Sprintf("Func=CutByte-%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
_, _, _ = CutByte(s, 'a')
}
})
}
I would prefer to use the following benchmarking procedure so that I can do benchmark comparisons on different samples. When I tried to run func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
for _, skip := range [...]int{2, 4, 8, 16, 32, 64} {
s := Repeat(Repeat(" ", skip)+"a"+Repeat(" ", skip), 1<<16/skip)
b.Run(fmt.Sprintf("Func=Cut-%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
_, _, _ = Cut(s, "a")
}
})
b.Run(fmt.Sprintf("Func=CutByte-%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
_, _, _ = CutByte(s, 'a')
}
})
}
}
IndexByte gets a significant performance boost after CL 148578. So I don't think "Using strings.IndexByte is premature optimisation." is entirely applicable now. |
If I've missed any information, please clue me in, thanks! |
I benchmarked this using https://github.com/egonelbre/exp/blob/main/bench/cutbyte/bench_test.go -- and then combining the results, I got the result (using Go master):
Individual results:
I used multiple implementations of the same code, because microbenchmarks are prone to statistical noise due to code layout differences. |
Too slow in your tests, maybe a faulty benchmark? https://github.com/egonelbre/exp/blob/main/bench/cutbyte/bench_test.go#L65 Can you accept it without a global variable? diff --git a/bench/cutbyte/bench_test.go b/bench/cutbyte/bench_test.go
index f56b9ac..0489ac5 100644
--- a/bench/cutbyte/bench_test.go
+++ b/bench/cutbyte/bench_test.go
@@ -62,8 +62,6 @@ func CutByte3(s string, sep byte) (before, after string, found bool) {
return s, "", false
}
-var x, y, z any
-
func BenchmarkCut0(b *testing.B) {
b.ReportAllocs()
@@ -71,13 +69,15 @@ func BenchmarkCut0(b *testing.B) {
s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = Cut0(s, "a")
+ x, y, z := Cut0(s, "a")
+ _, _, _ = x, y, z
}
})
b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = CutByte0(s, 'a')
+ x, y, z := CutByte0(s, 'a')
+ _, _, _ = x, y, z
}
})
}
@@ -89,13 +89,15 @@ func BenchmarkCut1(b *testing.B) {
s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = Cut1(s, "a")
+ x, y, z := Cut1(s, "a")
+ _, _, _ = x, y, z
}
})
b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = CutByte1(s, 'a')
+ x, y, z := CutByte1(s, 'a')
+ _, _, _ = x, y, z
}
})
}
@@ -108,13 +110,15 @@ func BenchmarkCut2(b *testing.B) {
s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = Cut2(s, "a")
+ x, y, z := Cut2(s, "a")
+ _, _, _ = x, y, z
}
})
b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = CutByte2(s, 'a')
+ x, y, z := CutByte2(s, 'a')
+ _, _, _ = x, y, z
}
})
}
@@ -127,14 +131,16 @@ func BenchmarkCut3(b *testing.B) {
s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = Cut3(s, "a")
+ x, y, z := Cut3(s, "a")
+ _, _, _ = x, y, z
}
})
b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
- x, y, z = CutByte3(s, 'a')
+ x, y, z := CutByte3(s, 'a')
+ _, _, _ = x, y, z
}
})
}
-}
\ No newline at end of file
+} I modified your test case a bit and got the following results
Thanks to @egonelbre's tip, I changed the benchmarking to the following way func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
for _, skip := range [...]int{2, 4, 8, 16, 32, 64} {
s := Repeat(Repeat(" ", skip)+"a"+Repeat(" ", skip), 1<<16/skip)
b.Run(fmt.Sprintf("Func=Cut/%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
_, _, _ = Cut(s, "a")
}
})
b.Run(fmt.Sprintf("Func=CutByte/%d", skip), func(b *testing.B) {
for i := 0; i < b.N; i++ {
_, _, _ = CutByte(s, 'a')
}
})
}
}
|
@aimuz, I'm storing in global vars to ensure that code doesn't get optimized away. It could avoid I updated the code, and I'm getting:
|
Optimize the Cut function in both the bytes and strings packages to immediately return slices when the separator is a single byte (or character), avoiding more complex index searching logic. This change can significantly reduce the execution time for these specific cases, as benchmark tests added to each package demonstrate improvements. The optimization checks if the length of the separator is one before proceeding with the existing search strategy. If so, it uses IndexByte for a faster lookup of the separator's position. Additionally, benchmark tests have been added for both packages to demonstrate the performance benefits of this optimization across various scenarios. goos: darwin goarch: arm64 pkg: strings cpu: Apple M2 Max │ old-cut.txt │ new-cut.txt │ │ sec/op │ sec/op vs base │ Cut/Cut-One/2-12 4.026n ± 2% 3.274n ± 2% -18.68% (p=0.000 n=10) Cut/Cut-Two/2-12 8.093n ± 0% 8.357n ± 0% +3.27% (p=0.000 n=10) Cut/Cut-One/4-12 4.048n ± 1% 3.324n ± 2% -17.91% (p=0.000 n=10) Cut/Cut-Two/4-12 8.105n ± 0% 8.377n ± 1% +3.35% (p=0.000 n=10) Cut/Cut-One/8-12 4.089n ± 1% 3.290n ± 1% -19.53% (p=0.000 n=10) Cut/Cut-Two/8-12 8.107n ± 1% 8.359n ± 1% +3.10% (p=0.000 n=10) Cut/Cut-One/16-12 4.127n ± 1% 3.328n ± 1% -19.35% (p=0.000 n=10) Cut/Cut-Two/16-12 8.119n ± 1% 8.374n ± 1% +3.15% (p=0.000 n=10) Cut/Cut-One/32-12 4.545n ± 2% 3.675n ± 1% -19.14% (p=0.000 n=10) Cut/Cut-Two/32-12 8.708n ± 1% 8.963n ± 1% +2.92% (p=0.000 n=10) Cut/Cut-One/64-12 4.825n ± 2% 4.146n ± 1% -14.08% (p=0.000 n=10) Cut/Cut-Two/64-12 9.286n ± 0% 9.315n ± 1% ~ (p=0.105 n=10) geomean 5.983n 5.486n -8.32% │ old-cut.txt │ new-cut.txt │ │ B/op │ B/op vs base │ Cut/Cut-One/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ old-cut.txt │ new-cut.txt │ │ allocs/op │ allocs/op vs base │ Cut/Cut-One/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean For golang#67101
Optimize the Cut function in both the bytes and strings packages to immediately return slices when the separator is a single byte (or character), avoiding more complex index searching logic. This change can significantly reduce the execution time for these specific cases, as benchmark tests added to each package demonstrate improvements. The optimization checks if the length of the separator is one before proceeding with the existing search strategy. If so, it uses IndexByte for a faster lookup of the separator's position. Additionally, benchmark tests have been added for both packages to demonstrate the performance benefits of this optimization across various scenarios. goos: darwin goarch: arm64 pkg: strings cpu: Apple M2 Max │ old-cut.txt │ new-cut.txt │ │ sec/op │ sec/op vs base │ Cut/Cut-One/2-12 4.026n ± 2% 3.274n ± 2% -18.68% (p=0.000 n=10) Cut/Cut-Two/2-12 8.093n ± 0% 8.357n ± 0% +3.27% (p=0.000 n=10) Cut/Cut-One/4-12 4.048n ± 1% 3.324n ± 2% -17.91% (p=0.000 n=10) Cut/Cut-Two/4-12 8.105n ± 0% 8.377n ± 1% +3.35% (p=0.000 n=10) Cut/Cut-One/8-12 4.089n ± 1% 3.290n ± 1% -19.53% (p=0.000 n=10) Cut/Cut-Two/8-12 8.107n ± 1% 8.359n ± 1% +3.10% (p=0.000 n=10) Cut/Cut-One/16-12 4.127n ± 1% 3.328n ± 1% -19.35% (p=0.000 n=10) Cut/Cut-Two/16-12 8.119n ± 1% 8.374n ± 1% +3.15% (p=0.000 n=10) Cut/Cut-One/32-12 4.545n ± 2% 3.675n ± 1% -19.14% (p=0.000 n=10) Cut/Cut-Two/32-12 8.708n ± 1% 8.963n ± 1% +2.92% (p=0.000 n=10) Cut/Cut-One/64-12 4.825n ± 2% 4.146n ± 1% -14.08% (p=0.000 n=10) Cut/Cut-Two/64-12 9.286n ± 0% 9.315n ± 1% ~ (p=0.105 n=10) geomean 5.983n 5.486n -8.32% │ old-cut.txt │ new-cut.txt │ │ B/op │ B/op vs base │ Cut/Cut-One/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ old-cut.txt │ new-cut.txt │ │ allocs/op │ allocs/op vs base │ Cut/Cut-One/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean For golang#67101
Change https://go.dev/cl/582655 mentions this issue: |
If the length is 1, it does improve, but if the length is greater than 1, it will increase accordingly. Maybe this is acceptable? The code and improved implementation of the benchmark test will be available at https://go.dev/cl/582655
|
Maybe CL 582655 is the best way to go for the current situation? He can get this one optimised without modifying the code in other locations. But for applications with sep greater than 1, he will have a performance loss greater than 2%, is this acceptable? In go code, I've investigated, and the proportion of sep lengths of 1 is quite large. Maybe we can merge CL 582655? |
Optimize the Cut function in both the bytes and strings packages to immediately return slices when the separator is a single byte (or character), avoiding more complex index searching logic. This change can significantly reduce the execution time for these specific cases, as benchmark tests added to each package demonstrate improvements. The optimization checks if the length of the separator is one before proceeding with the existing search strategy. If so, it uses IndexByte for a faster lookup of the separator's position. Additionally, benchmark tests have been added for both packages to demonstrate the performance benefits of this optimization across various scenarios. goos: darwin goarch: arm64 pkg: strings cpu: Apple M2 Max │ old-cut.txt │ new-cut.txt │ │ sec/op │ sec/op vs base │ Cut/Cut-One/2-12 4.026n ± 2% 3.274n ± 2% -18.68% (p=0.000 n=10) Cut/Cut-Two/2-12 8.093n ± 0% 8.357n ± 0% +3.27% (p=0.000 n=10) Cut/Cut-One/4-12 4.048n ± 1% 3.324n ± 2% -17.91% (p=0.000 n=10) Cut/Cut-Two/4-12 8.105n ± 0% 8.377n ± 1% +3.35% (p=0.000 n=10) Cut/Cut-One/8-12 4.089n ± 1% 3.290n ± 1% -19.53% (p=0.000 n=10) Cut/Cut-Two/8-12 8.107n ± 1% 8.359n ± 1% +3.10% (p=0.000 n=10) Cut/Cut-One/16-12 4.127n ± 1% 3.328n ± 1% -19.35% (p=0.000 n=10) Cut/Cut-Two/16-12 8.119n ± 1% 8.374n ± 1% +3.15% (p=0.000 n=10) Cut/Cut-One/32-12 4.545n ± 2% 3.675n ± 1% -19.14% (p=0.000 n=10) Cut/Cut-Two/32-12 8.708n ± 1% 8.963n ± 1% +2.92% (p=0.000 n=10) Cut/Cut-One/64-12 4.825n ± 2% 4.146n ± 1% -14.08% (p=0.000 n=10) Cut/Cut-Two/64-12 9.286n ± 0% 9.315n ± 1% ~ (p=0.105 n=10) geomean 5.983n 5.486n -8.32% │ old-cut.txt │ new-cut.txt │ │ B/op │ B/op vs base │ Cut/Cut-One/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ old-cut.txt │ new-cut.txt │ │ allocs/op │ allocs/op vs base │ Cut/Cut-One/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/2-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/4-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/8-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/16-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/32-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-One/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Cut/Cut-Two/64-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean For golang#67101
Proposal Details
Abstract
This proposal suggests the addition of a new function,
CutByte
, to thestrings
andbytes
packages in the Go standard library. The function aims to simplify the handling of strings and byte slices by cutting them around the first instance of a specified byte separator. This function is designed to offer a more efficient alternative to the existingCut
function when the separator is a single byte, providing up to a 25% performance improvement in typical use cases.Background
The existing
Cut
function in the Go standard library has proven to be extremely useful for handling strings and byte slices by simplifying code and replacing multiple standard library functions. However, in scenarios where the separator is known to be a single byte, theCut
function can be optimized further. The proposedCutByte
function addresses this by focusing on byte-level operations, which are common in many real-world applications, such as parsing binary protocols or handling ASCII-based text formats.Details
Proposal
We propose adding the following functions:
For the
strings
package:For the
bytes
package:Rationale
The
CutByte
function is specifically optimized for cases where the separator is a single byte. In the analysis of Go's main repository and several large-scale Go projects, a significant number of string manipulations involve single-byte separators. The performance benefit ofCutByte
overCut
for these cases is approximately 25%, as measured in benchmarks comparing the two functions under similar conditions.Compatibility
CutByte
is a new addition and does not modify any existing interfaces or behavior in the Go standard library. It follows the established patterns and naming conventions of the Go ecosystem, ensuring that it integrates seamlessly with the existing library functions.Implementation
The implementation of
CutByte
is straightforward and leverages the existingIndexByte
function in both thestrings
andbytes
packages. The proposed functions do not introduce any new dependencies or significant complexities.Conclusion
Adding
CutByte
to the Go standard library will provide developers with a more efficient tool for handling common string and byte slice operations involving single-byte separators. This function not only enhances performance but also maintains readability and simplicity, aligning with Go's philosophy of clear and efficient coding practices.The text was updated successfully, but these errors were encountered: