Instance API#271
Conversation
|
Still a draft for now, as I'm chasing a bug in the benchmark |
4fc0e73 to
6b4fa99
Compare
6b4fa99 to
b3e29e7
Compare
Signed-off-by: Alexandre Perrin <alex@isovalent.com>
b3e29e7 to
e470c8c
Compare
|
Did a quick round of benchmark testing before / after the patch. Without going too deep:
|
rolinh
left a comment
There was a problem hiding this comment.
I got curious regarding this statement:
the switch to a struct API doesn't improve significantly the performances over the previous package-level functions using a lockedSource from the randv1 func API.
So I benchmarked locally with the following code for the main package:
func BenchmarkLegacyAPI(b *testing.B) {
b.Run("Sequential", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = fake.IP()
_ = fake.MAC()
_ = fake.Port()
_ = fake.K8sNamespace()
}
})
b.Run("Parallel", func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_ = fake.IP()
_ = fake.MAC()
_ = fake.Port()
_ = fake.K8sNamespace()
}
})
})
}
func BenchmarkInstanceAPI(b *testing.B) {
b.Run("Sequential", func(b *testing.B) {
f := fake.New()
for i := 0; i < b.N; i++ {
_ = f.IP()
_ = f.MAC()
_ = f.Port()
_ = f.K8sNamespace()
}
})
b.Run("Parallel", func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
f := fake.New()
for pb.Next() {
_ = f.IP()
_ = f.MAC()
_ = f.Port()
_ = f.K8sNamespace()
}
})
})
}
func BenchmarkInstanceAPIShared(b *testing.B) {
b.Run("Parallel", func(b *testing.B) {
f := fake.New()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_ = f.IP()
_ = f.MAC()
_ = f.Port()
_ = f.K8sNamespace()
}
})
})
}And the results I get are the following:
% go test -bench .
goos: linux
goarch: amd64
pkg: github.com/cilium/fake
cpu: AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics
BenchmarkLegacyAPI/Sequential-16 4988082 232.9 ns/op
BenchmarkLegacyAPI/Parallel-16 3533332 347.7 ns/op
BenchmarkInstanceAPI/Sequential-16 5572246 217.2 ns/op
BenchmarkInstanceAPI/Parallel-16 23392348 56.28 ns/op
BenchmarkInstanceAPIShared/Parallel-16 6864799 169.3 ns/op
PASS
ok github.com/cilium/fake 7.127s
So while the sequential tests don't show any significant improvements, the parallel ones show a ~7x speedup, which is very significant.
And when it comes to the flow package:
func BenchmarkLegacyFlowAPI(b *testing.B) {
b.Run("Sequential", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = flow.New()
}
})
b.Run("Parallel", func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_ = flow.New()
}
})
})
}
func BenchmarkInstanceFlowAPI(b *testing.B) {
b.Run("Sequential", func(b *testing.B) {
f := flow.NewFaker()
for i := 0; i < b.N; i++ {
_ = f.NewFlow()
}
})
b.Run("Parallel", func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
f := flow.NewFaker()
for pb.Next() {
_ = f.NewFlow()
}
})
})
}The story is very similar:
% go test -bench .
goos: linux
goarch: amd64
pkg: github.com/cilium/fake/flow
cpu: AMD Ryzen 7 PRO 8840HS w/ Radeon 780M Graphics
BenchmarkFakeFlow-16 485557 2481 ns/op
BenchmarkLegacyFlowAPI/Sequential-16 140707 8594 ns/op
BenchmarkLegacyFlowAPI/Parallel-16 132897 8954 ns/op
BenchmarkInstanceFlowAPI/Sequential-16 137576 8414 ns/op
BenchmarkInstanceFlowAPI/Parallel-16 587812 2052 ns/op
PASS
ok github.com/cilium/fake/flow 7.330s
So here we have roughly a 4x speedup in the parallel case.
With that said, the "legacy" API we now provide for backwards compatibility is very likely to be a significant regression for the parallel use-case due to the introduction of the global mutex which will inevitably lead to lock contention. At the very least, we should document this and encourage users to move to the instance API if they generate fake data in parallel.
| ) | ||
|
|
||
| // Faker is the main interface exposed to generate fake data. | ||
| type Faker interface { //nolint:interfacebloat |
There was a problem hiding this comment.
What is the rationale to introduce an interface rather than simply returning a struct? I don't really see the benefit of doing this.
There was a problem hiding this comment.
It is to hide the embedded rand and make the contract explicit. I'm neutral toward this though (we don't have to embed the rand stuff).
There was a problem hiding this comment.
I have some issues with this approach as a 13 method interface will never be implemented in practice and, even if it were implemented elsewhere, every time we add a new method, we would break the existing implementation(s). So I'd really prefer we return a concrete *Faker. However, I think we should be able to address your concern regarding hiding the embedded rang as we can just not export it and keep it private.
Before this patch, the fake library would only expose package defined methods. This had the limitation of scaling poorly when used by multiple goroutines (although both recent golang versions and rand/v2 addressed the scaling issues), and did not abstract the PRNG very well. This commit introduces a new faker struct, constructors, and struct methods matching the exposed API. Each faker struct has its own instance PRNG and should not be used concurrently. Additionally, a package level struct is created and used by package level defined functions for full retro-compatibility. The following commits will introduce the same change to the flow module, and adapt the cmd module to use the new API. Signed-off-by: Alexandre Perrin <alex@isovalent.com>
Introduce a flowfaker struct with its own PRNG instance, constructors, and struct methods matching the existing package API. Package-level functions are preserved in legacy.go for backwards compatibility, delegating to a shared package-level flowfaker instance. Signed-off-by: Alexandre Perrin <alex@isovalent.com>
Use a Faker struct in each command handler and its methods rather than the (now) legacy package-level functions. Signed-off-by: Alexandre Perrin <alex@isovalent.com>
I think the benchmark you presented is testing something different than what I was referring to, sorry for the confusion. Your test covers both "legacy" and "instance" API, both backed by the instance implementation, at the same commit. In that case, as you correctly pointed out, the global mutex introduce a non-negligible overhead that you measured. My message mentioned the randv1 func API using a lockedSource, because I used the benchmark file introduced in this PR (sometime slightly adapted) to test different commits:
I found that there is a slight improvement between 1. and 2., and no significant difference between 2. and 3. |
e470c8c to
58fd3f6
Compare
rolinh
left a comment
There was a problem hiding this comment.
Your test covers both "legacy" and "instance" API, both backed by the instance implementation, at the same commit. In that case, as you correctly pointed out, the global mutex introduce a non-negligible overhead that you measured.
In practice, this is a regression for the users of the legacy API though. Can we at least document it? We should mention that these functions exist for backward compatibility only and that they are "slow" if used in a concurrent context. I'd document it both in the package and in the readme and cover the differences/use-cases for legacy and instance APIs.
Also, I think we could also commit the parallel benchmarks I submitted so that we can monitor for potential future regressions.
My message mentioned the randv1 func API using a lockedSource (...)
👍
Implement a new struct API (see #21), backward-compatibility wrappers, and adapt the cmd module accordingly.