-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/loadbalancer: Optimize L3n4Addr.Hash for performance #14617
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Do you have a flamegraph for "after"?
b = append(b, '|') | ||
b = strconv.AppendUint(b, uint64(a.Scope), 10) | ||
|
||
return string(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, how does this relate to L3n4AddrID? E.g.
cilium/pkg/maps/lbmap/lbmap.go
Line 563 in 2b9485d
func (svcs svcMap) addFE(fe *loadbalancer.L3n4AddrID) *loadbalancer.SVC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling fe.Hash()
(where fe
is of type L3n4AddrID
) is equivalent to calling fe.L3n4Addr.Hash()
. In other words, there is no inheritance. The %+v
part was still only printing the contents of L3n4Addr
, the ID
is and was omitted.
Not sure that is intended, but this commit does not change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L3n4AddrID.Hash
is simply L3n4Addr.Hash
and thus the ID
field will not be used. If that is an issue, we probably want to define L3n4AddrID.Hash
as L3n4Addr.Hash()|ID
or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
pkg/loadbalancer/loadbalancer.go
Outdated
// Note: 32 bytes is not enough for long IPv6 addresses, but it is cheaper | ||
// to reallocate on overflow than to check if a.IP is IPv6 | ||
b := make([]byte, 0, 32) | ||
|
||
b = append(b, a.IP.String()...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we do something like:
ip := a.IP.String()
if bytes.IndexRune(ip, ':') < 0 {
b = make([]byte, 0, maxV4 /*(=32)*/)
} else {
b = make([]byte, 0, maxV6 /*(=39? + 5 + sizeOfScope)*/)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to do the trick without a measurable overhead. Will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after some more benchmarking, simply relying on reallocation is still faster in most cases than trying to be smart about it:
Fastest / Most imprecise:
const lenIPv4 = 15
const lenProto = 1
const lenPort = 6
const lenScope = 2
// Note: The capacity might not be enough for long IPv6 addresses, but it is cheaper
// to reallocate on overflow than to check the length of a.IP
b := make([]byte, 0, lenIPv4+lenProto+lenPort+lenScope)
BenchmarkL3n4Addr_Hash_IPv4-8 16097442 70.7 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8 9431970 125 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8 5561029 215 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Max-8 3719448 314 ns/op
Middle ground:
const lenIPv4 = 15
const lenIPv6 = 39
const lenProto = 1
const lenPort = 6
const lenScope = 2
var b []byte
ip := a.IP.String()
if strings.IndexRune(ip, ':') < 0 {
b = make([]byte, 0, lenIPv4+lenProto+lenPort+lenScope)
} else {
b = make([]byte, 0, lenIPv6+lenProto+lenPort+lenScope)
}
BenchmarkL3n4Addr_Hash_IPv4-8 14471590 80.9 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8 9139532 132 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8 6556764 204 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Max-8 3862281 313 ns/op
Slowest / Most-precise:
const lenProto = 1
const lenPort = 6
const lenScope = 2
ip := a.IP.String()
b := make([]byte, 0, len(ip)+lenProto+lenPort+lenScope)
BenchmarkL3n4Addr_Hash_IPv4-8 12182581 97.3 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8 7979997 147 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8 5991332 215 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Max-8 3743278 317 ns/op
b = append(b, '|') | ||
b = strconv.AppendUint(b, uint64(a.Scope), 10) | ||
|
||
return string(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L3n4AddrID.Hash
is simply L3n4Addr.Hash
and thus the ID
field will not be used. If that is an issue, we probably want to define L3n4AddrID.Hash
as L3n4Addr.Hash()|ID
or something along those lines.
This adds a simple benchmark for different length IPv4- and IPv6-based L3nL4Addr hashes. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit optimizes the L3n4Addr.Hash function. It is used as a key in maps and therefore needs to be unique, but it does not need to actually be a secure hash. The previous implementation was slow issues because it relied on defer statements, reflection (due to the use of `%+v`), and SHA256. This function is used during service lookup in Hubble's hot path, therefore this optimization should reduce the overhead of Hubble. Before this commit: BenchmarkL3n4Addr_Hash_IPv4-8 608378 1856 ns/op BenchmarkL3n4Addr_Hash_IPv6_Short-8 620024 2080 ns/op BenchmarkL3n4Addr_Hash_IPv6_Long-8 535290 2168 ns/op After this commit: BenchmarkL3n4Addr_Hash_IPv4-8 15079537 81.1 ns/op BenchmarkL3n4Addr_Hash_IPv6_Short-8 8348496 149 ns/op BenchmarkL3n4Addr_Hash_IPv6_Long-8 4463416 259 ns/op Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
d779912
to
9e32e1f
Compare
Pushed an additional benchmark and made the length calculation a bit more tight (24 instead of 32 bytes by default). See also #14617 (comment) |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Net-Next hit #14598 |
retest-net-next |
retest-4.9 |
retest-4.19 |
func (a L3n4Addr) Hash() string { | ||
const lenIPv4 = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const lenIPv4 = 16
// len("255.255.255.255|")
const lenIPv4 = 15 | ||
const lenProto = 1 | ||
const lenPort = 6 | ||
const lenScope = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const lenScope = 1
// len("0") or len("1"), there isn't a "|" as part of the scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum should be correct. I did not add the |
to the IP address, and added it to the scope instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM!
net-next hit #14598 again, as well as an error similar to #12690 - but in a different context: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.13-net-next/468/testReport/junit/Suite-k8s-1/13/K8sServicesTest_Checks_service_across_nodes_Tests_NodePort_BPF_Tests_with_direct_routing_With_host_policy_Tests_NodePort/ |
retest-net-next |
This commit optimizes the L3n4Addr.Hash function. It is used as a key in
Golang maps and therefore needs to be unique, but it does not need to actually
be a secure hash.
The previous implementation was slow issues because it relied on defer
statements, reflection (due to the use of
%+v
), and SHA256. Thisfunction is used during service lookup in Hubble's hot path, therefore
this optimization should reduce the overhead of Hubble:
Before this commit:
After this commit: