Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/loadbalancer: Optimize L3n4Addr.Hash for performance #14617

Merged
merged 2 commits into from
Jan 18, 2021

Conversation

gandro
Copy link
Member

@gandro gandro commented Jan 14, 2021

This commit optimizes the L3n4Addr.Hash function. It is used as a key in
Golang maps and therefore needs to be unique, but it does not need to actually
be a secure hash.

The previous implementation was slow issues because it relied on defer
statements, reflection (due to the use of %+v), and SHA256. This
function is used during service lookup in Hubble's hot path, therefore
this optimization should reduce the overhead of Hubble:

image

Before this commit:

BenchmarkL3n4Addr_Hash_IPv4-8         	  608378	      1856 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8   	  620024	      2080 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8    	  535290	      2168 ns/op

After this commit:

BenchmarkL3n4Addr_Hash_IPv4-8         	15079537	        81.1 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8   	 8348496	       149 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8    	 4463416	       259 ns/op

@gandro gandro added release-note/misc This PR makes changes that have no direct user impact. sig/loadbalancing labels Jan 14, 2021
@gandro gandro requested review from a team and aditighag January 14, 2021 15:44
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Jan 14, 2021
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Do you have a flamegraph for "after"?

@gandro
Copy link
Member Author

gandro commented Jan 14, 2021

Nice! Do you have a flamegraph for "after"?

This is the flamegraph for just the service lookup afterwards. Most time is still spent in Hash, but now it's mostly memory allocation.

image


Edit: Flamegraph for Hubble overall:

Before: GetServiceByAddr takes 17.9% of CPU overall.
image

After: GetServiceByAddr is the first child of Parser.Decode() here, it takes 1.45% of CPU overall.
image

b = append(b, '|')
b = strconv.AppendUint(b, uint64(a.Scope), 10)

return string(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, how does this relate to L3n4AddrID? E.g.

func (svcs svcMap) addFE(fe *loadbalancer.L3n4AddrID) *loadbalancer.SVC {
is calling the Hash() from an L3n4AddrID type where the Sprintf() was on %+v.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling fe.Hash() (where fe is of type L3n4AddrID) is equivalent to calling fe.L3n4Addr.Hash(). In other words, there is no inheritance. The %+v part was still only printing the contents of L3n4Addr, the ID is and was omitted.

Not sure that is intended, but this commit does not change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L3n4AddrID.Hash is simply L3n4Addr.Hash and thus the ID field will not be used. If that is an issue, we probably want to define L3n4AddrID.Hash as L3n4Addr.Hash()|ID or something along those lines.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@aditighag aditighag removed their assignment Jan 14, 2021
Comment on lines 480 to 489
// Note: 32 bytes is not enough for long IPv6 addresses, but it is cheaper
// to reallocate on overflow than to check if a.IP is IPv6
b := make([]byte, 0, 32)

b = append(b, a.IP.String()...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we do something like:

ip := a.IP.String()
if bytes.IndexRune(ip, ':') < 0 {
    b = make([]byte, 0, maxV4 /*(=32)*/)
} else {
    b = make([]byte, 0, maxV6 /*(=39? + 5 + sizeOfScope)*/)
}

Copy link
Member Author

@gandro gandro Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to do the trick without a measurable overhead. Will change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after some more benchmarking, simply relying on reallocation is still faster in most cases than trying to be smart about it:


Fastest / Most imprecise:

const lenIPv4 = 15
const lenProto = 1
const lenPort = 6
const lenScope = 2

// Note: The capacity might not be enough for long IPv6 addresses, but it is cheaper
// to reallocate on overflow than to check the length of a.IP
b := make([]byte, 0, lenIPv4+lenProto+lenPort+lenScope)
BenchmarkL3n4Addr_Hash_IPv4-8         	16097442	        70.7 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8   	 9431970	       125 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8    	 5561029	       215 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Max-8     	 3719448	       314 ns/op

Middle ground:

const lenIPv4 = 15
const lenIPv6 = 39
const lenProto = 1
const lenPort = 6
const lenScope = 2

var b []byte
ip := a.IP.String()
if strings.IndexRune(ip, ':') < 0 {
    b = make([]byte, 0, lenIPv4+lenProto+lenPort+lenScope)
} else {
    b = make([]byte, 0, lenIPv6+lenProto+lenPort+lenScope)
}
BenchmarkL3n4Addr_Hash_IPv4-8         	14471590	        80.9 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8   	 9139532	       132 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8    	 6556764	       204 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Max-8     	 3862281	       313 ns/op

Slowest / Most-precise:

const lenProto = 1
const lenPort = 6
const lenScope = 2

ip := a.IP.String()
b := make([]byte, 0, len(ip)+lenProto+lenPort+lenScope)
BenchmarkL3n4Addr_Hash_IPv4-8         	12182581	        97.3 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Short-8   	 7979997	       147 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Long-8    	 5991332	       215 ns/op
BenchmarkL3n4Addr_Hash_IPv6_Max-8     	 3743278	       317 ns/op

pkg/loadbalancer/loadbalancer.go Show resolved Hide resolved
b = append(b, '|')
b = strconv.AppendUint(b, uint64(a.Scope), 10)

return string(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L3n4AddrID.Hash is simply L3n4Addr.Hash and thus the ID field will not be used. If that is an issue, we probably want to define L3n4AddrID.Hash as L3n4Addr.Hash()|ID or something along those lines.

This adds a simple benchmark for different length IPv4- and IPv6-based
L3nL4Addr hashes.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit optimizes the L3n4Addr.Hash function. It is used as a key in
maps and therefore needs to be unique, but it does not need to actually
be a secure hash.

The previous implementation was slow issues because it relied on defer
statements, reflection (due to the use of `%+v`), and SHA256. This
function is used during service lookup in Hubble's hot path, therefore
this optimization should reduce the overhead of Hubble.

Before this commit:

    BenchmarkL3n4Addr_Hash_IPv4-8         	  608378	      1856 ns/op
    BenchmarkL3n4Addr_Hash_IPv6_Short-8   	  620024	      2080 ns/op
    BenchmarkL3n4Addr_Hash_IPv6_Long-8    	  535290	      2168 ns/op

After this commit:

    BenchmarkL3n4Addr_Hash_IPv4-8         	15079537	        81.1 ns/op
    BenchmarkL3n4Addr_Hash_IPv6_Short-8   	 8348496	       149 ns/op
    BenchmarkL3n4Addr_Hash_IPv6_Long-8    	 4463416	       259 ns/op

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/lb-hash-performance-improvement branch from d779912 to 9e32e1f Compare January 14, 2021 17:34
@gandro
Copy link
Member Author

gandro commented Jan 14, 2021

Pushed an additional benchmark and made the length calculation a bit more tight (24 instead of 32 bytes by default). See also #14617 (comment)

@gandro
Copy link
Member Author

gandro commented Jan 14, 2021

test-me-please

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@gandro
Copy link
Member Author

gandro commented Jan 15, 2021

@gandro
Copy link
Member Author

gandro commented Jan 15, 2021

retest-net-next

@gandro
Copy link
Member Author

gandro commented Jan 15, 2021

retest-4.9

@gandro
Copy link
Member Author

gandro commented Jan 15, 2021

retest-4.19

func (a L3n4Addr) Hash() string {
const lenIPv4 = 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const lenIPv4 = 16 // len("255.255.255.255|")

const lenIPv4 = 15
const lenProto = 1
const lenPort = 6
const lenScope = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const lenScope = 1 // len("0") or len("1"), there isn't a "|" as part of the scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sum should be correct. I did not add the | to the IP address, and added it to the scope instead

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM!

@gandro
Copy link
Member Author

gandro commented Jan 18, 2021

@gandro
Copy link
Member Author

gandro commented Jan 18, 2021

retest-net-next

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 18, 2021
@gandro gandro merged commit 6754e8e into master Jan 18, 2021
@gandro gandro deleted the pr/gandro/lb-hash-performance-improvement branch January 18, 2021 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants