Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wire: only borrow/return binaryFreeList buffers at the message level #2073

Merged
merged 52 commits into from Dec 29, 2023

Conversation

Roasbeef
Copy link
Member

NOTE: This is a rebased version of #1426

The original PR body follows:

This PR optimizes the wire packages serialization of small buffers by minimizing the number of borrow/return round trips during message serialization. Currently the wire package uses a binaryFreeList from which 8-byte buffers are borrowed and returned for the purpose of serializing small integers and varints.

Problem

To understand the problem, consider calling WriteVarInt on a number greater than 0xfc (which requires writing the discriminant and a 2, 4, or 8 byte value following).

For instance, writing 20,000 will invoke PutUint8 and then PutUint16. Expanding this out to examine the message passing, we see:

buffer <- freelist

PutUInt8(buffer, discriminant)
w.Write(buffer)

freelist <- buffer

buffer <- freelist

PutUint16(buffer, value)
w.Write(buffer)

freelist <- buffer

Each <- requires a channel select, which more-or-less bears the performance implication of a mutex. This cost, in addition to need to wake up other goroutines and switch executions, imparts a significant performance penalty. In the context of block serialization, several hundred thousand of these operations may be performed.

Solution

In our example above, we can improve this by only using two <-, one to borrow and one to return, as so:

buffer <- freelist

PutUInt8(buffer, discriminant)
w.Write(buffer)

PutUint16(buffer, value)
w.Write(buffer)

freelist <- buffer

As expected, cutting the number channels sends in half cuts also cuts the latency in half, which can be seen in the benchmarks below for larger VarInts,

The remainder of this PR is to propagate this pattern all the way up to the top level of messages in the wire package, such that deserializing a message only incurs one borrow and one return. Any subroutines are made to conditionally borrow from the binarySerializer if the invoker has not provided them with a buffer, and conditionally return if they indeed were required to borrow.

A good example of how these channel sends/receives can add up is in MsgTx serialization, which is now upwards of 80% faster as a result of these optimizations:

benchmark                         old ns/op     new ns/op     delta
BenchmarkSerializeTx-8            683           142           -79.21%
BenchmarkSerializeTxSmall-8       724           143           -80.25%
BenchmarkSerializeTxLarge-8       1476002       182111        -87.66%

Preliminary Benchmarks

benchmark                         old ns/op     new ns/op     delta
BenchmarkWriteVarInt1-8           71.9          74.6          +3.76%
BenchmarkWriteVarInt3-8           148           70.9          -52.09%
BenchmarkWriteVarInt5-8           149           71.0          -52.35%
BenchmarkWriteVarInt9-8           147           72.8          -50.48%
BenchmarkReadVarInt1-8            78.8          77.9          -1.14%
BenchmarkReadVarInt3-8            159           87.7          -44.84%
BenchmarkReadVarInt5-8            155           88.3          -43.03%
BenchmarkReadVarInt9-8            158           86.9          -45.00%
BenchmarkReadVarStr4-8            120           119           -0.83%
BenchmarkReadVarStr10-8           138           130           -5.80%
BenchmarkWriteVarStr4-8           101           105           +3.96%
BenchmarkWriteVarStr10-8          103           105           +1.94%
BenchmarkReadOutPoint-8           91.3          28.7          -68.57%
BenchmarkWriteOutPoint-8          78.9          10.2          -87.07%
BenchmarkReadTxOut-8              245           118           -51.84%
BenchmarkWriteTxOut-8             151           93.5          -38.08%
BenchmarkReadTxIn-8               338           139           -58.88%
BenchmarkWriteTxIn-8              238           31.0          -86.97%
BenchmarkDeserializeTxSmall-8     1119          586           -47.63%
BenchmarkDeserializeTxLarge-8     2476063       1275815       -48.47%
BenchmarkSerializeTx-8            683           142           -79.21%
BenchmarkSerializeTxSmall-8       724           143           -80.25%
BenchmarkSerializeTxLarge-8       1476002       182111        -87.66%
BenchmarkReadBlockHeader-8        406           69.1          -82.98%
BenchmarkWriteBlockHeader-8       431           21.1          -95.10%
BenchmarkDecodeGetHeaders-8       13537         12238         -9.60%
BenchmarkDecodeHeaders-8          1025275       236709        -76.91%
BenchmarkDecodeGetBlocks-8        13206         11684         -11.53%
BenchmarkDecodeAddr-8             337977        157519        -53.39%
BenchmarkDecodeInv-8              5990935       1898169       -68.32%
BenchmarkDecodeNotFound-8         6285831       1864701       -70.33%
BenchmarkDecodeMerkleBlock-8      4357          2606          -40.19%
BenchmarkTxHash-8                 1928          1222          -36.62%
BenchmarkDoubleHashB-8            993           969           -2.42%
BenchmarkDoubleHashH-8            932           1029          +10.41%

benchmark                         old allocs     new allocs     delta
BenchmarkWriteBlockHeader-8       4              0              -100.00%
// all others remain 0 alloc

benchmark                         old bytes     new bytes     delta
BenchmarkWriteBlockHeader-8       16            0             -100.00%
// all others remain unchanged

Notes

I'm still in the process of going through and adding benchmarks to top-level messages in order to guage the overall performance benefit, expect more to be added at a later point.

There are a few remaining messages which have not yet been optimized, e.g. MsgAlert, MsgVesion, etc. I plan to add those as well but decided to start with the ones that were more performance critical.

@coveralls
Copy link

coveralls commented Dec 16, 2023

Pull Request Test Coverage Report for Build 7353202677

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.7%) to 56.754%

Totals Coverage Status
Change from base Build 7227596643: 0.7%
Covered Lines: 28907
Relevant Lines: 50934

💛 - Coveralls

@Roasbeef
Copy link
Member Author

cc @ellemouton

@Roasbeef
Copy link
Member Author

With this PR, combined with the rest on the v0.24 milestone, I was able to achieve a 12 hour sync over LAN!

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice changes and nicely split up into multiple commits.
I did as thorough a review as concentration allowed and didn't find anything wrong with the code.

The main thing I would change is to use a consistent pattern of:

buf := binarySerializer.Borrow()
defer binarySerializer.Return(buf)

everywhere to avoid a single mistake somewhere causing the buffers to all be occupied and becoming useless (which probably wouldn't show up in benchmark tests).

wire/bench_test.go Outdated Show resolved Hide resolved
wire/common.go Outdated Show resolved Hide resolved
wire/common.go Show resolved Hide resolved
wire/bench_test.go Outdated Show resolved Hide resolved
wire/msgtx.go Outdated Show resolved Hide resolved
wire/msgtx.go Show resolved Hide resolved
wire/msgtx.go Show resolved Hide resolved
wire/msgtx.go Outdated Show resolved Hide resolved
wire/msggetblocks.go Show resolved Hide resolved
wire/msgblock.go Show resolved Hide resolved
@Roasbeef Roasbeef merged commit 16684f6 into btcsuite:master Dec 29, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants