Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport Bitcoin PR#9441: Net: Massive speedup. Net locks overhaul #1586

Merged
merged 16 commits into from
Aug 23, 2017

Conversation

OlegGirko
Copy link

This is backport of Bitcoin PR bitcoin#9441.

This is the PR that not only significantly improves networking speed, but also it was the original cause that triggered massive code refactoring backport from Bitcoin.
Now we are at the end of this refactoring.

There are still couple of loose ends that will be addressed in subsequent PRs:

The original PR description follows.

Depends on (includes) bitcoin#9289. This is what I ultimately set out to fix with the net refactor.

In my (short) tests, it cuts network latency to ~50%. In some cases, more like 30%.

Test method: 1 fresh node (macbook) connected to a single other node (desktop). Testnet. Running until synced to block 10000. Results:

new = patched with this PR.
old = running #9289.
client / server
old / old: 100000 in 8:05
new / old: 100000 in 3:24
new / new: 100000 in 2:16

The results are very reproducible, always within a few seconds. Not only is it a nice improvement for the fresh node, but it compounds when its peer is running the updated code as well.

I had hoped to have the abstraction complete in time for 0.14, but it looks like that's unlikely at this point. For reference, it would look something more like theuni/bitcoin@1a6b10a.

Anyway, this is an attempt to fix the actual issue in time for 0.14, and putting off the last bit of refactor until after that. This addresses the issue observed in bitcoin#9415, but also cleans up the nasty locking issues.

I labeled this WIP because there are probably still several racy bits. Quite a bit of test-writing remains.

See the individual commits for the details. tl;dr: We currently either process a peer's message or read from their socket, but never both simultaneously. The changes here remove that restriction.

Surprisingly this hasn't been causing me any issues while testing, probably
because it requires lots of large blocks to be flying around.

Send/Recv corks need tests!
This will be needed so that the message processor can cork incoming messages
These conditions are problematic to check without locking, and we shouldn't be
relying on the refcount to disconnect.
when vRecvMsg becomes a private buffer, it won't make sense to allow other
threads to mess with it anymore.
This is left-over from before there was proper accounting. Hitting 2x the
sendbuffer size should not be possible.
…eserialize

We'll soon no longer have access to vRecvMsg, and this is more intuitive anyway.
This allows locking to be pushed down to only where it's needed

Also reuse the current time rather than checking multiple times.
This may be used publicly in the future
In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.

Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.

Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.

Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR dashpay#3180.

The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.
This separates the storage of messages from the net and queued messages for
processing, allowing the locks to be split.
Messages are dumped very quickly from the socket handler to the processor, so
it's the depth of the processing queue that's interesting.

The socket handler checks the process queue's size during the brief message
hand-off and pauses if necessary, and the processor possibly unpauses each time
a message is popped off of its queue.
Similar to the recv flag, but this one indicates whether or not the net's send
buffer is full.

The socket handler checks the send queue when a new message is added and pauses
if necessary, and possibly unpauses after each message is drained from its buffer.
vRecvMsg is now only touched by the socket handler thread.

The accounting vars (nRecvBytes/nLastRecv/mapRecvBytesPerMsgCmd) are also
only used by the socket handler thread, with the exception of queries from
rpc/gui. These accesses are not threadsafe, but they never were. This needs to
be addressed separately.

Also, update comment describing data flow
@UdjinM6 UdjinM6 added this to the 12.2 milestone Aug 22, 2017
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

EDIT: slightly tested, seems to be working ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants