-
Notifications
You must be signed in to change notification settings - Fork 38.3k
net: Waste less time in socket handling #34025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We run InactivityChecks() for each node everytime poll()/select() every 50ms or so. Rather than calculating the current time once for each node, just calculate it once and reuse it.
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34025. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible places where named args for integral literals may be used (e.g.
2025-12-09 |
|
Not sure if the flame graph is usable, but: GetBoolArg takes up 0.31% of total time, as part of PushMessage that takes up 1.75% of total time, in b-msghand. GetTime takes up 0.82% of total time, as part of InactivityCheck that takes up 1.78% of total time, in b-net. Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).
|
|
Concept ACK |
| { | ||
| AssertLockNotHeld(m_total_bytes_sent_mutex); | ||
|
|
||
| auto now = GetTime<std::chrono::microseconds>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetTime() is deprecated.
| auto now = GetTime<std::chrono::microseconds>(); | |
| const auto now = NodeClock::now(); |
(this is moving the deprecated call from elsewhere, but now is a good time to change it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).
I did try that, it requires a lot of changes to all the things we compare now against, so m_connected, m_last_send, m_last_recv. m_connected in particular is a big hit compared to the rest of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works:
@@ -2125 +2125 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
- auto now = GetTime<std::chrono::microseconds>();
+ auto now = NodeClock::now();
@@ -2218 +2218 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
- if (InactivityCheck(*pnode, now)) pnode->fDisconnect = true;
+ if (InactivityCheck(*pnode, now.time_since_epoch())) pnode->fDisconnect = true;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question beyond this PR: if GetTime() is still useful because a lot of surrounding code uses e.g. std::chrono::seconds which we need to compare against, then should GetTime() be un-deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).
Happy to take a look as well in a fresh commit, either here, or in a follow-up.
if
GetTime()is still useful because a lot of surrounding code uses e.g.std::chrono::secondswhich we need to compare against, then shouldGetTime()be un-deprecated?
GetTime returning a time duration is wrong, because the current time point (now) is not a duration, but a time point. A duration arises as the difference of two points in time. This duration can then be compared with any other duration (e.g. peer timeout). I don't think it makes sense un-deprecate something just because it is used in the current code. If this was a valid reason, nothing could ever be marked deprecated as long as it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maflcko, yes, I agree. But existent code uses "duration":
CNode::m_last_send
CNode::m_last_recv
CNode::m_last_tx_time
CNode::m_last_block_time
CNode::m_connected
so, if one if writing new code that needs to compare "now" to those, which one is preferred:
- use the deprecated
GetTime(); or - use
NodeClock::now()and convert usingtime_since_epoch()in order to compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to take a look as well in a fresh commit, either here, or in a follow-up.
Happy to review in a followup.
When looking at this previously, it seemed like having an AtomicTimePoint<Clock> template would be helpful (for things like m_last_send which is currently atomic<seconds>), because time_point doesn't support being an atomic. Here's a branch where I last looked at this topic; the "Add AtomicTimePoint" commit might be worth cherry-picking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because time_point doesn't support being an atomic.
Duration does not either. I think it is fine to type .load() or .store(), where needed.
I guess there is no rush here, and this can be done, when all other places (non-atomic) are switched.
| } | ||
|
|
||
| if (InactivityCheck(*pnode)) pnode->fDisconnect = true; | ||
| if (InactivityCheck(*pnode, now)) pnode->fDisconnect = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, feel free to ignore: given that we allow some nontrivial amount of time to pass between the variable initialization and usage, maybe now is not the best name for it. What about time_at_start_of_loop or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'd say "we allow some nontrivial amount of time to pass" -- it's probably a bug if that were to actually happen?
| AssertLockNotHeld(m_total_bytes_sent_mutex); | ||
|
|
||
| auto now = GetTime<std::chrono::microseconds>(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reasoning of the first commit cea443e net: Pass time to InactivityChecks fuctions is well grounded. No need to retrieve the current time for every node, given that we only care about seconds-precision here. I measured on my slow node with a few tens of connections: all nodes are processed in a few milliseconds or less. So, at worst this is outdated that much which is fine IMO.
b4d1007 to
5373898
Compare
|
Re: #34025 (comment) Just to confirm, this flamegraph is from a node that has finished syncing to the tip? i.e. IBD is not included in this graph, right? |
Interesting. I was wondering why getting the time eats so much CPU. Though, in the happy path, concept ack, Seems fine to still make the changes here. |
Yes; it's taken over a longish period though, iirc, I think
Yeah, presuming |
5373898 to
5f5c1ea
Compare
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 5f5c1ea
Would be happy to see the call to the deprecated GetTime() removed: #34025 (comment).
I do not see it as a blocker because this PR is actually moving the call around, not planting a new one.
|
review ACK 5f5c1ea 🏣 Show signatureSignature: |
sedited
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 5f5c1ea
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 5f5c1ea
Looks like an improvement, but I wonder if instead of doing micro-optimizations like the ones here, it wouldn't be better to have the entire InactivityCheck() procedure run somewhere else than in the SocketHandlerConnected, or at least less often - it seems a bit absurd to check every 50ms for a timeout of 20 minutes. Even the initial waiting time m_peer_connect_timeout from ShouldRunInactivityChecks() for a new peer doesn't really be checked that frequently.
Cuts out some wasted time in net socket handling. First, only calculates the current time once every 50ms, rather than once for each peer, which given we only care about second-level precision seems more than adequate. Second, caches the value of the
-capturemessagessetting inCConnmanrather than re-evaluating it every time we invokePushMessaage.