Skip to content

Conversation

@ajtowns
Copy link
Contributor

@ajtowns ajtowns commented Dec 6, 2025

Cuts out some wasted time in net socket handling. First, only calculates the current time once every 50ms, rather than once for each peer, which given we only care about second-level precision seems more than adequate. Second, caches the value of the -capturemessages setting in CConnman rather than re-evaluating it every time we invoke PushMessaage.

We run InactivityChecks() for each node everytime poll()/select() every
50ms or so. Rather than calculating the current time once for each node,
just calculate it once and reuse it.
@DrahtBot DrahtBot added the P2P label Dec 6, 2025
@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 6, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34025.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK vasild, maflcko, sedited, mzumsande

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30951 (net: option to disallow v1 connection on ipv4 and ipv6 peers by stratospher)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

  • CaptureMessage(pnode->addr, msg.m_type, msg.data, /is_incoming=/false) in src/net.cpp

2025-12-09

@ajtowns
Copy link
Contributor Author

ajtowns commented Dec 6, 2025

Not sure if the flame graph is usable, but:

perf

GetBoolArg takes up 0.31% of total time, as part of PushMessage that takes up 1.75% of total time, in b-msghand.

GetTime takes up 0.82% of total time, as part of InactivityCheck that takes up 1.78% of total time, in b-net.

Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).

Note that CConnman was a friend of CNode until #27257 (no longer relevant)

@sedited
Copy link
Contributor

sedited commented Dec 6, 2025

Concept ACK

@fanquake
Copy link
Member

fanquake commented Dec 8, 2025

cc @theuni @vasild

{
AssertLockNotHeld(m_total_bytes_sent_mutex);

auto now = GetTime<std::chrono::microseconds>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTime() is deprecated.

Suggested change
auto now = GetTime<std::chrono::microseconds>();
const auto now = NodeClock::now();

(this is moving the deprecated call from elsewhere, but now is a good time to change it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).

I did try that, it requires a lot of changes to all the things we compare now against, so m_connected, m_last_send, m_last_recv. m_connected in particular is a big hit compared to the rest of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works:

@@ -2125 +2125 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
-    auto now = GetTime<std::chrono::microseconds>();
+    auto now = NodeClock::now();
@@ -2218 +2218 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
-        if (InactivityCheck(*pnode, now)) pnode->fDisconnect = true;
+        if (InactivityCheck(*pnode, now.time_since_epoch())) pnode->fDisconnect = true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question beyond this PR: if GetTime() is still useful because a lot of surrounding code uses e.g. std::chrono::seconds which we need to compare against, then should GetTime() be un-deprecated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).

Happy to take a look as well in a fresh commit, either here, or in a follow-up.

if GetTime() is still useful because a lot of surrounding code uses e.g. std::chrono::seconds which we need to compare against, then should GetTime() be un-deprecated?

GetTime returning a time duration is wrong, because the current time point (now) is not a duration, but a time point. A duration arises as the difference of two points in time. This duration can then be compared with any other duration (e.g. peer timeout). I don't think it makes sense un-deprecate something just because it is used in the current code. If this was a valid reason, nothing could ever be marked deprecated as long as it is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maflcko, yes, I agree. But existent code uses "duration":

CNode::m_last_send
CNode::m_last_recv
CNode::m_last_tx_time
CNode::m_last_block_time
CNode::m_connected

so, if one if writing new code that needs to compare "now" to those, which one is preferred:

  1. use the deprecated GetTime(); or
  2. use NodeClock::now() and convert using time_since_epoch() in order to compare?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to take a look as well in a fresh commit, either here, or in a follow-up.

Happy to review in a followup.

When looking at this previously, it seemed like having an AtomicTimePoint<Clock> template would be helpful (for things like m_last_send which is currently atomic<seconds>), because time_point doesn't support being an atomic. Here's a branch where I last looked at this topic; the "Add AtomicTimePoint" commit might be worth cherry-picking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because time_point doesn't support being an atomic.

Duration does not either. I think it is fine to type .load() or .store(), where needed.

I guess there is no rush here, and this can be done, when all other places (non-atomic) are switched.

}

if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
if (InactivityCheck(*pnode, now)) pnode->fDisconnect = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, feel free to ignore: given that we allow some nontrivial amount of time to pass between the variable initialization and usage, maybe now is not the best name for it. What about time_at_start_of_loop or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'd say "we allow some nontrivial amount of time to pass" -- it's probably a bug if that were to actually happen?

AssertLockNotHeld(m_total_bytes_sent_mutex);

auto now = GetTime<std::chrono::microseconds>();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasoning of the first commit cea443e net: Pass time to InactivityChecks fuctions is well grounded. No need to retrieve the current time for every node, given that we only care about seconds-precision here. I measured on my slow node with a few tens of connections: all nodes are processed in a few milliseconds or less. So, at worst this is outdated that much which is fine IMO.

@ajtowns ajtowns force-pushed the 202512-netsplit-opt branch from b4d1007 to 5373898 Compare December 9, 2025 16:57
@dergoegge
Copy link
Member

Re: #34025 (comment)

Just to confirm, this flamegraph is from a node that has finished syncing to the tip? i.e. IBD is not included in this graph, right?

@maflcko
Copy link
Member

maflcko commented Dec 9, 2025

GetTime takes up 0.82% of total time, as part of InactivityCheck that takes up 1.78% of total time, in b-net.

Interesting. I was wondering why getting the time eats so much CPU. Though, in the happy path, InactivityCheck is just loading a few atomics, which means getting the time costs as much as loading a few atomics. Also, the flame graph probably shows the CPU time, and not the wall clock time. So the patch here likely won't cut the wall clock time between two calls of SocketHandlerConnected, but only the CPU time inside a single SocketHandlerConnected call?

concept ack, Seems fine to still make the changes here.

@ajtowns
Copy link
Contributor Author

ajtowns commented Dec 9, 2025

Just to confirm, this flamegraph is from a node that has finished syncing to the tip? i.e. IBD is not included in this graph, right?

Yes; it's taken over a longish period though, iirc, I think either 30m or 2h. You can see ProcessNewBlock at 0.20% just before ProcessTransaction at 4.56% fwiw.

Also, the flame graph probably shows the CPU time, and not the wall clock time. So the patch here likely won't cut the wall clock time between two calls of SocketHandlerConnected, but only the CPU time inside a single SocketHandlerConnected call?

Yeah, presuming SocketHandlerConnected isn't using 100% of a core, it'll be spending its time waiting for the SELECT_TIMEOUT_MILLISECONDS timeout to hit, which is 50ms.

@ajtowns ajtowns force-pushed the 202512-netsplit-opt branch from 5373898 to 5f5c1ea Compare December 9, 2025 20:52
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5f5c1ea

Would be happy to see the call to the deprecated GetTime() removed: #34025 (comment).
I do not see it as a blocker because this PR is actually moving the call around, not planting a new one.

@DrahtBot DrahtBot requested a review from sedited December 10, 2025 05:37
@maflcko
Copy link
Member

maflcko commented Dec 10, 2025

review ACK 5f5c1ea 🏣

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: review ACK 5f5c1ea01955d277581f6c2acbeb982949088960 🏣
DIf+EMRUrE9g+3ldiGUW0pHeRyThkZk3bbOL6sas10+I4f60l14Vo/S1nCwtrHEqiue3z1X4uTE3S7uN3Tg3DQ==

Copy link
Contributor

@sedited sedited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5f5c1ea

@glozow glozow requested a review from theuni December 11, 2025 16:39
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5f5c1ea

Looks like an improvement, but I wonder if instead of doing micro-optimizations like the ones here, it wouldn't be better to have the entire InactivityCheck() procedure run somewhere else than in the SocketHandlerConnected, or at least less often - it seems a bit absurd to check every 50ms for a timeout of 20 minutes. Even the initial waiting time m_peer_connect_timeout from ShouldRunInactivityChecks() for a new peer doesn't really be checked that frequently.

@fanquake fanquake merged commit 597b8be into bitcoin:master Dec 12, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants