Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: skip getpeerinfo for a peer without CNodeStateStats #26515

Closed

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Nov 16, 2022

The objects CNode, CNodeState and Peer store different info about a peer - InitializeNode() and FinalizeNode() make sure that for the duration of a connection, we should always have one of each for a peer.

Therefore, there is no situation in which, as part of getpeerinfo RPC, GetNodeStateStats() (which requires a CNodeState and a Peer entry for a NodeId to succeed) could fail for a legitimate reason while the peer is connected - this can only happen if there is a race condition between peer disconnection and the getpeerinfo processing (see also a more detailed description of this in #26457 (review)).

But in this case I think it's better to just not include the newly disconnected peer in the response instead of returning just parts of its data.

An earlier version of this PR also made the affected CNodeStateStats fields non-optional (see mzumsande@5f900e2). Since this conflicts with #25923 and should be a separate discussion, I removed that commit from this PR.

@mzumsande mzumsande marked this pull request as draft November 16, 2022 22:14
// exist for the duration of the connection - except if there is a race where the
// peer got disconnected in between the GetNodeStats() and the GetNodeStateStats()
// calls. In this case, the peer doesn't need to be reported here.
if (!fStateStats) {
Copy link
Member

@jonatack jonatack Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested yet if this works, but checking for fStateStats seems a bit of a smell in general and I wanted to remove it entirely, e.g. #25923, though the way you use it here looks like a real improvement that could be orthogonal to much of that pull.

Note that you haven't updated the code for the GUI peers details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into #25923 right now - it seems that it does some what Marco suggested below, making some fields, like txrelay, optional for an intrinsic reason instead of using defaults like -1 or 0 when the info is not known.

As for the GUI - I didn't change it on purpose, because this comment scared me off. I think that #25923 is affected by that too, I'll comment there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TRY_LOCK for accessing CNodeStats has been in the GUI since peer information was added in #4225 (see #4225 (comment)). I'm not sure why it was decided that the GUI could block on taking cs_vNodes, but can't block on taking cs_main.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either, but one reason might be because it slows down IBD a little if the GUI constantly competes with validation for cs_main just to update the peers window. Another reason might be that cs_main is held more often and longer by the node compared to cs_vNodes (current name m_nodes_mutex), so it might become noticeable that the GUI freezes. In both of these cases, skipping some of the requests from the GUI could make sense.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 17, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK MarcoFalke
Concept ACK jonatack
Approach ACK dergoegge

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26621 (refactor: Continue moving application data from CNode to Peer by dergoegge)
  • #25923 (p2p: always provide CNodeStateStats and getpeerinfo/netinfo/gui updates by jonatack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jonatack
Copy link
Member

Concept ACK, will look further at this change.

@maflcko
Copy link
Member

maflcko commented Nov 17, 2022

Do we really want this, given that the fields have been optional forever(?)? I think we should first think if there are fields that should be optional. If the answer to that is no, then this seems fine to do, but without an answer, it seems too early to change.

@mzumsande
Copy link
Contributor Author

Do we really want this, given that the fields have been optional forever(?)? I think we should first think if there are fields that should be optional. If the answer to that is no, then this seems fine to do, but without an answer, it seems too early to change.

I think that if we make any fields optional, we should have a reason for this, and communicate to the users within the RPC help in which situations these fields won't be included.
Unless I'm missing something, the honest explanation as to why these fields are optional right now would be:
"We can't keep our internal objects in sync as well as we'd like to, so in case of a race condition some fields may be left out from the response because they are part of Internal Object 1, and others are always present because they belong to Internal Object 2" - this seems really unsatisfactory to me.

The reason should be something that has to do with the actual meaning of the field, e.g. if there is a good reason why it's not available under certain conditions (like it's currently being done for fields like mapped_as or minping).

@maflcko
Copy link
Member

maflcko commented Nov 17, 2022

The reason should be something that has to do with the actual meaning of the field

yes, agree. This is what I was trying to say we should answer first before changing the code.

@mzumsande
Copy link
Contributor Author

yes, agree. This is what I was trying to say we should answer first before changing the code.

Ok, I'll look into this field by field before taking this out of draft. However, note that if we decide to keep certain field optional for an intrinsic reason, it would likely be a behavior change, because we currently include all CNodeStateStats fields as long as we have a Peer and CNodeState object for the peer.

@maflcko
Copy link
Member

maflcko commented Nov 17, 2022

if we decide to keep certain field optional for an intrinsic reason, it would likely be a behavior change

Heh, this might actually be a convincing argument to go ahead with the changes here as-is. (And then add a new field name, if needed, for "truly" optional fields instead of breaking the behaviour?)

@maflcko
Copy link
Member

maflcko commented Nov 18, 2022

Though,

If this gets approval, bitcoin-cli -netinfo could probably be simplified as a follow-up because it wouldn't need to deal with missing entries for some of these fields anymore.

I don't think this part is true. bitcoin-cli would still need to deduce if a non-optional field is "true" optional with some heuristics like bitcoin-core/gui#677. Though, this is already true on master and seems unrelated to this pull.

There is no situation in which CNodeStateStats could be
missing for a legitimate reason - this can only happen if
there is a race condition between peer disconnection and
the getpeerinfo call, in which case the disconnected peer
doesn't need to be included in the response.
@mzumsande mzumsande force-pushed the 202211_getpeerinfo_allornothing branch from ec839fe to 6fefd49 Compare November 28, 2022 19:00
@mzumsande mzumsande changed the title rpc: skip getpeerinfo for a peer without CNodeStateStats, make its fields non-optional rpc: skip getpeerinfo for a peer without CNodeStateStats Nov 28, 2022
@mzumsande
Copy link
Contributor Author

I removed the part that makes the CNodeStateStats fields non-optional. Since that part conflicts with #25293 and can be a separate discussion (I commented there), this PR now only addresses the situation where we have a CNode but no CNodeStateStats due to a race during shutdown.

I don't think this part is true. (...)

Removed from OP

@mzumsande mzumsande marked this pull request as ready for review November 28, 2022 19:12
@dergoegge
Copy link
Member

Approach ACK 6fefd49

Unless I'm missing something, the honest explanation as to why these fields are optional right now would be:
"We can't keep our internal objects in sync as well as we'd like to, so in case of a race condition some fields may be left out from the response because they are part of Internal Object 1, and others are always present because they belong to Internal Object 2" - this seems really unsatisfactory to me.

I don't think this is entirely accurate and I summarized my thinking about it below.

It seems to me that this is all the result of us re-architecting our code base, specifically splitting lower level network and application layer per-peer state into separate data structures managed by different components (CConnman, PeerManager). Obviously an overall good development which we don't need to rehash the merits of here.

Ideally these components are loosely coupled (they aren't really right now but I think we will/should get there eventually), such that the internal state of one is not dependent on the internal state of the other. Even though they are still tightly coupled at the moment, we should not rely on that to be the case going forward. So in my opinion, any user of the CConnman & PeerManager API (in this case the RPC handler querying statistics) should not expect there to be consistency between the internal state of these two modules and instead handle the edge cases (e.g. only one component returns stats) on its own end.

I like this PR because it is essentially doing what I describe above, that is handling the edge case in which only the CNode stats are available while the application layer stats are not (due to the connection being dropped in between the queries).

@maflcko
Copy link
Member

maflcko commented Dec 19, 2022

I wonder if this will introduce races in code (e.g. our tests) that assume a missing entry in the dict implies a fully disconnected node.

Edit: If there was a race, it would already exist regardless of this pull. m_nodes will have the node always erased before a call to DeleteNode. (This pull isn't changing that). And a node erased from m_nodes is currently considered to be "fully disconnected", even if DeleteNode has not yet been called.

@maflcko
Copy link
Member

maflcko commented Dec 19, 2022

review ACK 6fefd49 👒

Show signature

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

review ACK 6fefd49527fa0ed9535e54f2a3e76fe2599b2350 👒
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUiLywwAiJVX2n3iULxIoVvCpGT5ZbVbdyz3if/aL9PQCFzU0EA+j6IWYnecZkD/
L7ofFFXe89RGlHa1pvMKL2lQ9BYalf4uaL6A/SEN9HdSN/KzIMtxBOFc5yz79xLo
hKM8ZPK4STu0QFVIIfy71S3ozH46QGG1lLzHz0fj7Vi1pIWKU3S2c25+82DpcYhv
vBqjSZZ1jbEoiKGalQNQ35srRMDaiFKXkkuLa7qpvHmrgoWXV7XuGF0ivnhj7Nf+
510gsDJnetWtRgbRhFUa7MfCp9vnYyuGHVur2Vr/WcIbBa0ar/tEOcbiJ6jdl9Sb
brhZ/efMHJNNWFCROpH2M6xkNDrKObuTYiIRkoCExjrd8SB6xiFoCLNAQsK5n/wc
bmptAUkLhgjpFwka4r6NThU5hKKlmLBpjKXkRh1/qZlmYeUXIfNTk5gr9VGJWEHa
UWq6eFZrWThxsNJSI58rU1GscXs07L5QWDaHbhMr/5NOiLUc0GbszjTt9bXfCqbD
Vrfs9vMG
=TjN1
-----END PGP SIGNATURE-----

maflcko pushed a commit that referenced this pull request Dec 19, 2022
6fefd49 rpc: Require NodeStateStats object in getpeerinfo (Martin Zumsande)

Pull request description:

  The objects `CNode`, `CNodeState` and `Peer` store different info about a peer - `InitializeNode()` and `FinalizeNode()` make sure that for the duration of a connection, we should always have one of each for a peer.

  Therefore, there is no situation in which, as part of getpeerinfo RPC,  `GetNodeStateStats()` (which requires a `CNodeState` and a `Peer` entry for a `NodeId` to succeed)  could fail for a legitimate reason while the peer is connected - this can only happen if there is a race condition between peer disconnection and the `getpeerinfo` processing (see also a more detailed description of this in #26457 (review)).

  But in this case I think it's better to just not include the newly disconnected peer in the response instead of returning just parts of its data.

  An earlier version of this PR also made the affected `CNodeStateStats` fields non-optional (see mzumsande@5f900e2). Since this conflicts with #25923 and should be a separate discussion, I removed that commit from this PR.

ACKs for top commit:
  dergoegge:
    Approach ACK 6fefd49
  MarcoFalke:
    review ACK 6fefd49 👒

Tree-SHA512: 89c8f7318df4634c1630415de9c8350e6dc2d14d9d07e039e5b180c51bfd3ee2ce99eeac4f9f858af7de846f7a6b48fcae96ebac08495b30e431a5d2d4660532
@fanquake
Copy link
Member

This has been merged.

@fanquake fanquake closed this Dec 19, 2022
fanquake added a commit to fanquake/bitcoin that referenced this pull request Dec 19, 2022
These are no-longer optional after bitcoin#26515, so remove the documentation,
and no-op fStateStats checks.
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 19, 2022
…ateStats

6fefd49 rpc: Require NodeStateStats object in getpeerinfo (Martin Zumsande)

Pull request description:

  The objects `CNode`, `CNodeState` and `Peer` store different info about a peer - `InitializeNode()` and `FinalizeNode()` make sure that for the duration of a connection, we should always have one of each for a peer.

  Therefore, there is no situation in which, as part of getpeerinfo RPC,  `GetNodeStateStats()` (which requires a `CNodeState` and a `Peer` entry for a `NodeId` to succeed)  could fail for a legitimate reason while the peer is connected - this can only happen if there is a race condition between peer disconnection and the `getpeerinfo` processing (see also a more detailed description of this in bitcoin#26457 (review)).

  But in this case I think it's better to just not include the newly disconnected peer in the response instead of returning just parts of its data.

  An earlier version of this PR also made the affected `CNodeStateStats` fields non-optional (see mzumsande@5f900e2). Since this conflicts with bitcoin#25923 and should be a separate discussion, I removed that commit from this PR.

ACKs for top commit:
  dergoegge:
    Approach ACK 6fefd49
  MarcoFalke:
    review ACK 6fefd49 👒

Tree-SHA512: 89c8f7318df4634c1630415de9c8350e6dc2d14d9d07e039e5b180c51bfd3ee2ce99eeac4f9f858af7de846f7a6b48fcae96ebac08495b30e431a5d2d4660532
jonatack pushed a commit to jonatack/bitcoin that referenced this pull request Jan 11, 2023
There is no situation in which CNodeStateStats could be
missing for a legitimate reason - this can only happen if
there is a race condition between peer disconnection and
the getpeerinfo call, in which case the disconnected peer
doesn't need to be included in the response.

Github-Pull: bitcoin#26515
Rebased-From: 6fefd49
@mzumsande mzumsande deleted the 202211_getpeerinfo_allornothing branch January 16, 2023 18:49
maflcko pushed a commit that referenced this pull request Jan 18, 2023
1dc0e4b rpc: remove optional from fStateStats fields (fanquake)

Pull request description:

  These are no-longer optional after #26515, so remove the documentation, and no-op `fStateStats` checks.

ACKs for top commit:
  dergoegge:
    Code review ACK 1dc0e4b

Tree-SHA512: 06d4550e866341b379bfdbc72d67d71a3b7ceceec06ebd4c5e6f178b75fe40cbf4aff51adba1bc52590e69e818cbdecb0366bf1528c59c5c3dff5bbdba8eac68
fanquake added a commit that referenced this pull request Jan 20, 2023
…t CNodeStateStats

e72313e rpc: Require NodeStateStats object in getpeerinfo (Martin Zumsande)

Pull request description:

  Backports #26515.

ACKs for top commit:
  fanquake:
    ACK e72313e

Tree-SHA512: 28e885ea299fe8a3a7538628d413c434bc42c251a2c1ae238bca0652709f5bd781eb157675171ab538f6e2f6f720f1c184200ac3857f6c78f48858949eed49da
@fanquake
Copy link
Member

Was backported to 24.x in #26457.

@bitcoin bitcoin locked and limited conversation to collaborators Jan 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants