Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net, rpc: expose connection type in getpeerinfo #19883

Closed
wants to merge 1 commit into from

Conversation

jonatack
Copy link
Member

@jonatack jonatack commented Sep 5, 2020

Expose conn_type via a practical API for JSON-RPC consumers.

  • returns the conn_type as an integer id for API clients. It is a simple and small change to implement and maintain, and the API can remain stable even if the ConnectionType element naming or order changes.
  • adds a uint8_t type to the ConnectionType enum class; if preferred, this can be dropped
$ ./src/bitcoin-cli help getpeerinfo
...
    "inbound" : true|false,        (boolean) Inbound (true) or Outbound (false)
    "addnode" : true|false,        (boolean) Whether connection was due to addnode/-connect or if it was an automatic/inbound connection
    "conn_type" : n,               (numeric) Connection type between 0 and 5:
                                   0 - inbound (initiated by the peer)
                                   1 - outbound-full-relay (default automatic connections)
                                   2 - manual (added using the -addnode/-connect configuration options or the addnode RPC)
                                   3 - feeler (short-lived automatic connection to test addresses)
                                   4 - block-relay-only (does not relay transactions or addresses)
                                   5 - addr-fetch (short-lived automatic connection to request addresses)

src/rpc/net.cpp Outdated Show resolved Hide resolved
Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's good idea to return enum's int value. If the enum is changed/refactored clients get broken. IMO string is fine and enough.

@jonatack
Copy link
Member Author

jonatack commented Sep 6, 2020

I'm not sure if it's good idea to return enum's int value. If the enum is changed/refactored clients get broken. IMO string is fine and enough.

If a string was returned, changing the string name would be a breaking change, and string names are likely to be bike-shed or changed. OTOH there is no reason why the enum integer values would ever have to change. The enum class itself can separate the int values from the rest with no need for extra methods; it can remain stable even if the enum element naming or order is otherwise changed. And the integer is the easiest data for API clients to parse, bounds check and use as an index into a data structure.

Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nobody cares about the int value, if that was the case we wouldn't use enum class. Sending int and have this documented is fine too (but the redundant string is then unnecessary) but the way it is implemented doesn't look safe, mainly because not all conn_type are tested.

This code would be much simpler, and IMO safer, if you just add std::map<ConnectionType, std::string> CONNECTION_TYPE_NAME, CONNECTION_TYPE_DESCRIPTION.

src/rpc/net.cpp Outdated Show resolved Hide resolved
src/rpc/net.cpp Outdated Show resolved Hide resolved
@jonatack
Copy link
Member Author

jonatack commented Sep 6, 2020

I think nobody cares about the int value

Exactly -- no one caring about the int value is an advantage. The enum class int values can remain stable for the API even if people want to change the enum element naming or order.

Sending int and have this documented is fine too (but the redundant string is then unnecessary)

Thanks for reviewing and making this better, @promag. Dropped the string field and put the doc directly in RPCHelpMan.

Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this needs a test for all possible conn_type before merge. You can also add a release note of the new field.

@jnewbery
Copy link
Contributor

jnewbery commented Sep 7, 2020

I prefer #19725. We shouldn't leak our internal enum indexes out to a public API (since that locks us into a specific implementation)

@jonatack
Copy link
Member Author

jonatack commented Sep 7, 2020

since that locks us into a specific implementation

If the ConnectionType enum were to be abandoned for a hypothetical different implementation, it would just require adding a method to serialise the ids. For now that's not needed.

@promag
Copy link
Member

promag commented Sep 7, 2020

@jnewbery I've made that point too, but you can also assume that ATM the mapping function used is the identity function.

@amitiuttarwar
Copy link
Contributor

this PR isn't quite a replacement for #19725 because it doesn't include the deprecation of getpeerinfo.addnode, or the logging improvement in net_processing.

I think the proposal of adding conn_type as an integer id is a reasonable proposal, but wanted to clarify the differences for reviewers. I'm personally -0 because I think a string suffices.

@laanwj
Copy link
Member

laanwj commented Sep 8, 2020

I think the proposal of adding conn_type as an integer id is a reasonable proposal, but wanted to clarify the differences for reviewers. I'm personally -0 because I think a string suffices.

I agree here that a string suffices. I don't think it's wise to expose the enumeration IDs on the JSON-RPC interface, as they are an internal implementation detail, and for better or worse (no real enum type) the common way is to use strings as enumerators in JSON.

@jonatack
Copy link
Member Author

jonatack commented Sep 8, 2020

By any objective technical criteria, ersatz long-format string ids in the place of integer ones seem a substantially worse choice

  • code
  • complexity
  • robustness
  • API stability
  • API flexibility for clients
  • memory
  • speed
  • network bandwidth
  • maintainability

in every way maybe an order of magnitude worse.

An API client can bounds check an integer id, then call a vector element with it.

With a long format string, the client has to match against every possible expected value first in order to error check the value.

This is a lean, clean implementation that adds 2 lines to net.{h,cpp} versus 30.

Overall, I think it's objectively multiple times better for both Bitcoin Core and for software clients of the RPC API.

Also, this unbundles #19725 which adds extraneous logging refactoring and a controversial deprecation into the same PR. Just unbundling it adds value before considering the order of magnitude technical improvement.

@jonatack
Copy link
Member Author

jonatack commented Sep 8, 2020

The ids can be serialised via a separate method, but that doesn't seem needed here and would just be added complexity for no gain. The enum itself can separate the order and naming from the id values.

@jonatack
Copy link
Member Author

jonatack commented Sep 8, 2020

this PR isn't quite a replacement for #19725 because it doesn't include the deprecation of getpeerinfo.addnode, or the logging improvement in net_processing.

Yes, it's not intended to replace the logging refactoring or the deprecation. I'm -0.9 on both for the reasons I've stated in that PR.

@ajtowns
Copy link
Contributor

ajtowns commented Sep 19, 2020

If a string was returned, changing the string name would be a breaking change, and string names are likely to be bike-shed or changed. OTOH there is no reason why the enum integer values would ever have to change

The addition to getpeerinfo and the logging change is for helping humans understand what's going on, not for interoperability (there's no standard and different behaviours within the existing specs are perfectly reasonably), nor for command and control (there's no way for other programs to act on the connection type info), so no, changing the string names isn't a breaking change. There are obvious reasons why enum values change: if an entry is removed, or if the entries are rearranged. Yes, you can hardcode the values to prevent that, but there's no reason to do so: this isn't a standard, it's an aid for debugging problems with your node.

This is a small, focused, simple, performant alternative

getpeerinfo isn't a performance critical call, outputting strings via it isn't performance critical (both since all the numbers are encoded as strings anyway -- it's json; and since we're already decoding services to an array of strings via servicenames), and even if none of that were true, you should be providing benchmarks if claiming a performance improvement.

JSON-RPC expects integer ids.

I don't know where this is coming from, but it's not even literally true: "id - The request id. This can be of any type." and "id .. MUST contain a String, Number, or NULL value if included"

@sipa
Copy link
Member

sipa commented Sep 19, 2020

There is precedent for string values for enumerated types too already: the transaction types in listtransactions ("receive", "send", "generate", "immature", ...), and the branch types in getchaintips come to mind.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 19, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jonatack
Copy link
Member Author

jonatack commented Sep 20, 2020

If a string was returned, changing the string name would be a breaking change, and string names are likely to be bike-shed or changed. OTOH there is no reason why the enum integer values would ever have to change

The addition to getpeerinfo and the logging change

Unless I'm mistaken, the logging change would have the same output.

is for helping humans understand what's going on

From what I've been able to understand, the CLI client-side options are the human-first ones and not constrained by API stability constraints, which is why I added features to -getinfo and also created -netinfo that has been described as getpeerinfo for humans. The RPC API, on the other hand, is more-or-less machine-first and constrained by stability and the desire to avoid causing suffering for software clients downstream, even if it does not for human users of the RPC like us. That is what this PR focuses on. An additional human-friendly convenience field could be an option if people feel the tradeoffs are worth it, but that's orthogonal to this proposal.

changing the string names isn't a breaking change

Renaming a connection type not only entails cascading codebase changes for long-format name ids, it is also a breaking API change if the client no longer recognizes the connection type. This can be avoided by using a standard integer id decoupled from the naming. I will provide a demonstration with code a bit later.

getpeerinfo isn't a performance critical call

This field is called in a loop. API clients may be interested in performance, either because they call this frequently or at high frequency (I do) or from clients/to servers that are constrained in CPU, memory, or internet bandwidth. If it's available with a couple of lines, why dunk on it?

all the numbers are encoded as strings anyway -- it's json

I could be wrong but don't think this is true. https://tools.ietf.org/html/rfc7159

and even if none of that were true, you should be providing benchmarks

Requesting benchmarks for the simplest, standard practice is a bit pedantic. No one asked for benchmarks, for instance, when in #19731 I proposed to send Unix epoch times instead of a human-friendly datetime format.

I don't know where this is coming from, but it's not even literally true: "id - The request id. This can be of any type." and "id .. MUST contain a String, Number, or NULL value if included"

Thanks for confirming that we can use numbers. Now...do you see any long-format, non-numerical string ids in https://www.jsonrpc.org/specification#examples and https://www.jsonrpc.org/specification_v1#a4.CommunicationExamples? Right, neither do I (maybe I need glasses; I have been putting it off for a long time). Sure, you could use ersatz long-format strings, but ideally not as your primary id. "I want to depend on long-format names that people are already proposing to change, please send me that" said no API client to me ever. However, the clients might be ok with it as an optional convenience field that they can request on a per-call basis.

At any rate, thank you for having a look. Mind giving a concept ACK?

@jonatack
Copy link
Member Author

There is precedent for string values for enumerated types too already: the transaction types in listtransactions ("receive", "send", "generate", "immature", ...), and the branch types in getchaintips come to mind.

Sure, but like the kebab-case and snake_case config args, precedents may have varying degrees of desirability and relevance.

I wonder if we shouldn't have more separation between the API for software and the CLI for humans (seems to be a trend already?)... e.g. the human-friendly CLI versions could have human-readable datetime formats instead of Unix epoch time, etc.

@michaelfolkson
Copy link
Contributor

Approach ACK

I can see why some people think this argument is a touch pedantic. But at the very least this isn't a worse approach than #19725 and it is definitely simpler. Plus there appears there could be benefits to this approach downstream.

(PR #19725 could still do the logging refactoring and deprecation. I'm not convinced these are controversial.)

@sipa
Copy link
Member

sipa commented Sep 26, 2020

Concept NACK.

I don't think we should be exposing internal enums, as its mapping between numbers and connection type semantics is arbitary. Exposing it via RPC is cementing it in stone, for no good reason. The set of available connection types will change over time, and that will very likely mean that some types that currently exist won't remain.

As I've pointed out #19725 (comment), if you want an actual stable mapping with the advantages of machine-readability of numbers over the strings assigned by #19725, I think you'd need to maintain a separate set of numbers for the RPC interface, in which old numbers may retire if connection types are removed/split or even just substantially change meaning. I don't think that's worth the effort.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@jonatack jonatack closed this Sep 28, 2020
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants