Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] - Separately negotiated version number for NTN and NTC block-specific payloads #4770

Closed
nfrisby opened this issue Jan 5, 2024 · 19 comments
Assignees
Labels
enhancement New feature or request

Comments

@nfrisby
Copy link
Contributor

nfrisby commented Jan 5, 2024

Since everyone seems to agree with the basic idea, this Issue now represents the task of moving this design description and motivation into a documentation file in the repo. It'll explain that "this is how the versioning should be done; we merely haven't yet had cause to actually introduce the additional parameterization". I suppose I should at that same time create another "tech debt/future work" task that represents actually implementing that design.


(Edit: Here's a much more direct argument---an executive summary. The mini protocol codecs are parameterized over the codecs for the block-specific payloads (inevitable, since the Diffusion Layer is parameterized over the block type). Thus the block-specific codecs may vary independently of the mini protocol's own (parameterized) codecs. For the most part, our versions are merely counting changes to the codecs; we strive to never change semantics---we only get new semantics by introducing new "syntax", aka summands in the codecs. Therefore we (eventually) need two separate versions: one for the mini protocols and one for the block-specific payloads.)

(Edit 2: although those two separate versions could be glommed together in a single one, as we do today. However, that mega version should instead be owned by the downstream (ie block-specific) code, as opposed to the upstream code (ie the Ouroboros repos).)

When fulfilling the role of a server, the node has two kinds of connections.

  • Node-to-Node (NTN): Another node (in the client role) connects to the node.
  • Node-to-Client (NTC): A trusted process (in the client role) connects to the node using privileged access (eg a wallet running on the same machine).

Both of these connections begin by negotiating a version (via the Handshake mini protocol). This negotiated version suffices to let an server/client that is running newer code than its client/server avoid any recently-added mini protocol messages that the peer wouldn't understand. (We have so far avoided modifying the semantics of any messages---we have only ever added entirely new messages.)

Today, the NodeToNodeVersion and NodeToClientVersion data types in the ouroboros-network-api package represent the negotiated version.

Consider the entirely plausible future where someone---call them Janeway---wants to depend on the ouroboros-network and ouroboros-consensus-diffusion libraries in order to define a node executable that uses a custom block type not coupled to the ByronBlock/ShelleyBlock/CardanoBlock types. By assumption, Janeway is not modifying the ouroboros-* libraries, so the NTN and NTC versions negotiated by the Diffusion Layer govern which mini protocol messages are sent, exactly as intended. However, in this scenario Janeway is still responsible for three key pieces of code:

  • defining the custom block type (including the ledger, etc)
  • defining the custom NTC client process (eg their wallet executable)
  • defining the custom NTC queries (eg how the wallet inspects the node's state)

When defining the custom block type, Janeway does not actually need to consider the negotiated NTN version. A coherent Ouroboros block type will have its own protocol version in addition to the HardForkCombinator envelope. Thus the block-specific payloads of the NTN mini protocol messages are effectively self-versioning (ie headers, blocks, and transactions). The negotiated NTN version necessarily determines which mini protocol messages can/will be sent, but it is the protocol version/HFC era---independent of the NTN version---that determines what might be the payload of those mini protocol messages (eg MsgRollForward). 1 (Edit: actually the protocol version is not necessarily sufficient, see 2.)

When defining the custom NTC client process (eg wallet), Janeway would need to understand how the negotiated NTC version determines which mini protocol messages the client process can safely send to and/or expect to receive from a well-behaved node. So---unlike the NodeToNodeVersion case---Janeway would have to understand the semantics of the NodeToClientVersion data type in the ouroboros-network-api package.

Lastly, that client process most likely uses the LocalStateQuery mini protocol. The negotiated NTC version necessarily determines which messages of the mini protocol can be sent (eg whether MsgAcquire ImmutableTip is supported). But it cannot govern the payload of those messages, ie which queries can be sent. Recall that that version data type is defined by us, the Cardano Core Tech team, and we don't know anything about Janeway's block type or its queries.

Suppose one day Janeway needs to implement a fresh custom query. Which value of NodeToClientVersion should it be gated behind? What if Janeway has already "allocated" every existing constructor of that data type to gate some of the custom block's queries? Should they open a PR against the ouroboros-network repository adding a new constructor, one which does not (yet) affect either the Diffusion Layer nor CardanoBlock queries? That seems inappropriate.

Therefore, I propose:

To decouple the Ouroboros libraries from Cardano would at least require the Diffusion Layer to also negotiate a second version number, independent of NodeToNodeVersion and NodeToClientVersion. It must not be restricted by the core Ouroboros code (eg just a natural number). Its semantics is determined solely by the block type.

Notes:

  • As far as we know, there is no such Janeway yet. So we might as well leave the code alone for now. But it'd be good to already agree on what the eventual design should be. (They only immediate benefit of doing this change now would be that the Networking and Consensus Teams would no longer risk introducing conflicts when they need to alter the NTN and NTC version data types.)
  • At the moment, I suspect this independent version is only necessary for the Node-to-Client connections, since we anticipate that any block type that's compatible with the NTN mini protocols would already be self-versioning (via the on-chain protocol version). (Edit: actually the protocol version is not necessarily sufficient, see 2.)
  • (If I understood correctly,) @coot proposed an alternative where the Handshake mini protocol negotiates a custom version that Janeway defines, and then Janeway is require to provide a total mapping from that negotiated version to NodeToVersion and NodeToClient when instantiating the Diffusion/Consensus layers. That seems fine to me, and matches the Diffusion Layer's long-standing design goal that "the version negotiated via Handshake should determine everything the node needs to know about the peer". (My proposal as written (with a second negotiation) instead slightly weakens that goal to "the version negotiated via Handshake should determine everything the node needs to know about the peer, except the block-specific parts".) I don't have a strong preference about this, seems nice for there to be exactly one negotiated number (even though it'd need to be projected down into the "separate" NodeToNodeVersion/NodeToClientVersion 🤷).
  • As far as I know, LocalStateQuery is currently the only mini protocol whose payloads are not self-versioning. So this second version could be confined to that mini protocol. However, it seems reasonable for future NTC mini protocols to also have payloads that are not self-versioning.

Footnotes

  1. In contrast to this statement, our code currently does relate NTN version to prefixes of the Cardano eras! I'm doubting whether that is useful. Suppose two nodes negotiate a version that doesn't allow for the latest Cardano era the server can handle. What happens when the server's chain is in that latest era? (Whether or not two specific nodes negotiate an NTN version that disallows a suffix of one of server's eras does not determine whether the best chain can be in that era!) It cannot stay connected with that peer, since it can no longer communicate with them about its chain. If the relationship between NTN versions and eras is known, then the server can be the one to disconnect, and log a message about why. That's the only benefit I can see to relating NTN versions to eras. The alternative in which the NTN version and eras are unrelated is merely that the node would simply send the next message, and then (presumably) the peer would disconnect, since by assumption it negotiated a version indicating it cannot handle that era---the server's log would show that the peer disconnected, but without the explanation of why. Is that so bad? Does it justify coupling two ideally independent things? Actually... wouldn't it be more important for the client to be the one that sees an explanation of why the disconnection happened? After all, they're the one that needs to update their code! (This analysis would be the same even for intra-era hardforks, since the protocol version is also bound.)

  2. Ah. We actually do have two separate ByronNodeToNodeVersions. This is because we extended the header codecs in a backwards-compatible way. This was before the HFC existed. Indeed, even blocks and transactions have a degree of freedom between the over-the-wire codec and the storage codec (although one of them has to be chosen as canonical for the hashing). 2

@nfrisby nfrisby added the enhancement New feature or request label Jan 5, 2024
@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 5, 2024

@amesgen @coot @njd42 @dcoutts @abailly @KtorZ What do you think? Am I overlooking some part of the handshake that already supports this (idealized) use case?

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 5, 2024

Perhaps this should happen as an initialization phase of LocalStateQuery itself?

These are the instances of Protocol, which I'm assuming is a sound way to ensure it's an exhaustive list of mini protocols (ie things communicated to/from our node). I've also grouped/labelled them based on my understanding. Looking at this set, it does seem like only LocalStateQuery needs its own version (unless my above assumption about the protocol version handling all block-specific aspects of NTN messages is wrong).

Handshake - fundamental
KeepAlive - fundamental and NTN (edit: updated after Marcin's confirmation)
PeerSharing - fundamental and NTN

BlockFetch - NTN
TxSubmission2 - NTN

ChainSync - NTN (headers) and NTC (blocks)

LocalStateQuery - NTC
LocalTxMonitor - NTC
LocalTxSubmission - NTC

@abailly-iohk
Copy link
Contributor

I am fully supportive of the use case you envision, eg. making alternative implementations of Cardano network participants first class citizen if I understood correctly. I am not sure why you would need a different number, though. What I would do here is to standardise the protocol separately from the implementation, use whatever versioning scheme makes sense and from that point on allow a separate evolution of the protocol specification and the implementations.
This could apply equally well to N2N and N2C protocols, I think, as some nodes could decide to support exchanging some data which other nodes do not support yet (eg. Mithril signatures, Peras votes) and the network could evolve in a more incremental way.
Perhaps this vision is very naive :)

@coot
Copy link
Contributor

coot commented Jan 8, 2024

I also agree that it's a really good idea to support alternative use cases.

If this is really an alternative blockchain they ought to use alternative NodeTo{Node,Client}Version versions and we could generalise our protocol to be parametrised over the version types. This would allow one to govern their case by themselves, but would require them to modify cardano-{node,cli}, which they'd need to do anyway.

KeepAlive is fundamental for at least the first of the two reasons:

  • keep the network running,
  • do measurements which are used by BlockFetch.

@nfrisby nfrisby self-assigned this Jan 9, 2024
@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 9, 2024

@abailly said

making alternative implementations of Cardano network participants first class citizen if I understood correctly

but that's actually not what I meant. I meant using the Ouroboros code to run a different non-Cardano blockchain.

I suspect that misunderstanding supersedes the rest of your comment, unfortunately?

@abailly-iohk
Copy link
Contributor

@abailly said
but that's actually not what I meant. I meant using the Ouroboros code to run a different non-Cardano blockchain.

😢

I suspect that misunderstanding supersedes the rest of your comment, unfortunately?

😂

@abailly-iohk
Copy link
Contributor

Please ignore my comment then @nfrisby

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 9, 2024

@coot said

they ought to use alternative NodeTo{Node,Client}Version versions and we could generalise our protocol to be parametrised over the version types

I'll elaborate what I think you're proposing. Please let me know if I'm wrong. (I'm not sure I agree with your proposal :), but I want to first confirm I get what you're saying.)

The hypothetical devs (using our Ouroboros code to run their own non-Cardano chain) would have to provide types MyVersionNT{N,C}, a function MyVersionNTN -> DiffusionLayer.NodeToNodeVersion, and a function MyVersionNTC -> DiffusionLayer.NodeToNodeClient when invoking the Diffusion/Consensus layers. And we would have adjusted the Diffusion/Consensus code to additionally pipe the MyVersionNTC version into the codecs and handlers of their custom block's queries. The Handshake mini protocol would negotiate MyVersionNTN and MyVersionNTC instead of negotiating NodeTo{Node,Client}Version.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 10, 2024

I had a nice chat with @njd42 and @karknu. I think I can better present my proposal now. In particular, they had two pieces of clarifying feedback:

  • One handshake to rule them all. The Diffusion Layer has had the design goal that the version negotiated via Handshake should determine everything the node needs to know about the peer.

  • Keep it simple. We'd rather not introduce these abstractions until we have a customer/user/community who actually benefits from them.

My current proposal would only weaken that design goal to "everything the Diffusion Layer needs to know about the peer". And the scope of my proposal is: "this is how we should think about versioning NTC queries", but it's totally reasonable continue conflating their version with the Handshake version until we actually have a worthwhile reason to introduce the proposed degree of freedom.

In light of that very helpful feedback, I'll update my proposal's text. In particular, I think that design goal already has a bit of a misapplication on the Node-to-Node side---ie the interpretation of my NTC proposal along the NTN-NTC symmetry currently suggests a small change to the negotiated NTN version.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 10, 2024

I'm now rewriting the Issue description, so I'm archiving the original one (to which all the above replies were written) in this comment.


Imagine some innovative small company wants to use only the abstract parts of ouroboros-network, ouroboros-consensus, and cardano-ledger. They want to write their own blocks, with their own (Ouroboros-compatible) ledger rules and queries, entirely separate from Byron, Shelley, Allegra, etc.

Moreover, suppose they're not going to maintain forks of our code. They'll use whatever the latest version of Ouroboros is via our packages, and build everything else around those dependencies.

Today's NodeToNodeVersion and NodeToClientVersion data types in ouroboros-network prevent this use case from being sustainable, because these hypothetical devs (who by assumption are using the Diffusion Layer) would be unable to introduce new negotiable handshake versions. (They'd have to wait until Cardano does in order to ever release a feature that requires a bump to the negotiated version🤦.)

Proposal:

We need a second, separately negotiated version number that is not at all restricted by the core Ouroboros code (ie just a natural number). Its semantics is determined solely by the block type.

  • The two existing data types are irreplaceable for versioning the block-agnostic core Ouroboros behaviors: the mini protocols themselves, eg their codecs, eg block diffusion pipelining (in ChainSync, NTN), eg whether LocalTxMonitor is enabled (NTC), etc.
  • But they should not affect any block-specific behaviors.
  • This new version number would govern at least the evolution of blocks' Node-to-Client queries.
  • I currently suspect the new version number would not be useful for any block-specific aspects of Node-to-Node communication, since I anticipate those are all fully handled by the on-chain data, eg the chain's protocol version.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 10, 2024

I have now updated the Issue description with a much more fleshed out (and longer :/) motivation/proposal. cc @coot @njd42 @karknu

@nfrisby nfrisby changed the title [FEAT] - Separately negotiated version number for NTC queries [FEAT] - Separately negotiated version number for NTC block-specific payloads Jan 10, 2024
@nfrisby nfrisby changed the title [FEAT] - Separately negotiated version number for NTC block-specific payloads [FEAT] - Separately negotiated version number for NTN and NTC block-specific payloads Jan 10, 2024
@nfrisby nfrisby transferred this issue from IntersectMBO/ouroboros-consensus Jan 10, 2024
@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 10, 2024

I eventually realized a much simpler explanatin. I added it to the Issue description, but here it is again, to save you a click from your email notification.


The mini protocol codecs are parameterized over the codecs for the block-specific payloads (inevitable, since the Diffusion Layer is parameterized over the block type). Thus the block-specific codecs may vary independently of the mini protocol's own (parameterized) codecs. For the most part, our versions are merely counting changes to the codecs; we strive to never change semantics---we only get new semantics by introducing new "syntax", aka summands in the codecs. Therefore we (eventually) need two separate versions: one for the mini protocols and one for the block-specific payloads.

@coot
Copy link
Contributor

coot commented Jan 11, 2024

@nfrisby your proposal will be simpler to implement, but will slightly increase the complexity of the version scheme; my proposal is more difficult to implement but keeps the simplicity. It has additional disadvantage that we won't be sure if this is possible until one tries it, because your proposal is simpler it's easier to make it right.

Your proposal also has the advantage that it takes out from me the burden of taking care of the versioning of local-state-query mini-protocol, so it's in my own interest 😁.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 11, 2024

OK. Unless anyone objects, I'll take this Issue as a task to migrate the write-up so far into a document in the ouroboros-network repo's docs.

@KtorZ
Copy link
Contributor

KtorZ commented Jan 12, 2024

From my perspective (an in particular, in the context of Ogmios); I only truly care about a version (or a set of protocol/codec versions) that would give me the following guarantee:

  • What used to work when negotiating version X, keeps working when I negotiate version X again in the future.

This may seem obvious but the cardano-node/ouroboros-consensus/ouroboros-network/cardano-ledger stack has broken this simple hypothesis on several occasions. This mainly for the reasons you underline above @nfrisby: the version currently negotiated in handshake only specifies what messages are available to the protocol and their envelope, but not necessarily their payloads which is (mostly) entirely governed by the cardano-ledger here. Hence, from a client application standpoint, the effective version of the codecs depends on:

  1. The protocol version negotiated during handshake;
  2. The version of the underlying libraries that the node was compiled with.

If the ledger decides to change the binary representation of a certain object that is part of the mini-protocol API, then the protocol will effectively behave differently even if the same version is negotiated. That is not good.

Apart from that, I have very little interest (nor do I see it as plausible 😅) in a future where someone would re-use the ouroboros stack to re-implement a non-Cardano blockchain. Mainly because people already struggle to use that stack to interact with just Cardano and we had to come up with tools like Ogmios to fill that gap. I have therefore no opinion on the matter, provided it doesn't over-complexify an already-quite-complicated client code for applications dealing with Cardano.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 16, 2024

If the ledger decides to change the binary representation of a certain object that is part of the mini-protocol API, then the protocol will effectively behave differently even if the same version is negotiated. That is not good.

Yeah, that should absolutely be causing an increment in the "second" version that this Issue is proposing :/ Thanks @KtorZ

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 16, 2024

Also, IntersectMBO/ouroboros-consensus#167 should help, @KtorZ

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 18, 2024

I wrote the distilled version of this up in one of the consensus developer documents. See this IntersectMBO/ouroboros-consensus@17d8850

Do you agree with what I wrote there, @coot ? If so, I'll close this Issue as done.

github-merge-queue bot pushed a commit to IntersectMBO/ouroboros-consensus that referenced this issue Jan 19, 2024
This PR elaborates some topics in `QueryVersioning.md`, including a
distillation of
IntersectMBO/ouroboros-network#4770.
@coot
Copy link
Contributor

coot commented Jan 22, 2024

I agree.

@coot coot closed this as completed Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: ✅ Done
Development

No branches or pull requests

4 participants