diff --git a/proposals/0000-wire-protocol.md b/proposals/0000-wire-protocol.md new file mode 100644 index 0000000..b7a7d01 --- /dev/null +++ b/proposals/0000-wire-protocol.md @@ -0,0 +1,481 @@ + +Title: **DEP-0000: Hypercore Wire Protocol** + +Short Name: `0000-wire-protocol` + +Type: Standard + +Status: Undefined (as of 2018-02-04) + +Github PR: [Draft](https://github.com/datprotocol/DEPs/pull/8) + +Authors: [Paul Frazee](https://github.com/pfrazee), [Bryan Newbold](https://github.com/bnewbold) + + +# Summary +[summary]: #summary + +This DEP describes the Hypercore wire protocol: a transport-agnostic message stream spoken between nodes in a swarm of hypercore network peers (including Dat clients). The wire protocol includes mechanisms for framing, stream encryption, and feed key authentication. + + +# Motivation +[motivation]: #motivation + +The protocol described here is already in use as of 2017 (by Hypercore, Dat, and Beaker Browser users), and was partially described in an earlier [whitepaper][whitepaper]. This document fills in some additional details. + +[whitepaper]: https://github.com/datprotocol/whitepaper + + +# Stream Connections +[stream-details]: #stream-details + +The Dat wire protocol depends on the use of a binary transport protocol which provides the following semantics: + +- reliable delivery (no dropped or partial messages) +- in-order delivery of messages + +Two notable transport protocols that satisfy these requirements are TCP and µTP (the "Micro Transport Protocol"). + +Peers wishing to connect need to discover each other using some mechanism or another (see forthcoming DEPs on some options; this process is modular and swappable), and both need to have the public key for the primary hypercore they wish to exchange. + +Messages are framed by the Dat protocol itself (see Messages section for details). + + +## Channels +[channels]: #channels + +Multiple hypercore feeds can be synchronized over the same protocol connection. Messages pertaining to the separate channels are tagged with an id for disambiguation. + +Note that at least one feed is necessary for each connection (for handshaking to succeed), and that the first feed is the one used for discovery and as an encryption key. + +To initiate a new channel (after the primary is established), a peer can send a Feed message, followed by an Info message. Unlike the first message sent on an overall connection, later Feed messages are encrypted. Either party may initiate a new channel with a Feed message at any time. + + +## Handshake Procedure +[handshake]: #handshake + +A handshake procedure needs to occur for each feed on a channel; the first part of the first handshake happens in cleartext and both validates discovery keys and establishes encryption parameters used for the rest of the connection. The first (primary) channel has `id=0`. + +The first (cleartext) message is a Feed message, and includes two fields: a nonce and a discovery key. + +The **nonce** is generated by each peer as a random 32-byte sequence. + +The **discovery key** is generated from the public encryption key for a hypercore feed (in this case the first, or "primary" feed) by using the public key to perform a keyed hash of the 9-byte ASCII string "HYPERCORE" (with no trailing NULL byte) using the BLAKE2b keyed hashing algorithm (provided by most BLAKE2b hash implementations). The discovery key is 32 bytes long. + +Example of generating a discovery key in pseudo-code: + +``` +discoveryKey = BLAKE2b(message = 'HYPERCORE', key = publicKey, outputLengthBytes = 32) +``` + +The discovery key is used in cleartext instead of the public key to avoid leaking the public key to the network; read access to hypercore feeds (including Dat archives) is controlled by limiting access to public keys. + +When the connection is first opened, the connecting peer sends their Feed message. The receiving peer checks that the discovery key was expected (eg, that they know of a public key matching that discovery key and are willing to synchronize the feed associated with that key). If so, they reply with their own Feed message. If not, they drop the connection. + +Once Feed messages are exchanged, both peers have all information they need to encrypt all further content on the channel, and do so (see below for details). The second part of the handshake is to exchange Handshake messages, which set some parameters for the channel. Handshakes also include the self-identified peer id, which can be used to detect accidental self-connections or redundant connections to the same peer (eg, over different transports). The peer id is typically random and is not authenticated. + + +## Encryption Scheme +[encryption]: #encryption + +After the first Feed messages are exchanged (one message in each direction, in cleartext), all further bytes exchanged over the channel are encrypted. + +Framing metadata (aka, message length and type) is encrypted, but a third party could likely infer message lengths (and thus potentially message types) by observing packet sizes and timing; no padding is applied at the protocol layer. + +The encryption scheme used is libsodium's stream primitive, specifically the XSalsa20 cipher. The cipher is fed a shared key (the primary hypercore feed public key), a nonce (selected by the sender and exchanged during handshake), and a block offset (representing all encrypted bytes sent on the connection in this direction so far). + +The specific libsodium function used is `crypto_stream_xsalsa20_xor_ic()`. Some interfacing code is necessary to process messages that don't align with the cipher's 64-byte chunk size; unused bytes in any particular chunk can be ignored (or, as an optimization, cached for use with the next message). For example, if 1000 encrypted bytes had been sent on a connection already, and then a new 50 byte message needed to be encrypted and sent, then one would offset the message by `1000 % 64 = 40` bytes and XOR the first 24 bytes against block 15, then XOR the remaining 26 bytes against block 16. The bytes would be shifted back and recombined before sending, so only 50 bytes would go down the connection; the same process would be followed by the receiver. + + +## Want/Have Procedure +[want-have]: #want-have + +The wire protocol is designed to be efficient when syncing extremely large Hypercore feeds (ie millions of blocks in length). Therefore a procedure exists for each peer to indicate which subset of the feed they would like to synchronize, and for the remote to then announce which blocks within those subsets they possess. This is the Want/Have Procedure. + +By default, a peer wants no blocks. It can add or remove wanted block using the Want and Unwant messages. + +When new wanted blocks are made available, the peer should react by sending a Have message. Likewise, when wanted blocks are no longer available, the peer should react by sending an Unhave message. Finally, any time a Want message is received, the peer should react by sending a Have message. + + +## Request Procedure +[requests]: #requests + +After the Want/Have Procedure, a peer will know what Hypercore feed blocks are available on the remote, and can then send Request messages for the individual blocks. Each Request specifies which block it needs, and also specifies some information about Merkle tree proof nodes which should be included. In reaction, the remote sends a Data message with the requested content. + +At present, Request messages can only specify one block at a time. This is to encourage an equal distribution of requests between multiple connected peers. However, multiple requests can be sent in parallel. + + +## Extension Procedure +[extensions]: #extensions + +To allow for experimentation within userland, the wire protocol supports an extension process. All extensions are identified by strings, and sent in an array in the Handshake message. Both peers must declare an extension in the handshake to consider it 'supported'. + +Each extension is capable of sending custom payloads through the Extension message type. + + +# Message Details +[message-details]: #message-details + +The connection between peers is an endless stream of bytes, so each message must be "framed" so the recipient knows when it starts and ends. The wire framing format is `(
)`. `len` is a varint with the number of bytes in the following message (the sum of the `header` and `message`). `header` is a varint, of form `channel << 4 | <4-bit-type>`. Note that in most cases the `header` varint will be a single byte, but clients should treat it as a varint to accommodate large channel counts. + +Messages are encoded (serialized) using Google's [protobuf][protobuf] encoding. + +[protobuf]: https://developers.google.com/protocol-buffers/ + + +
`type` code Name +
N/A [Keep-Alive][msg-keepalive] +
0 [Feed][msg-feed] +
1 [Handshake][msg-handshake] +
2 [Info][msg-info] +
3 [Have][msg-have] +
4 [Unhave][msg-unhave] +
5 [Want][msg-want] +
6 [Unwant][msg-unwant] +
7 [Request][msg-request] +
8 [Cancel][msg-cancel] +
9 [Data][msg-data] +
15 [Extension][msg-extension] +
+ +#### Keep-Alive +[msg-keepalive]: #msg-keepalive + +A message of body length 0 (giving a total message size of 1 byte for the `len` varint) is a keep-alive. Peers must always handle keep-alive messages correctly (aka, ignore them), regardless of transport. + +Depending on transport and application needs, peers may *optionally* send keep-alive messages to help detect and prevent connection loss. Implementors are free to chose their own period or strategy for sending keep-alives. A reasonable default period is 300 seconds (5 minutes). + + +#### Feed +[msg-feed]: #msg-feed + +`type=0` Should be the first message sent on a channel. Establishes the content which will be exchanged on the channel. See the [Channels][channels] section for details. + +``` +message Feed { + required bytes discoveryKey = 1; + optional bytes nonce = 2; +} +``` + +#### Handshake +[msg-handshake]: #msg-handshake + +`type=1` Overall connection handshake. Should be sent just after the Feed message on the first channel only. + +Some notes on the fields: + + - **id** Typically a 32-byte identifier which is allocated randomly by a peer at the start of its swarming session. Not authenticated. Used to help detect multiple connections to a single peer. + - **live** If both peers set to true, the connection will be kept open indefinitely. + - **userData** An open field for sending data which you can retrieve on handshake. No value is prescribed by the protocol. + - **extentions** A list of strings identifying additional message-types which are supported via the Extension message. + - **ack** A peer (the "sender") can set this flag to request that every time another peer (a "receiver") gets a `Data` message from the sender, that they reply with an explicit `Have` message once they have successfully persisted the data in storage. This lets the "sender" track how many persisted copies of a chunk there are. + +``` +message Handshake { + optional bytes id = 1; + optional bool live = 2; + optional bytes userData = 3; + repeated string extensions = 4; + optional bool ack = 5; +} +``` + +#### Info +[msg-info]: #msg-info + +`type=2` Indicates state changes in a channel. The initial state for uploading & downloading is `true`. If both ends are not downloading and not live, it is safe to consider the channel ended. + +``` +message Info { + optional bool uploading = 1; + optional bool downloading = 2; +} +``` + +#### Have +[msg-have]: #msg-have + +`type=3` Announces the availability of Hypercore feed blocks for download. If no `bitfield` is present, the `start` and `length` represent a continuous set of blocks. If `bitfield` is present, it will convey the availability of individual blocks, offset from the `start`. For information about the encoding of `bitfield`, see "Run-Length Encoded Bitfield" below. + +The Have message is sent in the following contexts: + + - In response to a Want message. + - To announce new blocks which have been added within a remote-wanted range. + +``` +message Have { + required uint64 start = 1; + optional uint64 length = 2 [default = 1]; // defaults to 1 + optional bytes bitfield = 3; +} +``` + +#### Unhave +[msg-unhave]: #msg-unhave + +`type=4` Announces the loss of availability of Hypercore feed blocks for download. The `start` and `length` represent a continuous set of blocks. + +This message is sent in the following contexts: + + - To inform the peer that data which was sent was rejected, for instance because it was not requested and pushes are not being allowed. + - To announce blocks have been removed from local storage within a remote-wanted range. + +``` +message Unhave { + required uint64 start = 1; + optional uint64 length = 2 [default = 1]; // defaults to 1 +} +``` + +#### Want +[msg-want]: #msg-want + +`type=5` Announces a range of Hypercore feed blocks which the peer would like to receive Have/Unhave messages about. Remote should respond with a Have message which describes which of the wanted blocks are available, and should send additional Have/Unhave messages if availability within the wanted range changes. + +``` +message Want { + required uint64 start = 1; + optional uint64 length = 2; // defaults to Infinity or feed.length (if not live) +} +``` + +#### Unwant +[msg-unwant]: #msg-unwant + +`type=6` Announces a range of Hypercore feed blocks which the peer would no longer like to receive Have/Unhave messages about. + +``` +message Unwant { + required uint64 start = 1; + optional uint64 length = 2; // defaults to Infinity or feed.length (if not live) +} +``` + +#### Request +[msg-request]: #msg-request + +`type=7` Request data from the remote. The remote should be react by sending a Data message. + +The fields are as follows: + + - **index** The Hypercore feed block being requested. + - **bytes** An alternative to `index`, a byte offset into the Hypercore feed. Instructs the remote to send the block at the given `bytes` offset. + - **hash** If true, only send the nodes, not the block data. + - **nodes** Communicates which parent and uncle nodes should be included in the Data response. See "Block Tree Digest" (below) for an explanation of the encoding. + +``` +message Request { + required uint64 index = 1; + optional uint64 bytes = 2; + optional bool hash = 3; + optional uint64 nodes = 4; +} +``` + +#### Cancel +[msg-cancel]: #msg-cancel + +`type=8` Cancel a Request message sent earlier. The remote should react by aborting any active or queued Request message which matches the `index`, `bytes`, and `hash` parameters. + +``` +message Cancel { + required uint64 index = 1; + optional uint64 bytes = 2; + optional bool hash = 3; +} +``` + +#### Data +[msg-data]: #msg-data + +`type=9` Send a Hypercore feed block. May contain the data of the block, and may contain the hashes of the ancestor nodes in the merkle tree. Should only be sent in reaction to a Request message. + +When a Data message is received, its node hashes and signature should be verified against any local tree information. If the nodes can be verified, they and the block data may be stored locally. + +If a Data message is received for a block which was not requested, the peer can react by ignoring the data and sending an Unhave message in response. This will inform the remote that the data was rejected, and is not stored. + +``` +message Data { + message Node { + required uint64 index = 1; + required bytes hash = 2; + required uint64 size = 3; + } + required uint64 index = 1; + optional bytes value = 2; + repeated Node nodes = 3; + optional bytes signature = 4; +} +``` + +#### Extension +[msg-extension]: #msg-extension + +`type=15` Send a custom message. + +The `user-type` is an index into the `extensions` array sent in the Handshake. For instance, if two extensions were declared `['foo', 'bar']` then the `user-type` of `'foo'` would be `0` and of `'bar'` would be `1`. + +The `payload` may be any message content, as needed by the extension. + +``` + +``` + + +## Run-Length Encoded Bitfield +[run-length-encoded-bitfield]: #run-length-encoded-bitfield + +The Run-Length Encoded (RLE) bitfield is a series of compressed and uncompressed bit sequences.All sequences start with a header that is a varint. If the last bit is set in the varint (it is an odd number) then a header represents a compressed bit sequence. If the last bit is not set then a header represents an non compressed sequence. + +``` +compressed-sequence = varint(byte-length-of-sequence << 2 | bit << 1 | 1) +uncompressed-sequence = varint(byte-length-of-bitfield << 1 | 0) + bitfield +``` + + +## Block Tree Digest +[block-tree-digest]: #block-tree-digest + +As described in DEP-0002 (Hypercore), a peer should be able to verify both the integrity of received data (aka, was there corruption somewhere along the way, detected via hash) and the authenticity (aka, is this the data from the original writer, detected via signature). Hypercore transmits the hash for every data block, but only signatures for the root hashes of Merkle trees, not individual block hashes, which means a peer may need additional hashes (for data blocks they do not have a copy of) if they want to verify the signatures of individual blocks. + +Redundantly transmitting all such hashes on every request would be inefficient if the receiver already had some hashes, so requesting peers can specify which hashes they need in the `nodes` field of the `Request` message. Instead of sending an array of node indexes, the `nodes` field is a compact bitfield (serialized as a `uint64`), indicating for each "uncle" and "parent" whether the hash should be transmitted. + +Consider the following tree: + +``` +0 + 1 +2 + 3 +4 + 5 +6 +``` + +If the receiving peers wants to fetch, and verify, block 6, it needs to communicate whether it already has the uncle hashes (4, 1) and the parent hashes (3). This information can be compressed into a small bit vector with the following scheme: + + - the least-significant bit indicates whether the most-significant bit is a "parent" (if '1') or an "uncle" (if '0') + - all other bits, in order from least- to most-significant, indicate whether the corresponding "uncle" hash *does* need to be transmitted (if bit '0') or *does not* (if bit '1') + +An an example, suppose we want to fetch block 6 from a remote peer, and we already have the sparse node metadata (hashes) for blocks 3 and 4, but not 1. In other words: + + - 4, an uncle, we already have the hash + - 1, next uncle, we don't have hash + - 3, a parent (and root hash), we do have hash + - root signature (of hash 3) will be sent with request + +Our `Request` should have the following `nodes` bitfield (`0b1011` encoded in the least-significant bits of a `uint64`): + +``` +1011 +│││└── indicates that the most-significant bit is a parent, not an uncle +││└─── do not send hash for the first uncle, 4 +│└──── do send hash for the next uncle, 1 +└───── do not send hash for next parent, 3 +``` + +Using the `index` field of the `Request` message, the receiving peer can calculate the number of packed bits and extract the bitfield. The "most-significant bit" referenced above is of just the fixed-size bitfield, not the `uint64` as a whole. + +From this, the remote peer will know to only send one hash (for block 1) for us to verify block 6. Note that we (the receiver) can calculate the hash for block 6 ourselves when we receive it. + +As a special case, the bit vector `1` (only contains a single one) means that the sender should not send any hashes at all. + +These digests are very compact in size. Only `(log2(number-of-blocks) + 2) / 8` bytes are needed in the worst case. For example if you are sharing one trillion blocks of data the digest would be `(log2(1000000000000) + 2) / 8 ~= 6` bytes long (which fits in a single `uint64`). This scheme works for Hypercores with up to `2^62 = 4,611,686,018,427,387,904` blocks. + +# Examples + +## Simple Download + +Alice has an e-book identified by a public-key (PK) which Bob would like to download. The e-book also has a discovery-key (DK). Bob connects to Alice, and the following exchange occurs: + +``` + BOB: sends Feed (unencrypted) {discoveryKey: DK, nonce: BobNonce} + BOB: sends Handshake {id: BobID} +ALICE: sends Feed (unencrypted) {discoveryKey: DK, nonce: AliceNonce} +ALICE: sends Handshake {id: AliceID} + BOB: waits for Feed + BOB: waits for Handshake +ALICE: waits for Feed +ALICE: waits for Handshake + BOB: sends Want {start: 0} +ALICE: receives Want {start: 0} +ALICE: sends Have {start: 0, bitfield: ...} + BOB: receives Have {start: 0, bitfield: ...} + BOB: sends Request (for block 0) +ALICE: receives Request (for block 0) +ALICE: sends Data (for block 0) + BOB: receives Data (for block 0) + BOB: sends Request (for block 1) +ALICE: receives Request (for block 1) +ALICE: sends Data (for block 1) + ... (repeat for all other blocks) ... + BOB: sends Info {downloading: false} +ALICE: closes connection +``` + +# Privacy and Security Considerations + +All security and privacy guarantees of this protocol implicitly depend on the soundness of the underlying cryptographic primitives, algorithms, and implementations, which include BLAKE2b and XSalsa20. + +A connecting peer, or any observer able to decrypt traffic, may be able to infer the following from protocol traffic: + +- what subset of content the local peer already has access to (via Have message) +- what subset of content the local peer is interested in (via Want and Request messages) + +A "persistent" (all-seeing) observer can infer the following with reasonable confidence, using only protocol traffic: + +- which peers (identified by IP address, timing, topology, or other network metadata) have connected which other peers +- the relative flow of content (which peer provided data to the other, and how much) +- which primary hypercore feeds which peers have knowledge of (identified by discovery key, not public key) +- which peers have the public key for a given discovery key (based on whether connections succeed) + +Such an observer can not determine the specific content of hypercore feeds, or (with confidence) which peer (or peers) have write access to a feed. + +The wire protocol does not pad message sizes. This means that a persistent observer could potentially identify traffic content by inferring message sizes. + +The hypercore protocol does not intentionally identify peers across connections, but it has been shown in other network protocols (like HTTP) that even the smallest amount of identifying metadata can be used, statistically, to track and surveil users. These techniques are sometimes called "device fingerprints", "browser fingerprints", or "permacookies". Hypercore protocol users should not consider themselves immune to such tracking without specific additional effort to identify and mitigate such fingerprints. + +Any network observer with the public key can fully decrypt the network traffic of *any* and *all* peers establishing a connection using that key. This includes channel content other than the first (discovery) channel. Peers should not consider extension messages, "Have"/"Want" metadata, or any other messages or metadata private from other peers (or observers) with the public key. + +The wire protocol encryption provides message *secrecy* (from parties who do not have the public key), but does not guarantee any form of *authenticity*. In the case of Dat and hypercore, the application layer itself verifies authenticity of transferred content using hashes and signatures. However, implementors should note that network agents who can manipulate traffic can modify data in flight without detection, regardless of whether they have the feed public key. However, peers are already not trusted parties, and thus implementors must already take care to treat protocol messages as potentially hostile input. + +The issues of observer decryption and message authenticity may be addressed in a future revision of the wire protocol. + + +# References + +"Dat - Distributed Dataset Synchronization And Versioning" by Maxwell Ogden, Karissa McKelvey, and Mathias Buus Madsen. Whitepaper, May 2017 ([pdf](https://datproject.org/paper)). + +"The BitTorrent Protocol Specification (BEP #3)", by Bram Cohen. January 2008 ([html](http://www.bittorrent.org/beps/bep_0003.html)) + +"uTorrent transport protocol (BEP #29)", by Arvid Norberg. June 2009 ([html](http://www.bittorrent.org/beps/bep_0029.html)) + +"Merkle hash torrent extension (BEP #30)", by Arno Bakker. May 2009. ([html](http://www.bittorrent.org/beps/bep_0030.html)) + +"Updating Torrents Via DHT Mutable Items (BEP #46)", by Luca Matteis. July 2016. ([html](http://www.bittorrent.org/beps/bep_0046.html)) + +"Rarest First and Choke Algorithms Are Enough", by Arnaud Legout, G. Urvoy-Keller, and P. Michiardi. October 2006. ([pdf](http://conferences.sigcomm.org/imc/2006/papers/p20-legout.pdf)) + +"Extending the Salsa20 nonce", by Daniel J. Bernstein. February 22011. ([pdf](https://cr.yp.to/snuffle/xsalsa-20110204.pdf)) + +libsodium is a fork of NaCl (the "Networking and Cryptography library", developed by Daniel J. Bernstein). More information available from the [NaCl website](https://nacl.cr.yp.to/) and [libsodium website](https://libsodium.org). Specific information about the BLAKE2b hash function available from the [BLAKE2 website](https://blake2.net/). + +Google Protocol Buffers Documentation ([website](https://developers.google.com/protocol-buffers/)) + + +# Unresolved questions +[unresolved]: #unresolved-questions + +- Encryption might not make sense in some contexts (eg, IPC, or if the transport layer is already providing encryption). Should this DEP recognize this explicitly? Does not need to be addressed before Draft status. +- There is a potential race condition with channel index numbers. If each peer sends a new Feed message on a new channel at the same time (aka, before the remote message is received), what should peers do? Probably ignore the channel and try again. Possibly channel indices should go even/odd depending on the peer proposing to prevent conflicts. Does not need to be resolved before Draft status. + + +# Changelog +[changelog]: #changelog + +The Dat wire protocol was initially described in the 2016 white paper referenced above. + +- 2018-03-04: Early "WIP" draft circulated on github +- 2019-01-19: First complete draft submitted for review +