New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP: Ephemeral message extension #28

Closed
wants to merge 9 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@pfrazee
Copy link
Member

pfrazee commented Jun 6, 2018

Another DEP proposal

This DEP defines the non-standard em extension message used in the Dat replication protocol. This message provides a way to send arbitrary application data to a peer through an existing connection.

This is not meant to supersede #27 and will probably be used in combination with it.

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jun 6, 2018

Here's a lab API I'm considering for Beaker to leverage both this and #27 https://gist.github.com/pfrazee/7259f9201d44417c777803984715e1c4

@RangerMauve

This comment has been minimized.

Copy link
Contributor

RangerMauve commented Jun 6, 2018

This is awesome! The proposed Beaker API is really elegant, too.

Only questions:

  • Why 1000 bytes?
  • What happens when there's a parsing error for the payload?
  • Are the peer IDs going to be random per dat site?
@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jun 6, 2018

Why 1000 bytes?

I originally had it at 256 bytes to fit within a single packet (with some clearance) but I decided that's just too much of a pain for users.

I figure we should cap it at some size to discourage DoS attacks (ie "hey parse this 100mb json message!"). I'm open to discussing the exact size, but 1kb seemed reasonable.

What happens when there's a parsing error for the payload?

I figure you just drop the message. There's no acknowledgement of receipt and so an error won't come back either.

Are the peer IDs going to be random per dat site?

They're random per connection, at the moment.

@pfrazee pfrazee referenced this pull request Jun 6, 2018

Closed

Upcoming Meeting Agenda - 6th June 2018 #21

0 of 6 tasks complete
@RangerMauve

This comment has been minimized.

Copy link
Contributor

RangerMauve commented Jun 6, 2018

Also, question: In the future would it be weird if I did the "approve" review action when reviewing stuff even though I'm not part of the WG? I almost did it by reflex but wasn't sure if it would be weird to do so.

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jun 6, 2018

Straight into the dungeon with you

@bnewbold
Copy link
Contributor

bnewbold left a comment

My comments are a little probing/critical, but I wouldn't consider them blocking.

# Motivation
[motivation]: #motivation

While Dat is effective at sharing persistent datasets, applications frequently need to transmit extra information which does not need to persist. This kind of information is known as "ephemeral." Examples include: sending chat messages, proposing changes to a dat, alerting peers to events, broadcasting identity information, and sharing the URLs of related datasets.

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

Instead of adding general purpose capabilities to the Dat protocol to support all these use cases, why not embed Dat withing application-specific protocols for each of these use cases? The protocol stack is pretty modular as-is, so it should be possible to borrow "just" features like discovery, stream encryption, etc. Hypercore is transport agnostic, so it should be possible to embed in each of these.

This is sort of a rhetorical/hypothetical question; what i'm really getting at is, are we trying to turn the hypercore protocol a general purpose distributed networking framework? If so, maybe we should draw up a roadmap of what that would entail.

This comment has been minimized.

@pfrazee

pfrazee Jun 10, 2018

Author Member

Beaker needs a quick solution to exchanging ephemeral messages between users that reside on a dat:// site. Hypercore has an extensions mechanism for sending arbitrary message types, and so this DEP is taking advantage of that. It's a low-effort, high-return feature; just a stepping stone to the broader roadmap.

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

That's good context! I'm for pushing this through as a Draft so you can move fast but still have documentation you can point at for interop.

# Reference Documentation
[reference-documentation]: #reference-documentation

This DEP is implemented using the Dat replication protocol's "extension messages." In order to broadcast support for this DEP, a client should declare the `'em'` extension in the replication handshake.

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

Does em stand for "extension message" or "ephemeral message"? I would expand it to ephemeral-message, or ephemeral if we're trying to save bytes.

This comment has been minimized.

@pfrazee

pfrazee Jun 10, 2018

Author Member

I like ephemeral


The first 6 header bits are reserved. The payload's encoding may be specified in the last two header bits. Possible values are:

- `00` - binary

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

Hrm, this is pretty limited. I think it might be better to just not attempt to do content-type declaration at all rather than do it partially. An HTTP-style Content-Type header feels like the minimum viable here. How will the client know how to "decode the payload according to this encoding-value" at all if it doesn't know what the message type is (only the encoding)?

This comment has been minimized.

@pfrazee

pfrazee Jun 10, 2018

Author Member

I'm just looking for a way to send buffers, strings, or objects. Maf had some observations about using a protocol buffer so that we can extend it more in the future, which I agree with.

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

Having this be a protobuf message with two fields:

optional string contentType = 1;
required bytes payload = 2;

feels more hyper-world-idiomatic to me. For the browser use-case, I think you could also just try decoding the raw bytes as JSON, and throw a warning if it fails to parse. UTF-8 strings can be encoded as a JSON string.

This comment has been minimized.

@pfrazee

pfrazee Jun 10, 2018

Author Member

Makes sense to me. Should we use the full mime-type? "application/json" ?

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

Yup; and applications can support later variants if they want (eg, for GeoJSON, application/geo+json)

This comment has been minimized.

@RangerMauve

RangerMauve Jun 11, 2018

Contributor

I really like the idea of using mime types! It'll make packets a bit bigger, but I think it's worth the flexability it will bring.


No acknowledgment of receipt will be provided (no "ACK").

After publishing this DEP, the "Beaker Browser" will implement a Web API for exposing the `'em'` protocol to applications. It will restrict access so that the application code of a `dat://` site will only be able to send ephemeral messages on connections related to its own content.

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

I'd leave forward-looking statements ("this client will support XYZ") out of the DEP, even at draft stage. The note about expected security policy might be worth keeping though.

This comment has been minimized.

@pfrazee

pfrazee Jun 10, 2018

Author Member

What's a better way to phrase it, do you think? Maybe as a recommendation to browser clients which implement the DEP?

This comment has been minimized.

@bnewbold

bnewbold Jun 10, 2018

Contributor

I would move it to "Motivation" and rephrase:

"A specific use case for this extension is to enable a new Web API which will expose peer message-passing channels to in-browser applications. Such an API would restrict access so that the application code of a dat:// site will only be able to send ephemeral messages on connections related to its own content."

This comment has been minimized.

@pfrazee

pfrazee Jun 10, 2018

Author Member

Yeah that's good

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jun 10, 2018

Changes made. I removed any max payload length. Do we want to readd that?

@bnewbold

This comment has been minimized.

Copy link
Contributor

bnewbold commented Jun 10, 2018

Could go either way on payload length... if we do limit it i'd leave the door open to increasing the size in the future, but never decreasing the message size.

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jun 27, 2018

Is this 👍 to publish as a draft?

@bnewbold

This comment has been minimized.

Copy link
Contributor

bnewbold commented Jun 27, 2018

It would be nice to at least mention that payload size should be kept "reasonable", but I don't have specific language to recommend. I wouldn't block on that.

Skimming back over this, I don't see a privacy/security section. Even as an "Informative" and "Draft", I think there should be a section, and at least a "user/developer beware" statement. This in addition to the statement under drawbacks. Off the top of my head:

  • messages are encrypted on the wire and have the same global observer privacy properties as regular hypercore messages in that regard (eg, "what can your gateway router or ISP inspect/filter/block").
  • the discovery key semantics mean that you know you are connecting to a peer that knows the discovery key. however, within that set of users (which could be large/public/everybody if the discovery key is a public app), there is no man-in-the-middle protection; any protocol trying to implement authentication or secure message using ephemeral messages needs to treat them like any other datagram over the open internet; hypercore does not provide a trustworthy end-to-end or peer-to-peer messaging channel. as with hypercore, the current status of the network is that a gateway can "trivially" control discovery of peers (unless DNS-over-TLS or DNSSEC or similar), and as well anybody can join the swarm, so man-in-the-middling is a real thing for all users, not an advanced-persistent-threat scenario; this isn't as much of a concern with regular hypercore because you're verifying the data you receive by signature, but ephemeral messages aren't signed with the feed key (unless the application developer does so themselves)

That last point kind of ran on, sorry about that.

It would be nice to link to the implementation of extension messages, but we don't have the wire protocol DEP published yet. For reference, the message type is 15 (0x0F). Also the semantics: if ephemeral messages are being used, no other extension messages can be sent unambiguously, and the "scope" of extension messages is to the connection, not to any particular feed or channel (though I guess it's implicitly tied to the first/primary feed that was used to initiate the connection).

Of the above, the only thing i'd consider blocking is an additional security/privacy section. Apologies for not noticing and requesting earlier; I can draft something if you want.

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jun 27, 2018

@bnewbold I'm happy to write it, and happy for you to! I'd probably steal what you wrote their either way. Just LMK

@bnewbold

This comment has been minimized.

Copy link
Contributor

bnewbold commented Jun 27, 2018

If you're up for writing it please do, and feel free to take anything from the above.

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jul 2, 2018

@bnewbold glad you suggested that section, it was needed. Ready for review.

@pfrazee pfrazee referenced this pull request Jul 6, 2018

Closed

Upcoming Meeting Agenda - 4th July 2018 #25

0 of 6 tasks complete
@bnewbold

This comment has been minimized.

Copy link
Contributor

bnewbold commented Jul 6, 2018

We should discuss in the WG meeting, but after thinking about it more I think the above security issue is serious enough that it feels like we're handing developers a foot-gun by publishing this as a DEP, even a Draft. This doesn't feel "safe by default": there is a large burden on application developers and users to understand the security/privacy semantics.

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jul 6, 2018

We decided to withdraw this proposal. The WG was uncomfortable giving this spec approval given its security & privacy characteristics. Applications are (as always) free to use similar designs in their own extension messages.

@pfrazee pfrazee closed this Jul 6, 2018

@RangerMauve

This comment has been minimized.

Copy link
Contributor

RangerMauve commented Jul 7, 2018

Is the session data proposal still good to go?

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jul 7, 2018

@bnewbold do you have any feelings about that? It's not fundamentally different in its security & privacy properties.

@RangerMauve

This comment has been minimized.

Copy link
Contributor

RangerMauve commented Jul 13, 2018

Also, if this is being revoked, does that mean the datPeers API in beaker is going away? Would it be possible to keep something like it for sessionData if it's just ephemeral messages that are going away?

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jul 13, 2018

No the datPeer API is a go as planned. We decided not to make it an official DEP

@aral

This comment has been minimized.

Copy link

aral commented Jan 24, 2019

Sorry to revive a closed PR but regarding the privacy concerns, wouldn’t they be addressed by having the ephemeral messages encrypted by the feed key as part of the ephemeral extension protocol itself (instead of leaving it to userland?)

It feels like this would be preferable to every userland application creating its own method of handling ephemeral messaging.

@pfrazee

This comment has been minimized.

Copy link
Member Author

pfrazee commented Jan 25, 2019

@aral AFAIK that is the case -- all messages over the hypercore-protocol are encrypted using the feed key that arranged the connection. The concern is that the feed key is not secret enough for general use. For instance, if the communication were occurring over a popular app's dat.

There's a lot of work planned & under way with the discovery and connection layer. Discovery is shifting to hyperswarm and connection-layer encryption is (last I heard) moving to a NOISE implementation. The reasons for this include:

  • Enabling authenticated dat connections, so that you can specify a network ACL (eg "only sync with aral").
  • Hyperswarm moves discovery onto a DHT, which improves resilience. Hopefully that work will improve overall connection-reliability as well.
  • We have plans to make the discovery+connection layer a separate reusable stack which Beaker will expose as the Peer Socket API. This will make it possible to arrange connections without piggybacking on the replication streams and that should make the connections fit their intended uses a little better in terms of controlling who participates in them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment