Skip to content

Problem: The secrecy of Share Link is ambiguous and can lead to false secrecy assumptions.#538

Closed
tristanls wants to merge 1 commit intodat-ecosystem:masterfrom
tristanls:master
Closed

Problem: The secrecy of Share Link is ambiguous and can lead to false secrecy assumptions.#538
tristanls wants to merge 1 commit intodat-ecosystem:masterfrom
tristanls:master

Conversation

@tristanls
Copy link
Copy Markdown

Solution: Rephrased to not mention secrecy. Also, removed some trailing whitespace from README.

Background:

Seeing the "is secret" claim led me on a chase through the code and DHT literature to figure out how secrecy was achieved in DAT as compared to its lack in other DHT-using implementations I found. I couldn't find any.

I also saw this comment after a discussion about how "secret" the link is #484 (comment), which seemed appropriate. However, I didn't find discussion that re-introduced the "secret" claim.

My understanding is that any swarm node hosting the files already has full access to the files by being a host in the swarm that happened to be close enough to the Share Link. Also, there are documented sybil attacks against DHTs that attempt to grab all the data:

It may very well be the case that I missed something fundamental. I am familiar with DHTs in general, but I may have missed a detail when skimming through the DAT implementation. If that's the case, feel free to close this out.

… secrecy assumptions.

Solution: Rephrased to not mention secrecy.
@pfrazee
Copy link
Copy Markdown

pfrazee commented Oct 2, 2016

@mafintosh what's the status on the key hmac, and the connection encryption? Can you point to the implementation?

@tristanls
Copy link
Copy Markdown
Author

To clarify the problem I ran into in interpreting the secrecy claim, the main issue for me was the part of "only those you share it with will be able to get the files".

While the Share Link itself may be unguessable (I see the key is a salted sha256 hash of a public key pair's public key) it is the secrecy of the data that is ambiguous. I see the private key is used for signature and owner determination, I don't see it used for encryption.

By the way, this is all fine. The primary use case is sharing and syncing scientific data. So I don't expect the secrecy of the data itself to be a thing.

The problem for me was that going off of "only people with link can get files", it had me wondering if I could store secret data inside DAT without encrypting the data. Removing mentions of secrecy is intended to discourage a mistake such as mine.

As before, if this is distracting or missing the point in some way, feel free to close this out. I think the project is great.

@pfrazee
Copy link
Copy Markdown

pfrazee commented Oct 2, 2016

As it's been described to me, the transport layer should be encrypted with the public key. You shouldn't be able to initiate a connection within the swarm unless you have the original public key to handle the encryption. If I'm wrong about that, then yes, the secrecy claim doesn't hold. I haven't reviewed the code to be sure.

@tristanls
Copy link
Copy Markdown
Author

tristanls commented Oct 2, 2016

Hmm.. I see..

So to join a swarm we can use signalhub, webrtc-swarm, or discovery-swarm (DHT).

In the signalhub case, the Share Link hashed Share Link is used as the swarmKey, which ends up being an "app name" in signalhub. So, "joining the swarm" in this case means "use an unguessable app name in signalhub".

I'm guessing webrtc also uses signalhub, at least for bootstrapping (didn't look).

And then discovery-swarm ends up joining what's configured in datland-swarm-defaults. This mode joins the DHT using the Share Link hashed Share Link, but it is hashed again and truncated into SHA1 size before being broadcast to DHT or announced via DNS. So, by this time, it is a pretty munged and opaque identifier (and it seems a bit more so than in signalhub mode). This is where secrecy of data v. secrecy of Share Link come into play (I think). While the Share Link is munged and opaque hashed, then hashed again, the sybil attacks I mentioned in the beginning, could care less what the identifier is, and would, I assume, go after all identifiers in the DHT. Requesting what's behind this munged and opaque Share Link would give a hint whether or not the data belongs to Dat or not, which would allow for Unvanish-like quick interrogation of nodes to come up with a subset of Dat-only data in the DHT. At this point, I don't think there's anything preventing downloading the data once this subset of Dat-only DHT is gathered.

(btw.. hybrid-swarm is missing link to repo https://www.npmjs.com/package/hybrid-swarm)

update: I learned that archive.discoverKey is a hash of the Share Link, not the Share Link itself. Share Link is the public key. I updated above accordingly.

@pfrazee
Copy link
Copy Markdown

pfrazee commented Oct 2, 2016

Requesting what's behind this munged and opaque Share Link would give a hint whether or not the data belongs to Dat or not

This is where you lose me. The discovery network (DHT, DNS, SignalHub) should only get you in touch with nodes on the basis of the hashed key. You shouldn't then be able to communicate with those nodes unless you have the original unhashed key, to encrypt and decrypt the traffic with them.

@tristanls
Copy link
Copy Markdown
Author

tristanls commented Oct 2, 2016

The discovery network (DHT, DNS, SignalHub) should only get you in touch with nodes on the basis of the hashed key.

This isn't guaranteed. See second paragraph of section 4 in the Unvanish paper of how to crawl the entire DHT.

Afterward, you shouldn't be able to communicate with those nodes unless you have the original unhashed key, to encrypt and decrypt the traffic with them.

This is the part I can't find haven't found yet.

@pfrazee
Copy link
Copy Markdown

pfrazee commented Oct 2, 2016

This isn't guaranteed. See second paragraph of section 4 in the Unvanish paper of how to crawl the entire DHT.

Crawling the DHT isn't a compromise to the security design. The keys are hashed; they don't reveal the original content. AFAICT the only question is whether the connection-layer encryption is actually implemented yet.

@tristanls
Copy link
Copy Markdown
Author

Crawling the DHT isn't a compromise to the security design. The keys are hashed; they don't reveal the original content. AFAICT the only question is whether the connection-layer encryption is actually implemented yet.

Agreed, that's the part I haven't learned yet.

@mafintosh
Copy link
Copy Markdown
Contributor

@tristanls @pfrazee the connection encryption layer has been implemented from the start and is a core feature of dat. i'm on my way out the door right now but will post more details later. the implementation lives here, https://github.com/mafintosh/hypercore-protocol

@mafintosh
Copy link
Copy Markdown
Contributor

Basically when finding other peers dat uses an hmac(datKey, 'hypercore') to publish to the dht/dns-servers. This hmac is only used to find peers sharing the same content as you.

Afterwards when connecting to a potential peer the first message sent is

<hmac><nonce>

This allows the peers to figure which dat they are sharing. All following messages are encrypted using libsodium with the dat key and the nonce (the nonce is increment for every message sent), https://github.com/mafintosh/hypercore-protocol/blob/master/index.js#L505-L518.

This means that if you dont have the dat key, but only the hmac, you wont be able to decrypt the actual traffic and thereby the data.

@mafintosh
Copy link
Copy Markdown
Contributor

Secrecy and access control is important to us btw and a big use case in science.

@mafintosh mafintosh closed this Oct 2, 2016
@mafintosh
Copy link
Copy Markdown
Contributor

Also if you see any problems with the above scheme or have feedback in general I'd love to hear it :)

@tristanls
Copy link
Copy Markdown
Author

My last remaining question is regarding the reuse (or not) of the same keypair when encrypting the blocks for transport. From what I can tell, the keypair itself is used for encrypting the blocks directly via libsodium's crypto_secretbox. I don't see forward secrecy implemented in my scan through the code. Did I miss the ECDHE key exchange somewhere? Is it part of libsodium's crypto_secretbox implementation somehow?

@mafintosh
Copy link
Copy Markdown
Contributor

Yep you're correct, it isn't forward secure. We have discussion about that somewhere, I can see if I can dig it up. As I remember the gist of the conversation was that we wanted the simplest possible scheme and we could compromise on forward security as the protocol only shares static data (or an append-only list of static data) and the same data is shared over all sessions anyway.

@tristanls
Copy link
Copy Markdown
Author

Ok, I think I have a model of how it all works now. Thank you ^__^

@mafintosh
Copy link
Copy Markdown
Contributor

@tristanls btw noticed that you maintain https://github.com/tristanls/k-bucket. big fan of that module! using it for something new as we speak.

@mafintosh
Copy link
Copy Markdown
Contributor

@tristanls also i appreciate security feedback a lot, so please continue to open issues if you think we need to improve things.

@tristanls
Copy link
Copy Markdown
Author

@tristanls btw noticed that you maintain https://github.com/tristanls/k-bucket. big fan of that module! using it for something new as we speak.

Thank you! I'm happy to hear it's useful.

@tristanls also i appreciate security feedback a lot, so please continue to open issues if you think we need to improve things.

Will do. Thanks to this conversation I have a better understanding of what I want to do with Dat (but no timeline). If/when I go down this path, I'll be happy to continue to scrutinize it and will happily share anything I learn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants