Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Hiding data stored on registries #21

Open
RangerMauve opened this issue Apr 13, 2018 · 2 comments
Open

Discussion: Hiding data stored on registries #21

RangerMauve opened this issue Apr 13, 2018 · 2 comments

Comments

@RangerMauve
Copy link
Contributor

RangerMauve commented Apr 13, 2018

Related to #14

Some data should be private and not exposed with any third parties, but people will still need it backed up somewhere (pinned) so that they can be sure they have access to it between devices.

Dat should provide a mechanism for this out of the box.

Some ideas to consider:

  • Easiest approach would be to encrypt the contents of the archive at the application layer
    • Metadata could still be exposed about what files exist and how frequently it gets updated
    • Encrypt the file paths, too?
  • Should encryption be at the hyperlog level or the archive level?
    • How would hosts advertise and share the data without knowing the contents if it's at the hyperlog level?
  • How the flow of sharing this data will work
    • Should dat URLs allow for a "key" field to have two levels of URL to share?
    • dat://encryptionstrategy:encryptionkeyhere@daturlhere/mysecretdata.html

Wanted to have some feedback on others' ideas before working on anything.

For a start I'm going to play around with a wrapper for the DatArchive API for encrypting contents and files. Will look into using WebCrypto for the actual functionality.

@bnewbold
Copy link
Contributor

A few comments:

Encryption and privacy are complex topics with a bunch of use cases and threat models; I would be specific about what use cases you want to address and what specific privacy properties you want to preserve. Some use cases it sounds like you might be describing are:

  • personal device backup (all content from a single device backed up across multiple devices, all controlled/owned by a single person)
  • private shared-folder synchronization and backup for a single individual (like dropbox with a single user, where there is a third-party hosted "cloud" backup, but the service provider can't see any contents)
  • private collaboration between a fixed set of users without third-party backup (like bittorrent sync, where all transfer is peer-to-peer, content is private, can't revoke access to individuals/devices once granted, no third-party backup)
  • mutual backup between peers (like Tahoe-LAFS)

These use-cases have different technical requirements, for example a mechanism for securely distributing keys to multiple parties (non-trivial!) or even devices controlled by the same user.

It seems to me that there are two mechanisms that would work with the existing system:

  • for a large institutional/organizational use case, eg within a campus or enterprise network, one could run "local" trusted discovery services and/or an independent DHT. Traffic and connections would then only be between internal "trusted" nodes; even the discovery keys and "which peers are interested in which discovery keys" type metadata would not be public.
  • users can use existing backup and encryption tools (like GPG/keybase, LUKS, NaCL, etc) to encrypt files/content in a dat archive, and then use dat only for synchronization

What use cases would not be covered by the above?

In terms of implementation, I think adding features at the hypercore level will be much harder to pull off than building a library layer on top of hypercore (perhaps adding an abstraction layer between hypercore and hyperdb?).

@martinheidegger
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants