Skip to content
This repository has been archived by the owner on Dec 29, 2021. It is now read-only.

Securing a Dat with a - additional - private key (password) #80

Open
martinheidegger opened this issue Feb 23, 2018 · 7 comments
Open

Comments

@martinheidegger
Copy link

martinheidegger commented Feb 23, 2018

When implementing a backup/public storage (like hashbase or datbase) for DATs that storage knows the content of the DAT. In my understanding, right now the only way to make sure that the storage does not know what is inside the dat is to encrypt the files in the storage additionally by packing the data in a .zip file. The problem with this approach is that it is not-at-all transparent. The sender needs to know and care about zipping and so does the recipient. Also both parties need the same zip program installed (funny sidenote: japanese tend to send out shift-jis encoded zip files) and know how to use it. Aside from knowledge and installation issues, its also significant amount of overhead if you do that often and reduces the comfort of using dat.

I thought about implementing an transparent-ish wrapper on top of hyperdrive that - instead of writing directly to the stream - write everything into a .dat-encrypt.zip file that is encrypted with a password and upon receiving a DAT that contains only a .dat-encrypt.zip file it automatically decrypts it.

This approach would be sound, but unfortunately DAT - as it is built right now - only lets you upload/download the entire zip in one run. Which means that any additional file would trigger a complete re-upload and re-download - consuming vast amounts of bandwidth 😟and sacrificing a big part of the value of having DATs. Maybe that is important in order to ensure actual privacy of the content.

This all leaves me with a few questions:

  • Are there other ways to achieve that?
  • Should the encryption layer be implemented?
  • Should this encryption this be part of hyperdrive? dat-node? or a implementation on top?
@joehand
Copy link

joehand commented Feb 23, 2018

May be of interest: https://github.com/jayrbolton/dat-wot

@martinheidegger
Copy link
Author

martinheidegger commented Feb 26, 2018

@joehand Thank you for the hint, but from what I can tell dat-wot only manages who knows about what dat link and makes sure that every user gets to see only dat-links he/she is supposed to see. That is certainly a nice workflow and concept but it doesn't make it possible for a intermediary to store/cache encrypted data.

@creationix
Copy link

How private do you want your data? For example, you could encrypt file contents, but not the file names or directory structure. Each file could be encrypted with some master key hashed to the path (don't want to use the same key for all files).

Another option is to store the data in some container that has multiple files. I've stored files in a git repo (blob and tree objects) which maps to a flat list of hashes. You only need to point to the root hash to read the tree. The real filenames are quite private since the tree objects are also encrypted. Here you could hash the master key with the content hash.

Another option is to store the files in a filesystem image (like ext4) and store the image in 4k blocks (named by index). You can use a stream cipher since the blocks are ordered if you plan on extracting them at once or hash the index to the master key if you want random access using fuse or something to mount the block device.

Also depending on how custom you want to go, only hyperdrive needs to be used for the dat protcol to allow syncing data around. Hashbase can store any hyperdrive based dataset, it just won't render as http if it's not hyperdrive on top.

I've got lots of ideas, but need more information about your requirements.

@martinheidegger
Copy link
Author

Oh, this is really inspiring!

The solution I am imagining right now - based of what you wrote - would be to have an encrypted file table .dat-encrypted which grows in 4k blocks (to avoid knowing the exact size of files in the dat. Every time files are written their information is added to .dat-encrypted and they get encrypted themselves into blocks 0, 1, 01, etc. Old blocks that don't show up in the current table can be deleted and each block is encrypted. This way streaming, in a sense, could still work.

@creationix
Copy link

I don't quite understand what you're proposing, but it sounds like it might work.

@creationix
Copy link

I'm just going to link to this old experiment of mine to show some of what I meant https://github.com/creationix/test-workspace

@martinheidegger
Copy link
Author

Let me try to rephrase: Instead of writing into .dat-encrypted, data get written into blocks of files. Those blocks have all the all the same size and are encrypted, so unless you know the meta-information you can't figure out what files are where, but you can download parts. The meta-data gets encrypted as well and is written in the key. So: with every new version you need to download the meta-data but you then can decide what parts to download.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants