New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle File Transfer - Efficient and pain free file uploads, smart chunking for processing queues #2

Open
diasdavid opened this Issue Jan 2, 2016 · 0 comments

Comments

Projects
None yet
1 participant
@diasdavid
Owner

diasdavid commented Jan 2, 2016

After experiencing a considerable amount of pain points and frustrations when uploading a large file (e.g. a video to youtube or vimeo), seeing it being canceled because the external HDD suddenly unmounted or because the Internet connection dropped, having to restart the upload from scratch, I started looking if where isn't a better way to do this file upload thing.

I didn't have to look to much into this, as IPFS (and pretty much any other Merkle'lised data transfer), does the chunking + resuming an upload pretty well! However, IPFS has a large scope and goals and for this specific case, we are just focused on bitswap, the exchange protocol found inside IPFS, for a 1:1 transfer unidirectional transfer.

Merkle File Transfer (MFT)

The goals of a MFT are:

  • Enable file upload resuming.
  • Cryptographically verify the integrity of the file contents (during transfer and long after transfer).
  • Chunk the file in an interesting format for the post processing.
  • Avoid re-uploading the same chunks.

Traditional File Uploading

Traditional file uploads (FTP, SFTP, HTTP, etc) have a very simple algorithm

  1. dialer opens a connection to the listener
  2. starts file upload (can include file type, encoding and other metadata as headers or encoded in the file)
  3. if connection breaks, jump back to 1

This makes it extremely simple, but also extremely wasteful of resources, if a connection drops, even if only one bit is missing, the whole file has to be transferred again.

Merkle'lised File Uploading

With a MFT, you chunk and import a file into a MerkleDAG format (ref: jbenet/random-ideas#20) and send the list of chunks. With that list, the receiver can ask for the specific chunks that it is still missing, avoid to transfer chunks that it already has available.

Merkle File Processing Queues

One of the use cases that can benefit greatly from MFT is Processing Queues, depending on the type of data, we can make the chunking algorithm be smarter on how it breaks the file into parts, so that the receiver can start processing the chunks as soon as it receives them (e.g chunk a video by key frames so that effects can be applied).

References:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment