"Don't put all your bits in one basket!"
Consider the following scenario:
Three friends share a secret - like a cryptographic key. They all wish to hold this key so they make their own copy of it. Now that each friend have one of the tree identic keys, they can all access the data it protects. However, if one of these friends turn rouge or is robbed of the key, their secret is compromised and lost.
Next time the tree friends shares a secret they make their own unique key. All the three different keys are needed to unlock the secret. If one of the friends turn rouge, gets robbed or otherwise expire, they instead all lose access to the secret altogether!
BitFrag solves this.
BitFrag is a library and utility for splitting data into a set of fragments (called clusters). The twist to these clusters is that upon its creation a set of parity fragments are additionally generated. This allows for some of the fragments to be lost, tampered with or otherwise corrupted without compromising the cluster.
Details and terminology
The original data is fragmented (or fragged) into fragments. A fragment is some portion of the data or a parity fragment (which is a special sum of the other fragments). A set of fragments (that corresponds to the same original data) is considered to be a cluster. A cluster can regenerate the original data if sufficient fragments are provided. Fragments can be exported, distributed and imported by a data structure standard called the fragment format.
BitFrag works with any arbitrary data but some specific applications that comes to mind are
- Cryptographic key management.
- Long-term storage of critical data.
- Fault/tamper tolerant P2P communication.
- Network traffic obfuscation.
This project arose out of a simple idea and prospered into an idea of bigger picture. The code is still in its early versions.
- The final algorithm is not yet written. As of now, a much simpler version of the algorithm is implemented as a proof of concept prototype.
- The code structure is brief.
- The fragment format is preliminary.
- Adequate testing needs to be automated.
- The code structure should be refactored to better allow extension and usage of the code as an API.
- Additional modules for some of the less trivial common operations.
- Add consensus check with cross-regeneration.
- Stream mode (as opposed to only block mode as of now).
- Performance optimizations when dealing with large data sets.
- Implement the fully functional algorithm.