This document describes version 0
of the paperback schema, and the underlying
design and motivations of the cryptosystem.
Paperback is designed to store a person's secret data in a manner such that they can be recovered with a quorum of people coming together for the recovery.
It is a necessary part of this model that this recovery can occur without the original person present, as one use-case of paperback is to act as a person's will. If this is not what the person wants, they should encrypt their document with a secret passphrase before providing it to paperback.
There are two primary documents in paperback, both of which are required in order to facilitate a recovery event:
-
The "main document" stores an encrypted copy of the person's secret data. It is up to the user whether they wish to only have one copy of this document, or many.
-
"Key shards" are individual documents which store a portion of the secret key used to encrypt the "main document" (in a zero-information manner). A unique "key shard" is provided to each person who can form the quorum.
N
key "shards" are required to recover the secret key for the "main document".
For the purpose of this document, the set of people who possess a "key shard"
are called "key-holders". When N
people come together to perform a recovery
event, that group is called a "quorum" (with each member being called a "quorum
member"). In this document, K
refers to the total number of key shards
(including any key shards created after ).
For the purposes of simplicity, we assume that each key holder only has one key shard in their possession. Key-holders holding multiple key shards are counted as multiple people.
We assume that all of the cryptographic primitives used are computationally hard to break, and are thus trustworthy enough for our use. However, if the cryptographic primitives used in this design are found to be compromised, we can migrate to newer ones.
We also assume that the original user made an accurate decision about the
quorum size (that no more than N-1
people will either be compromised or
collaborate to attack the secret data). However, paperback must be secure even
with up to N-1
malicious quorum members.
We assume that the quorum is using a real copy of paperback, as otherwise we cannot guarantee that the software they use will not produce a fake result. And while it may be possible to prove that the result is fake using the real paperback software, that presupposes that they have access to the real paperback.
Paperback must be able to detect if malicious quorum members have produced fake
key shards or fake documents. It must not allow fewer than N
people to
recover the secret. It also should protect against N
or more malicious
key-holders creating a fake document that other (honest) key-holders then
cannot detect is a fake document. However, in order for this detection to work,
at least one honest key-holder must be confident that their key shard has not
been modified or replaced with a forged copy (by a malicious quorum).
The main cryptographic primitives used in this design are:
-
ZBase32_Encode(b)
encodes the bytesb
with z-base32. -
ZBase32_Decode(s)
decodes the z-base32 strings
. -
CSPRNG(n)
generatesn
bytes of cryptographically-random data. -
AEAD_GenKey()
generates a random and secure cryptographic key, usable forAEAD_Enc
andAEAD_Dec
. -
AEAD_Enc(K, nonce, plain, ad)
symmetrically encrypts the plain-textplain
with associated dataad
with keyK
and noncenonce
. -
AEAD_Dec(K, nonce, cipher, ad)
symmetrically decrypts the cipher-textcipher
with associated dataad
with keyK
and noncenonce
. -
Hash(data)
produces a cryptographic hash ofdata
, in the multihash format. -
Sig_GenPrivKey()
generates a random and secure cryptographic key, usable forSig_Sign
andSig_Verify
. -
Sig_GetPubKey(K_priv)
calculates the corresponding public key for the private keyK_priv
generated bySig_GenPrivKey
. The value is prefixed with the appropriate multicodec code. -
Sig_Sign(K_priv, data)
signs the bytesdata
using the private keyK_priv
. The value is prefixed with the appropriate multicodec code. -
Sig_Verify(K_pub, data, sig)
verifies that the signaturesig
by the public keyK_pub
of the bytesdata
is valid (K_pub
andsig
have the appropriate multicodec code prefix). -
Secret_Share(secret, n, k)
generatesk
shards of the secret datasecret
, such thatn
quorum members are required to reconstruct the secret. -
Secret_Recover(shards, n)
recovers the secret sharded into the givenshards
with quorum sizen
. -
Secret_Expand(shards, n, k2)
generates an additionalk2
shards by recovering the necessary information fromshards
with quorum sizen
. These shards should be statistically improbable to coincide with any other shards previously generated.
AEAD_Enc
and AEAD_Dec
are implemented using the AEAD construction of
ChaCha20-Poly1350
as defined in RFC 8439.
Hash
is Blake2b-256
as defined in RFC 7693. The
multihash prefix for Hash
is {0xa0 0xe4 0x02 0x20}
.
Sig_GetPubKey
, Sig_Sign
, and Sig_Verify
are implemented using
Ed25519
. The multicodec code prefixes for each
function are as follows:
Sig_GetPubKey
has a prefix of{0xed 0x01}
.Sig_Sign
has a prefix of{0xef 0x01}
.
Secret_Share
, Secret_Recover
, and Secret_Expand
are implemented using
Shamir Secret Sharing in GF(2^32)
(to allow for smaller chances of
shard collisions if the x-values are randomly chosen -- but a larger field such
as GF(2^64)
would be even better). At the moment, Secret_Recover
and
Secret_Expand
are implemented using Langrange polynomial interpolation, but
more efficient methods (the barycentric form of the Lagrange polynomials,
Vangermonde matricies, Sylvester's formula, Neville's algorithm) that provide
the same security guarantees (and accurate results) are also acceptable.
AEAD_GenKey
and Sig_GenPrivKey
are both implemented using the relevant
secure randomness source provided by the operating system (depending on the
algorithm scheme, this may require different derivation algorithms -- just use
the one provided by the relevant cryptosystem).
The data described in this document will need to be serialised in some form for practical reasons (most QR code readers don't support binary encoding, and some features of these documents will have a human-readable fallback). All of the data operations in this document are applied to the deserialised form of the data.
Let secret
be the secret data which the user wants to store. Let n
be the
size of the quorum the user selected, and k
be the number of key shards to be
created (assuming k > n
).
We first generate all of the necessary keys and shards:
K_doc = AEAD_GenKey()
K_id_priv = Sig_GenPrivKey()
K_id_pub = Sig_GetPubKey(K_id_priv)
// "Sealed documents" are documents where creating new shards after the initial
// backup (as described in the "Expansion" section below) is prohibited. This
// prohibition is done by not including K_id_priv in the shared secret -- making
// it impossible for new signatures to be created after the initial backup.
secret = K_doc
if !sealed_document
secret ||= K_id_priv
else
secret ||= [zeroes]
shards = Secret_Share(secret, n, k)
Then the main document is formed (treat it as a JSON object):
doc_nonce = CSPRNG(NONCE_SIZE)
Doc[meta] = n
Doc[body] = doc_nonce || AEAD_Enc(K_doc, doc_nonce, secret, Doc[meta] || K_id_pub)
Doc[identity] = K_id_pub || Sig_Sign(K_id_priv, Doc[meta] || Doc[body] || K_id_pub)
Doc[chksum] = Hash(Doc[meta] || Doc[body] || Doc[identity])
// The full document ID is content-defined.
doc_id = Doc[chksum]
// Used for human identification of the document. Will be printed on the main
// document but isn't actually stored in the data (for obvious reasons -- you
// can't store the checksum of the data inside the data).
doc_human_id = ZBase32_Encode(doc_id[:6])
Then the key shards are formed (treat them as JSON-like objects). For each
shard
in shards
:
K_shard = AEAD_GenKey()
shard_nonce = CSPRNG(NONCE_SIZE)
// The x-coordinate of the SSS shard must be unique for each shard -- if two
// shards have the same x co-ordinate they cannot be used together. Thus we can
// use it as a way for human-verifiable uniqueness of shards as well as a loose
// "content-defined" shard ID for human identification.
shard_id = shard.x
// Used for human identification of the key shard.
shard_human_id = ZBase32_Encode(shard_id)
Shard[meta] = Doc[chksum]
Shard[body] = shard_nonce || AEAD_Enc(K_shard, shard_nonce, n || shard, Shard[meta] || K_id_pub)
Shard[identity] = K_id_pub || Sig_Sign(K_id_priv, Shard[meta] || Shard[body] || K_id_pub)
Shard[chksum] = Hash(Shard[meta] || Shard[body] || Shard[identity])
// The key is included, to allow for users to optionally store the key
// separately (or keep it with the key shard to make this level of security a
// no-op).
Shard[key] = K_shard
During recovery, we may discover that the document or one or more key shards do
not have consistent values for K_id_pub
or Doc[chksum]
. This indicates that
at least one of the documents has been forged, but it is not possible to be
sure which documents have been forged. However it is possible to display to
users (possibly graphically) which documents are consistent with each other,
and thus perhaps allow for a social solution to the conflict. In all such
cases, the recovery operation must abort.
When recovering, first the main document is verified:
// Ensure the checksum and signature are valid.
if Doc[chksum] != Hash(Doc[meta] || Doc[body] || Doc[identity])
abort "Main document checksum mismatch -- bad data."
{doc_id_pub, doc_id_sig} = Doc[identity]
if not Sig_Verify(doc_id_pub, Doc[meta] || Doc[body] || doc_id_pub, doc_id_sig)
abort "Main document signature mismatch -- forgery detected"
{doc_doc_id, doc_n} = Doc[meta]
Then each of the shards are verified. For each Shard
in the given set:
// Ensure the checksum and signature are valid.
if Shard[chksum] != Hash(Shard[meta] || Shard[body] || Shard[identity])
abort "Shard checksum mismatch -- bad data."
{shard_id_pub, shard_id_sig} = Shard[identity]
if not Sig_Verify(shard_id_pub, Shard[meta] || Shard[body] || shard_id_pub, shard_id_sig)
abort "Shard signature mismatch -- forgery detected."
{shard_doc_chksum} = Shard[meta]
// All shards and documents must use the same public identity keys.
if shard_id_pub != doc_id_pub
abort "Inconsistent identities -- forgery detected."
// The shard must agree on the document's checksum.
if shard_doc_chksum != Doc[chksum]
abort "Document checksum doesn't match shard's copy -- forgery detected."
// Decrypt the shard.
{shard_nonce, shard_ciphertext} = Shard[body]
{shard_n, shard}, error = AEAD_Dec(Shard[key], shard_nonce, shard_ciphertext, Shard[meta] || Shard[id_pub])
if error
abort "Error encountered during decryption -- possible forgery detected."
// All shards and documents must agree on basic metadata.
if shard_n != doc_n
abort "Inconsistent metadata -- forgery detected."
// All shards must have a unique shard_id.
shard_id = shard.x
if shard_id == any other shard_id
abort "Duplicate or fake duplicate shard -- possible forgery detected."
// All good!
shards.append(shard)
Once all the shards have been verified against the document, and no errors have been found, the recovery process can be completed:
// Get the secret data.
{K_doc, K_id_priv}, error = Secret_Recover(shards, doc_n)
if error
abort "Error encountered during secret recovery."
// Effectively, check if the document is sealed.
if K_id_priv is not all zeroes
// Make sure that that K_id_priv matches. At this point, we've already
// revealed the secrets but it doesn't hurt to double-check.
if Sig_GetPubKey(K_id_priv) != Doc[id_pub]
abort "Inconsistent identities -- forgery detected."
// Decrypt the main document.
{doc_nonce, doc_ciphertext} = Doc[body]
secret, error = AEAD_Dec(K_doc, doc_nonce, doc_ciphertext, Doc[meta] || doc_id_pub)
if error
abort "Error encountered during decryption -- possible forgery detected."
Let k2
be the additional number of shards needed.
In certain circumstances, it is necessary to able to create new shards. However, the most trivial way (creating a new paperback backup) means that any old shares become invalidated. This is problematic because some key-holders may not be easily reachable to replace their existing key shard.
Thus, paperback supports creating new key shards which are compatible (but distinct) from any others. In order to perform this operation a quorum is required, but the main document is not necessary. Note that this operation allows the quorum to produce a virtually unlimited number of key shards, and that this feature is an unavoidable mathematical feature of the construction of Shamir Secret Sharing.
However, users can disable this feature by creating a "sealed document", where
K_id_priv
is deleted after the initial backup and thus no new shards can be
signed with that key. Note that this only restricts new shards from being
accepted as "valid" when the document is being recovered -- mathematically, the
new shards can still be used to recover the secret (the sealing feature is
intended to ensure that during recovery, such shards would be detected as not
being original -- whether that is useful depends on what application paperback
is being used in).
The process for verifying the shards is very similar to the recovery scenario,
except that the shard metadata is compared with other shards rather than
against the main document. The same caveat about forgery detection applies here
too. For each Shard
in the given set:
// Ensure the checksum and signature are valid.
if Shard[chksum] != Hash(Shard[meta] || Shard[body] || Shard[identity])
abort "Shard checksum mismatch -- bad data."
{shard_id_pub, shard_id_sig} = Shard[identity]
if not Sig_Verify(shard_id_pub, Shard[meta] || Shard[body] || shard_id_pub, shard_id_sig)
abort "Shard signature mismatch -- forgery detected."
{shard_doc_chksum} = Shard[meta]
// All shards must use the same public identity keys.
if shard_id_pub != any other shard_id_pub
abort "Inconsistent identities -- forgery detected."
// All shards must agree on the document's checksum.
if shard_doc_chksum != any other shard_doc_chksum
abort "Shards don't agree on document checksum -- forgery detected."
// Decrypt the shard.
{shard_nonce, shard_ciphertext} = Shard[body]
{shard_n, shard}, error = AEAD_Dec(Shard[key], shard_nonce, shard_ciphertext, Shard[meta] || Shard[id_pub])
if error
abort "Error encountered during decryption -- possible forgery detected."
// All shards must agree on basic metadata.
if shard_n != any other shard_n
abort "Inconsistent metadata -- forgery detected."
// All shards must have a unique shard_id.
shard_id = shard.x
if shard_id == any other shard_id
abort "Duplicate or fake duplicate shard -- possible forgery detected."
// All good!
shards.append(shard)
Once all the shards have been verified against the document, and no errors have
been found, the recovery process can be completed. K_id_pub
is the agreed-upon
shard_id_pub
, and n
is the agreed-upon shard_n
.
// Get the secret data.
{K_doc, K_id_priv}, error = Secret_Recover(shards, n)
if error
abort "Error encountered during secret recovery."
// Effectively, check if the document is sealed.
if K_id_priv is not all zeroes
// Make sure that that K_id_priv matches. At this point, we've already
// revealed the secrets but it doesn't hurt to double-check.
if Sig_GetPubKey(K_id_priv) != K_id_pub
abort "Inconsistent identities -- forgery detected."
else
abort "Document was created as a sealed backup -- no new shards permitted."
// Decrypt the main document.
{doc_nonce, doc_ciphertext} = Doc[body]
secret, error = AEAD_Dec(K_doc, doc_nonce, doc_ciphertext, Doc[meta] || K_id_pub)
if error
abort "Error encountered during decryption -- possible forgery detected."
Once all the shards have been verified against each other, and no errors have been found, the expansion process can be completed.
{K_doc, K_id_priv} = Secret_Recover(shards, n)
new_shards = Secret_Expand(shards, n, k2)
The new key shards are constructed as during creation. doc_id
is the
agreed-upon shard_doc_id
. For each shard
in new_shards
:
K_shard = AEAD_GenKey()
shard_nonce = CSPRNG(NONCE_SIZE)
// The x-coordinate of the SSS shard must be unique for each shard -- if two
// shards have the same x co-ordinate they cannot be used together. Thus we can
// use it as a way for human-verifiable uniqueness of shards as well as a loose
// "content-defined" shard ID for human identification.
shard_id = shard.x
// Used for human identification of the key shard.
shard_human_id = ZBase32_Encode(shard_id)
Shard[meta] = doc_id
Shard[body] = shard_nonce || AEAD_Enc(K_shard, shard_nonce, n || shard, Shard[meta] || K_id_pub)
Shard[identity] = K_id_pub || Sig_Sign(K_id_priv, Shard[meta] || Shard[body] || K_id_pub)
Shard[chksum] = Hash(Shard[meta] || Shard[body] || Shard[identity])
// The key is included, to allow for users to optionally store the key
// separately (or keep it with the key shard to make this level of security a
// no-op).
Shard[key] = K_shard
While this is not core to the cryptographic design of paperback, the methods of serialising information so that the user can use the paperback documents is pretty important.
The metadata and body of each document will be serialised together since there is no point to providing these as separate values to the user.
For each segment of a paperback document, there are two serialised representations:
-
QR codes (ISO/IEC 18004:2015), containing a base64-encoded payload. This is the preferred method for users to input data, because it has very high density, error correction, and is machine-readable.
-
Human readable text, encoded in z-base-32 with a checksum. This is the backup method if the QR codes cannot be read. It has far smaller data density and no real error correction (in principle, most transcription errors can be fixed by the user but that's not an ideal solution).
In order to make the encoding both self-descriptive and future-proof, the encodings for both serialisations are prefixed with the corresponding multibase code prefix.
It is often necessary to split the data stored in QR codes. The
primary reason is that the main document can have a very large data segment and
many phone QR code scanners do not like QR codes with large segments (failure
to scan, or scanning a portion of the QR code as a barcode are very common
failure modes). In addition, QR codes also have an upper limit regardless of
whether the user could scan such large codes. As such we need to include a
serialisation for each chunk of data being encoded. For each chunk of the data
(chunk-bytes
), the following serialisation format is used:
"Pb" [ version ] [ what ] [ nth-chunk ] [ N-chunks ] [ chunk-bytes... ]
"Pb"
is represented as ASCII, with the bytes{0x50 0x62}
.version
is the version of the paperback schema being used, represented as an unsigned varint.what
is a single-byte indicator of what data the QR code pertains to. If a scanner is expecting a different kind of data to the kind which is scanned, it should warn the user and not use the scanned data. The following bytes are defined:Byte ASCII Description 0x44
D
Main document data. nth-chunk
is the0
-indexed index of the current chunk, represented as an unsigned varint.N-chunks
is the number of data chunks that need to be scanned, represented as an unsigned varint.chunk-bytes
is the 8-bit binary encoding of the chunk bytes. It is prefixed by the relevant multibase code prefix (in this case,0x00
).
By storing nth-chunk
and N-chunks
in each QR code, we allow for the QR
codes to be scanned out-of-order (and we can instruct the user to know which QR
codes have yet to be scanned).
If the user scans QR codes from different documents or otherwise scans a QR code that isn't a valid chunk, the invalid data will be detected when the user scans the final checksum section of the document. It would just be wasteful to include a full checksum in the QR codes (especially since it would have to be stored in each QR code).
The serialised data is encoded using Base64 with padding as per RFC
4648, and is prefixed with the appropriate multibase
code (in this case, M
). The final base64 representation is then
encoded in a QR code (the redundancy level is not specified by
this document, and may even be user-configurable).
The size of the chunks should not exceed 512 bytes, as larger QR codes can be difficult to scan (and once printed may require too much fine detail, making any minor deterioration of the paper a serious problem).
Alongside the aforementioned QR codes, a textual representation is provided
where the data is encoded using z-base-32 and prefixed with the
appropriate multibase code (in this case, h
). In order to
facilitate easier human entry of the text, the text should be displayed in
4-character groups.
Ideally we would like paperback to be easily usable by non-technical users (after all, what's the point of a will if the only person who can read it is its author and subject?). As such, the layout was very carefully considered to ensure that ordinary folks can understand what each section of the document does without overloading them with too much information.
The following mockups are not the final layout (and were created in Inkscape), but should give a good idea of the style and layout which paperback should provide.
The key takeaways from this layout are:
-
The metadata, body, and identity sections of the
Doc
have been all included in a single section and set of QR codes. This avoids over-complicating the layout (and the user!) by having a separate "identity" section that the user gains no benefit from. -
The checksum is stored separately so that the user can clearly see where the document identifier comes from (and so that we can ask the user to manually verify the full checksum in z-base-32).
-
While the QR codes can be scanned in any order, we include arrows so users will feel that the scanning process is much more intuitive.
The key takeaways from this layout are:
-
The layout is consistent with the main document (both in terms of the order of the sections, and the conceptual meaning of each section).
-
It is made clear to the user that they can remove the shard codewords and store them separately, but that this is optional. Also note that the document and shard identifier are stored on the codeword section so that (if removed) it can be easily matched up with the original shard.