Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest standard implementation for electronic share set generation #64

Open
BenWestgate opened this issue Aug 26, 2023 · 0 comments
Open

Comments

@BenWestgate
Copy link
Contributor

BenWestgate commented Aug 26, 2023

Summarizes Issue #57:

While codex32 is designed to be computed by hand, should it become popular, it's likely most backups will be made with electronics. This offers opportunities to increase error detection and presents liabilities from trusting the randomness of a device. This proposal seeks to maximize benefits of electronics while reducing their downside by being easily auditable so the device's entropy is not a single point of failure.

Generating Shares
For a fresh master seed
Inputs:

  • App entropy an extended private masterkey generated by the wallet. It must be displayed unconditionally so the wallet does not know if it will be audited.
  • User entropy a string provided by the user that allows to improve security in case app entropy generator has a backdoor. This could be a one-time text entered by the user only for this purpose, or a passphrase containing enough entropy that is used later for other purposes.
  • k the threshold to recover the codex32 secret.
  • n the total number of shares in the backup.

Optional inputs:

  • Identifier sets a specific 4 character bech32 identifier for the backup, overriding the default one.
  • Seed length sets a specific master seed byte length, overriding the default of 16-bytes.
  • Existing codex32 strings forces up to k of the shares to be the provided ones. Overrides seed length, identifier and k.

Outputs:

  • Fresh master seed, the BIP32 master seed of the backup.
  • New shares, n newly generated shares with shuffled indices. (excludes existing)

For an existing master seed
- Inputs and outputs are the same except:
Inputs:

  • Existing codex32 strings: one must be the codex32 secret or else a valid threshold length set of shares must be given.
  • App entropy is replaced by the private masterkey of an existing master seed.
  • User entropy becomes optional unless re-sharing a master seed with an existing codex32 share backup. Then it must be a "unique string". The app may help ensure uniqueness with a monotonic counter or date which must be unconditionally displayed.

PROPOSAL

  • Share indices

In order to prevent the order of indices used from leaking secret information, the available indices (after excluding 's' and any existing shares passed) are deterministically shuffled as follows:

def shuffle_indices(index_seed, indices):
    from cryptography.hazmat.primitives.ciphers import Cipher, algorithms
    """
    Shuffle indices deterministically using provided key with ChaCha20.

    :param index_seed: The ChaCha20 key for deterministic shuffling.
    :param indices: Characters to be shuffled as a string or list.
    :return: List of shuffled characters sorted by assigned values.
    """
    algorithm = algorithms.ChaCha20(index_seed, bytes(16))
    keystream = Cipher(algorithm, mode=None).encryptor()
    counter = 0  # Counter to track current position in the keystream.
    value = b""  # Storage for the assigned random byte.
    block = b""  # Holds the latest keystream block.
    assigned_values = {}  # Dictionary to store chars and their values.
    for char in indices:
        # Ensure new random value is generated if there is a collision.
        while value in assigned_values.values() or not value:
            if not counter % 64:  # Get new 64-byte block per 64 count.
                block = keystream.update(bytes(64))  # ChaCha20 block.
            value = block[counter % 64:counter % 64 + 1]  # Rand byte.
            counter += 1
        assigned_values[char] = value
    return sorted(assigned_values.keys(), key=lambda x: assigned_values[x])
  • Padding bits

In order to prevent padding bits from leaking 2-4 bits per generated share, these must also be set deterministically. While all zeros works, it's wasteful as some error detection is possible with an ECC code. (potentially bit error correction for high k values.) This ECC padding is guaranteed to detect up to 1 bit error or 2 sequential bit errors:

def ecc_padding(data):
    """
    Calculate and return a byte with concatenated parity bits.

    :param data: Bytes of seed_len, hrp, k, ident, index and payload.
    :return: Byte with concatenated parity in most significant bits.
    """
    # Count mod 2 the number of set (1) bits in the byte data
    parity = bin(int.from_bytes(data)).count('1') % 2 << 7
    # Count mod 2 the number of set (1) even bits in the byte data
    parity += bin(int.from_bytes(data))[::2].count('1') % 2 << 6
    # Count mod 2 the number of set (1) odd bits in the byte data
    parity += bin(int.from_bytes(data))[3::2].count('1') % 2 << 5
    # Count mod 2 the number of set (1) third bits in the byte data
    parity += bin(int.from_bytes(data))[::3].count('1') % 2 << 4

    return parity.to_bytes()

This may help if a share is damaged beyond the correction ability of the codex32 checksum by reducing correction candidates by 1/4 to 1/16. It also checksums string length that BCH checksums can't offer error detection guarantees about.

  • Identifier

There are two good ideas for setting the default ID so that 1) users are not burdened with creating and entering it, 2) 20-bits of the secret aren't leaked, and 3) it's easy to identify which wallet or descriptor the shares belong to or vice versa.

  1. BIP32 fingerprint in bech32
  • Allows matching the first 5 hex characters from descriptor key origin information to the codex32 share identifier to confirm they correspond.
  • Allows error correcting these 4 characters by converting a known corresponding fingerprint. Further helping emergency recovery.
  • As a cryptographic checksum, it detects malicious tampering of shares or invalid combinations as the derived seed's fingerprint won't match.
  1. Encrypted BIP32 fingerprint in bech32
  • Allows selectively revealing the correspondence between shares and descriptors only to those with the unique_string.
  • Allows creating unique identifiers for re-sharing that still offer the benefits of fingerprint above if unique_string is remembered or stored.
  • Includes k and len(payload) in KDF's salt allowing detection of errors in these with known corresponding BIP32 fingerprint and unique_string.
def encrypt_fingerprint(master_seed, k, unique_string=""):
    """
    Encrypt the MS32 fingerprint using a unique string and header data.

    :param master_seed: The master seed used for BIP32 fingerprint.
    :param k: The threshold parameter as a string.
    :param unique_string: Optional unique string encryption password.
    :return: Encrypted fingerprint as a bech32 string.
    """
    password = bytes(unique_string, "utf")
    salt = len(master_seed).to_bytes(1, "big") + bytes(k, "utf")
    enc_key = convertbits(hashlib.scrypt(
        password, salt=salt, n=2 ** 20, r=8, p=1, maxmem=1025 ** 3, dklen=3),
        8, 5, pad=False)
    new_id = [x ^ y for x, y in zip(ms32_fingerprint(master_seed), enc_key)]
    return "".join([CHARSET[d] for d in new_id])

I prefer using the encrypted fingerprint everywhere for more privacy and its ability to detect errors in k and string length which may help emergency recovery. It's also slower to compute due to KDF so harder to grind data into.

  • Random share payloads

k shares are generated (less any provided existing codex32 strings). Generating random payloads is the largest potential leak. My proposal has two steps:

  1. User entropy and an extended private masterkey (xprv) from app entropy (or an existing master seed) are fed to a computationally expensive KDF to produce a 128 byte derived_key.
  2. Truncated HMAC-SHA512 is used with key=derived_key and a unique descriptive info field as the msg= to derive the "Index seed" for shuffling as well as each new share payload until a threshold of codex32 strings exist.
  3. Additional derived shares are derived as per the BIP but at shuffled indexes.
  4. If necessary, the master_seed is recovered from k shares as per the BIP and the identifier is updated based on this fresh master seed's fingerprint.
def generate_shares(master_key="", user_entropy="", n=31, k="2", ident="NOID",
                    seed_length=16, existing_codex32_strings=None):
    """
    Generate new codex32 shares from provided or derived entropy.

    :param master_key: BIP32 extended private master key from bitcoind.
    :param user_entropy: User-provided entropy for improved security.
    :param n: Total number of codex32 shares to generate (default: 31).
    :param k: Threshold parameter (default: 2).
    :param ident: Identifier (4 bech32 characters) or 'NOID' (default).
    :param seed_length: Length of seed (16 to 64 bytes, default: 16).
    :param existing_codex32_strings: List of existing codex32 strings.
    :return: Tuple: master_seed (bytes), list of new codex32 shares.
    """
    master_seed = b""
    if existing_codex32_strings is None:
        existing_codex32_strings = []
    new_shares = []
    num_strings = len(existing_codex32_strings)
    if (not validate_codex32_string_list(existing_codex32_strings, False)
            and existing_codex32_strings):
        return None
    available_indices = list(CHARSET)
    for string in existing_codex32_strings:
        k, ident, share_index, payload = decode("ms", string)
        available_indices.remove(share_index)
        if share_index == "s":
            master_seed = payload
        seed_length = len(payload)

    if num_strings == int(k) and not master_seed:
        master_seed = recover_master_seed(existing_codex32_strings)
    if master_seed:
        master_key = BIP32Node.from_rootseed(master_seed, xtype="standard")
    elif master_key:
        master_key = BIP32Node.from_xkey(master_key)
    else:
        return None
    key_identifier = hash_160(master_key.eckey.get_public_key_bytes())
    entropy_header = (seed_length.to_bytes(length=1, byteorder="big")
                      + bytes("ms" + k + ident + "s", "utf") + key_identifier)
    salt = entropy_header + bytes(CHARSET[n] + user_entropy, "utf")
    # This is equivalent to hmac-sha512(b"Bitcoin seed", master_seed).
    password = master_key.eckey.get_secret_bytes() + master_key.chaincode
    # If scrypt absent visit OWASP Password Storage or use pbkdf2_hmac(
    # 'sha512', password, salt, iterations=210_000 * 64, dklen=128)
    derived_key = hashlib.scrypt(
        password, salt=salt, n=2 ** 20, r=8, p=1, maxmem=1025 ** 3, dklen=128)
    index_seed = hmac.digest(derived_key, b"Index seed", "sha512")[:32]
    available_indices.remove("s")
    available_indices = shuffle_indices(index_seed, available_indices)
    tmp_id = "temp" if ident == "NOID" else ident

    # Generate new shares, if necessary, to reach a threshold.
    for i in range(num_strings, int(k)):
        share_index = available_indices.pop()
        info = bytes("Share payload with index: " + share_index, "utf")
        payload = hmac.digest(derived_key, info, "sha512")[:seed_length]
        new_shares.append(encode("ms", k, tmp_id, share_index, payload))
    existing_codex32_strings.extend(new_shares)

    # Derive new shares using ms32_interpolate.
    for i in range(int(k), n):
        fresh_share_index = available_indices.pop()
        new_share = derive_share(existing_codex32_strings, fresh_share_index)
        new_shares.append(new_share)

    # Relabel the new shares with default ID.
    master_seed = recover_master_seed(existing_codex32_strings)
    if ident == "NOID":
        ident = "".join([CHARSET[d] for d in ms32_fingerprint(master_seed)])
        new_shares = relabel_codex32_strings("ms", new_shares, k, ident)

    return master_seed, new_shares
  • Notes on Key Derivation Function Selection:

It is critical that the KDF used for the derived key and the identifier encryption be computationally expensive for the hardware implementing this standard. Time costs between 2 and 10 seconds with a significant memory cost (relative to available memory) will offer the best protection for low entropy user input against grinding attacks, in acceptable user time.

Scrypt with parameters n=2 ** 20, r=8, p=1 is used above. This requires 1 GB and a couple seconds on a modern laptop. If 1GB of memory is unavailable pbkdf2_hmac('sha512', password, salt, iterations=210_000 * 64, dklen=128) offers similar protection with slower runtime. Using Argon2id with a minimum configuration of 1024 MiB of memory, an iteration count of 4, and 4 degrees of parallelism can offer even better protection than Scrypt and may be implemented by wallets.

I do not prescribe any particular KDF but if using a different KDF or parameters than scrypt with n=2 ** 20, r=8, p=1, compliant wallet implementations MUST:

  1. Make the KDF function used and its cost parameters be publicly and permanently available for auditors.
  2. Use at least 2048 PBKDF2 iterations of HMAC-SHA512 OR its equivalent.
  3. Not use any fixed cost KDF, hash based message authentication code or simple hash function in place of scrypt.

They also must generally:
4. Display the app_entropy, if any, unconditionally.
5. Allow the user to provide at least 128-bits of user_entropy (128+ character minimum, 1024 is a reasonable maximum.) AFTER displaying app entropy.

I welcome everyone's feedback on this proposal. #59

@BenWestgate BenWestgate changed the title Reference implementation for electronic share set generation Suggested standard implementation for electronic share set generation Mar 26, 2024
@BenWestgate BenWestgate changed the title Suggested standard implementation for electronic share set generation Suggest standard implementation for electronic share set generation Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant