Skip to content

Conversation

@BenWestgate
Copy link

@BenWestgate BenWestgate commented Nov 22, 2025

Summary of Changes:
Describe codex32 format for arbitrary human-readable parts not just "ms", specify master seed encoding standard, add new test vectors and enhance readability. This makes the document more like BIP-0173: proposing an encoding "codex32", then defining a standard for something using it.

See discussion on #2023 (comment).

Spec:

  • fixed the threshold mistake in the abstract
  • replaced "master seed" with "secret", prior to the "Master seed format" section and made descriptions hrp general
  • updated the checksum reference code to produce valid checksums for any hrp
  • change t to k to match the test vectors and book
  • defined "ms" codex32 secrets:
    • using terms "secret seed" (as the book does) and "codex32-encoded master seed" to refer to "ms" codex32 secrets
    • recommended using first 4 characters of the bech32-encoded fingerprint as the identifier
    • recommended the padding bits be set with a CRC code for extra error detection. Provided reference code for this checksum.

Test Vectors:

  • Fixed the cornucopia of naming conventions in the Test vectors
    • used mostly "secret seed", "codex32 secret", and "codex32-encoded X".
  • Fixed test vector 5 which did not actually append a long checksum to "random" data as the text said it would.
  • Added vector 6 encoding a "cl" prefix codex32-encoded HSM secret, then relabels the identifier (producing a new checksum and codex32-encoded HSM secret)
  • Added vector 7 which parses a "cl" prefix codex32 secret and decodes the HSM secret
  • Clarified why invalid prefix test vectors were bad (their checksum is for "ms" but their prefix is not "ms")
  • We might want to add one that uses "cl" with the old "ms" checksum code as that will now fail with the updated ms32_verify_checksum function

Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.
@jonatack jonatack added Proposed BIP modification Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified labels Nov 22, 2025
errors. The human-readable part is processed by first
feeding the higher bits of each character's US-ASCII value into the
checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?'''
This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lengths limitations of the codex32 strings are working under the assumption that the HRP is not subject to error correction. We more or less cannot do that anyways as all sorts of various bech32 formats have appeared all with different checksums and characteristics. In order to run the checksum algorithm you have to know the prefix first in order to know which checksum algorithm to try.

This isn't really a problem in practice since there are only a small finite number of prefixes, and from context only a few are going to be applicable anyways.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied over from BIP-00173. Delete it?

Bech32 attempts to decode two checksums, a universal bech32 decoder could try decoding the string with the bech32, bech32m and codex32 checksums to discover the format.
Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.

If HRP is swapped between formats the chances of false verification is:

  • 1 in 2^65 for a "codex32 checksum" validating when the encoding was Bech32/Bech32m

  • ~1 in 2^30 for "Bech32 checksum" validating when the encoding was Codex32.

*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
*** Translate those bits to characters using the bech32 character table from BIP-0173.
When padding bits are needed they should be generated using CRC polynomial <code>(1 << pad_len) | 3</code> with an initial value of <code>0</code> and appended to the master seed bits. Note that unlike the codex32 checksums, we do NOT include the header data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want this CRC stuff in the standard. At most it is a recommendation that folks MAY use to select padding bits. but I doubt it will be useful in practice since you cannot know if any given codex32 master seed was generated with this CRC or with random padding as the codex32 book does. If one's seed is so corrupted that the codex32 error correction wasn't able to fix it, I'm skeptical a few more bits will help.

If it is include than padding MAY be random should also be stated. Perhaps it is better to move this to a separate PR if we want to further discuss it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll replace it with one sentence for now:

Encoders MAY select padding using a CRC-w where: w = pad_bits_needed, poly = 1 << w | 3, init = 1, const = 1 << w - 1, refIn = false, and refOut = false.

CRC-4 helps 32-byte seeds detect 94% of 1 character errors/deletions and narrows 1 erasure down to 2 candidates. The most common substitutions in CHARSET are 1 bit apart so actual performance will exceed that.

since you cannot know if any given codex32 master seed was generated with this CRC or with random padding

Finding the book is reasonable suspicion for random padding IF all electronically encoded seeds use CRC. The person who made the backup certainly knows if they encoded with the book or not.

as the codex32 book does

A book insert could compute CRCs by hand, but I have not divided a large enough number to benchmark time for 128-bits, Andrew and I determined the space requirement is two pieces of 11 x 8.5 graph paper.

If it is include than padding MAY be random should also be stated.

Decoders, MUST accept random padding, although they may someday warn on it.

For electronic encoders, trusting RNGs introduces risk: it can leak up to 32 bits to an attacker with 8 shares and breaks decode->encode round-trip. Using zero padding round-trips but leaks the final payload character to an attacker with (5-pad_len) / pad_len interpolated shares.

CRC pad minimizes RNG trust by using entropy already present in the payload bytes.

Perhaps it is better to move this to a separate PR if we want to further discuss it.

Lets discuss on my deterministic codex32 BIP85 PR where it's required since reviewers asked me to directly encode bytes not u5 ints even for share payloads.

def bech32_hrp_expand(s):
return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]

def ms32_verify_checksum(hrp, data):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want an hrp parameter, you have to rename this function to something like codex32_verify_checksum.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.
The lowercase form is used when determining a character's value for checksum purposes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense. The lowercase form and uppercase form of Bech32 characters have the same value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for HRP which needs to be lower cased during decoding or bech32_hrp_expand(hrp) would return a different result.

This line is repeated from the test vectors, why explain the rules about case in the vectors instead of up here?


* Secret share with index <code>S</code>: <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code>
* Master secret (hex): <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code>
unchecksummed string (bech32): <code>MS10C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F</code>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be included to remove this uncheckedsummed string. I'm really nervous displaying strings without a checksum anywhere. They are very problematic.

If you insist on going into this much detail in this test vector I'd say use the following bullets

  • Master seed (hex):
  • master node xprv
  • Payload
  • HRP
  • Identifier
  • Checksum
  • Secret seed

That's the order I'd use, but maybe some other permutations are also good.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about since the text said:

This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.

human-readable part: MS
k value: 0
identifier: 0C8V
share index: S
payload: M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F

  • checksum: HPV80UNDVARHRAK
  • secret seed: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK
  • Master seed (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9
  • master node xprv: xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN

No information is displayed we did not already in Vector 1.

* The data-part values:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
** An identifier consisting of 4 bech32 characters.
*** We recommend the first 4 characters of the bech32-encoded BIP-0032 key fingerprint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When some shares of a master seed are compromised, a user may wish to simply dispose of remaining shares and rederive a new set of secret shares without the cost of sweeping their wallet. In such a case a user very much should use a fresh identifier so that they do not get mix up their obsolete share data with their fresh shares.

At best a hardware wallet may suggest such an identifier, but only when the hardware wallet is generating a fresh master seed and thus knows that there are no other secret shares for the same secret floating around.

Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good case for BIP85 derive codex32 application to avoid trusting or generating randomness for this.

If we want a default identifier for "reshares" too:

identifier = fingerprint(master_seed)[:2] + fingerprint(false_seed)[2:4]

Where false_seed is recovered from fresh initial shares (reducing k if needed).

During the first generation with k fresh shares; the two slices together produce the full fingerprint. If the identifier is unspecified, recommend this default for master seeds. Or if that collides, the next higher.

Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.

Already done here. Agreed wrt details and vectors.

https://github.com/BenWestgate/bips/blob/a8f8e98d05a2183aba395f8f8ff479b4fb764f95/bip-0085.mediawiki#unshared-secret

#1958

I can move this recommendation (and the padding rec) into a separate BIP93 PR pending the final BIP85 codex32 design (needs finishing touches feedback). I’ll remove it from this PR so it remains focused on typos and general HRP support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified Proposed BIP modification

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants