BIP93: Generalize codex32 format for any hrp and fix typos #2040

BenWestgate · 2025-11-22T06:36:29Z

Summary of Changes:
Describe codex32 format for arbitrary human-readable parts not just "ms", specify master seed encoding standard, add new test vectors and enhance readability. This makes the document more like BIP-0173: proposing an encoding "codex32", then defining a standard for something using it.

See discussion on #2023 (comment).

Spec:

fixed the threshold mistake in the abstract
replaced "master seed" with "secret", prior to the "Master seed format" section and made descriptions hrp general
updated the checksum reference code to produce valid checksums for any hrp
change t to k to match the test vectors and book
defined "ms" codex32 secrets:
- using terms "secret seed" (as the book does) and "codex32-encoded master seed" to refer to "ms" codex32 secrets
- recommended using first 4 characters of the bech32-encoded fingerprint as the identifier
- recommended the padding bits be set with a CRC code for extra error detection. Provided reference code for this checksum.

Test Vectors:

Fixed the cornucopia of naming conventions in the Test vectors
- used mostly "secret seed", "codex32 secret", and "codex32-encoded X".
Fixed test vector 5 which did not actually append a long checksum to "random" data as the text said it would.
Added vector 6 encoding a "cl" prefix codex32-encoded HSM secret, then relabels the identifier (producing a new checksum and codex32-encoded HSM secret)
Added vector 7 which parses a "cl" prefix codex32 secret and decodes the HSM secret
Clarified why invalid prefix test vectors were bad (their checksum is for "ms" but their prefix is not "ms")
We might want to add one that uses "cl" with the old "ms" checksum code as that will now fail with the updated ms32_verify_checksum function

Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.

roconnor · 2025-11-24T17:18:26Z

bip-0093.mediawiki

+errors. The human-readable part is processed by first
+feeding the higher bits of each character's US-ASCII value into the
+checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?'''
+This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the


The lengths limitations of the codex32 strings are working under the assumption that the HRP is not subject to error correction. We more or less cannot do that anyways as all sorts of various bech32 formats have appeared all with different checksums and characteristics. In order to run the checksum algorithm you have to know the prefix first in order to know which checksum algorithm to try.

This isn't really a problem in practice since there are only a small finite number of prefixes, and from context only a few are going to be applicable anyways.

This was copied over from BIP-00173. Delete it?

Bech32 attempts to decode two checksums, a universal bech32 decoder could try decoding the string with the bech32, bech32m and codex32 checksums to discover the format.
Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.

If HRP is swapped between formats the chances of false verification is:

1 in 2^65 for a "codex32 checksum" validating when the encoding was Bech32/Bech32m

~1 in 2^30 for "Bech32 checksum" validating when the encoding was Codex32.

roconnor · 2025-11-24T17:25:25Z

bip-0093.mediawiki

+*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
+*** Translate those bits to characters using the bech32 character table from BIP-0173.
+
+When padding bits are needed they should be generated using CRC polynomial <code>(1 << pad_len) | 3</code> with an initial value of <code>0</code> and appended to the master seed bits. Note that unlike the codex32 checksums, we do NOT include the header data.


I don't really want this CRC stuff in the standard. At most it is a recommendation that folks MAY use to select padding bits. but I doubt it will be useful in practice since you cannot know if any given codex32 master seed was generated with this CRC or with random padding as the codex32 book does. If one's seed is so corrupted that the codex32 error correction wasn't able to fix it, I'm skeptical a few more bits will help.

If it is include than padding MAY be random should also be stated. Perhaps it is better to move this to a separate PR if we want to further discuss it.

I'll replace it with one sentence for now:

Encoders MAY select padding using a CRC-w where: w = pad_bits_needed, poly = 1 << w | 3, init = 1, const = 1 << w - 1, refIn = false, and refOut = false.

CRC-4 helps 32-byte seeds detect 94% of 1 character errors/deletions and narrows 1 erasure down to 2 candidates. The most common substitutions in CHARSET are 1 bit apart so actual performance will exceed that.

since you cannot know if any given codex32 master seed was generated with this CRC or with random padding

Finding the book is reasonable suspicion for random padding IF all electronically encoded seeds use CRC. The person who made the backup certainly knows if they encoded with the book or not.

as the codex32 book does

A book insert could compute CRCs by hand, but I have not divided a large enough number to benchmark time for 128-bits, Andrew and I determined the space requirement is two pieces of 11 x 8.5 graph paper.

If it is include than padding MAY be random should also be stated.

Decoders, MUST accept random padding, although they may someday warn on it.

For electronic encoders, trusting RNGs introduces risk: it can leak up to 32 bits to an attacker with 8 shares and breaks decode->encode round-trip. Using zero padding round-trips but leaks the final payload character to an attacker with (5-pad_len) / pad_len interpolated shares.

CRC pad minimizes RNG trust by using entropy already present in the payload bytes.

Perhaps it is better to move this to a separate PR if we want to further discuss it.

Lets discuss on my deterministic codex32 BIP85 PR where it's required since reviewers asked me to directly encode bytes not u5 ints even for share payloads.

roconnor · 2025-11-24T17:26:36Z

bip-0093.mediawiki

+def bech32_hrp_expand(s):
+  return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]
+
+def ms32_verify_checksum(hrp, data):


If you want an hrp parameter, you have to rename this function to something like codex32_verify_checksum.

roconnor · 2025-11-24T17:27:13Z

bip-0093.mediawiki

 As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
 For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
 If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.
+The lowercase form is used when determining a character's value for checksum purposes.


This doesn't make sense. The lowercase form and uppercase form of Bech32 characters have the same value.

Not for HRP which needs to be lower cased during decoding or bech32_hrp_expand(hrp) would return a different result.

This line is repeated from the test vectors, why explain the rules about case in the vectors instead of up here?

roconnor · 2025-11-24T17:36:47Z

bip-0093.mediawiki


-* Secret share with index <code>S</code>: <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code>
-* Master secret (hex): <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code>
+unchecksummed string (bech32): <code>MS10C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F</code>


I'd be included to remove this uncheckedsummed string. I'm really nervous displaying strings without a checksum anywhere. They are very problematic.

If you insist on going into this much detail in this test vector I'd say use the following bullets

Master seed (hex):

master node xprv

Payload

HRP

Identifier

Checksum

Secret seed

That's the order I'd use, but maybe some other permutations are also good.

How about since the text said:

This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.

human-readable part: MS
k value: 0
identifier: 0C8V
share index: S
payload: M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F

checksum: HPV80UNDVARHRAK

secret seed: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK

Master seed (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9

master node xprv: xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN

No information is displayed we did not already in Vector 1.

roconnor · 2025-11-24T17:53:09Z

bip-0093.mediawiki

+* The data-part values:
+** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
+** An identifier consisting of 4 bech32 characters.
+*** We recommend the first 4 characters of the bech32-encoded BIP-0032 key fingerprint.


When some shares of a master seed are compromised, a user may wish to simply dispose of remaining shares and rederive a new set of secret shares without the cost of sweeping their wallet. In such a case a user very much should use a fresh identifier so that they do not get mix up their obsolete share data with their fresh shares.

At best a hardware wallet may suggest such an identifier, but only when the hardware wallet is generating a fresh master seed and thus knows that there are no other secret shares for the same secret floating around.

Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.

This is a good case for BIP85 derive codex32 application to avoid trusting or generating randomness for this.

If we want a default identifier for "reshares" too:

identifier = fingerprint(master_seed)[:2] + fingerprint(false_seed)[2:4]

Where false_seed is recovered from fresh initial shares (reducing k if needed).

During the first generation with k fresh shares; the two slices together produce the full fingerprint. If the identifier is unspecified, recommend this default for master seeds. Or if that collides, the next higher.

Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.

Already done here. Agreed wrt details and vectors.

https://github.com/BenWestgate/bips/blob/a8f8e98d05a2183aba395f8f8ff479b4fb764f95/bip-0085.mediawiki#unshared-secret

#1958

I can move this recommendation (and the padding rec) into a separate BIP93 PR pending the final BIP85 codex32 design (needs finishing touches feedback). I’ll remove it from this PR so it remains focused on typos and general HRP support.

BenWestgate added 2 commits November 21, 2025 23:58

Generalize codex32 format for any hrp and fix typos

c6f8bd0

Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.

Revert title for BIP93 document

aedb912

jonatack added Proposed BIP modification Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified labels Nov 22, 2025

roconnor reviewed Nov 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BIP93: Generalize codex32 format for any hrp and fix typos #2040

BIP93: Generalize codex32 format for any hrp and fix typos #2040

BenWestgate commented Nov 22, 2025 •

edited

Loading

Uh oh!

roconnor Nov 24, 2025

Uh oh!

BenWestgate Nov 25, 2025

Uh oh!

roconnor Nov 24, 2025

Uh oh!

BenWestgate Nov 25, 2025

Uh oh!

roconnor Nov 24, 2025

Uh oh!

BenWestgate Nov 25, 2025

Uh oh!

roconnor Nov 24, 2025

Uh oh!

BenWestgate Nov 25, 2025

Uh oh!

roconnor Nov 24, 2025

Uh oh!

BenWestgate Nov 25, 2025

Uh oh!

roconnor Nov 24, 2025

Uh oh!

BenWestgate Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BIP93: Generalize codex32 format for any hrp and fix typos #2040

Are you sure you want to change the base?

BIP93: Generalize codex32 format for any hrp and fix typos #2040

Conversation

BenWestgate commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BenWestgate commented Nov 22, 2025 •

edited

Loading