BIP93: Generalize codex32 format for any hrp and fix typos #2040

BenWestgate · 2025-11-22T06:36:29Z

Summary of Changes:
Describe codex32 format for arbitrary human-readable parts not just "ms", specify master seed encoding standard, add new test vectors and enhance readability. This makes the document more like BIP-0173: proposing an encoding "codex32", then defining a standard for something using it.

See discussion on #2023 (comment).

Spec:

fixed the threshold mistake in the abstract
replaced "master seed" with "secret", prior to the "Master seed format" section and made descriptions hrp general
updated the checksum reference code to produce valid checksums for any hrp
change t to k to match the test vectors and book
defined "ms" codex32 secrets:
- using terms "secret seed" (as the book does) and "codex32-encoded master seed" to refer to "ms" codex32 secrets
- recommended using first 4 characters of the bech32-encoded fingerprint as the identifier
- recommended the padding bits be set with a CRC code for extra error detection. Provided reference code for this checksum.

Test Vectors:

Fixed the cornucopia of naming conventions in the Test vectors
- used mostly "secret seed", "codex32 secret", and "codex32-encoded X".
Fixed test vector 5 which did not actually append a long checksum to "random" data as the text said it would.
Added vector 6 encoding a "cl" prefix codex32-encoded HSM secret, then relabels the identifier (producing a new checksum and codex32-encoded HSM secret)
Added vector 7 which parses a "cl" prefix codex32 secret and decodes the HSM secret
Clarified why invalid prefix test vectors were bad (their checksum is for "ms" but their prefix is not "ms")
We might want to add one that uses "cl" with the old "ms" checksum code as that will now fail with the updated ms32_verify_checksum function

Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.

roconnor · 2025-11-24T17:18:26Z

bip-0093.mediawiki

+errors. The human-readable part is processed by first
+feeding the higher bits of each character's US-ASCII value into the
+checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?'''
+This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the


The lengths limitations of the codex32 strings are working under the assumption that the HRP is not subject to error correction. We more or less cannot do that anyways as all sorts of various bech32 formats have appeared all with different checksums and characteristics. In order to run the checksum algorithm you have to know the prefix first in order to know which checksum algorithm to try.

This isn't really a problem in practice since there are only a small finite number of prefixes, and from context only a few are going to be applicable anyways.

This was copied over from BIP-00173. Delete it?

Bech32 attempts to decode two checksums, a universal bech32 decoder could try decoding the string with the bech32, bech32m and codex32 checksums to discover the format.
Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.

If HRP is swapped between formats the chances of false verification is:

1 in 2^65 for a "codex32 checksum" validating when the encoding was Bech32/Bech32m

~1 in 2^30 for "Bech32 checksum" validating when the encoding was Codex32.

This text was certainly the design goal of BIP-173, but we are not using their checksum, and we haven't realized this part of their design in codex32 in part because our 13 character checksum unfortunately works only on relatively short strings.

Instead we process this HRP in this way because that is what BIP-173 does, and we still want the HRP to change the residue to catch random errors, so me might as well do it in the standard way.

Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.

The problem is that our particular 13 character checksum's max length for its error detection and correction properties is limited to 93 bech32 characters. That's why our payload is limited to 74 characters add in 13 character checksum and 6 characters for the header and we get 93 bech32 characters, with nothing left over to detect or correct errors in the HRP. Yes, in cases where the payload is 72 characters or less, our error correction / detection properties extend to the low 5 bits of the ascii characters of a 2 character prefix, but that doesn't apply to 73 or 74 character payloads.

I don't know if we really want to get into these subtleties. I'm not even sure correcting and detecting errors in the HRP is useful to begin. If you are a hardware wallet expecting a master seed and someone gives you a "cl" codex32 string, you don't need a fancy error correction algorithm to detect the "cl" prefix is wrong; if it is a expecting a master seed then the "cl" prefix must be wrong.

I will delete the footmark and say:
"The human-readable part is processed as per BIP-0173."

I updated the rationale accordingly:

At this length, the human-readable part is not covered by the checksum. This is acceptable because the checksum scheme itself requires you to know that a valid human-readable part is being used in the first place. If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected prefix.

I'm not even sure correcting and detecting errors in the HRP is useful to begin

wallets.md import guidance prefills the prefix in applications expecting only one to prevent mistakes.

A future application for extended keys (Is long codex32 HD=9 for 74 bytes?) has a situation where decoders need to accept both "xprv" and "xpub" HRPs in the same descriptor. So here it is absolutely useful to detect and correct errors in the HRP.

However we should not go into details about that until such an application needing to disambiguate different HRP actually exists.

What does HD=9 mean?

Hamming distance 9, able to detect 8 substitution errors. (and correct 4)

Yeah the 15 character checksum has HD=9 up to a maximum length 1023, supports data up to 1008 = (1023 - 15) characters and a payload of 1002 (=1008 - 6 header characters) bech32 characters. If you want to cover the HRP you lose payload capacity.

Plenty of room to long codex32 encode extended keys with covered human-readable parts "xpub", "xprv", "tprv" and "tpub" someday. They're horrible to handle and type in base58. Is this something we should think about in a separate PR?

I ran the numbers here and we must further restrict HRP lengths to 72 or 74 (same maximum as the payload). At len(hrp) = 74 two header characters two header characters aren't checked but what does a threshold or share index even mean for a 0 byte payload?

I also explicitly stated the maximum length for a codex32 string is 96. Although perhaps at length = 96 the HRP we assume "ms" while the max length for all other prefixes is 94?

bip-0093.mediawiki

roconnor · 2025-11-24T17:27:13Z

bip-0093.mediawiki

 As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
 For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
 If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.
+The lowercase form is used when determining a character's value for checksum purposes.


This doesn't make sense. The lowercase form and uppercase form of Bech32 characters have the same value.

Not for HRP which needs to be lower cased during decoding or bech32_hrp_expand(hrp) would return a different result.

This line is repeated from the test vectors, why explain the rules about case in the vectors instead of up here?

I guess we should reword this to make it more clear that the relevance is for the HRP.

"When constructing or verifying a checksum, the human-readable part MUST be interpreted in lowercase, as specified in BIP-0173."

I might say "MUST be converted to lowercase" instead.

That seems to imply mutating the string when verifying a checksum.

Something BIP-93 omitted was that encoders should always emit lower-case strings. Did we relax that requirement?

Currently I have the sentence as:
"Encoders MUST emit lowercase; decoders MUST reject mixed-case and MUST lowercase the human-readable part during checksum verification."

And I am adding a section with a codex32_encode and codex32_decode definitions as I think it's easier to see these rules in code than english.

Uppercase/lowercase

The lowercase form is used when determining a character's value for checksum purposes.

Encoders MUST always output an all lowercase Bech32 string. If an uppercase version of the encoding result is desired, (e.g.- for presentation purposes, or QR code use), then an uppercasing procedure can be performed external to the encoding process.

Decoders MUST NOT accept strings where some characters are uppercase and some are lowercase (such strings are referred to as mixed case strings).

For presentation, lowercase is usually preferable, but inside QR codes uppercase SHOULD be used, as those permit the use of alphanumeric mode, which is 45% more compact than the normal byte mode.

roconnor · 2025-11-24T17:36:47Z

bip-0093.mediawiki


-* Secret share with index <code>S</code>: <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code>
-* Master secret (hex): <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code>
+unchecksummed string (bech32): <code>MS10C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F</code>


I'd be included to remove this uncheckedsummed string. I'm really nervous displaying strings without a checksum anywhere. They are very problematic.

If you insist on going into this much detail in this test vector I'd say use the following bullets

Master seed (hex):

master node xprv

Payload

HRP

Identifier

Checksum

Secret seed

That's the order I'd use, but maybe some other permutations are also good.

How about since the text said:

This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.

human-readable part: MS
k value: 0
identifier: 0C8V
share index: S
payload: M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F

checksum: HPV80UNDVARHRAK

secret seed: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK

Master seed (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9

master node xprv: xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN

No information is displayed we did not already in Vector 1.

That would be better.

Would you rather just change V5 text to match master's vectors?

From:

This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.

To:

This example shows the long codex32 format, when used without splitting the secret into any shares.

I somewhat prefer the current text.

Opps, I accidentally committed a fix that changed the text.

I'll revert it to the original text and change the vector data to my earlier comment in my next commit.

human-readable part: MS
k value: 0
identifier: 0C8V
share index: S

This info should be in sentence format not as a list because it gets double spaced in mediawiki and is hard to read.

Other vectors use data given in sentence format or on the main line and calculated data in bullets.

bip-0093.mediawiki

… case in checksum

bip-0093.mediawiki

BenWestgate · 2025-11-25T22:16:07Z

bip-0093.mediawiki

+* A human-readable part, which is intended to convey the type of data, or anything else that is relevant to the reader. This part MUST contain 1 to 83 US-ASCII characters, with each character having a value in the range [33-126]. HRP validity may be further restricted by specific applications.
+* A separator, which is always "1". In case "1" is allowed inside the human-readable part, the last one in the string is the separator<ref>'''Why include a separator in codex32 strings?''' That way the human-readable


Do we need to be this wordy or should we say:

A human-readable part, as specified in BIP-0173, which is intended to convey the type of data, or anything else that is relevant to the reader.

A separator, as specified in BIP-0173, which is always "1".

BenWestgate · 2025-11-25T22:19:55Z

bip-0093.mediawiki

+* All strings have the same human-readable part, the same threshold value ''k'', the same identifier, and the same length.
+* All of the share index values are distinct.


remove the periods I inadvertently added to these bullets?

BenWestgate · 2025-11-25T22:25:46Z

bip-0093.mediawiki

+# Choose a threshold value ''k'' between 2 and 9, inclusive
 # Choose a 4 bech32 character identifier
-#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
+#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every secret the user may need to disambiguate


Should we say "secret or set of shares" because this is the reshare case you mentioned that SHOULD have a unique identifier?
Here we make it sound like it's OK to reuse an identifier if the secret is the same which is false.

Yes, set of shares.

BenWestgate · 2025-11-25T22:30:23Z

bip-0093.mediawiki

 # Set the share index to <code>s</code>
-# Set the payload to a bech32 encoding of the master seed, padded with arbitrary bits
-# Generating a valid checksum in accordance with the Checksum section
+# Set the payload to a bech32 encoding of the secret, padded with arbitrary bits


Change this to "secret data" or "secret bytes" to specify its encoding bytes or leave it? There may be some confusion between "secret" meaning the string with share index "s" and the decoded payload bytes of that [codex32] secret.

"secret data" sounds good.

BenWestgate added 2 commits November 21, 2025 23:58

Generalize codex32 format for any hrp and fix typos

c6f8bd0

Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.

Revert title for BIP93 document

aedb912

jonatack added Proposed BIP modification Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified labels Nov 22, 2025

roconnor reviewed Nov 24, 2025

View reviewed changes

bip-0093.mediawiki Outdated Show resolved Hide resolved

BenWestgate commented Nov 25, 2025

View reviewed changes

bip-0093.mediawiki Outdated Show resolved Hide resolved

BenWestgate commented Nov 25, 2025

View reviewed changes

bip-0093.mediawiki Outdated Show resolved Hide resolved

BenWestgate mentioned this pull request Nov 25, 2025

doc: Add unshared secret generation to wallets.md BlockstreamResearch/codex32#71

Draft

Rename ms32 functions to codex32, remove recommendations, clarify HRP…

a4f1e91

… case in checksum

BenWestgate commented Nov 25, 2025

View reviewed changes

bip-0093.mediawiki Show resolved Hide resolved

BenWestgate commented Nov 25, 2025

View reviewed changes

		* A human-readable part, which is intended to convey the type of data, or anything else that is relevant to the reader. This part MUST contain 1 to 83 US-ASCII characters, with each character having a value in the range [33-126]. HRP validity may be further restricted by specific applications.
		* A separator, which is always "1". In case "1" is allowed inside the human-readable part, the last one in the string is the separator<ref>'''Why include a separator in codex32 strings?''' That way the human-readable

		* All strings have the same human-readable part, the same threshold value ''k'', the same identifier, and the same length.
		* All of the share index values are distinct.

BIP93: Generalize codex32 format for any hrp and fix typos #2040

Are you sure you want to change the base?

BIP93: Generalize codex32 format for any hrp and fix typos #2040

Conversation

BenWestgate commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenWestgate Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BenWestgate commented Nov 22, 2025 •

edited

Loading

BenWestgate Nov 25, 2025 •

edited

Loading