-
Notifications
You must be signed in to change notification settings - Fork 5.8k
BIP93: Generalize codex32 format for any hrp and fix typos #2040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.
| errors. The human-readable part is processed by first | ||
| feeding the higher bits of each character's US-ASCII value into the | ||
| checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?''' | ||
| This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lengths limitations of the codex32 strings are working under the assumption that the HRP is not subject to error correction. We more or less cannot do that anyways as all sorts of various bech32 formats have appeared all with different checksums and characteristics. In order to run the checksum algorithm you have to know the prefix first in order to know which checksum algorithm to try.
This isn't really a problem in practice since there are only a small finite number of prefixes, and from context only a few are going to be applicable anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copied over from BIP-00173. Delete it?
Bech32 attempts to decode two checksums, a universal bech32 decoder could try decoding the string with the bech32, bech32m and codex32 checksums to discover the format.
Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.
If HRP is swapped between formats the chances of false verification is:
-
1 in 2^65 for a "codex32 checksum" validating when the encoding was Bech32/Bech32m
-
~1 in 2^30 for "Bech32 checksum" validating when the encoding was Codex32.
| *** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed. | ||
| *** Translate those bits to characters using the bech32 character table from BIP-0173. | ||
| When padding bits are needed they should be generated using CRC polynomial <code>(1 << pad_len) | 3</code> with an initial value of <code>0</code> and appended to the master seed bits. Note that unlike the codex32 checksums, we do NOT include the header data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want this CRC stuff in the standard. At most it is a recommendation that folks MAY use to select padding bits. but I doubt it will be useful in practice since you cannot know if any given codex32 master seed was generated with this CRC or with random padding as the codex32 book does. If one's seed is so corrupted that the codex32 error correction wasn't able to fix it, I'm skeptical a few more bits will help.
If it is include than padding MAY be random should also be stated. Perhaps it is better to move this to a separate PR if we want to further discuss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll replace it with one sentence for now:
Encoders MAY select padding using a CRC-
wwhere:w = pad_bits_needed,poly = 1 << w | 3,init = 1,const = 1 << w - 1,refIn = false, andrefOut = false.
CRC-4 helps 32-byte seeds detect 94% of 1 character errors/deletions and narrows 1 erasure down to 2 candidates. The most common substitutions in CHARSET are 1 bit apart so actual performance will exceed that.
since you cannot know if any given codex32 master seed was generated with this CRC or with random padding
Finding the book is reasonable suspicion for random padding IF all electronically encoded seeds use CRC. The person who made the backup certainly knows if they encoded with the book or not.
as the codex32 book does
A book insert could compute CRCs by hand, but I have not divided a large enough number to benchmark time for 128-bits, Andrew and I determined the space requirement is two pieces of 11 x 8.5 graph paper.
If it is include than padding MAY be random should also be stated.
Decoders, MUST accept random padding, although they may someday warn on it.
For electronic encoders, trusting RNGs introduces risk: it can leak up to 32 bits to an attacker with 8 shares and breaks decode->encode round-trip. Using zero padding round-trips but leaks the final payload character to an attacker with (5-pad_len) / pad_len interpolated shares.
CRC pad minimizes RNG trust by using entropy already present in the payload bytes.
Perhaps it is better to move this to a separate PR if we want to further discuss it.
Lets discuss on my deterministic codex32 BIP85 PR where it's required since reviewers asked me to directly encode bytes not u5 ints even for share payloads.
| def bech32_hrp_expand(s): | ||
| return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s] | ||
|
|
||
| def ms32_verify_checksum(hrp, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want an hrp parameter, you have to rename this function to something like codex32_verify_checksum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
| As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase. | ||
| For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings. | ||
| If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly. | ||
| The lowercase form is used when determining a character's value for checksum purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense. The lowercase form and uppercase form of Bech32 characters have the same value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for HRP which needs to be lower cased during decoding or bech32_hrp_expand(hrp) would return a different result.
This line is repeated from the test vectors, why explain the rules about case in the vectors instead of up here?
|
|
||
| * Secret share with index <code>S</code>: <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code> | ||
| * Master secret (hex): <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code> | ||
| unchecksummed string (bech32): <code>MS10C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be included to remove this uncheckedsummed string. I'm really nervous displaying strings without a checksum anywhere. They are very problematic.
If you insist on going into this much detail in this test vector I'd say use the following bullets
- Master seed (hex):
- master node xprv
- Payload
- HRP
- Identifier
- Checksum
- Secret seed
That's the order I'd use, but maybe some other permutations are also good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about since the text said:
This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.
human-readable part: MS
k value: 0
identifier: 0C8V
share index: S
payload: M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F
- checksum:
HPV80UNDVARHRAK - secret seed:
MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK - Master seed (hex):
dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9 - master node xprv:
xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN
No information is displayed we did not already in Vector 1.
| * The data-part values: | ||
| ** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0". | ||
| ** An identifier consisting of 4 bech32 characters. | ||
| *** We recommend the first 4 characters of the bech32-encoded BIP-0032 key fingerprint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When some shares of a master seed are compromised, a user may wish to simply dispose of remaining shares and rederive a new set of secret shares without the cost of sweeping their wallet. In such a case a user very much should use a fresh identifier so that they do not get mix up their obsolete share data with their fresh shares.
At best a hardware wallet may suggest such an identifier, but only when the hardware wallet is generating a fresh master seed and thus knows that there are no other secret shares for the same secret floating around.
Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good case for BIP85 derive codex32 application to avoid trusting or generating randomness for this.
If we want a default identifier for "reshares" too:
identifier = fingerprint(master_seed)[:2] + fingerprint(false_seed)[2:4]Where false_seed is recovered from fresh initial shares (reducing k if needed).
During the first generation with k fresh shares; the two slices together produce the full fingerprint. If the identifier is unspecified, recommend this default for master seeds. Or if that collides, the next higher.
Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.
Already done here. Agreed wrt details and vectors.
I can move this recommendation (and the padding rec) into a separate BIP93 PR pending the final BIP85 codex32 design (needs finishing touches feedback). I’ll remove it from this PR so it remains focused on typos and general HRP support.
Summary of Changes:
Describe codex32 format for arbitrary human-readable parts not just "ms", specify master seed encoding standard, add new test vectors and enhance readability. This makes the document more like BIP-0173: proposing an encoding "codex32", then defining a standard for something using it.
See discussion on #2023 (comment).
Spec:
hrpTest Vectors:
ms32_verify_checksumfunction