-
Notifications
You must be signed in to change notification settings - Fork 35.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Ambiguity of Script ASM Hex and Decimal Integer Representations #27795
Comments
Related to #7996. |
I wonder why we even try to detect and print numbers differently (and even more so in a different base!), just print all pushes in hex and there is no ambiguity anymore? |
They are two different numbers. One is Displaying all numbers as hex is an option. One problem is that they are given as little-endian which could be confusing for some wishing to interpret them. Off the top of my memory, the scripting interpreter treats 32-bit numbers arithmetically so I think it makes sense to display those separately. The only difference should be that hex and decimal should be distinguished. |
Sure, i misspoke. I just meant: >>> int.from_bytes(bytes.fromhex("57c74942"), "little")
1112131415 |
It's a little unclear to me how the ASM output is used on the consumer side, and therefore how breaking any changes to it might be. However I still think we have a few options to resolve this:
$ /home/will/src/bitcoin/src/bitcoin-cli -regtest decodescript 0511121314150457c74942
{
"asm": "1112131415 57c74942",
$ /home/will/src/bitcoin/src/bitcoin-cli -regtest decodescript 0511121314150457c74942
{
"asm": "0x1112131415 1112131415",
$ /home/will/src/bitcoin/src/bitcoin-cli -regtest decodescript 0511121314150457c74942
{
"asm": "1112131415 0d1112131415", As I think Antoine inticated, I'd have a preference for "full hex". However I think there may be a small quality of life advantage to prefixing decimal values for persons manually crafting or reading scripts using ASM notation, as e.g. timelock values are most easily thought about in decimal. Edit: Curious if anyone knows the original rationale for displaying small values as decimal, it was carved out by satoshi. |
Closes: bitcoin#27795 Previously ScriptToAsmStr returned hex-encoded integers, except if data length was <= 4 bytes, in which case it displayed using decimal encoding. This was originally added by Satoshi in 4405b78/script.h#L298-L305 Remove the decimal carve-out for small pushes.
It is because any numerical/arithmetic operations only allow 32-bit values (please see the |
Well, they would still be 32 bit values, just hex-encoded? But I take the point. I personally think prefixing hex with Is prefixing decimals with I am still unclear on how much impact this would have downstream, and therefore whether any of these discussed changes would be accepted in any form. It seems like this could break much of the little scripting tooling we have available for bitcoin... |
Is it a problem prefixing public keys with
If these tools read these ambiguous numbers/data from ASM, then they are already broken, so this needs fixing one way or another. |
Maybe ajtowns@36cbf11 ? It adds a "0x" if the hex is ambiguous (ie none of the digits are a-f). Presumably almost all pubkeys will not be ambiguous (1-in-12-trillion chance) so the 0x won't get added there and confuse people.
Another approach could be to add the prefix for hex strings of 5 bytes (10 hex digits) -- anything that would decode to 8 or fewer hex digits would be decoded as decimal anyway, and anything with >11 decimal digits would be more than 4 bytes originally so would not have been decoded as decimal. |
Closes: bitcoin#27795 Closes: bitcoin#7996 Previously ScriptToAsmStr returned hex-encoded integers, except if data length was <= 4 bytes, in which case it displayed using decimal encoding. This was originally added by Satoshi in 4405b78/script.h#L298-L305 Remove the decimal carve-out for small pushes.
Closes: bitcoin#27795 Closes: bitcoin#7996 Previously ScriptToAsmStr returned hex-encoded integers, except if data length was <= 4 bytes, in which case it displayed using decimal encoding. This was originally added by Satoshi in 4405b78/script.h#L298-L305 Remove the decimal carve-out for small pushes.
Even aside from the main issue, it's still ambiguous for the various opcode possibilities... :/ |
Closes: bitcoin#27795 Closes: bitcoin#7996 Previously ScriptToAsmStr returned hex-encoded integers, except if data length was <= 4 bytes, in which case it displayed using decimal encoding. This was originally added by Satoshi in 4405b78/script.h#L298-L305 Remove the decimal carve-out for small pushes. Github-Pull: bitcoin#28824 Rebased-From: d6c0c4b
The discussion here led to #28824 being opened, where more discussion happened, and the idea evolved further. Having re-read things there, I have a concrete proposal that perhaps goes a bit further, so I'm posting it here to keep PR discussion about the implementation. The goal is addressing ambiguity and readability, but not necessarily compatibility (because being compatibility with something broken is silly). Decoding rulesEven if all we expose on the RPC/user side is encoding, I believe it's useful to formally specify the decoding rules, as they help understand how the encoding can be unambiguous. If this is implemented, I think we want the decoder written too, so that e.g. round-trip fuzz tests can be added for the encoding.
Notably, this enables recursive encoding (e.g. a 2-of-3 multisig P2SH scriptSig could become The ambiguity of asm scripts that consists entirely of an even number of decimal characters is resolved in favor of treating them as raw hex rather than as a decimal integer, because:
Compared to the earlier discussion, it uses It drops Encoding rulesThe format allows multiple possible encodings for the same thing (e.g. script byte 00 aka OP_0 can be written as
If everything, or a large fraction of the script, is unparsable just hex encode instead. |
Please describe the feature you'd like to see added.
Distinguish between decimal and hex integers in the script ASM to avoid ambiguity. Hex integers/data can be prefixed with
0x
to avoid ambiguity.Is your feature related to a problem, if so please describe it.
When scripts are decoded into ASM, two different integers can be displayed identically, with one as hex and the other as decimal.
decodescript 0511121314150457c74942
produces ASM of1112131415 1112131415
, despite the integers being different. There is no distinction between hex and decimal.Describe the solution you'd like
Prefixing
0x
makes the most sense to me.Describe any alternatives you've considered
All data could be provided as hex, or "d" could be added to decimals. Removing ambiguity is the primary concern here.
Please leave any additional context
No response
The text was updated successfully, but these errors were encountered: