-
Notifications
You must be signed in to change notification settings - Fork 36.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let validateaddress locate error in Bech32 address #16807
Let validateaddress locate error in Bech32 address #16807
Conversation
See #16779 |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, light review, haven't compared with reference code.
Would it be possible to add functions to key_io that provide this functionality for all addresses (with just basic error messages for anything but bech32)? That way the RPC code doesn't need to directly invoke the bech32 module. |
+1 @sipa suggestion, something like CTxDestination DecodeDestination(const std::string& str, std::string& error);
CTxDestination DecodeDestination(const std::string& str)
{
const std::string& error;
return DecodeDestination(str, error);
} And maybe also embed the error index in the message for now. |
I've added boost tests, fixed the above nits, and added a final commit with one implementation of the above idea. I can drop the final commit if it's not a good approach, but this is the cleanest way I could come up with |
Rebased |
Concept ACK. I would prefer to make Distinguishing between base58 and bech32 could be done by counting how many characters of the address are part of the base58/bech32 character set. You can then use |
Once this is merged, I have the GUI changes ready to go... |
Github-Pull: bitcoin#16807 Rebased-From: 54e107a
Github-Pull: bitcoin#16807 Rebased-From: 54e107a
Why is this limited to only returning a single error position? Fairly simple code can give up-to two, and it's not hard to imagine fancier code that can sometimes return more than that. |
Concept ACK to adding Bech32 error location to
|
Fair points, I'll modify to return more than one error. Re. address type, we discussed it a while back in IRC, I can't remember if there was a reason we didn't base it solely on HRP or not, but I am happy to do it that way if you want because I can't see a real downside now. I still have to fix the current error and rebase too. |
Github-Pull: bitcoin#16807 Rebased-From: 54e107a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of touch-ups to the tests needed when you rebase.
@gmaxwell you may also like to review 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 88cc481 with caveat that I only checked the new LocateErrors
code to try to verify it didn't have unsafe or unexpected operations or loop forever or crash. Did not try to verify behavior corresponds to the spec. In the worst case bugs here should just affect error messages not actual decoding of addresses so this seemed ok.
I left various review suggestions, but feel to ignore them or just use ones that seem useful.
I do think it should be possible to add more test coverage. I think it would be nice to have a fuzz test that would encode a random address, decode it and verify the address was the same, introduce random errors, and verify the errors were detected at the right locations. Also maybe there could be unit tests or constexpr checks that verify the contents of the new GF1024_EXP / GF1024_LOG arrays.
} | ||
} | ||
|
||
/** Return index of first invalid character in a Bech32 string. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add Bech32 error location function" (b62b67e)
Comment is out of date
size_t length = str.size() - 1 - pos; // length of data part | ||
data values(length); | ||
for (size_t i = pos + 1; i < str.size(); ++i) { | ||
unsigned char c = str[i]; | ||
int8_t rev = CHARSET_REV[c]; | ||
if (rev == -1) { | ||
error_locations.push_back(i); | ||
return "Invalid Base 32 character"; | ||
} | ||
values[i - pos - 1] = rev; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add Bech32 error location function" (b62b67e)
These lines could be moved into a function and deduplicated with lines 415-424 above.
Maybe not worth it because a PR that affects address decoding is more dangerous than a PR that just changes error detection.
// Every non-zero element of the field can then be represented as (e)^k for some power k. | ||
// The array GF1024_EXP contains all these powers of (e) - GF1024_EXP[k] = (e)^k in GF(1024). | ||
// Conversely, GF1024_LOG contains the discrete logarithms of these powers, so | ||
// GF1024_LOG[GF1024_EXP[k]] == k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add lots of comments to Bech32" (5599813)
Could write a unit test or constexpr compile time test to check this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be simple to add yep 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even nicer would be to have the tables be generated at compile time using a constexpr
algorithm...
// Set potential error message. | ||
// This message may be changed if the address can also be interpreted as a Bech32 address. | ||
error_str = "Invalid prefix for Base58-encoded address"; | ||
if (!std::equal(script_prefix.begin(), script_prefix.end(), data.begin()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "More detailed error checking for base58 addresses" (0b06e72)
Is there something which guarantees data
size is greater or equal to script_prefix
and pubkey_prefix
sizes? If not then, it seems like this can read uninitialized memory.
Would suggest either adding assert
's to make it clear these cases won't happen, or else adding size checks to the if
statement to handle these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanofsky Agree, I think this needs a length check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error: attempt to subscript a past-the-end iterator 1 step from its current
position, which falls outside its dereferenceable range.
} else if (!is_bech32) { | ||
// Try Base58 decoding without the checksum, using a much larger max length | ||
if (!DecodeBase58(str, data, 100)) { | ||
error_str = "Invalid HRP or Base58 character in address"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "More detailed error checking for base58 addresses" (0b06e72)
IMO previous error message "Invalid address format" seems more understandable if you aren't an address format guru than "Invalid HRP or Base58 character in address". Maybe consider keeping the existing message or switching to something in between like "Not a valid BECH32 or Base58 encoding"
}; | ||
static const std::pair<std::string, int> ERRORS[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add boost tests for bech32 error detection" (c4979f7)
Could be good to static_assert
that ERRORS and CASES arrays are the same size.
std::string error = bech32::LocateErrors(str, error_locations); | ||
BOOST_CHECK_EQUAL(err.first, error); | ||
if (err.second == -1) BOOST_CHECK(error_locations.empty()); | ||
else BOOST_CHECK_EQUAL(err.second, error_locations[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add boost tests for bech32 error detection" (c4979f7)
Could be good to BOOST_CHECK_EQUAL(error_locations.size(), 1))
.
Could also be good to add cases with size > 1 here, but less important since python tests cover some of these.
Same suggestions could apply to bech32m test below
int l_e1 = l_s0 + (1023 - 997) * p1; | ||
// Finally, some sanity checks on the result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add lots of comments to Bech32" (5599813)
If check below is supposed to be a sanity check, shouldn't there be an assert, or a log, or a push_back(-1)
, or some other indication if it fails?
Same comment applies to other sanity checks and continue statements below. The descriptions here seem like they could be clearer about which of these conditions are possible to hit and which should be impossible.
error_locations = std::move(possible_errors); | ||
} | ||
} | ||
return "Invalid checksum"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add Bech32 error location function" (b62b67e)
Might be nice if this mentioned the encoding that was detected. I.e. add an optional<Encoding> error_encoding
variable, and set:
error_locations = std::move(possible_errors);
error_encoding = encoding;
and return:
return error_encoding == BECH32M ? "Invalid bech32m checksum" : error_encoding == BECH32 ? "Invalid bech32 checksum" : "Invalid checksum"
Updated RPCs | ||
------------ | ||
|
||
- The `validateaddress` RPC now optionally returns an `error_locations` array, with the indices of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add release notes for validateaddress Bech32 error detection" (2eb5792)
Maybe mention that validateaddress returns new more specific error strings if an address can't be decoded.
Also "optionally returns" here seems a little misleading since this isn't an option the caller can control. Would maybe just say it returns error locations whenever it's possible to determine them.
Thanks for the detailed review and suggestions @ryanofsky. I'll definitely either incorporate them here or in a follow-up :) |
@@ -23,7 +23,7 @@ std::string EncodeExtPubKey(const CExtPubKey& extpubkey); | |||
|
|||
std::string EncodeDestination(const CTxDestination& dest); | |||
CTxDestination DecodeDestination(const std::string& str); | |||
CTxDestination DecodeDestination(const std::string& str, std::string& error_msg); | |||
CTxDestination DecodeDestination(const std::string& str, std::string& error_msg, std::vector<int>* error_locations = nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider another DecodeDestination
overload to avoid pointer to vector: promag@94ca6b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion seems fine, but no reasoning is given, and it's not clearly an improvement. Google's "Use non-const pointers to represent optional outputs" which is what current code is doing does make it easier to distinguish output arguments from input arguments at call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @ryanofsky here. References are hard to read as output arguments.
If you're going to make a separate function declaration, I'd prefer a function that returns it as part of the return value. E.g. as a pair or tuple.
But this is fine as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 88cc481
The command below returns the "error_locations" field on the PR branch.
./src/bitcoin-cli validateaddress bc1qctxsy54qjhk0dl5ea0pd2xtth2qq9n78wlk39m
Master branch:
{
"isvalid": false,
"error": "Invalid address format"
}
PR Branch:
{
"isvalid": false,
"error_locations": [
29
],
"error": "Invalid checksum"
}
Code review and manually tested ACK 88cc481 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some review comments I had queued up. I'll go through the rest still, and perhaps open a PR for any serious ones I still find.
// Set potential error message. | ||
// This message may be changed if the address can also be interpreted as a Bech32 address. | ||
error_str = "Invalid prefix for Base58-encoded address"; | ||
if (!std::equal(script_prefix.begin(), script_prefix.end(), data.begin()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanofsky Agree, I think this needs a length check.
/** Convert to lower case. */ | ||
inline unsigned char LowerCase(unsigned char c) | ||
{ | ||
return (c >= 'A' && c <= 'Z') ? (c - 'A') + 'a' : c; | ||
} | ||
|
||
void push_range(int from, int to, std::vector<int>& vec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this function, it may be worth just using std::iota
.
The function is equivalent to:
void push_range(int from, int to, std::vector<int>& vec)
{
vec.resize(to - from);
std::iota(vec.begin(), vec.end(), from);
}
But I suggest just doing that directly in the call sites.
} | ||
|
||
// We attempt error detection with both bech32 and bech32m, and choose the one with the fewest errors | ||
// We can't simply use the segwit version, because that may be one of the errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you can reject (and not return) candidates for which the checksum type mismatches the witness version (bech32 for 0, bech32m for 1+) AND the character that identifies the witness version is not one of the error locations.
int s0 = syn & 0x3FF; | ||
int s1 = (syn >> 10) & 0x3FF; | ||
int s2 = syn >> 20; | ||
int l_s0 = GF1024_LOG[s0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanofsky Alternatively, would it help if there was an (additional) & 0x3FF
inside the []
?
// Every non-zero element of the field can then be represented as (e)^k for some power k. | ||
// The array GF1024_EXP contains all these powers of (e) - GF1024_EXP[k] = (e)^k in GF(1024). | ||
// Conversely, GF1024_LOG contains the discrete logarithms of these powers, so | ||
// GF1024_LOG[GF1024_EXP[k]] == k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even nicer would be to have the tables be generated at compile time using a constexpr
algorithm...
Thanks for the review comments (and the merge)! I will open a follow-up PR today. |
a4fe701 Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson) 2fa4fd1 Use std::iota instead of manually pushing range (Samuel Dobson) 405c96f Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson) 28d9c28 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson) 14358a0 Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson) 63f7b69 Update release note for bech32 error detection (Samuel Dobson) c8b9a22 Report encoding type in bech32 error message (Samuel Dobson) 92f0caf Improve Bech32 boost tests (Samuel Dobson) bb4d3e9 Address review comments for Bech32 error validation (Samuel Dobson) Pull request description: A number of follow-ups and improvements to the bech32 error location code, introduced in #16807. Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation. ACKs for top commit: laanwj: Re-ACK a4fe701 Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276
a4fe701 Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson) 2fa4fd1 Use std::iota instead of manually pushing range (Samuel Dobson) 405c96f Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson) 28d9c28 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson) 14358a0 Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson) 63f7b69 Update release note for bech32 error detection (Samuel Dobson) c8b9a22 Report encoding type in bech32 error message (Samuel Dobson) 92f0caf Improve Bech32 boost tests (Samuel Dobson) bb4d3e9 Address review comments for Bech32 error validation (Samuel Dobson) Pull request description: A number of follow-ups and improvements to the bech32 error location code, introduced in bitcoin#16807. Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation. ACKs for top commit: laanwj: Re-ACK a4fe701 Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276
Addresses (partially) #16779 - no GUI change in this PR
Adds a LocateError function the bech32 library, which is then called by
validateaddress
RPC, (and then eventually from a GUI tool too, future work). I think modifying validateaddress is nicer than adding a separate RPC for this.Includes tests.
Based on https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js
Credit to sipa for that code