Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let validateaddress locate error in Bech32 address #16807

Merged
merged 8 commits into from
Nov 22, 2021
Merged

Let validateaddress locate error in Bech32 address #16807

merged 8 commits into from
Nov 22, 2021

Conversation

meshcollider
Copy link
Contributor

@meshcollider meshcollider commented Sep 5, 2019

Addresses (partially) #16779 - no GUI change in this PR

Adds a LocateError function the bech32 library, which is then called by validateaddress RPC, (and then eventually from a GUI tool too, future work). I think modifying validateaddress is nicer than adding a separate RPC for this.
Includes tests.

Based on https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js
Credit to sipa for that code

@cvengler
Copy link
Contributor

cvengler commented Sep 5, 2019

See #16779
Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 5, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #23545 (scripted-diff: Use clang-tidy syntax for C++ named arguments [WIP; TOO MANY CONFLICTS] by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, light review, haven't compared with reference code.

src/bech32.cpp Show resolved Hide resolved
src/bech32.cpp Outdated Show resolved Hide resolved
src/bech32.h Outdated Show resolved Hide resolved
src/bech32.cpp Outdated Show resolved Hide resolved
src/rpc/misc.cpp Outdated Show resolved Hide resolved
@sipa
Copy link
Member

sipa commented Sep 5, 2019

Would it be possible to add functions to key_io that provide this functionality for all addresses (with just basic error messages for anything but bech32)? That way the RPC code doesn't need to directly invoke the bech32 module.

@promag
Copy link
Member

promag commented Sep 5, 2019

+1 @sipa suggestion, something like

CTxDestination DecodeDestination(const std::string& str, std::string& error);
CTxDestination DecodeDestination(const std::string& str)
{
    const std::string& error;
    return DecodeDestination(str, error);
}

And maybe also embed the error index in the message for now.

@meshcollider
Copy link
Contributor Author

I've added boost tests, fixed the above nits, and added a final commit with one implementation of the above idea. I can drop the final commit if it's not a good approach, but this is the cleanest way I could come up with

@meshcollider
Copy link
Contributor Author

Rebased

@Sjors
Copy link
Member

Sjors commented Oct 29, 2019

Concept ACK. I would prefer to make address_type an optional hint, and instead guess the address type.

Distinguishing between base58 and bech32 could be done by counting how many characters of the address are part of the base58/bech32 character set. You can then use error_index to indicate the position of a character that doesn't belong in the set.

src/rpc/misc.cpp Outdated Show resolved Hide resolved
@luke-jr
Copy link
Member

luke-jr commented Feb 13, 2020

Once this is merged, I have the GUI changes ready to go...

luke-jr added a commit to bitcoinknots/bitcoin that referenced this pull request Mar 5, 2020
luke-jr added a commit to bitcoinknots/bitcoin that referenced this pull request Mar 6, 2020
@gmaxwell
Copy link
Contributor

gmaxwell commented Mar 8, 2020

Why is this limited to only returning a single error position? Fairly simple code can give up-to two, and it's not hard to imagine fancier code that can sometimes return more than that.

@gmaxwell
Copy link
Contributor

@meshcollider ?

@sipa
Copy link
Member

sipa commented Mar 16, 2020

Concept ACK to adding Bech32 error location to validateaddress, but echoing earlier comments:

  • Why only one error? The reference code in https://github.com/sipa/bech32 finds two, and I don't think just returning one makes things meaningfully simpler. Even if not implemented, the interface should probably return an array of error positions.
  • It feels unnecessary to pass an address type. Just only run the bech32 error location code when the HRP matches (and treat as base58 otherwise).

@meshcollider
Copy link
Contributor Author

meshcollider commented Apr 7, 2020

Fair points, I'll modify to return more than one error. Re. address type, we discussed it a while back in IRC, I can't remember if there was a reason we didn't base it solely on HRP or not, but I am happy to do it that way if you want because I can't see a real downside now. I still have to fix the current error and rebase too.

luke-jr added a commit to bitcoinknots/bitcoin that referenced this pull request Jun 9, 2020
Copy link
Member

@adamjonas adamjonas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of touch-ups to the tests needed when you rebase.

test/functional/rpc_validateaddress.py Outdated Show resolved Hide resolved
test/functional/rpc_validateaddress.py Outdated Show resolved Hide resolved
@meshcollider
Copy link
Contributor Author

@gmaxwell you may also like to review 🙂

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 88cc481 with caveat that I only checked the new LocateErrors code to try to verify it didn't have unsafe or unexpected operations or loop forever or crash. Did not try to verify behavior corresponds to the spec. In the worst case bugs here should just affect error messages not actual decoding of addresses so this seemed ok.

I left various review suggestions, but feel to ignore them or just use ones that seem useful.

I do think it should be possible to add more test coverage. I think it would be nice to have a fuzz test that would encode a random address, decode it and verify the address was the same, introduce random errors, and verify the errors were detected at the right locations. Also maybe there could be unit tests or constexpr checks that verify the contents of the new GF1024_EXP / GF1024_LOG arrays.

}
}

/** Return index of first invalid character in a Bech32 string. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add Bech32 error location function" (b62b67e)

Comment is out of date

Comment on lines +456 to +466
size_t length = str.size() - 1 - pos; // length of data part
data values(length);
for (size_t i = pos + 1; i < str.size(); ++i) {
unsigned char c = str[i];
int8_t rev = CHARSET_REV[c];
if (rev == -1) {
error_locations.push_back(i);
return "Invalid Base 32 character";
}
values[i - pos - 1] = rev;
}
Copy link
Contributor

@ryanofsky ryanofsky Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add Bech32 error location function" (b62b67e)

These lines could be moved into a function and deduplicated with lines 415-424 above.

Maybe not worth it because a PR that affects address decoding is more dangerous than a PR that just changes error detection.

// Every non-zero element of the field can then be represented as (e)^k for some power k.
// The array GF1024_EXP contains all these powers of (e) - GF1024_EXP[k] = (e)^k in GF(1024).
// Conversely, GF1024_LOG contains the discrete logarithms of these powers, so
// GF1024_LOG[GF1024_EXP[k]] == k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add lots of comments to Bech32" (5599813)

Could write a unit test or constexpr compile time test to check this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be simple to add yep 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even nicer would be to have the tables be generated at compile time using a constexpr algorithm...

// Set potential error message.
// This message may be changed if the address can also be interpreted as a Bech32 address.
error_str = "Invalid prefix for Base58-encoded address";
if (!std::equal(script_prefix.begin(), script_prefix.end(), data.begin()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "More detailed error checking for base58 addresses" (0b06e72)

Is there something which guarantees data size is greater or equal to script_prefix and pubkey_prefix sizes? If not then, it seems like this can read uninitialized memory.

Would suggest either adding assert's to make it clear these cases won't happen, or else adding size checks to the if statement to handle these cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanofsky Agree, I think this needs a length check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error: attempt to subscript a past-the-end iterator 1 step from its current
position, which falls outside its dereferenceable range.

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=41364

} else if (!is_bech32) {
// Try Base58 decoding without the checksum, using a much larger max length
if (!DecodeBase58(str, data, 100)) {
error_str = "Invalid HRP or Base58 character in address";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "More detailed error checking for base58 addresses" (0b06e72)

IMO previous error message "Invalid address format" seems more understandable if you aren't an address format guru than "Invalid HRP or Base58 character in address". Maybe consider keeping the existing message or switching to something in between like "Not a valid BECH32 or Base58 encoding"

};
static const std::pair<std::string, int> ERRORS[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add boost tests for bech32 error detection" (c4979f7)

Could be good to static_assert that ERRORS and CASES arrays are the same size.

std::string error = bech32::LocateErrors(str, error_locations);
BOOST_CHECK_EQUAL(err.first, error);
if (err.second == -1) BOOST_CHECK(error_locations.empty());
else BOOST_CHECK_EQUAL(err.second, error_locations[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add boost tests for bech32 error detection" (c4979f7)

Could be good to BOOST_CHECK_EQUAL(error_locations.size(), 1)).

Could also be good to add cases with size > 1 here, but less important since python tests cover some of these.

Same suggestions could apply to bech32m test below

int l_e1 = l_s0 + (1023 - 997) * p1;
// Finally, some sanity checks on the result:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add lots of comments to Bech32" (5599813)

If check below is supposed to be a sanity check, shouldn't there be an assert, or a log, or a push_back(-1), or some other indication if it fails?

Same comment applies to other sanity checks and continue statements below. The descriptions here seem like they could be clearer about which of these conditions are possible to hit and which should be impossible.

error_locations = std::move(possible_errors);
}
}
return "Invalid checksum";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add Bech32 error location function" (b62b67e)

Might be nice if this mentioned the encoding that was detected. I.e. add an optional<Encoding> error_encoding variable, and set:

error_locations = std::move(possible_errors);
error_encoding = encoding;

and return:

return error_encoding == BECH32M ? "Invalid bech32m checksum" : error_encoding == BECH32 ? "Invalid bech32 checksum" : "Invalid checksum"

Updated RPCs
------------

- The `validateaddress` RPC now optionally returns an `error_locations` array, with the indices of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add release notes for validateaddress Bech32 error detection" (2eb5792)

Maybe mention that validateaddress returns new more specific error strings if an address can't be decoded.

Also "optionally returns" here seems a little misleading since this isn't an option the caller can control. Would maybe just say it returns error locations whenever it's possible to determine them.

@meshcollider
Copy link
Contributor Author

Thanks for the detailed review and suggestions @ryanofsky. I'll definitely either incorporate them here or in a follow-up :)

@@ -23,7 +23,7 @@ std::string EncodeExtPubKey(const CExtPubKey& extpubkey);

std::string EncodeDestination(const CTxDestination& dest);
CTxDestination DecodeDestination(const std::string& str);
CTxDestination DecodeDestination(const std::string& str, std::string& error_msg);
CTxDestination DecodeDestination(const std::string& str, std::string& error_msg, std::vector<int>* error_locations = nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider another DecodeDestination overload to avoid pointer to vector: promag@94ca6b4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion seems fine, but no reasoning is given, and it's not clearly an improvement. Google's "Use non-const pointers to represent optional outputs" which is what current code is doing does make it easier to distinguish output arguments from input arguments at call sites.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ryanofsky here. References are hard to read as output arguments.
If you're going to make a separate function declaration, I'd prefer a function that returns it as part of the return value. E.g. as a pair or tuple.
But this is fine as-is.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 88cc481

The command below returns the "error_locations" field on the PR branch.

./src/bitcoin-cli validateaddress bc1qctxsy54qjhk0dl5ea0pd2xtth2qq9n78wlk39m

Master branch:

{
  "isvalid": false,
  "error": "Invalid address format"
}

PR Branch:

{
  "isvalid": false,
  "error_locations": [
    29
  ],
  "error": "Invalid checksum"
}

@laanwj
Copy link
Member

laanwj commented Nov 22, 2021

Code review and manually tested ACK 88cc481
Thanks for adding all the bech32 comments.

@laanwj laanwj merged commit 95d19f8 into bitcoin:master Nov 22, 2021
Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some review comments I had queued up. I'll go through the rest still, and perhaps open a PR for any serious ones I still find.

// Set potential error message.
// This message may be changed if the address can also be interpreted as a Bech32 address.
error_str = "Invalid prefix for Base58-encoded address";
if (!std::equal(script_prefix.begin(), script_prefix.end(), data.begin()) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanofsky Agree, I think this needs a length check.

/** Convert to lower case. */
inline unsigned char LowerCase(unsigned char c)
{
return (c >= 'A' && c <= 'Z') ? (c - 'A') + 'a' : c;
}

void push_range(int from, int to, std::vector<int>& vec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this function, it may be worth just using std::iota.

The function is equivalent to:

void push_range(int from, int to, std::vector<int>& vec)
{
    vec.resize(to - from);
    std::iota(vec.begin(), vec.end(), from);
}

But I suggest just doing that directly in the call sites.

}

// We attempt error detection with both bech32 and bech32m, and choose the one with the fewest errors
// We can't simply use the segwit version, because that may be one of the errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can reject (and not return) candidates for which the checksum type mismatches the witness version (bech32 for 0, bech32m for 1+) AND the character that identifies the witness version is not one of the error locations.

int s0 = syn & 0x3FF;
int s1 = (syn >> 10) & 0x3FF;
int s2 = syn >> 20;
int l_s0 = GF1024_LOG[s0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanofsky Alternatively, would it help if there was an (additional) & 0x3FF inside the []?

// Every non-zero element of the field can then be represented as (e)^k for some power k.
// The array GF1024_EXP contains all these powers of (e) - GF1024_EXP[k] = (e)^k in GF(1024).
// Conversely, GF1024_LOG contains the discrete logarithms of these powers, so
// GF1024_LOG[GF1024_EXP[k]] == k
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even nicer would be to have the tables be generated at compile time using a constexpr algorithm...

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 22, 2021
@meshcollider
Copy link
Contributor Author

meshcollider commented Nov 23, 2021

Thanks for the review comments (and the merge)! I will open a follow-up PR today.

@meshcollider meshcollider deleted the 201909_bech32_error_detection branch November 23, 2021 06:28
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 23, 2021
laanwj added a commit that referenced this pull request Dec 6, 2021
a4fe701 Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson)
2fa4fd1 Use std::iota instead of manually pushing range (Samuel Dobson)
405c96f Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson)
28d9c28 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson)
14358a0 Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson)
63f7b69 Update release note for bech32 error detection (Samuel Dobson)
c8b9a22 Report encoding type in bech32 error message (Samuel Dobson)
92f0caf Improve Bech32 boost tests (Samuel Dobson)
bb4d3e9 Address review comments for Bech32 error validation (Samuel Dobson)

Pull request description:

  A number of follow-ups and improvements to the bech32 error location code, introduced in #16807.

  Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

ACKs for top commit:
  laanwj:
    Re-ACK a4fe701

Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 7, 2021
a4fe701 Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson)
2fa4fd1 Use std::iota instead of manually pushing range (Samuel Dobson)
405c96f Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson)
28d9c28 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson)
14358a0 Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson)
63f7b69 Update release note for bech32 error detection (Samuel Dobson)
c8b9a22 Report encoding type in bech32 error message (Samuel Dobson)
92f0caf Improve Bech32 boost tests (Samuel Dobson)
bb4d3e9 Address review comments for Bech32 error validation (Samuel Dobson)

Pull request description:

  A number of follow-ups and improvements to the bech32 error location code, introduced in bitcoin#16807.

  Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

ACKs for top commit:
  laanwj:
    Re-ACK a4fe701

Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276
@bitcoin bitcoin locked and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.