-
Notifications
You must be signed in to change notification settings - Fork 35.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor optimizations to bech32::Decode(); add tests. #12881
Conversation
src/bech32.cpp
Outdated
if (c >= 'a' && c <= 'z') lower = true; | ||
if (c >= 'A' && c <= 'Z') upper = true; | ||
// Mixing upper and lowercase is not OK. | ||
if (!lower && c >= 'a' && c <= 'z') { if(upper) return {}; lower = true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if (upper)
instead of if(upper)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not even a nit. Should be a compiler error IMO. I must be tired. :-)
size_t pos = str.rfind('1'); | ||
if (str.size() > 90 || pos == str.npos || pos == 0 || pos + 7 > str.size()) { | ||
return {}; | ||
} | ||
data values(str.size() - 1 - pos); | ||
for (size_t i = 0; i < str.size() - 1 - pos; ++i) { | ||
unsigned char c = str[i + pos + 1]; | ||
int8_t rev = (c < 33 || c > 126) ? -1 : CHARSET_REV[c]; | ||
int8_t rev = CHARSET_REV[c]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assert(c >= 33 && c <= 126);
on the line before int8_t rev = CHARSET_REV[c];
to make assumption explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@practicalswift If I were writing Decode() from scratch, I don't think it would occur to me to add an assert() there. Not convinced it adds any value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that we have asserts enabled on release notes, better not add them in inner loops especially not if the goal is to 'tighten up' anything.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a feeling there was a reason for LowerCase().
src/bech32.cpp
Outdated
if (c >= 'a' && c <= 'z') lower = true; | ||
if (c >= 'A' && c <= 'Z') upper = true; | ||
// Mixing upper and lowercase is not OK. | ||
if (!lower && c >= 'a' && c <= 'z') { if (upper) return {}; lower = true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the coding style: any if
that has anything but just a single-statement then body must use braces and indentation.
@murrayn After fixing any nits, please also squash your commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide benchmark results. I suspect this makes the most common case worst. Maybe the suggestion below would improve performance for the most common case. An edge case doesn't have to perform better if the implementation makes the remaining cases worst.
src/bech32.cpp
Outdated
@@ -161,18 +161,25 @@ std::pair<std::string, data> Decode(const std::string& str) { | |||
for (size_t i = 0; i < str.size(); ++i) { | |||
unsigned char c = str[i]; | |||
if (c < 33 || c > 126) return {}; | |||
if (c >= 'a' && c <= 'z') lower = true; | |||
if (c >= 'A' && c <= 'Z') upper = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
here would improve a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better:
if (c >= 'a') { if (c <= 'z') lower = true; }
else if (c >= 'A') { if (c <= 'Z') upper = true; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@promag Do you have an example in mind of a most common case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The success case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in that case (let's assume a string consisting only of lowercase or uppercase letters) the existing code will do three comparisons and one assignment, per character. The proposed code will do three comparisons per character, with the added benefit of returning earlier in the case of a malformed string. Not sure why you would "suspect this makes the most common case worst".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further to this, if we're going to assume the most common case is the success case (which is reasonable), it would probably be good to move the initial (c < 33 || c > 126) check to an ultimate "else if" check to the original code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@murrayn probably, but can you post benchmark results?
Thanks for catching this. We should be extremely careful to not introduce locale dependencies in the low-level string parsing functions. We've had serious problems with those in the past. This can result in country-specific bugs... |
Please don't overthink this. Decoding addresses is hardly relevant (I don't think anyone would notice if they were 100x slower). My goal when writing this was more clarity and simplicity than speed, though I'm obviously not opposed to performance improvements if they don't conflict with those goals. I am interested in whether the additional branches don't make performance worse though. My gut feeling is that they impact performance more than comparisons. |
@sipa Thanks for the feedback. I've reworked the code again to reflect your input. |
To get rid of the merge commit, please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits |
Not sure if this PR is stalled due to the earlier comment about benchmarks...I didn't think benchmarks would be as interesting after my most recent commit, in which the code was more straightforward. Just in case, I have benchmarked:
versus
and my results show the former is significantly faster; however, with -O2 compiler optimization enabled they benchmark identically, which isn't surprising. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 60f61f9.
Change is slightly better because success checks come first, which is probably the most common case.
Please also adjust the OP |
utACK 60f61f9 |
60f61f9 Tighten up bech32::Decode(); add tests. (murrayn) Pull request description: Just a few minor optimizations to bech32::Decode(): 1) optimize the order and logic of the conditionals 2) get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above) 3) add a couple more bech32 tests (mixed-case) Tree-SHA512: e41af834c8f6b7d34c22c28b724df42c60f72e00df616e70a12efbc4271d15d80627fe1bc36845caf29f615c238499a566298a863cbe119fef457287231053c8
…of locale dependence 698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift) 0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift) Pull request description: This linter will check for code accidentally introducing locale dependencies. Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible. Context: #12881 (comment) Example output: ``` $ contrib/devtools/lint-locale-dependence.sh The locale dependent function tolower(...) appears to be used: src/init.cpp: if (s[0] == '0' && std::tolower(s[1]) == 'x') { Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. Please avoid using locale dependent functions if possible. Advice not applicable in this specific case? Add an exception by updating the ignore list in contrib/devtools/lint-locale-dependence.sh ``` **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed? Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
…uction of locale dependence 698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift) 0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift) Pull request description: This linter will check for code accidentally introducing locale dependencies. Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible. Context: bitcoin#12881 (comment) Example output: ``` $ contrib/devtools/lint-locale-dependence.sh The locale dependent function tolower(...) appears to be used: src/init.cpp: if (s[0] == '0' && std::tolower(s[1]) == 'x') { Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. Please avoid using locale dependent functions if possible. Advice not applicable in this specific case? Add an exception by updating the ignore list in contrib/devtools/lint-locale-dependence.sh ``` **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed? Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
…uction of locale dependence 698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift) 0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift) Pull request description: This linter will check for code accidentally introducing locale dependencies. Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible. Context: bitcoin#12881 (comment) Example output: ``` $ contrib/devtools/lint-locale-dependence.sh The locale dependent function tolower(...) appears to be used: src/init.cpp: if (s[0] == '0' && std::tolower(s[1]) == 'x') { Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. Please avoid using locale dependent functions if possible. Advice not applicable in this specific case? Add an exception by updating the ignore list in contrib/devtools/lint-locale-dependence.sh ``` **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed? Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
60f61f9 Tighten up bech32::Decode(); add tests. (murrayn) Pull request description: Just a few minor optimizations to bech32::Decode(): 1) optimize the order and logic of the conditionals 2) get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above) 3) add a couple more bech32 tests (mixed-case) Tree-SHA512: e41af834c8f6b7d34c22c28b724df42c60f72e00df616e70a12efbc4271d15d80627fe1bc36845caf29f615c238499a566298a863cbe119fef457287231053c8
60f61f9 Tighten up bech32::Decode(); add tests. (murrayn) Pull request description: Just a few minor optimizations to bech32::Decode(): 1) optimize the order and logic of the conditionals 2) get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above) 3) add a couple more bech32 tests (mixed-case) Tree-SHA512: e41af834c8f6b7d34c22c28b724df42c60f72e00df616e70a12efbc4271d15d80627fe1bc36845caf29f615c238499a566298a863cbe119fef457287231053c8
Just a few minor optimizations to bech32::Decode():