Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor optimizations to bech32::Decode(); add tests. #12881

Merged
merged 1 commit into from
May 15, 2018

Conversation

murrayn
Copy link
Contributor

@murrayn murrayn commented Apr 4, 2018

Just a few minor optimizations to bech32::Decode():

  1. optimize the order and logic of the conditionals
  2. get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above)
  3. add a couple more bech32 tests (mixed-case)

src/bech32.cpp Outdated
if (c >= 'a' && c <= 'z') lower = true;
if (c >= 'A' && c <= 'Z') upper = true;
// Mixing upper and lowercase is not OK.
if (!lower && c >= 'a' && c <= 'z') { if(upper) return {}; lower = true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if (upper) instead of if(upper)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not even a nit. Should be a compiler error IMO. I must be tired. :-)

size_t pos = str.rfind('1');
if (str.size() > 90 || pos == str.npos || pos == 0 || pos + 7 > str.size()) {
return {};
}
data values(str.size() - 1 - pos);
for (size_t i = 0; i < str.size() - 1 - pos; ++i) {
unsigned char c = str[i + pos + 1];
int8_t rev = (c < 33 || c > 126) ? -1 : CHARSET_REV[c];
int8_t rev = CHARSET_REV[c];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assert(c >= 33 && c <= 126); on the line before int8_t rev = CHARSET_REV[c]; to make assumption explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@practicalswift If I were writing Decode() from scratch, I don't think it would occur to me to add an assert() there. Not convinced it adds any value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that we have asserts enabled on release notes, better not add them in inner loops especially not if the goal is to 'tighten up' anything.

@practicalswift
Copy link
Contributor

practicalswift commented Apr 4, 2018

LowerCase() and std::tolower() are not equivalent. std::tolower() takes the currently installed C locale into account.

Copy link
Contributor Author

@murrayn murrayn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a feeling there was a reason for LowerCase().

src/bech32.cpp Outdated
if (c >= 'a' && c <= 'z') lower = true;
if (c >= 'A' && c <= 'Z') upper = true;
// Mixing upper and lowercase is not OK.
if (!lower && c >= 'a' && c <= 'z') { if (upper) return {}; lower = true; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the coding style: any if that has anything but just a single-statement then body must use braces and indentation.

@fanquake
Copy link
Member

fanquake commented Apr 5, 2018

@murrayn After fixing any nits, please also squash your commits.

Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide benchmark results. I suspect this makes the most common case worst. Maybe the suggestion below would improve performance for the most common case. An edge case doesn't have to perform better if the implementation makes the remaining cases worst.

src/bech32.cpp Outdated
@@ -161,18 +161,25 @@ std::pair<std::string, data> Decode(const std::string& str) {
for (size_t i = 0; i < str.size(); ++i) {
unsigned char c = str[i];
if (c < 33 || c > 126) return {};
if (c >= 'a' && c <= 'z') lower = true;
if (c >= 'A' && c <= 'Z') upper = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else here would improve a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better:

if (c >= 'a') { if (c <= 'z') lower = true; }
else if (c >= 'A') { if (c <= 'Z') upper = true; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@promag Do you have an example in mind of a most common case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case (let's assume a string consisting only of lowercase or uppercase letters) the existing code will do three comparisons and one assignment, per character. The proposed code will do three comparisons per character, with the added benefit of returning earlier in the case of a malformed string. Not sure why you would "suspect this makes the most common case worst".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further to this, if we're going to assume the most common case is the success case (which is reasonable), it would probably be good to move the initial (c < 33 || c > 126) check to an ultimate "else if" check to the original code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@murrayn probably, but can you post benchmark results?

@laanwj
Copy link
Member

laanwj commented Apr 5, 2018

LowerCase() and std::tolower() are not equivalent. std::tolower() takes the currently installed C locale into account.

Thanks for catching this. We should be extremely careful to not introduce locale dependencies in the low-level string parsing functions. We've had serious problems with those in the past. This can result in country-specific bugs...

@sipa
Copy link
Member

sipa commented Apr 5, 2018

Please don't overthink this. Decoding addresses is hardly relevant (I don't think anyone would notice if they were 100x slower). My goal when writing this was more clarity and simplicity than speed, though I'm obviously not opposed to performance improvements if they don't conflict with those goals.

I am interested in whether the additional branches don't make performance worse though. My gut feeling is that they impact performance more than comparisons.

@murrayn
Copy link
Contributor Author

murrayn commented Apr 6, 2018

@sipa Thanks for the feedback. I've reworked the code again to reflect your input.

@maflcko
Copy link
Member

maflcko commented Apr 11, 2018

To get rid of the merge commit, please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits

@murrayn
Copy link
Contributor Author

murrayn commented Apr 20, 2018

Not sure if this PR is stalled due to the earlier comment about benchmarks...I didn't think benchmarks would be as interesting after my most recent commit, in which the code was more straightforward. Just in case, I have benchmarked:

if (c >= 'a' && c <= 'z') lower = 1;
else if (c >= 'A' && c <= 'Z') upper = 1;
else if (c < 33 || c > 126) return 0;

versus

if (c < 33 || c > 126) return 0;
if (c >= 'a' && c <= 'z') lower = 1;
if (c >= 'A' && c <= 'Z') upper = 1;

and my results show the former is significantly faster; however, with -O2 compiler optimization enabled they benchmark identically, which isn't surprising.

Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 60f61f9.

Change is slightly better because success checks come first, which is probably the most common case.

@maflcko
Copy link
Member

maflcko commented Apr 26, 2018

Please also adjust the OP

@murrayn murrayn changed the title Tighten up bech32::Decode(); add tests. Minor optimizations to bech32::Decode(); add tests. Apr 28, 2018
@laanwj
Copy link
Member

laanwj commented May 15, 2018

utACK 60f61f9

@laanwj laanwj merged commit 60f61f9 into bitcoin:master May 15, 2018
laanwj added a commit that referenced this pull request May 15, 2018
60f61f9 Tighten up bech32::Decode(); add tests. (murrayn)

Pull request description:

  Just a few minor optimizations to bech32::Decode():

  1) optimize the order and logic of the conditionals
  2) get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above)
  3) add a couple more bech32 tests (mixed-case)

Tree-SHA512: e41af834c8f6b7d34c22c28b724df42c60f72e00df616e70a12efbc4271d15d80627fe1bc36845caf29f615c238499a566298a863cbe119fef457287231053c8
laanwj added a commit that referenced this pull request Jun 7, 2018
…of locale dependence

698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift)
0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift)

Pull request description:

  This linter will check for code accidentally introducing locale dependencies.

  Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

  Context: #12881 (comment)

  Example output:

  ```
  $ contrib/devtools/lint-locale-dependence.sh
  The locale dependent function tolower(...) appears to be used:
  src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

  Unnecessary locale dependence can cause bugs that are very
  tricky to isolate and fix. Please avoid using locale dependent
  functions if possible.

  Advice not applicable in this specific case? Add an exception
  by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
  ```

  **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed?

Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 17, 2020
…uction of locale dependence

698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift)
0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift)

Pull request description:

  This linter will check for code accidentally introducing locale dependencies.

  Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

  Context: bitcoin#12881 (comment)

  Example output:

  ```
  $ contrib/devtools/lint-locale-dependence.sh
  The locale dependent function tolower(...) appears to be used:
  src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

  Unnecessary locale dependence can cause bugs that are very
  tricky to isolate and fix. Please avoid using locale dependent
  functions if possible.

  Advice not applicable in this specific case? Add an exception
  by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
  ```

  **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed?

Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 2, 2020
…uction of locale dependence

698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift)
0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift)

Pull request description:

  This linter will check for code accidentally introducing locale dependencies.

  Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

  Context: bitcoin#12881 (comment)

  Example output:

  ```
  $ contrib/devtools/lint-locale-dependence.sh
  The locale dependent function tolower(...) appears to be used:
  src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

  Unnecessary locale dependence can cause bugs that are very
  tricky to isolate and fix. Please avoid using locale dependent
  functions if possible.

  Advice not applicable in this specific case? Add an exception
  by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
  ```

  **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed?

Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request May 21, 2021
60f61f9 Tighten up bech32::Decode(); add tests. (murrayn)

Pull request description:

  Just a few minor optimizations to bech32::Decode():

  1) optimize the order and logic of the conditionals
  2) get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above)
  3) add a couple more bech32 tests (mixed-case)

Tree-SHA512: e41af834c8f6b7d34c22c28b724df42c60f72e00df616e70a12efbc4271d15d80627fe1bc36845caf29f615c238499a566298a863cbe119fef457287231053c8
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request May 25, 2021
60f61f9 Tighten up bech32::Decode(); add tests. (murrayn)

Pull request description:

  Just a few minor optimizations to bech32::Decode():

  1) optimize the order and logic of the conditionals
  2) get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above)
  3) add a couple more bech32 tests (mixed-case)

Tree-SHA512: e41af834c8f6b7d34c22c28b724df42c60f72e00df616e70a12efbc4271d15d80627fe1bc36845caf29f615c238499a566298a863cbe119fef457287231053c8
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants