Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace UUID validation RegEx with manual validation #651

Merged
merged 2 commits into from
Mar 8, 2020

Conversation

dbrgn
Copy link
Member

@dbrgn dbrgn commented Jan 5, 2020

When analyzing a callgrind profile of a full library rescan, I noticed
that there are a lot of CPU instructions coming from libpcre2, with the
call originating in the UUID validation code.

Previously, the regular expression used for UUID validation was parsed
from a string for every invocation of the validation function. This is
unnecessary. A library rescan will validate thousands of UUIDs (roughly
250k calls to Uuid::isValid with 200'000 instructions per call on my
laptop for a library rescan). By moving the code to a static class
member, I achieved a 15% speedup for a full library scan.


Library scan duration for five consecutive LibrePCB starts using the release build:

Before:

  • 12'419 ms
  • 12'239 ms
  • 11'992 ms
  • 12'377 ms
  • 11'766 ms

Avg: 12'159 ms

After:

  • 11'366 ms
  • 9'975 ms
  • 10'113 ms
  • 10'149 ms
  • 10'020 ms

Avg: 10'325 ms (-15%)

@dbrgn
Copy link
Member Author

dbrgn commented Jan 5, 2020

My master branch was outdated (did not yet include the regex change from #637), it's now rebased. I don't expect the regex change to affect the benchmark above.

@ubruhin
Copy link
Member

ubruhin commented Jan 5, 2020

Nice catch!

But the fix could be better and simpler 😉 Creating static objects is often dangerous, especially for complex types. Since the initialization order of static objects is not well defined, very weird behavior could be introduced if there are dependencies between static objects. In addition, these objects are initialized very early (before main() gets called) and thus important things like creating a QApplication instance (which is essential for many Qt objects), or in this case initializing the libpcre2 library, are not yet done.

Thus I'd avoid static objects whenever possible. Usually that's easy by wrapping such objects with a static method like this:

class Foo {
  static const Bar& createBar() {
    static Bar bar;
    return bar;
  }
};

In that case, the object will be created once when it is accessed the first time, i.e. much later than static objects.

But in the Uuid class you probably don't even need such a wrapper method, just add static const to the definition of the QRegularExpression object (const only to be sure the objects doesn't get modified during its lifetime), that should (hopefully) do the trick 😉

Btw, there are several other classes which use QRegularExpression in a similar way, would you be interested to take a look at them too?

@ubruhin
Copy link
Member

ubruhin commented Jan 5, 2020

And one more issue since Uuid is used in multiple threads: https://stackoverflow.com/questions/25581696/qregularexpression-matching-thread-safety

@eduardosm
Copy link
Contributor

Why not just write the matching code manually instead of using a regex library? It should be quite simple in this case.

Something like:

auto isLowerHex = [](QChar chr) {
    return (chr >= QChar('0') && chr <= QChar('9')) || (chr >= QChar('a') && chr <= QChar('f'));
};
if (str.length() != 36) return false;
if (!isLowerHex(str[0])) return false;
if (!isLowerHex(str[1])) return false;
...
if (str[8] != QChar('-')) return false;
...

I believe this would be faster than using a regular expression compiled at runtime.

@dbrgn
Copy link
Member Author

dbrgn commented Jan 5, 2020

Creating static objects is often dangerous

Thanks for the explanation! I'm used to lazy_static and already expected static initialization of complex objects to be potentially fishy 🙂

Why not just write the matching code manually instead of using a regex library?

I thought about that too, but a regex basically does that as well, but in a very optimized way that might make optimal use of CPU caches and things like SIMD instructions. If the RegEx is not unnecessarily created multiple times, that cost should be amortized over time.

In any case, we would need to benchmark it. Any performance optimization without a benchmark is just guessing.

@dbrgn
Copy link
Member Author

dbrgn commented Jan 5, 2020

Btw, a compile-time regular expression library like ctre would be very nice, but it requires C++17 (Clang 5+, GCC 7+).

@dbrgn
Copy link
Member Author

dbrgn commented Jan 9, 2020

I made a simple benchmark (a standalone CLI application that validates 30'000 UUIDs):

#include <iostream>
#include <librepcb/common/uuid.h>

using namespace librepcb;

const static QString uuids[3000] = {
  "61a0caad-812b-4291-908d-b63aa9f7213e",
  // ...
  "baa57b4e-3e68-4e3b-8a07-041575b42fce"
};

void validate_uuids() {
  unsigned volatile int valid = 0;
  unsigned volatile int invalid = 0;
  for (int i = 0; i < 10; i++) {
    for (QString uuid : uuids) {
      if (Uuid::isValid(uuid)) {
        valid += 1;
      } else {
        invalid += 1;
      }
    }
  }
  std::cout << valid << " valid, " << invalid << " invalid." << std::endl;
}

int main() {
  std::cout << "Starting benchmark..." << std::endl;
  validate_uuids();
  return 0;
}

I compared the current master, the static regex and two version with manual char comparison as suggested by @eduardosm (once with loops and once with manual loop unrolling):

inline bool isLowerHex(const QChar chr) noexcept {
    return (chr >= QChar('0') && chr <= QChar('9')) || (chr >= QChar('a') && chr <= QChar('f'));
}

// Loops
bool Uuid::isValid(const QString& str) noexcept {
  // check format of string
  if (str.length() != 36) return false;
  for (uint i = 0; i < 8; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[8] != QChar('-')) return false;
  for (uint i = 9; i < 13; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[13] != QChar('-')) return false;
  for (uint i = 14; i < 18; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[18] != QChar('-')) return false;
  for (uint i = 19; i < 23; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[23] != QChar('-')) return false;
  for (uint i = 24; i < 36; i++) {
    if (!isLowerHex(str[i])) return false;
  }

  // ...
}

// Manual loop unrolling
bool Uuid::isValid(const QString& str) noexcept {
  // check format of string
  if (str.length() != 36) return false;
  if (!isLowerHex(str[0])) return false;
  if (!isLowerHex(str[1])) return false;
  if (!isLowerHex(str[2])) return false;
  if (!isLowerHex(str[3])) return false;
  if (!isLowerHex(str[4])) return false;
  if (!isLowerHex(str[5])) return false;
  if (!isLowerHex(str[6])) return false;
  if (!isLowerHex(str[7])) return false;
  if (str[8] != QChar('-')) return false;
  if (!isLowerHex(str[9])) return false;
  if (!isLowerHex(str[10])) return false;
  if (!isLowerHex(str[11])) return false;
  if (!isLowerHex(str[12])) return false;
  if (str[13] != QChar('-')) return false;
  if (!isLowerHex(str[14])) return false;
  if (!isLowerHex(str[15])) return false;
  if (!isLowerHex(str[16])) return false;
  if (!isLowerHex(str[17])) return false;
  if (str[18] != QChar('-')) return false;
  if (!isLowerHex(str[19])) return false;
  if (!isLowerHex(str[20])) return false;
  if (!isLowerHex(str[21])) return false;
  if (!isLowerHex(str[22])) return false;
  if (str[23] != QChar('-')) return false;
  if (!isLowerHex(str[24])) return false;
  if (!isLowerHex(str[25])) return false;
  if (!isLowerHex(str[26])) return false;
  if (!isLowerHex(str[27])) return false;
  if (!isLowerHex(str[28])) return false;
  if (!isLowerHex(str[29])) return false;
  if (!isLowerHex(str[30])) return false;
  if (!isLowerHex(str[31])) return false;
  if (!isLowerHex(str[32])) return false;
  if (!isLowerHex(str[33])) return false;
  if (!isLowerHex(str[34])) return false;
  if (!isLowerHex(str[35])) return false;

  // ...
}

I then timed the different variants 5 times with time (user+sys) and did a valgrind+callgrind measurement (both in a release build). Results:

Variant Times (ms) Avg Time (ms) CPU Instructions per call to isValid Speedup Threadsafe
Master 553 / 552 / 553 / 556 / 563 555 187 341 100% Yes
Static RegEx 24 / 26 / 25 / 17 / 24 23 3 026 2413% No?
Char comparisons (loops) 17 / 20 / 20 / 21 / 20 20 871 2775% Yes
Char comparisons (unrolled) 20 / 18 / 16 / 18 / 19 18 767 3083% Yes

@ubruhin It seems that char comparison beats regular expressions and it's thread safe, so I'd go with the unrolled version unless you prefer to spend a few cycles for better readability. (It seems quite strange to me that these loops aren't unrolled by the compiler 😕)

In any case, a 3083% speedup is quite nice 😁

@rnestler
Copy link
Member

rnestler commented Jan 9, 2020

Static RegEx Threadsafe No?

Why shouldn't it be thread safe? You don't modify the regex, right? The static initialization is guaranteed to be thread safe by C++11: https://stackoverflow.com/questions/1661529/is-meyers-implementation-of-the-singleton-pattern-thread-safe.

Edit: Oh wow. After reading https://stackoverflow.com/questions/25581696/qregularexpression-matching-thread-safety I'm again horrified about C++.

With my "QRegularExpression author" hat, I can tell you that QRegularExpression is actually thread safe, as long as you only use const methods). But if you rely on that I'm going to give you a bad time.

How can one not guarantee that const access is thread safe?

@dbrgn
Copy link
Member Author

dbrgn commented Jan 9, 2020

How can one not guarantee that const access is thread safe?

The RegEx engine probably has an internal state machine 😉

@ubruhin
Copy link
Member

ubruhin commented Jan 20, 2020

Thanks for the detailed investigation!

@ubruhin It seems that char comparison beats regular expressions and it's thread safe, so I'd go with the unrolled version unless you prefer to spend a few cycles for better readability.

Ok I'm fine with that, but would by nice to add a comment to the code why (the hell) UUID checking is done that way (link to this PR or so). Otherwise one might want to replace it by a RegEx some time 😁

dbrgn added a commit that referenced this pull request Jan 23, 2020
When analyzing a callgrind profile of a full library rescan, I noticed
that there are a lot of CPU instructions coming from libpcre2, with the
call originating in the UUID validation code.

Previously, the regular expression used for UUID validation was parsed
from a string for every invocation of the validation function. This is
unnecessary. A library rescan will validate thousands of UUIDs (roughly
250k calls to `Uuid::isValid` with 200'000 instructions per call on my
laptop for a library rescan). By replacing the RegEx with a manually
loop-unrolled validation function, I achieved a 20-40% speedup for a
full library scan on my computer (which has a fast SSD). The CPU
instructions per call to `isValid` were reduced from 187'341 to 767.

For more details, please refer to the pull request discussion:
#651
@dbrgn
Copy link
Member Author

dbrgn commented Jan 23, 2020

@ubruhin updated and rebased! I added an inline comment, as well as a detailed commit message.

When analyzing a callgrind profile of a full library rescan, I noticed
that there are a lot of CPU instructions coming from libpcre2, with the
call originating in the UUID validation code.

Previously, the regular expression used for UUID validation was parsed
from a string for every invocation of the validation function. This is
unnecessary. A library rescan will validate thousands of UUIDs (roughly
250k calls to `Uuid::isValid` with 200'000 instructions per call on my
laptop for a library rescan). By replacing the RegEx with a manually
loop-unrolled validation function, I achieved a 20-40% speedup for a
full library scan on my computer (which has a fast SSD). The CPU
instructions per call to `isValid` were reduced from 187'341 to 767.

For more details, please refer to the pull request discussion:
#651
@ubruhin ubruhin changed the title Make UUID validation RegEx a static class member Replace UUID validation RegEx with manual validation Mar 8, 2020
@ubruhin ubruhin merged commit ca83f88 into master Mar 8, 2020
@ubruhin ubruhin deleted the optimize-uuid-validation branch March 8, 2020 12:55
ubruhin pushed a commit that referenced this pull request Apr 22, 2020
When analyzing a callgrind profile of a full library rescan, I noticed
that there are a lot of CPU instructions coming from libpcre2, with the
call originating in the UUID validation code.

Previously, the regular expression used for UUID validation was parsed
from a string for every invocation of the validation function. This is
unnecessary. A library rescan will validate thousands of UUIDs (roughly
250k calls to `Uuid::isValid` with 200'000 instructions per call on my
laptop for a library rescan). By replacing the RegEx with a manually
loop-unrolled validation function, I achieved a 20-40% speedup for a
full library scan on my computer (which has a fast SSD). The CPU
instructions per call to `isValid` were reduced from 187'341 to 767.

For more details, please refer to the pull request discussion:
#651
(cherry picked from commit ca83f88)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants