Replace UUID validation RegEx with manual validation #651

dbrgn · 2020-01-05T00:14:29Z

When analyzing a callgrind profile of a full library rescan, I noticed
that there are a lot of CPU instructions coming from libpcre2, with the
call originating in the UUID validation code.

Previously, the regular expression used for UUID validation was parsed
from a string for every invocation of the validation function. This is
unnecessary. A library rescan will validate thousands of UUIDs (roughly
250k calls to Uuid::isValid with 200'000 instructions per call on my
laptop for a library rescan). By moving the code to a static class
member, I achieved a 15% speedup for a full library scan.

Library scan duration for five consecutive LibrePCB starts using the release build:

Before:

12'419 ms
12'239 ms
11'992 ms
12'377 ms
11'766 ms

Avg: 12'159 ms

After:

11'366 ms
9'975 ms
10'113 ms
10'149 ms
10'020 ms

Avg: 10'325 ms (-15%)

dbrgn · 2020-01-05T00:18:24Z

My master branch was outdated (did not yet include the regex change from #637), it's now rebased. I don't expect the regex change to affect the benchmark above.

ubruhin · 2020-01-05T01:00:27Z

Nice catch!

But the fix could be better and simpler 😉 Creating static objects is often dangerous, especially for complex types. Since the initialization order of static objects is not well defined, very weird behavior could be introduced if there are dependencies between static objects. In addition, these objects are initialized very early (before main() gets called) and thus important things like creating a QApplication instance (which is essential for many Qt objects), or in this case initializing the libpcre2 library, are not yet done.

Thus I'd avoid static objects whenever possible. Usually that's easy by wrapping such objects with a static method like this:

class Foo {
  static const Bar& createBar() {
    static Bar bar;
    return bar;
  }
};

In that case, the object will be created once when it is accessed the first time, i.e. much later than static objects.

But in the Uuid class you probably don't even need such a wrapper method, just add static const to the definition of the QRegularExpression object (const only to be sure the objects doesn't get modified during its lifetime), that should (hopefully) do the trick 😉

Btw, there are several other classes which use QRegularExpression in a similar way, would you be interested to take a look at them too?

ubruhin · 2020-01-05T01:23:16Z

And one more issue since Uuid is used in multiple threads: https://stackoverflow.com/questions/25581696/qregularexpression-matching-thread-safety

eduardosm · 2020-01-05T02:31:46Z

Why not just write the matching code manually instead of using a regex library? It should be quite simple in this case.

Something like:

auto isLowerHex = [](QChar chr) {
    return (chr >= QChar('0') && chr <= QChar('9')) || (chr >= QChar('a') && chr <= QChar('f'));
};
if (str.length() != 36) return false;
if (!isLowerHex(str[0])) return false;
if (!isLowerHex(str[1])) return false;
...
if (str[8] != QChar('-')) return false;
...

I believe this would be faster than using a regular expression compiled at runtime.

dbrgn · 2020-01-05T10:02:22Z

Creating static objects is often dangerous

Thanks for the explanation! I'm used to lazy_static and already expected static initialization of complex objects to be potentially fishy 🙂

Why not just write the matching code manually instead of using a regex library?

I thought about that too, but a regex basically does that as well, but in a very optimized way that might make optimal use of CPU caches and things like SIMD instructions. If the RegEx is not unnecessarily created multiple times, that cost should be amortized over time.

In any case, we would need to benchmark it. Any performance optimization without a benchmark is just guessing.

dbrgn · 2020-01-05T10:05:12Z

Btw, a compile-time regular expression library like ctre would be very nice, but it requires C++17 (Clang 5+, GCC 7+).

dbrgn · 2020-01-09T21:55:15Z

I made a simple benchmark (a standalone CLI application that validates 30'000 UUIDs):

#include <iostream>
#include <librepcb/common/uuid.h>

using namespace librepcb;

const static QString uuids[3000] = {
  "61a0caad-812b-4291-908d-b63aa9f7213e",
  // ...
  "baa57b4e-3e68-4e3b-8a07-041575b42fce"
};

void validate_uuids() {
  unsigned volatile int valid = 0;
  unsigned volatile int invalid = 0;
  for (int i = 0; i < 10; i++) {
    for (QString uuid : uuids) {
      if (Uuid::isValid(uuid)) {
        valid += 1;
      } else {
        invalid += 1;
      }
    }
  }
  std::cout << valid << " valid, " << invalid << " invalid." << std::endl;
}

int main() {
  std::cout << "Starting benchmark..." << std::endl;
  validate_uuids();
  return 0;
}

I compared the current master, the static regex and two version with manual char comparison as suggested by @eduardosm (once with loops and once with manual loop unrolling):

inline bool isLowerHex(const QChar chr) noexcept {
    return (chr >= QChar('0') && chr <= QChar('9')) || (chr >= QChar('a') && chr <= QChar('f'));
}

// Loops
bool Uuid::isValid(const QString& str) noexcept {
  // check format of string
  if (str.length() != 36) return false;
  for (uint i = 0; i < 8; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[8] != QChar('-')) return false;
  for (uint i = 9; i < 13; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[13] != QChar('-')) return false;
  for (uint i = 14; i < 18; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[18] != QChar('-')) return false;
  for (uint i = 19; i < 23; i++) {
    if (!isLowerHex(str[i])) return false;
  }
  if (str[23] != QChar('-')) return false;
  for (uint i = 24; i < 36; i++) {
    if (!isLowerHex(str[i])) return false;
  }

  // ...
}

// Manual loop unrolling
bool Uuid::isValid(const QString& str) noexcept {
  // check format of string
  if (str.length() != 36) return false;
  if (!isLowerHex(str[0])) return false;
  if (!isLowerHex(str[1])) return false;
  if (!isLowerHex(str[2])) return false;
  if (!isLowerHex(str[3])) return false;
  if (!isLowerHex(str[4])) return false;
  if (!isLowerHex(str[5])) return false;
  if (!isLowerHex(str[6])) return false;
  if (!isLowerHex(str[7])) return false;
  if (str[8] != QChar('-')) return false;
  if (!isLowerHex(str[9])) return false;
  if (!isLowerHex(str[10])) return false;
  if (!isLowerHex(str[11])) return false;
  if (!isLowerHex(str[12])) return false;
  if (str[13] != QChar('-')) return false;
  if (!isLowerHex(str[14])) return false;
  if (!isLowerHex(str[15])) return false;
  if (!isLowerHex(str[16])) return false;
  if (!isLowerHex(str[17])) return false;
  if (str[18] != QChar('-')) return false;
  if (!isLowerHex(str[19])) return false;
  if (!isLowerHex(str[20])) return false;
  if (!isLowerHex(str[21])) return false;
  if (!isLowerHex(str[22])) return false;
  if (str[23] != QChar('-')) return false;
  if (!isLowerHex(str[24])) return false;
  if (!isLowerHex(str[25])) return false;
  if (!isLowerHex(str[26])) return false;
  if (!isLowerHex(str[27])) return false;
  if (!isLowerHex(str[28])) return false;
  if (!isLowerHex(str[29])) return false;
  if (!isLowerHex(str[30])) return false;
  if (!isLowerHex(str[31])) return false;
  if (!isLowerHex(str[32])) return false;
  if (!isLowerHex(str[33])) return false;
  if (!isLowerHex(str[34])) return false;
  if (!isLowerHex(str[35])) return false;

  // ...
}

I then timed the different variants 5 times with time (user+sys) and did a valgrind+callgrind measurement (both in a release build). Results:

Variant	Times (ms)	Avg Time (ms)	CPU Instructions per call to `isValid`	Speedup	Threadsafe
Master	553 / 552 / 553 / 556 / 563	555	187 341	100%	Yes
Static RegEx	24 / 26 / 25 / 17 / 24	23	3 026	2413%	No?
Char comparisons (loops)	17 / 20 / 20 / 21 / 20	20	871	2775%	Yes
Char comparisons (unrolled)	20 / 18 / 16 / 18 / 19	18	767	3083%	Yes

@ubruhin It seems that char comparison beats regular expressions and it's thread safe, so I'd go with the unrolled version unless you prefer to spend a few cycles for better readability. (It seems quite strange to me that these loops aren't unrolled by the compiler 😕)

In any case, a 3083% speedup is quite nice 😁

rnestler · 2020-01-09T22:08:28Z

Static RegEx Threadsafe No?

Why shouldn't it be thread safe? You don't modify the regex, right? The static initialization is guaranteed to be thread safe by C++11: https://stackoverflow.com/questions/1661529/is-meyers-implementation-of-the-singleton-pattern-thread-safe.

Edit: Oh wow. After reading https://stackoverflow.com/questions/25581696/qregularexpression-matching-thread-safety I'm again horrified about C++.

With my "QRegularExpression author" hat, I can tell you that QRegularExpression is actually thread safe, as long as you only use const methods). But if you rely on that I'm going to give you a bad time.

How can one not guarantee that const access is thread safe?

dbrgn · 2020-01-09T22:27:36Z

How can one not guarantee that const access is thread safe?

The RegEx engine probably has an internal state machine 😉

ubruhin · 2020-01-20T20:31:42Z

Thanks for the detailed investigation!

@ubruhin It seems that char comparison beats regular expressions and it's thread safe, so I'd go with the unrolled version unless you prefer to spend a few cycles for better readability.

Ok I'm fine with that, but would by nice to add a comment to the code why (the hell) UUID checking is done that way (link to this PR or so). Otherwise one might want to replace it by a RegEx some time 😁

When analyzing a callgrind profile of a full library rescan, I noticed that there are a lot of CPU instructions coming from libpcre2, with the call originating in the UUID validation code. Previously, the regular expression used for UUID validation was parsed from a string for every invocation of the validation function. This is unnecessary. A library rescan will validate thousands of UUIDs (roughly 250k calls to `Uuid::isValid` with 200'000 instructions per call on my laptop for a library rescan). By replacing the RegEx with a manually loop-unrolled validation function, I achieved a 20-40% speedup for a full library scan on my computer (which has a fast SSD). The CPU instructions per call to `isValid` were reduced from 187'341 to 767. For more details, please refer to the pull request discussion: #651

dbrgn · 2020-01-23T22:32:13Z

@ubruhin updated and rebased! I added an inline comment, as well as a detailed commit message.

libs/librepcb/common/uuid.cpp

When analyzing a callgrind profile of a full library rescan, I noticed that there are a lot of CPU instructions coming from libpcre2, with the call originating in the UUID validation code. Previously, the regular expression used for UUID validation was parsed from a string for every invocation of the validation function. This is unnecessary. A library rescan will validate thousands of UUIDs (roughly 250k calls to `Uuid::isValid` with 200'000 instructions per call on my laptop for a library rescan). By replacing the RegEx with a manually loop-unrolled validation function, I achieved a 20-40% speedup for a full library scan on my computer (which has a fast SSD). The CPU instructions per call to `isValid` were reduced from 187'341 to 767. For more details, please refer to the pull request discussion: #651

Make isLowerHex a lambda function

When analyzing a callgrind profile of a full library rescan, I noticed that there are a lot of CPU instructions coming from libpcre2, with the call originating in the UUID validation code. Previously, the regular expression used for UUID validation was parsed from a string for every invocation of the validation function. This is unnecessary. A library rescan will validate thousands of UUIDs (roughly 250k calls to `Uuid::isValid` with 200'000 instructions per call on my laptop for a library rescan). By replacing the RegEx with a manually loop-unrolled validation function, I achieved a 20-40% speedup for a full library scan on my computer (which has a fast SSD). The CPU instructions per call to `isValid` were reduced from 187'341 to 767. For more details, please refer to the pull request discussion: #651 (cherry picked from commit ca83f88)

dbrgn requested review from rnestler and ubruhin January 5, 2020 00:14

dbrgn force-pushed the optimize-uuid-validation branch from d8a97cf to 65d9ccb Compare January 5, 2020 00:16

dbrgn force-pushed the optimize-uuid-validation branch from 2e9eece to 1eb7dce Compare January 23, 2020 21:20

dbrgn force-pushed the optimize-uuid-validation branch from 1eb7dce to ae1dd70 Compare January 23, 2020 22:29

ubruhin reviewed Feb 4, 2020

View reviewed changes

libs/librepcb/common/uuid.cpp Outdated Show resolved Hide resolved

dbrgn added 2 commits March 6, 2020 23:51

fixup! Replace UUID validation RegEx with manual validation

6b0cfd9

Make isLowerHex a lambda function

dbrgn force-pushed the optimize-uuid-validation branch from ae1dd70 to 6b0cfd9 Compare March 7, 2020 00:02

ubruhin changed the title ~~Make UUID validation RegEx a static class member~~ Replace UUID validation RegEx with manual validation Mar 8, 2020

ubruhin merged commit ca83f88 into master Mar 8, 2020

ubruhin deleted the optimize-uuid-validation branch March 8, 2020 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace UUID validation RegEx with manual validation #651

Replace UUID validation RegEx with manual validation #651

dbrgn commented Jan 5, 2020

dbrgn commented Jan 5, 2020

ubruhin commented Jan 5, 2020 •

edited

Loading

ubruhin commented Jan 5, 2020

eduardosm commented Jan 5, 2020

dbrgn commented Jan 5, 2020 •

edited

Loading

dbrgn commented Jan 5, 2020

dbrgn commented Jan 9, 2020 •

edited

Loading

rnestler commented Jan 9, 2020 •

edited

Loading

dbrgn commented Jan 9, 2020

ubruhin commented Jan 20, 2020

dbrgn commented Jan 23, 2020

Replace UUID validation RegEx with manual validation #651

Replace UUID validation RegEx with manual validation #651

Conversation

dbrgn commented Jan 5, 2020

dbrgn commented Jan 5, 2020

ubruhin commented Jan 5, 2020 • edited Loading

ubruhin commented Jan 5, 2020

eduardosm commented Jan 5, 2020

dbrgn commented Jan 5, 2020 • edited Loading

dbrgn commented Jan 5, 2020

dbrgn commented Jan 9, 2020 • edited Loading

rnestler commented Jan 9, 2020 • edited Loading

dbrgn commented Jan 9, 2020

ubruhin commented Jan 20, 2020

dbrgn commented Jan 23, 2020

ubruhin commented Jan 5, 2020 •

edited

Loading

dbrgn commented Jan 5, 2020 •

edited

Loading

dbrgn commented Jan 9, 2020 •

edited

Loading

rnestler commented Jan 9, 2020 •

edited

Loading