New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-14269: [C++] Consolidate utf8 benchmark #11376
Conversation
|
const uint8_t* data = reinterpret_cast<const uint8_t*>(str.data()); | ||
const size_t length = str.size(); | ||
|
||
return ValidateUTF8(data, length); | ||
} | ||
|
||
inline bool ValidateAsciiSw(const uint8_t* data, int64_t len) { | ||
static inline bool ValidateAsciiSw(const uint8_t* data, int64_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original code unrolls loop manually expecting to make better use of cpu pipeline. But it prevents the compiler to do better optimization leveraging auto vectorization.
Actually, this simple code performs same as simd code, if build with clang.
But gcc has big regression. So I left the simd code untouched.
- clang, simd no better than naive code
https://quick-bench.com/q/R5S9gDfyCzxs4pyQO3rLszMLCBI - gcc, simd is faster
https://quick-bench.com/q/Xes5_3-CGjJbYNikY0E0BWdfVTo
NOTE: below results are compared against the non-inlined benchmark (1st commit in this pr), not against master branch.
clang-10, Intel gold 5218
gcc-9, Intel gold 5218
Arm Neoverse N1, clang-10
Arm Neoverse N1, gcc-9
|
// Do not benchmark inlined functions directly inside the benchmark loop | ||
static ARROW_NOINLINE bool ValidateUTF8NoInline(const uint8_t* data, int64_t size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this noinline wrap, benchmarked function is expanded in the benchmark loop.
It causes strange behaviour. E.g., an obviously irrelevant change may bring big variance to the benchmark result.
@ursabot please benchmark |
Benchmark runs are scheduled for baseline = 39875f1 and contender = a06c34f. Results will be available as each benchmark for each run completes. |
There is no important regression, so this can be merged IMHO. |
Don't benchmark inlined functions directly inside the benchmark loop. Instead, wrap them with non-inlined functions. This patch also improves performance of validating short ascii strings.
Rebased to fix merge conflict. |
Benchmark runs are scheduled for baseline = ca18e8a and contender = b01a30a. b01a30a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Don't benchmark inlined functions directly inside the benchmark loop.
Instead, wrap them with non-inlined functions.
This patch also improves performance of validating short ascii strings.