-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13536: [C++] Use decimal-point aware conversion from fast-float #11817
Conversation
@ursabot please benchmark |
Benchmark runs are scheduled for baseline = 2baed02 and contender = 2113e2a. Results will be available as each benchmark for each run completes. |
The custom wrapper is still used for decimals.
2113e2a
to
0a82075
Compare
@ursabot please benchmark |
Benchmark runs are scheduled for baseline = 2baed02 and contender = 0a82075. Results will be available as each benchmark for each run completes. |
There's some regression from Below result is tested on Xeon gold 5218. clang-12
gcc-9.4
|
@ursabot please benchmark lang=Python,R |
Supported benchmark command examples:
To run all benchmarks: To filter benchmarks by language: To filter Python and R benchmarks by name: To filter C++ benchmarks by archery --suite-filter and --benchmark-filter: For other |
@ursabot please benchmark lang=Python lang=R |
Supported benchmark command examples:
To run all benchmarks: To filter benchmarks by language: To filter Python and R benchmarks by name: To filter C++ benchmarks by archery --suite-filter and --benchmark-filter: For other |
Benchmark runs are scheduled for baseline = 2baed02 and contender = 93fcb18. Results will be available as each benchmark for each run completes. |
Hmm, the regression is annoying. I simply made a method non-static and the struct is empty (for non-floats). I expected modern compilers to handle this optimally :-/ |
So looks the overhead is from the benchmark itself. As the test strings are very short, the indirect call overhead becomes significant. |
Interestingly, when I add a long string in float parsing benchmark. Master is 2.7M/s, This PR is 7.8M/s, much faster.
|
By "indirect call overhead", you mean the fact that
This may be because of the updated fast-float version. |
@ursabot please benchmark |
Benchmark runs are scheduled for baseline = 2baed02 and contender = b5489e2. Results will be available as each benchmark for each run completes. |
I mean changes from |
Yes, ideally it's trivial. Which is why the regression is a bit surprising. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
For FloatParsing, this PR starts to beat master code when string size >= 8 on my test machine, the longer the string, the bigger the gap. For string size < 8, this PR is slower than master. Guess it's due to fast-float code updatds. I think this PR is beneficial overall.
Other regressions like HexParsing looks not real, clang and gcc gives different benchmark results.
Benchmark runs are scheduled for baseline = 6cdb80c and contender = b8431fb. b8431fb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
The custom wrapper is still used for decimals.