ARROW-13536: [C++] Use decimal-point aware conversion from fast-float #11817

pitrou · 2021-11-30T18:08:59Z

The custom wrapper is still used for decimals.

pitrou · 2021-11-30T18:09:09Z

@ursabot please benchmark

github-actions · 2021-11-30T18:09:18Z

https://issues.apache.org/jira/browse/ARROW-13536

ursabot · 2021-11-30T18:09:19Z

Benchmark runs are scheduled for baseline = 2baed02 and contender = 2113e2a. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed] ec2-t3-xlarge-us-east-2
[Failed] ursa-i9-9960x
[Failed] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

The custom wrapper is still used for decimals.

pitrou · 2021-11-30T18:24:09Z

@ursabot please benchmark

ursabot · 2021-11-30T18:25:19Z

Benchmark runs are scheduled for baseline = 2baed02 and contender = 0a82075. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️25.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.35% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

cyb70289 · 2021-12-01T04:44:27Z

There's some regression from FloatParsing. I repeated the test locally with similar result.

Below result is tested on Xeon gold 5218. FloatParsing sees 25%~30% regression for clang, 12% regression for gcc.
I also tested on Arm neoverse N1, the regression is about 10% for both compilers.

clang-12
$ archery benchmark diff --suite-filter="arrow-value-parsing-benchmark" --cc=clang --cxx=clang++

-----------------------------------------------------------------------------------------
Non-regressions: (23)                                                                    
-----------------------------------------------------------------------------------------
                                benchmark           baseline          contender  change %
                    HexParsing<Int16Type> 122.889M items/sec 146.914M items/sec    19.551
                IntegerParsing<Int16Type> 122.648M items/sec 130.271M items/sec     6.215
             IntegerFormatting<Int64Type>  50.467M items/sec  53.040M items/sec     5.097
......

---------------------------------------------------------------------------------------
Regressions: (10)
---------------------------------------------------------------------------------------
                              benchmark           baseline          contender  change %
             IntegerParsing<UInt32Type> 158.139M items/sec 149.383M items/sec    -5.537
              IntegerParsing<UInt8Type> 193.646M items/sec 181.939M items/sec    -6.046
              IntegerParsing<Int32Type>  96.648M items/sec  87.516M items/sec    -9.449
                  HexParsing<UInt8Type> 154.437M items/sec 139.302M items/sec    -9.800
                   HexParsing<Int8Type> 159.359M items/sec 143.353M items/sec   -10.044
                 HexParsing<UInt32Type> 104.878M items/sec  90.750M items/sec   -13.470
TimestampParsingISO8601<TimeUnit::NANO>  44.332M items/sec  37.728M items/sec   -14.897
                FloatParsing<FloatType>  52.520M items/sec  38.632M items/sec   -26.444
               FloatParsing<DoubleType>  59.252M items/sec  41.705M items/sec   -29.613
                 HexParsing<UInt16Type> 122.306M items/sec  84.441M items/sec   -30.959

gcc-9.4
$ archery benchmark diff --suite-filter="arrow-value-parsing-benchmark" --cc=gcc --cxx=g++

-----------------------------------------------------------------------------------------
Non-regressions: (28)                                                                    
-----------------------------------------------------------------------------------------
                                benchmark           baseline          contender  change %
                   HexParsing<UInt32Type> 117.514M items/sec 131.796M items/sec    12.153
                    HexParsing<UInt8Type> 191.946M items/sec 213.739M items/sec    11.354
TimestampParsingISO8601<TimeUnit::SECOND>  38.486M items/sec  41.766M items/sec     8.523
                 IntegerParsing<Int8Type> 136.544M items/sec 147.524M items/sec     8.042
                   HexParsing<UInt16Type> 150.635M items/sec 158.778M items/sec     5.406
             IntegerFormatting<UInt8Type> 405.355M items/sec 426.245M items/sec     5.153
......

-----------------------------------------------------------------------------
Regressions: (5)
-----------------------------------------------------------------------------
                    benchmark           baseline          contender  change %
   IntegerParsing<UInt32Type> 167.434M items/sec 157.774M items/sec    -5.769
IntegerFormatting<UInt16Type> 183.618M items/sec 172.128M items/sec    -6.257
        HexParsing<Int16Type> 142.974M items/sec 130.157M items/sec    -8.965
     FloatParsing<DoubleType>  45.377M items/sec  39.540M items/sec   -12.864
      FloatParsing<FloatType>  43.444M items/sec  37.624M items/sec   -13.397

pitrou · 2021-12-01T09:04:00Z

@ursabot please benchmark lang=Python,R

ursabot · 2021-12-01T09:04:03Z

Supported benchmark command examples:

@ursabot benchmark help

To run all benchmarks:
@ursabot please benchmark

To filter benchmarks by language:
@ursabot please benchmark lang=Python
@ursabot please benchmark lang=C++
@ursabot please benchmark lang=R
@ursabot please benchmark lang=Java
@ursabot please benchmark lang=JavaScript

To filter Python and R benchmarks by name:
@ursabot please benchmark name=file-write
@ursabot please benchmark name=file-write lang=Python
@ursabot please benchmark name=file-.*

To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3

For other command=cpp-micro options, please see https://github.com/ursacomputing/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py

pitrou · 2021-12-01T09:04:30Z

@ursabot please benchmark lang=Python lang=R

ursabot · 2021-12-01T09:04:31Z

Supported benchmark command examples:

@ursabot benchmark help

To run all benchmarks:
@ursabot please benchmark

To filter benchmarks by language:
@ursabot please benchmark lang=Python
@ursabot please benchmark lang=C++
@ursabot please benchmark lang=R
@ursabot please benchmark lang=Java
@ursabot please benchmark lang=JavaScript

To filter Python and R benchmarks by name:
@ursabot please benchmark name=file-write
@ursabot please benchmark name=file-write lang=Python
@ursabot please benchmark name=file-.*

To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3

For other command=cpp-micro options, please see https://github.com/ursacomputing/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py

pitrou · 2021-12-01T09:04:51Z

@ursabot please benchmark lang=Python
@ursabot please benchmark lang=R

ursabot · 2021-12-01T09:05:18Z

Benchmark runs are scheduled for baseline = 2baed02 and contender = 93fcb18. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.45% ⬆️0.0%] ursa-i9-9960x
[Skipped ⚠️ Only ['C++', 'Java'] langs are supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

pitrou · 2021-12-01T09:07:39Z

Hmm, the regression is annoying. I simply made a method non-static and the struct is empty (for non-floats). I expected modern compilers to handle this optimally :-/

cyb70289 · 2021-12-01T10:39:52Z

Hmm, the regression is annoying. I simply made a method non-static and the struct is empty (for non-floats). I expected modern compilers to handle this optimally :-/

So looks the overhead is from the benchmark itself. As the test strings are very short, the indirect call overhead becomes significant.
I think it's not a real problem in practice.

cyb70289 · 2021-12-01T10:47:48Z

Interestingly, when I add a long string in float parsing benchmark. Master is 2.7M/s, This PR is 7.8M/s, much faster.

diff --git a/cpp/src/arrow/util/value_parsing_benchmark.cc b/cpp/src/arrow/util/value_parsing_benchmark.cc
index 40d139316..b955a4174 100644
--- a/cpp/src/arrow/util/value_parsing_benchmark.cc
+++ b/cpp/src/arrow/util/value_parsing_benchmark.cc
@@ -77,9 +77,8 @@ static std::vector<std::string> MakeHexStrings(int32_t num_items) {
 }

 static std::vector<std::string> MakeFloatStrings(int32_t num_items) {
-  std::vector<std::string> base_strings = {"0.0",         "5",        "-12.3",
-                                           "98765430000", "3456.789", "0.0012345",
-                                           "2.34567e8",   "-5.67e-8"};
+  std::vector<std::string> base_strings = {
+                                           "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112222222223333"};

pitrou · 2021-12-01T11:15:45Z

By "indirect call overhead", you mean the fact that StringToFloat gained a new parameter?

Interestingly, when I add a long string in float parsing benchmark. Master is 2.7M/s, This PR is 7.8M/s, much faster.

This may be because of the updated fast-float version.

pitrou · 2021-12-01T11:38:08Z

@ursabot please benchmark

ursabot · 2021-12-01T11:39:25Z

Benchmark runs are scheduled for baseline = 2baed02 and contender = b5489e2. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.45% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.53% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

cyb70289 · 2021-12-01T13:32:10Z

By "indirect call overhead", you mean the fact that StringToFloat gained a new parameter?

I mean changes from StringConverter<T>::Convert(type, s, length, out) to StringConverter<T>{}.Convert(type, s, length, out), which might generate a temporary object.
But I'm not certain now. Looks it's trivial for compiler to eliminate the overhead. https://godbolt.org/z/TGTqsMdM1

pitrou · 2021-12-01T13:39:10Z

Yes, ideally it's trivial. Which is why the regression is a bit surprising.

cyb70289

+1

For FloatParsing, this PR starts to beat master code when string size >= 8 on my test machine, the longer the string, the bigger the gap. For string size < 8, this PR is slower than master. Guess it's due to fast-float code updatds. I think this PR is beneficial overall.

Other regressions like HexParsing looks not real, clang and gcc gives different benchmark results.

ursabot · 2021-12-02T11:08:14Z

Benchmark runs are scheduled for baseline = 6cdb80c and contender = b8431fb. b8431fb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️1.35% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.49% ⬆️0.27%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

github-actions bot added the Component: C++ label Nov 30, 2021

ARROW-13536: [C++] Use decimal-point aware conversion from fast-float

0a82075

The custom wrapper is still used for decimals.

pitrou force-pushed the ARROW-13536-fast-float branch from 2113e2a to 0a82075 Compare November 30, 2021 18:23

Quick fix

93fcb18

Bump fast_float version with compilation fix

b5489e2

cyb70289 approved these changes Dec 2, 2021

View reviewed changes

pitrou closed this in b8431fb Dec 2, 2021

pitrou deleted the ARROW-13536-fast-float branch December 2, 2021 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-13536: [C++] Use decimal-point aware conversion from fast-float #11817

ARROW-13536: [C++] Use decimal-point aware conversion from fast-float #11817

pitrou commented Nov 30, 2021 •

edited

Loading

pitrou commented Nov 30, 2021

github-actions bot commented Nov 30, 2021

ursabot commented Nov 30, 2021 •

edited

Loading

pitrou commented Nov 30, 2021

ursabot commented Nov 30, 2021 •

edited

Loading

cyb70289 commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021 •

edited

Loading

pitrou commented Dec 1, 2021

cyb70289 commented Dec 1, 2021

cyb70289 commented Dec 1, 2021

pitrou commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021 •

edited

Loading

cyb70289 commented Dec 1, 2021

pitrou commented Dec 1, 2021

cyb70289 left a comment

ursabot commented Dec 2, 2021 •

edited

Loading

ARROW-13536: [C++] Use decimal-point aware conversion from fast-float #11817

ARROW-13536: [C++] Use decimal-point aware conversion from fast-float #11817

Conversation

pitrou commented Nov 30, 2021 • edited Loading

pitrou commented Nov 30, 2021

github-actions bot commented Nov 30, 2021

ursabot commented Nov 30, 2021 • edited Loading

pitrou commented Nov 30, 2021

ursabot commented Nov 30, 2021 • edited Loading

cyb70289 commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021 • edited Loading

pitrou commented Dec 1, 2021

cyb70289 commented Dec 1, 2021

cyb70289 commented Dec 1, 2021

pitrou commented Dec 1, 2021

pitrou commented Dec 1, 2021

ursabot commented Dec 1, 2021 • edited Loading

cyb70289 commented Dec 1, 2021

pitrou commented Dec 1, 2021

cyb70289 left a comment

Choose a reason for hiding this comment

ursabot commented Dec 2, 2021 • edited Loading

pitrou commented Nov 30, 2021 •

edited

Loading

ursabot commented Nov 30, 2021 •

edited

Loading

ursabot commented Nov 30, 2021 •

edited

Loading

ursabot commented Dec 1, 2021 •

edited

Loading

ursabot commented Dec 1, 2021 •

edited

Loading

ursabot commented Dec 2, 2021 •

edited

Loading