Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15005: [C++] Improve csv parser with Neon #11896

Closed
wants to merge 3 commits into from

Conversation

cyb70289
Copy link
Contributor

@cyb70289 cyb70289 commented Dec 8, 2021

No description provided.

@github-actions
Copy link

github-actions bot commented Dec 8, 2021

@cyb70289
Copy link
Contributor Author

cyb70289 commented Dec 8, 2021

Neoverse N1, clang-12

-----------------------------------------------------------------
Non-regressions: (8)
-----------------------------------------------------------------
              benchmark        baseline       contender  change %
ParseCSVVehiclesExample 943.178 MiB/sec   1.411 GiB/sec    53.152
  ParseCSVStocksExample 817.026 MiB/sec 888.977 MiB/sec     8.806
    ParseCSVQuotedBlock 497.098 MiB/sec 514.735 MiB/sec     3.548
 ParseCSVFlightsExample 365.120 MiB/sec 370.772 MiB/sec     1.548
   ParseCSVEscapedBlock 451.707 MiB/sec 452.478 MiB/sec     0.171
ChunkCSVNoNewlinesBlock      193.220245      193.543142     0.167
    ChunkCSVQuotedBlock 720.860 MiB/sec 720.979 MiB/sec     0.017
   ChunkCSVEscapedBlock 774.219 MiB/sec 774.157 MiB/sec    -0.008

M1, AppleClang-12

------------------------------------------------------------------
Non-regressions: (8)
------------------------------------------------------------------
              benchmark         baseline       contender  change %
ParseCSVVehiclesExample 1006.916 MiB/sec   1.765 GiB/sec    79.447
  ParseCSVStocksExample    1.005 GiB/sec   1.242 GiB/sec    23.544
 ParseCSVFlightsExample  483.309 MiB/sec 484.854 MiB/sec     0.320
    ChunkCSVQuotedBlock    1.505 GiB/sec   1.508 GiB/sec     0.238
   ParseCSVEscapedBlock  527.498 MiB/sec 528.321 MiB/sec     0.156
    ParseCSVQuotedBlock  521.683 MiB/sec 522.305 MiB/sec     0.119
   ChunkCSVEscapedBlock    1.615 GiB/sec   1.614 GiB/sec    -0.021
ChunkCSVNoNewlinesBlock        83.988479       81.389943    -3.094

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

cpp/src/arrow/csv/parser.cc Outdated Show resolved Hide resolved
cpp/src/arrow/csv/parser.cc Show resolved Hide resolved
@cyb70289
Copy link
Contributor Author

cyb70289 commented Dec 9, 2021

New benchmark result after templating special options.

Neoverse N1, clang-12

------------------------------------------------------------------
Non-regressions: (8)
------------------------------------------------------------------
              benchmark        baseline        contender  change %
ParseCSVVehiclesExample 936.272 MiB/sec    1.695 GiB/sec    85.394
  ParseCSVStocksExample 832.577 MiB/sec 1010.333 MiB/sec    21.350
    ParseCSVQuotedBlock 433.361 MiB/sec  499.383 MiB/sec    15.235
 ParseCSVFlightsExample 344.455 MiB/sec  358.278 MiB/sec     4.013
   ParseCSVEscapedBlock 438.008 MiB/sec  447.634 MiB/sec     2.198
   ChunkCSVEscapedBlock 774.220 MiB/sec  774.256 MiB/sec     0.005
    ChunkCSVQuotedBlock 721.149 MiB/sec  720.960 MiB/sec    -0.026
ChunkCSVNoNewlinesBlock      192.677073       192.158918    -0.269

M1, AppleClang-12

------------------------------------------------------------------
Non-regressions: (8)
------------------------------------------------------------------
              benchmark         baseline       contender  change %
ParseCSVVehiclesExample 1011.496 MiB/sec   1.765 GiB/sec    78.723
  ParseCSVStocksExample    1.009 GiB/sec   1.243 GiB/sec    23.128
   ChunkCSVEscapedBlock    1.614 GiB/sec   1.615 GiB/sec     0.037
    ChunkCSVQuotedBlock    1.510 GiB/sec   1.510 GiB/sec     0.029
   ParseCSVEscapedBlock  528.560 MiB/sec 528.450 MiB/sec    -0.021
    ParseCSVQuotedBlock  523.418 MiB/sec 522.261 MiB/sec    -0.221
 ParseCSVFlightsExample  486.159 MiB/sec 484.643 MiB/sec    -0.312
ChunkCSVNoNewlinesBlock        83.910285       81.432684    -2.953

@pitrou pitrou closed this in 464ccde Dec 9, 2021
@pitrou
Copy link
Member

pitrou commented Dec 9, 2021

That's a very nice improvement, thank you :-)

@ursabot
Copy link

ursabot commented Dec 9, 2021

Benchmark runs are scheduled for baseline = f0110cf and contender = 464ccde. 464ccde is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.9%] ursa-i9-9960x
[Finished ⬇️0.71% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@cyb70289 cyb70289 deleted the 15005-csv-parser-neon branch December 9, 2021 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants