Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize arrayReduce, -Array and -State combinators #7608

Merged
merged 1 commit into from
Nov 10, 2019

Conversation

amosbird
Copy link
Collaborator

@amosbird amosbird commented Nov 4, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • Performance Improvement

Short description (up to few sentences):
vectorize processing arrayReduce similar to Aggregator addBatch. Might also serve as an infrastructure for #7550

native:
201.nobida.cn :) select ignore(arraySum(a)) from aa format Null

SELECT ignore(arraySum(a))
FROM aa
FORMAT Null


10000000 rows in set. Elapsed: 0.423 sec. Processed 10.00 million rows, 880.00 MB (23.66 million rows/s., 2.08 GB/s.)


before:
201.nobida.cn :) select ignore(arrayReduce('sum', a)) from aa format Null

SELECT ignore(arrayReduce('sum', a))
FROM aa
FORMAT Null


10000000 rows in set. Elapsed: 0.957 sec. Processed 10.00 million rows, 880.00 MB (10.45 million rows/s., 919.44 MB/s.)



after:
201.nobida.cn :) select ignore(arrayReduce('sum', a)) from aa format Null

SELECT ignore(arrayReduce('sum', a))
FROM aa
FORMAT Null


10000000 rows in set. Elapsed: 0.583 sec. Processed 10.00 million rows, 880.00 MB (17.17 million rows/s., 1.51 GB/s.)

@amosbird amosbird force-pushed the batchreduce branch 3 times, most recently from 4ec156b to 5c160d5 Compare November 4, 2019 11:20
Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but please also add the performance test.

@alexey-milovidov
Copy link
Member

Results of performance test look strange: https://clickhouse-test-reports.s3.yandex.net/7608/5c160d5efb0ca08b6b515d66d2abb0bb847713b4/performance_test/comparison_min_time_gcc_9.html

Could you please compare the most different result manually with clickhouse-benchmark?

@amosbird amosbird force-pushed the batchreduce branch 2 times, most recently from 0a8fa71 to 206d959 Compare November 5, 2019 05:05
@amosbird
Copy link
Collaborator Author

amosbird commented Nov 5, 2019

Summary

This pr optimizes three use cases.

  1. arrayReduce: 0.9 sec -> 0.56 sec, 60% improvement
  2. -Array combinator: 0.9 sec -> 0.34 sec, 260% improvement
  3. -State combinator: 3.0 sec -> 0.5 sec, 600% improvement

The actual improvements depend on aggregate functions, group by methods, group by size, etc.

Supercede #7508 #7549

clickhoues-benchmark results

  • Query: select arrayReduce('sum', a) from aa format Null;
Before:

Queries executed: 51.

localhost:9000, queries 51, QPS: 1.014, RPS: 10142086.101, MiB/s: 851.158, result RPS: 10142086.101, result MiB/s: 77.378.

0.000%          0.908 sec.
10.000%         0.938 sec.
20.000%         0.942 sec.
30.000%         0.944 sec.
40.000%         0.948 sec.
50.000%         0.951 sec.
60.000%         0.956 sec.
70.000%         0.961 sec.
80.000%         1.077 sec.
90.000%         1.093 sec.
95.000%         1.142 sec.
99.000%         1.194 sec.
99.900%         1.234 sec.
99.990%         1.238 sec.

After:

Queries executed: 59.

localhost:9000, queries 59, QPS: 1.681, RPS: 16814842.041, MiB/s: 1411.158, result RPS: 16814842.041, result MiB/s: 128.287.

0.000%          0.560 sec.
10.000%         0.582 sec.
20.000%         0.587 sec.
30.000%         0.589 sec.
40.000%         0.592 sec.
50.000%         0.595 sec.
60.000%         0.597 sec.
70.000%         0.602 sec.
80.000%         0.604 sec.
90.000%         0.607 sec.
95.000%         0.610 sec.
99.000%         0.632 sec.
99.900%         0.635 sec.
99.990%         0.636 sec.
  • Query: select sumArray(a) from aa format Null;
Before:

Queries executed: 51.

localhost:9000, queries 51, QPS: 1.045, RPS: 10453042.085, MiB/s: 877.254, result RPS: 1.045, result MiB/s: 0.000.

0.000%          0.897 sec.
10.000%         0.928 sec.
20.000%         0.931 sec.
30.000%         0.937 sec.
40.000%         0.949 sec.
50.000%         0.951 sec.
60.000%         0.954 sec.
70.000%         0.968 sec.
80.000%         0.978 sec.
90.000%         0.990 sec.
95.000%         0.994 sec.
99.000%         1.065 sec.
99.900%         1.078 sec.
99.990%         1.079 sec.

After:

Queries executed: 54.

localhost:9000, queries 54, QPS: 2.642, RPS: 26421096.796, MiB/s: 2217.347, result RPS: 2.642, result MiB/s: 0.000.

0.000%          0.345 sec.
10.000%         0.375 sec.
20.000%         0.377 sec.
30.000%         0.378 sec.
40.000%         0.379 sec.
50.000%         0.380 sec.
60.000%         0.381 sec.
70.000%         0.381 sec.
80.000%         0.382 sec.
90.000%         0.384 sec.
95.000%         0.385 sec.
99.000%         0.389 sec.
99.900%         0.390 sec.
99.990%         0.390 sec.
  • Query: select countState() from numbers(1000000000) format Null;
Before:

Queries executed: 21.

localhost:9000, queries 21, QPS: 0.324, RPS: 324260324.149, MiB/s: 2473.910, result RPS: 0.324, result MiB/s: 0.000.

0.000%          3.020 sec.
10.000%         3.041 sec.
20.000%         3.047 sec.
30.000%         3.049 sec.
40.000%         3.052 sec.
50.000%         3.054 sec.
60.000%         3.056 sec.
70.000%         3.060 sec.
80.000%         3.083 sec.
90.000%         3.100 sec.
95.000%         3.130 sec.
99.000%         3.481 sec.
99.900%         3.560 sec.
99.990%         3.568 sec.


After:

Queries executed: 41.

localhost:9000, queries 41, QPS: 1.896, RPS: 1895701598.912, MiB/s: 14463.055, result RPS: 1.896, result MiB/s: 0.000.

0.000%          0.490 sec.
10.000%         0.512 sec.
20.000%         0.516 sec.
30.000%         0.521 sec.
40.000%         0.522 sec.
50.000%         0.526 sec.
60.000%         0.530 sec.
70.000%         0.538 sec.
80.000%         0.540 sec.
90.000%         0.544 sec.
95.000%         0.549 sec.
99.000%         0.571 sec.
99.900%         0.573 sec.
99.990%         0.574 sec.

@amosbird
Copy link
Collaborator Author

amosbird commented Nov 5, 2019

The perf results almost don't make any sense to me. The slow down queries aren't related to any of this pr's changes

@amosbird
Copy link
Collaborator Author

amosbird commented Nov 5, 2019

btw, some perf tests use total_time as its main_metric, which is confusing.

@amosbird amosbird changed the title Vectorize arrayReduce Optimize arrayReduce, -Array and -State combinators Nov 5, 2019
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Nov 5, 2019

I have checked manually the query that has the most slowdown on performance test:

SELECT SearchEngineID AS k, uniqCombined(18)(UserID) FROM hits_100m_single GROUP BY k

I have downloaded two binaries, one from master and one from this PR:

wget https://clickhouse-builds.s3.yandex.net/0/6a871f579fa9638acc08ac8275f3aa1d2736a8f6/1572912744_deb/clickhouse-common-static_19.17.1.1613_amd64.deb
wget https://clickhouse-builds.s3.yandex.net/7608/206d959d1f8bc9c61aa8c90608862ba89cbd6852/1572933436_deb/clickhouse-common-static_19.17.1.1614_amd64.deb

Unpacked them to clickhouse-19.17-old and clickhouse-19.17-new.
Run them from the terminal:

sudo -u clickhouse ./clickhouse-19.17-old server -C /etc/clickhouse-server/config.xml

And run clickhouse-benchmark:

/usr/bin/clickhouse benchmark <<< "SELECT SearchEngineID AS k, uniqCombined(18)(UserID) FROM hits_100m_single GROUP BY k"
Old:

Queries executed: 54.

localhost:9000, queries 54, QPS: 2.751, RPS: 275118567.607, MiB/s: 2623.735, result RPS: 264.114, result MiB/s: 0.003.

0.000%          0.314 sec.
10.000%         0.332 sec.
20.000%         0.340 sec.
30.000%         0.344 sec.
40.000%         0.349 sec.
50.000%         0.357 sec.
60.000%         0.368 sec.
70.000%         0.372 sec.
80.000%         0.393 sec.
90.000%         0.407 sec.
95.000%         0.413 sec.
99.000%         0.422 sec.
99.900%         0.425 sec.
99.990%         0.426 sec.

New:

Queries executed: 57.

localhost:9000, queries 57, QPS: 2.603, RPS: 260334492.752, MiB/s: 2482.743, result RPS: 249.921, result MiB/s: 0.002.

0.000%          0.344 sec.
10.000%         0.352 sec.
20.000%         0.356 sec.
30.000%         0.367 sec.
40.000%         0.374 sec.
50.000%         0.382 sec.
60.000%         0.386 sec.
70.000%         0.398 sec.
80.000%         0.410 sec.
90.000%         0.418 sec.
95.000%         0.439 sec.
99.000%         0.448 sec.
99.900%         0.450 sec.
99.990%         0.450 sec.


Old:

Queries executed: 211.

localhost:9000, queries 211, QPS: 3.020, RPS: 302000223.497, MiB/s: 2880.099, result RPS: 289.920, result MiB/s: 0.003.

0.000%          0.290 sec.
10.000%         0.299 sec.
20.000%         0.307 sec.
30.000%         0.310 sec.
40.000%         0.318 sec.
50.000%         0.324 sec.
60.000%         0.329 sec.
70.000%         0.340 sec.
80.000%         0.359 sec.
90.000%         0.377 sec.
95.000%         0.392 sec.
99.000%         0.404 sec.
99.900%         0.410 sec.
99.990%         0.411 sec.

New:

Queries executed: 224.

localhost:9000, queries 224, QPS: 2.607, RPS: 260723585.248, MiB/s: 2486.454, result RPS: 250.295, result MiB/s: 0.002.

0.000%          0.336 sec.
10.000%         0.350 sec.
20.000%         0.357 sec.
30.000%         0.361 sec.
40.000%         0.368 sec.
50.000%         0.375 sec.
60.000%         0.384 sec.
70.000%         0.395 sec.
80.000%         0.411 sec.
90.000%         0.436 sec.
95.000%         0.446 sec.
99.000%         0.458 sec.
99.900%         0.471 sec.
99.990%         0.472 sec.

And have found that this performance regression is real.

Although the code looks reasonable and performance improvement that you've demonstrated is massive.

Let's look at perf top.

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Nov 5, 2019

Then I run two servers simultaneously in comparison mode (this is the new feature of clickhouse-benchmark that should show statistical significant results if there any):

/usr/bin/clickhouse benchmark --port 9000 --port 9001 <<< "SELECT SearchEngineID AS k, uniqCombined(18)(UserID) FROM hits_100m_single GROUP BY k"
Queries executed: 231.

localhost:9000, queries 103, QPS: 2.990, RPS: 299010679.262, MiB/s: 2851.588, result RPS: 287.050, result MiB/s: 0.003.
localhost:9001, queries 128, QPS: 2.586, RPS: 258638185.987, MiB/s: 2466.566, result RPS: 248.293, result MiB/s: 0.002.

0.000%          0.290 sec.      0.337 sec.
10.000%         0.302 sec.      0.352 sec.
20.000%         0.312 sec.      0.357 sec.
30.000%         0.319 sec.      0.364 sec.
40.000%         0.325 sec.      0.374 sec.
50.000%         0.332 sec.      0.380 sec.
60.000%         0.334 sec.      0.389 sec.
70.000%         0.344 sec.      0.397 sec.
80.000%         0.352 sec.      0.411 sec.
90.000%         0.369 sec.      0.436 sec.
95.000%         0.392 sec.      0.441 sec.
99.000%         0.429 sec.      0.452 sec.
99.900%         0.433 sec.      0.565 sec.
99.990%         0.434 sec.      0.580 sec.

Difference at 99.5% confidence : mean difference is 0.05220429, but confidence interval is 0.01285394

@amosbird
Copy link
Collaborator Author

amosbird commented Nov 6, 2019

Ok, it seems some compiler tweaks are required. I'll investigate...

This is the new feature of clickhouse-benchmark that should guarantee statistical significant results

That's awesome!

@amosbird amosbird force-pushed the batchreduce branch 3 times, most recently from dc9b1d9 to 6eb1bfc Compare November 7, 2019 02:30
Also devirtualize -State combinator.
@alexey-milovidov alexey-milovidov merged commit 1d910c5 into ClickHouse:master Nov 10, 2019
@CurtizJ CurtizJ added the pr-improvement Pull request with some product improvements label Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants