Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[commons] arrow -> arrow2 #1239

Merged
merged 9 commits into from Aug 2, 2021
Merged

Conversation

sundy-li
Copy link
Member

I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

Summary

Summary about this PR

Changelog

  • Improvement

Related Issues

Fixes #1170

Test Plan

Unit Tests

Stateless Tests

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@BohuTANG BohuTANG mentioned this pull request Jul 31, 2021
2 tasks
@sundy-li
Copy link
Member Author

sundy-li commented Aug 2, 2021

Baseline benchmarks compare arrow1 vs arrow2 shows great performance improvement.

https://github.com/sundy-li/learn/tree/master/arrow-vs-arrow2/benches

argo bench -- "arrow2-min 2\^13 f32"
arrow2-min 2^13 f32     time:   [1.5618 us 1.5680 us 1.5752 us]

argo bench -- "arrow1-min 2\^13 f32"
arrow1-min 2^13 f32     time:   [3.1624 us 3.1732 us 3.1855 us]

After migrated datafuse into arrow2, the performance shows not much difference. Maybe the bottleneck is still in upstream, such as numbers_tream.

perf-results

Arrow2 makes the code be much more clear and simple, so it's still better to have it.

@sundy-li sundy-li marked this pull request as ready for review August 2, 2021 03:46
@codecov-commenter
Copy link

Codecov Report

Merging #1239 (8dbd277) into master (38c1c70) will increase coverage by 0%.
The diff coverage is 59%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1239     +/-   ##
========================================
  Coverage      70%     71%             
========================================
  Files         476     472      -4     
  Lines       27683   29018   +1335     
========================================
+ Hits        19629   20625    +996     
- Misses       8054    8393    +339     
Impacted Files Coverage Δ
common/datablocks/src/kernels/data_block_slice.rs 100% <ø> (ø)
common/datavalues/src/arrays/ops/vec_hash.rs 54% <0%> (ø)
common/datavalues/src/series/comparison.rs 96% <ø> (+6%) ⬆️
common/datavalues/src/series/date_wrap.rs 0% <ø> (ø)
common/datavalues/src/series/series_impl.rs 34% <ø> (+3%) ⬆️
...usequery/query/src/api/rpc/flight_client_stream.rs 0% <0%> (ø)
...sequery/query/src/api/rpc/flight_service_stream.rs 23% <ø> (ø)
fusequery/query/src/sql/sql_common.rs 74% <ø> (+4%) ⬆️
fusestore/store/src/executor/action_handler.rs 61% <0%> (+1%) ⬆️
common/datavalues/src/arrays/ops/fill.rs 63% <7%> (-19%) ⬇️
... and 99 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38c1c70...8dbd277. Read the comment docs.

@BohuTANG
Copy link
Member

BohuTANG commented Aug 2, 2021

Performance with my AMD Ryzen 7 PRO 4750U(Still faster in most cases, expect for last two case):

Query arrow1(master) arrow2
SELECT avg(number) FROM numbers_mt(100000000000) 4.35 s.
(22.97 billion rows/s., 183.91 GB/s.)
4.17 s.
(23.98 billion rows/s., 191.84 GB/s.)
SELECT sum(number) FROM numbers_mt(100000000000) 4.20 s.
(23.79 billion rows/s., 190.50 GB/s.)
3.72 s.
(26.82 billion rows/s., 214.59 GB/s.)
SELECT min(number) FROM numbers_mt(100000000000) 4.92 s.
(20.31 billion rows/s., 162.64 GB/s.)
4.3 s.
(22.90 billion rows/s., 183.17 GB/s.)
SELECT max(number) FROM numbers_mt(100000000000) 4.77 s.
(20.95 billion rows/s., 167.78 GB/s.)
4.3 s.
(22.90 billion rows/s., 183.17 GB/s.)
SELECT count(number) FROM numbers_mt(100000000000) 2.91 s.
(34.33 billion rows/s., 274.90 GB/s.)
2.31 s.
(43.21 billion rows/s., 345.66 GB/s.)
SELECT sum(number+number+number) FROM numbers_mt(100000000000) 19.83 s.
(5.04 billion rows/s., 40.37 GB/s.)
16.16 s.
(6.19 billion rows/s., 49.50 GB/s.)
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) 3.90 s.
(25.62 billion rows/s., 205.13 GB/s.)
3.91 s.
(25.57 billion rows/s., 204.57 GB/s.)
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) 8.28 s.
(12.07 billion rows/s., 96.66 GB/s.)
8.21 s.
(12.17 billion rows/s., 97.34 GB/s.)
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 100 4.80 s.
(2.08 billion rows/s., 16.67 GB/s.)
5.37 s.
(1.86 billion rows/s., 14.88 GB/s.)
SELECT max(number), sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 6.31 s.
(158.49 million rows/s., 1.27 GB/s.)
7.0 s.
(141.51 million rows/s., 1.13 GB/s.)

cc @sundy-li @zhang2014 @jorgecarleitao

@sundy-li
Copy link
Member Author

sundy-li commented Aug 2, 2021

Ok, but the baseline of sort performance is poor, I'm trying to find the bottleneck in arrow2.

jorgecarleitao/arrow2#245

It's strange to see the group by query be slower but the aggregation is faster, this pr did not change the group by behavior.

@BohuTANG
Copy link
Member

BohuTANG commented Aug 2, 2021

Ok, but the baseline of sort performance is poor, I'm trying to find the bottleneck in arrow2.

jorgecarleitao/arrow2#245

I have seen that, if it solved, arrow2 all is top performance against arrow1

@BohuTANG
Copy link
Member

BohuTANG commented Aug 2, 2021

For the groupby, may be the first test is fast in that time, i think we can meaused that again when the sort sloved.

Copy link
Member

@BohuTANG BohuTANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot databend-bot merged commit 9f5f046 into datafuselabs:master Aug 2, 2021
@jorgecarleitao
Copy link

I believe that the group by perf can be addressed with the arrow2::compute::hash kernel: hash the arrays one by one and merge the hashes

@sundy-li
Copy link
Member Author

sundy-li commented Aug 4, 2021

arrow2::compute::hash

The current design in datafuse is non-conflict compact hash combination key.

  1. Group by String, the combination key is binary as Vec<u8>
  2. Group by UInt32, the combination key is UInt32
  3. Group by UInt8, UInt8, UInt8, the combination key is UInt32, left last 8bytes zeroed.
  4. Group by UInt8, we can make a small lookup table to tune the performance.

If the key size is defined, we can allocate a large memory in one allocation, the size is Keysize * rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Considering migrate to arrow2
5 participants