Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Support Bucket Shuffle Join for Doris #4394

Closed
HappenLee opened this issue Aug 19, 2020 · 0 comments
Closed

[Proposal] Support Bucket Shuffle Join for Doris #4394

HappenLee opened this issue Aug 19, 2020 · 0 comments
Labels
area/planner Issues or PRs related to the query planner Discuss

Comments

@HappenLee
Copy link
Contributor

HappenLee commented Aug 19, 2020

Motivation

At present, Doris support 3 type join: shuffle join, broadcast join, colocate join.
Except colocate join,another join will lead to a lot of network consumption.

For example, there a SQL A join B, the cost of network.

  • broadcast join: if table A is divided into three parts,the net work cost is 3B
  • shuffle join: the network cost is A + B.

These network consumption not only leads to slow query, but also leads to extra memory consumption during join.

Each Doris table have disrtribute info, if the join expr hit the distribute info, we should use the distribute info to reduce the network consumption.

What is bucket shuffle join

image.png

just like Hive's bucket map join, the picture show how it work. if there a SQL A join B, and the join expr hit the distribute info of A. Bucket shuffle join only need distribute table B, sent the data to proper table A part. So the network cost is always B.

So compared with the original join, obviously bucket shuffle join lead to less network cost:

                 B < min(3B, A + B)

It can bring us the following benefits:

  1. First, Bucket Shuffle Join reduce the network cost and lead to a better performance for some join. Especially when the bucket is cropped.

  2. It does not strongly rely on the mechanism of collocate, so it is transparent to users. There is no mandatory requirement for data distribution, which will not lead to data skew.

  3. It can provide more query optimization space for join reorder.

How Bucket Shffle Join For Doris

The key idea and challenge

  • Keep left table data locally in same join case, the join expr need contain the left table distribute column
  • BE must know how right table data to send by distribution in left table

1. Now the data distribute in left table when data load into doris

Firstly, we get the hash value by distributed column of left table, and mod by bucket num. like the pic below:
image.png

Secondy, we data distributed column of left table, send the data distribute of left table to right table。Right table do the same hash way to distribute data, so we only send one copy right table data when we do join compute.
image.png

2 Bucket Shuffle Join query plan

For the Bucket Shuffle Join, we change the DataStreamSend Partition way to BUCKET_SHFFULE_HASH_PARTITIONED, the right table will distribute data by distribution of left table.
image.png

3 Bucket Shuffle Join query schedule

The schedule goal: the DataStreamSender Should know of the bucket that left table distribute and send the proper data to the proper instance.

The schedule strategy: the left table ScanNode chose which bucket seqs it should own and need to keep each Instance of ScanNode have same count of bucket seqs. The FE Send how bucket seqs distribute in BE to that DataStream Sender know how to send data.

image.png

  • Different Framgment have different Bucket Seq distribute, so we need distinguish them by FragmentId
  • We need to make sure each BucketID have same Bucekt count to keep data compulte balance

4 How to decide a query can be Bucket Shuffle Join

  1. The eqJoinConjuncts must contain the left table distribution columns
  2. The eqJoinConjuncts between left table columns and right table columns must keep the same type

5 Others

Add a session variable enable_bucket_shuffle_join, the default value is true.

The limit for curren bucket shuffle join

    1. The left table of bucket shuffle join must be OLAP table. (the reason is obvious)
    1. The join columns should contains all left table distribute columns to enable bucket shuffle join.(the reason is obvious)
    1. After Partition crop, there should only have one partition in left table. (the reason is obvious)
    1. The join column of left table distribution columns and right table column must have the same data type, Different data type will cause different the hash result. (I will solve this problem in next version)

Origin Join VS Bucket Shuffle Join

Test Data:

TPC-DS 1TB Data

Cluser Info:

10 BEs, each BE is Physical Machine and has 48CPU, 96GMEM.

Test Result of TPC-DS

If pic do not show time which means query failed since mem limit exceed.

image

@wuyunfeng wuyunfeng added Discuss area/planner Issues or PRs related to the query planner labels Aug 21, 2020
HappenLee added a commit to HappenLee/incubator-doris that referenced this issue Sep 27, 2020
HappenLee added a commit to HappenLee/incubator-doris that referenced this issue Sep 27, 2020
morningman pushed a commit that referenced this issue Oct 11, 2020
Support Bucket Shuffle Join
issue:#4394
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/planner Issues or PRs related to the query planner Discuss
Projects
None yet
Development

No branches or pull requests

2 participants