Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: support index nested loop hash join #8661

Merged
merged 70 commits into from Sep 3, 2019

Conversation

yu34po
Copy link
Contributor

@yu34po yu34po commented Dec 12, 2018

What problem does this PR solve?

#8470

What is changed and how it works?

We'll split index_lookup_join into index_hash_join and index_merge_join.
This PR only adds the implementation for index_hash_join.
If we do not need to keep the result order as the outer plan, we'll build an index_hash_join executor. Otherwise, we use the old index_lookup_join executor.

IndexNestedLoopHashJoin employs one outer worker and N inner workers to
execute concurrently. The output order is not promised.

The execution flow is very similar to IndexLookUpReader:

  1. The outer worker reads N outer rows, builds a task and sends it to the
    inner worker channel.
  2. The inner worker receives the tasks and does 3 things for every task:
    1. builds hash table from the outer rows
    2. builds key ranges from outer rows and fetches inner rows
    3. probes the hash table and sends the join result to the main thread channel.
      Note: step i and step ii runs concurrently.
  3. The main thread receives the join results.

Check List

Tests

Existing test case.

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change

Side effects

  1. The current implementation of tryToMatch uses an inner iterator and one outer row, this commit will cause performance reduction.
  2. After adding a new joiner function trytoMatch(outerIterator, innerRow)/onMissMatch(executor: add tryToMatchOuters to the joiner interface #9286), performance can be recovered.
  3. After executor: decrease the memory usage of hashTable in HashJoinExec #11832 is merged, the hashtable can be optimized further. [DONE]

This change is Reviewable

@zz-jason zz-jason added status/WIP contribution This PR is from a community contributor. sig/planner SIG: Planner sig/execution SIG execution labels Dec 12, 2018
@XuHuaiyu
Copy link
Contributor

@yu34po
There are too many modifications in this PR, we'd better split this PR into smaller ones, each one can be pushed as an individual PR:

  1. Rename IndexLookupJoin to IndexJoin.
  2. Add the definition for IndexHashJoin.
  3. Implement IndexHashJoin, build hashtable from OuterTable, but we need to keep the output order be the same as the outer table in this version of implementation (Because if we do not keep the order, most cases may fail).
    3.1 We use the old tryToMatch interface here, and pass 1 outRow and 1 innerRow every time, thus can avoid the wong columns order in the output.
  4. Implement new tryToMatch interface for join 1 innerRow and n outerRows.
  5. Implement IndexMergeJoin, and consider the KeepOutOrder flag now.

Copy link
Member

@shenli shenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yu34po Please:

  1. Refine the description of this PR.
  2. Add performance benchmark.

executor/index_join.go Outdated Show resolved Hide resolved
executor/index_join.go Outdated Show resolved Hide resolved
@yu34po yu34po changed the title Indexjoin add feature HashIndexjoin to IndexLookUpJoin Dec 17, 2018
@yu34po yu34po changed the title add feature HashIndexjoin to IndexLookUpJoin add feature IndexHashJoin to IndexLookUpJoin Dec 17, 2018
@yu34po yu34po mentioned this pull request Dec 17, 2018
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
executor/index_lookup_join.go Outdated Show resolved Hide resolved
@zhouqiang-cl
Copy link
Contributor

/rebuild

@qw4990 qw4990 self-requested a review January 2, 2019 06:53
@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Jan 3, 2019

PTAL @zz-jason @qw4990

@zz-jason zz-jason self-requested a review January 3, 2019 03:13
@zz-jason zz-jason added status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. and removed status/ReqChange labels Sep 2, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 2, 2019

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Sep 2, 2019

@yu34po merge failed.

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Sep 3, 2019

/run-all-tests

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Sep 3, 2019

/run-all-tests

func (iw *indexHashJoinInnerWorker) buildHashTableForOuterResult(ctx context.Context, cancelFunc context.CancelFunc, task *indexHashJoinTask, h hash.Hash64) {
rowIdx, numRows := 0, task.outerResult.NumRows()
buf := make([]byte, 1)
task.lookupMap = newRowHashMap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newRowHashMap is added an argument here, https://github.com/pingcap/tidb/pull/11937/files#diff-4f7e05240341383641662778116ddf13R236
We should merge #11937, then fix it here.

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Sep 3, 2019

/run-all-tests

Copy link
Contributor

@SunRunAway SunRunAway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SunRunAway SunRunAway added status/can-merge Indicates a PR has been approved by a committer. and removed status/can-merge Indicates a PR has been approved by a committer. labels Sep 3, 2019
@SunRunAway
Copy link
Contributor

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Sep 3, 2019

/run-all-tests

@XuHuaiyu XuHuaiyu merged commit d9e7bd3 into pingcap:master Sep 3, 2019
@XuHuaiyu XuHuaiyu deleted the indexjoin branch September 3, 2019 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. sig/execution SIG execution sig/planner SIG: Planner status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet