Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cdbpath_motion_for_parallel_join() by outer join inner style #98

Merged

Conversation

avamingli
Copy link
Collaborator

@avamingli avamingli commented Jul 27, 2023

We have separated join locus function for non-parallel and parallel mode into cdbpath_motion_for_join() which handles for locus whose parallel_workers is 0 and cdbpath_motion_for_parallel_join() which handles others.

A lot of logic are meaningless in cdbpath_motion_for_parallel_join():
1.Writeable operations for join.
2.Both sides's locus parallel_workers are no more than one.
3.Some locus couldn't participate parallel join yet now.

We have done the job for SegmentGeneralWorkers as outer side join with other locus as inner side, but lack for other types. Refactoring that to keep codes clear.
Some wrong logic are found and left CBDB_PARALLEL_FIXME to remind us to resolve them in upcoming fixes.

Main changes:
SingleQE join SegmentGeneralWorkers.
SingleQE join Partitioned(workers>1), except HashedOJ.
SegmentGeneral as outer, Assert(false).
Partitioned(workers>1) join SegmentGeneral.
Partitioned(workers>1) join SegmentGeneralWorkers.
Partitioned(workers>1) join SingleQE.
Partitioned join Partitioned are handled as cdbpath_motion_for_join().
Remove all if conditions for inner side locus.

Authored-by: Zhang Mingli avamingli@gmail.com

closes: #27


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to @cloudberrydb/dev team for review and approval when your PR is ready馃コ

@avamingli
Copy link
Collaborator Author

@my-ship-it This function is critical, to keep pr small, this commit doesn't cause any behavior change.

Some wrong logic are found and left CBDB_PARALLEL_FIXME to remind us to resolve them in upcoming fixes.

@avamingli
Copy link
Collaborator Author

A suggestion for reviewers, it's hard to tell the right by codes diffs.
Compare the logical for parallel join with older and newer codes with two windows may be a little easier..

@avamingli avamingli force-pushed the refactor_parallel_locus_join branch 6 times, most recently from 922038a to 67aa72f Compare July 28, 2023 09:17
@avamingli avamingli self-assigned this Aug 1, 2023
We have separated join locus function for non-parallel and parallel
mode into cdbpath_motion_for_join() which handles for locus whose
parallel_workers is 0 and cdbpath_motion_for_parallel_join() which
handles others.

A lot of logic are meaningless in cdbpath_motion_for_parallel_join():
  1.Writeable operations for join.
  2.Both sides's locus parallel_workers are no more than one.
  3.Some locus couldn't participate parallel join yet now.

We have done the job for SegmentGeneralWorkers as outer side join
with other locus as inner side, but lack for other types.
Refactoring that to keep codes clear.
Some wrong logic are found and left CBDB_PARALLEL_FIXME to remind us
to resolve them in upcoming fixes.

Main changes:
SingleQE join SegmentGeneralWorkers.
SingleQE join Partitioned(workers>1), except HashedOJ.
SegmentGeneral as outer, Assert(false).
Partitioned(workers>1) join SegmentGeneral.
Partitioned(workers>1) join SegmentGeneralWorkers.
Partitioned(workers>1) join SingleQE.
Partitioned join Partitioned are handled as cdbpath_motion_for_join().
Remove all if conditions for inner side locus.

Authored-by: Zhang Mingli avamingli@gmail.com
@avamingli avamingli merged commit c00f6d7 into cloudberrydb:main Oct 9, 2023
6 checks passed
@avamingli avamingli deleted the refactor_parallel_locus_join branch October 9, 2023 10:25
@avamingli
Copy link
Collaborator Author

Pushed, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cdbpath_motion_for_parallel_join() refactor
3 participants