Skip to content

Fix planning bug while using sort merge frame processor#14450

Merged
LakshSingla merged 15 commits intoapache:masterfrom
LakshSingla:sm-planning-reg
Jul 11, 2023
Merged

Fix planning bug while using sort merge frame processor#14450
LakshSingla merged 15 commits intoapache:masterfrom
LakshSingla:sm-planning-reg

Conversation

@LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Jun 20, 2023

Description

Sort merge join plans the join queries in a different way where both the sides of the joins are subquery. This introduces a regression in planning when the condition for the join becomes an input ref to a boolean column indicating whether or not that row needs to be included or not. This PR fixes the following things:

  1. Planning of such sub-conditions
  2. Changing the semantics of joinAlgorithm to be a directive rather than a "command" - SortMerge join helps in a subset of the join cases, where the left and the right side are columns into the tables. Left expressions, right expressions, and expressions where we only rely on a boolean input ref (like above) don't fit into this use case. One problem with the above condition would be that we would only have 2 partitions while partitioning the table ("true"/"false") which would degrade the performance of sort-merge join. Therefore MSQ's DataSourcePlan would determine whether or not to accept the hint based on the sub condition (for now).

The 2) point should ideally reside in the planner, however till the time we have a clean way to do so, we will do it in the MSQ's planner (QueryKit)


Release notes

sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@LakshSingla LakshSingla marked this pull request as ready for review July 3, 2023 04:56
@cryptoe cryptoe added this to the 27.0 milestone Jul 3, 2023
@cryptoe cryptoe added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Jul 3, 2023
@cryptoe cryptoe requested a review from gianm July 3, 2023 06:33
Copy link
Contributor

@adarshsanjeev adarshsanjeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good % a few minor comments

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm. @LakshSingla The CI failures look legit though.

@LakshSingla
Copy link
Contributor Author

Thanks, fixed the test case that was throwing.

@LakshSingla
Copy link
Contributor Author

Thanks @cryptoe @adarshsanjeev for the review

@LakshSingla LakshSingla merged commit 5ce5363 into apache:master Jul 11, 2023
LakshSingla added a commit to LakshSingla/druid that referenced this pull request Jul 11, 2023
sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.
cryptoe pushed a commit that referenced this pull request Jul 11, 2023
)

sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants