-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17080][SQL][FOLLOWUP] Improve documentation, change buildJoin method structure and add a debug log #17353
Conversation
cc @gatorsmile |
* @param conf SQLConf for statistics computation. | ||
* @param conditions The overall set of join conditions. | ||
* @param topOutput The output attributes of the final plan. | ||
* @return Return a new JoinPlan if the two sides can be joined with some conditions. Otherwise, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
Builds and returns a new JoinPlan if there exists at least one join condition involving references from both left and right. Otherwise, returns None.
@@ -201,7 +201,16 @@ object JoinReorderDP extends PredicateHelper { | |||
nextLevel.toMap | |||
} | |||
|
|||
/** Build a new join node. */ | |||
/** | |||
* Builds a new JoinPlan if the two given sides can be joined with some conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us reword it too.
Builds a new JoinPlan if there is a join predicate connecting two given sides.
Could you also update the description of |
Could we move the checking |
Replace the following codes by using
|
Test build #74845 has finished for PR 17353 at commit
|
Does it make sense to introduce some counts to decide how many new JoinPlan we build? We can find out what we pruned in the search. It can easy for us to ensure no regression happened when we improve the codes in the future. Also cc @cloud-fan |
Test build #74864 has started for PR 17353 at commit |
@@ -202,14 +201,15 @@ object JoinReorderDP extends PredicateHelper { | |||
} | |||
|
|||
/** | |||
* Builds a new JoinPlan if the two given sides can be joined with some conditions. | |||
* Builds a new JoinPlan if there exists at least one join condition involving references from | |||
* both left and right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builds a new JoinPlan when both conditions hold
- the sets of items contained in both left and right sides do not overlap
- there exists at least one join condition involving references from both sides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks.
* @return Return a new JoinPlan if the two sides can be joined with some conditions. Otherwise, | ||
* return None. | ||
* @return Builds and returns a new JoinPlan if there exists at least one join condition | ||
* involving references from both left and right. Otherwise, returns None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, we can simplify it to Builds and returns a new JoinPlan if both conditions hold. Otherwise, returns None.
@@ -202,14 +201,15 @@ object JoinReorderDP extends PredicateHelper { | |||
} | |||
|
|||
/** | |||
* Builds a new JoinPlan if the two given sides can be joined with some conditions. | |||
* Builds a new JoinPlan if there exists at least one join condition involving references from | |||
* both left and right. | |||
* @param oneJoinPlan One side JoinPlan for building a new JoinPlan. | |||
* @param otherJoinPlan The other side JoinPlan for building a new join node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to rename them to leftPlan and rightPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left and right sides are decided inside this method. It tends to build a left deep tree.
Where should we check the count? Whom do we want to expose it to? |
debug log SGTM |
Yeah. The counts can help us understand the pruning rate of the search space. When CBO join reordering is very slow, we can check the counts. |
Test build #74934 has finished for PR 17353 at commit
|
Test build #74945 has started for PR 17353 at commit |
@@ -152,6 +156,10 @@ object JoinReorderDP extends PredicateHelper { | |||
foundPlans += searchLevel(foundPlans, conf, conditions, topOutput) | |||
} | |||
|
|||
// val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) | |||
// logDebug(s"Join reordering finished. Duration: $durationInMs ms, number of items: " + | |||
// s"${items.length}, number of plans in memo: ${foundPlans.map(_.size).sum}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I forgot to recover them...
Test build #74947 has started for PR 17353 at commit |
Please update PR description and title. LGTM pending Jenkins |
retest this please |
Test build #74963 has finished for PR 17353 at commit
|
retest this please |
retest this please... |
* We also prune cartesian product candidates when building a new plan if there exists no join | ||
* condition involving references from both left and right. This pruning strategy significantly | ||
* reduces the search space. | ||
* For example, given A J B J C J D, plans maintained for each level will be as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be
-> may be
?
|
||
def search( | ||
conf: SQLConf, | ||
items: Seq[LogicalPlan], | ||
conditions: Set[Expression], | ||
topOutput: AttributeSet): LogicalPlan = { | ||
|
||
val startTime = System.nanoTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use System.currentTimeMillis
if we only care about the ms level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanoTime()
is more reliable than currentTimeMillis()
: https://github.com/databricks/scala-style-guide#misc_currentTimeMillis_vs_nanoTime
Test build #74978 has finished for PR 17353 at commit
|
Test build #74977 has finished for PR 17353 at commit
|
Test build #74980 has finished for PR 17353 at commit
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
Cost
andJoinReorderDP
and methodbuildJoin()
.buildJoin()
to make the logic clearer.How was this patch tested?
Not related.