[SPARK-44325][SQL] Use PartitionEvaluator API in SortMergeJoinExec #41884

vinodkc · 2023-07-06T22:31:37Z

What changes were proposed in this pull request?

SQL operator SortMergeJoinExec updated to use the PartitionEvaluator API to do execution.

Why are the changes needed?

To avoid the use of lambda during distributed execution.
Ref: SPARK-43061 for more details.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Updated 1 test case, once all the SQL operators are migrated, the flag spark.sql.execution.useTaskEvaluator will be enabled by default to avoid running the tests with and without this TaskEvaluator

vinodkc · 2023-07-07T14:42:52Z

CC @cloud-fan @viirya @dongjoon-hyun @yaooqinn @beliefer @HyukjinKwon

...core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinEvaluatorFactory.scala

viirya

Looks okay but I'm just wondering is there the closure issue in SortMergeJoinExec so we need to change it to PartitionEvaluator? Seems I don't find it.

...core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinEvaluatorFactory.scala

cloud-fan · 2023-07-11T01:01:07Z

@viirya I think the benefit is that, we make it more clear what gets serialized and sent to the executor side.

beliefer · 2023-07-11T07:42:29Z

LGTM+1

yaooqinn · 2023-07-12T03:03:49Z

thanks @vinodkc and @cloud-fan @viirya @beliefer. merged to master

vinodkc · 2023-07-27T17:18:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala

+    } else {
+      left.execute().zipPartitions(right.execute()) { (leftIter, rightIter) =>
+        val evaluator = evaluatorFactory.createEvaluator()
+        evaluator.eval(0, leftIter, rightIter)


zipPartitionsWithIndex method is currently absent, hence 0 index is passed to evaluator.eval(0, ...)

Once spark.sql.execution.useTaskEvaluator is set to true by default, this block will not be executed.

can we use TaskContext.getPartitionId?

nvm, it's different from index. Maybe we should just leave a note here about why we always use 0.

### What changes were proposed in this pull request? SQL operator `SortMergeJoinExec` updated to use the `PartitionEvaluator` API to do execution. ### Why are the changes needed? To avoid the use of lambda during distributed execution. Ref: SPARK-43061 for more details. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated 1 test case, once all the SQL operators are migrated, the flag `spark.sql.execution.useTaskEvaluator` will be enabled by default to avoid running the tests with and without this TaskEvaluator Closes apache#41884 from vinodkc/br_refactorSortMergeJoinEvaluatorFactory. Authored-by: Vinod KC <vinod.kc.in@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>

github-actions bot added the SQL label Jul 6, 2023

vinodkc changed the title ~~[SPARK-44325][SQL]Use PartitionEvaluator API for SortMergeJoinExec~~ [SPARK-44325][SQL] Use PartitionEvaluator API for SortMergeJoinExec Jul 7, 2023

vinodkc changed the title ~~[SPARK-44325][SQL] Use PartitionEvaluator API for SortMergeJoinExec~~ [SPARK-44325][SQL] Use PartitionEvaluator API in SortMergeJoinExec Jul 7, 2023

cloud-fan reviewed Jul 7, 2023

View reviewed changes

...core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinEvaluatorFactory.scala Outdated Show resolved Hide resolved

vinodkc force-pushed the br_refactorSortMergeJoinEvaluatorFactory branch from 406cb4b to 2fad7f4 Compare July 8, 2023 05:31

beliefer reviewed Jul 8, 2023

View reviewed changes

...core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinEvaluatorFactory.scala Outdated Show resolved Hide resolved

vinodkc force-pushed the br_refactorSortMergeJoinEvaluatorFactory branch 3 times, most recently from 9ca4b50 to 8717928 Compare July 8, 2023 21:21

viirya reviewed Jul 9, 2023

View reviewed changes

...core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinEvaluatorFactory.scala Outdated Show resolved Hide resolved

viirya reviewed Jul 9, 2023

View reviewed changes

vinodkc force-pushed the br_refactorSortMergeJoinEvaluatorFactory branch from 8717928 to 80d960d Compare July 9, 2023 16:02

beliefer reviewed Jul 10, 2023

View reviewed changes

...core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinEvaluatorFactory.scala Outdated Show resolved Hide resolved

Add SortMergeJoinEvaluatorFactory

79937b5

vinodkc force-pushed the br_refactorSortMergeJoinEvaluatorFactory branch from 80d960d to 79937b5 Compare July 10, 2023 15:20

cloud-fan approved these changes Jul 11, 2023

View reviewed changes

viirya approved these changes Jul 11, 2023

View reviewed changes

yaooqinn approved these changes Jul 11, 2023

View reviewed changes

yaooqinn closed this in ce359bc Jul 12, 2023

vinodkc commented Jul 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44325][SQL] Use PartitionEvaluator API in SortMergeJoinExec #41884

[SPARK-44325][SQL] Use PartitionEvaluator API in SortMergeJoinExec #41884

vinodkc commented Jul 6, 2023

vinodkc commented Jul 7, 2023

viirya left a comment

cloud-fan commented Jul 11, 2023

beliefer commented Jul 11, 2023

yaooqinn commented Jul 12, 2023

vinodkc Jul 27, 2023

cloud-fan Jul 28, 2023

cloud-fan Jul 28, 2023

[SPARK-44325][SQL] Use PartitionEvaluator API in SortMergeJoinExec #41884

[SPARK-44325][SQL] Use PartitionEvaluator API in SortMergeJoinExec #41884

Conversation

vinodkc commented Jul 6, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

vinodkc commented Jul 7, 2023

viirya left a comment

Choose a reason for hiding this comment

cloud-fan commented Jul 11, 2023

beliefer commented Jul 11, 2023

yaooqinn commented Jul 12, 2023

vinodkc Jul 27, 2023

Choose a reason for hiding this comment

cloud-fan Jul 28, 2023

Choose a reason for hiding this comment

cloud-fan Jul 28, 2023

Choose a reason for hiding this comment