Skip to content

[SPARK-44361][SQL] Use PartitionEvaluator API in MapInBatchExec#42024

Closed
vinodkc wants to merge 4 commits intoapache:masterfrom
vinodkc:br_SPARK-44361
Closed

[SPARK-44361][SQL] Use PartitionEvaluator API in MapInBatchExec#42024
vinodkc wants to merge 4 commits intoapache:masterfrom
vinodkc:br_SPARK-44361

Conversation

@vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Jul 16, 2023

What changes were proposed in this pull request?

SQL operator MapInBatchExec is updated to use the PartitionEvaluator API to do execution.
Added a new method mapPartitionsWithEvaluator in RDDBarrier.

Why are the changes needed?

To avoid the use of lambda during distributed execution.
Ref: SPARK-43061 for more details.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing test cases. Once all SQL operators are refactored, will enable spark.sql.execution.usePartitionEvaluator by default, so all tests cover this code path.

@vinodkc
Copy link
Contributor Author

vinodkc commented Jul 18, 2023

cc @cloud-fan , @beliefer

* partition.
*/
@DeveloperApi
@Since("3.5.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed to add this API to RDDBarrier

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.5 (fix an issue that a new RDD API is missing in RDDBarrier)!

@cloud-fan cloud-fan closed this in 9b43a9f Jul 19, 2023
cloud-fan pushed a commit that referenced this pull request Jul 19, 2023
### What changes were proposed in this pull request?

SQL operator `MapInBatchExec` is updated to use the `PartitionEvaluator` API to do execution.
Added a new method `mapPartitionsWithEvaluator` in `RDDBarrier`.

### Why are the changes needed?

To avoid the use of lambda during distributed execution.
Ref: SPARK-43061 for more details.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing test cases. Once all SQL operators are refactored, will enable `spark.sql.execution.usePartitionEvaluator` by default, so all tests cover this code path.

Closes #42024 from vinodkc/br_SPARK-44361.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 9b43a9f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
HyukjinKwon added a commit that referenced this pull request Jul 19, 2023
…statements

### What changes were proposed in this pull request?

This PR is a followup of #42024 that removes unused variables and fix import statements (which should be the part of the whole refactoring).

### Why are the changes needed?

To properly cleanup.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests should covoer

Closes #42068 from HyukjinKwon/SPARK-44361-followup.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Jul 19, 2023
…statements

### What changes were proposed in this pull request?

This PR is a followup of #42024 that removes unused variables and fix import statements (which should be the part of the whole refactoring).

### Why are the changes needed?

To properly cleanup.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests should covoer

Closes #42068 from HyukjinKwon/SPARK-44361-followup.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit bca28f8)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
cloud-fan pushed a commit that referenced this pull request Jul 28, 2023
…Exec

### What changes were proposed in this pull request?
This is a follow-up of #42024, to set the partition index correctly even if it's not used for now.

### Why are the changes needed?
future-proof

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #42189 from vinodkc/br_SPARK-44361_Followup.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jul 28, 2023
…Exec

### What changes were proposed in this pull request?
This is a follow-up of #42024, to set the partition index correctly even if it's not used for now.

### Why are the changes needed?
future-proof

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #42189 from vinodkc/br_SPARK-44361_Followup.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3cf88cb)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants