[SPARK-37566][SQL] Fix repartitionByRange metrics by cfmcgrady · Pull Request #34828 · apache/spark

cfmcgrady · 2021-12-07T10:13:45Z

What changes were proposed in this pull request?

As the RangePartitioner does a sampling job in the input RDD, and the sampling job has the same execution context (execution id and metrics id) as a real job, then the number of output rows metrics of the RangePartitioner child will be increased by mistake.

spark.range(0, 10)
  .repartitionByRange(10, col("id"))
  .show()

Before this pr:

After this pr:

Does this PR introduce any user-facing change?

No, only bug fix.

How was this patch tested?

Added UT.

fix repartitionByRange metrics

62c02fb

github-actions bot added the SQL label Dec 7, 2021

cfmcgrady closed this Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37566][SQL] Fix repartitionByRange metrics#34828

[SPARK-37566][SQL] Fix repartitionByRange metrics#34828
cfmcgrady wants to merge 1 commit intoapache:masterfrom
cfmcgrady:SPARK-37566

cfmcgrady commented Dec 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cfmcgrady commented Dec 7, 2021

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant