Skip to content

[SPARK-37566][SQL] Fix repartitionByRange metrics#34828

Closed
cfmcgrady wants to merge 1 commit intoapache:masterfrom
cfmcgrady:SPARK-37566
Closed

[SPARK-37566][SQL] Fix repartitionByRange metrics#34828
cfmcgrady wants to merge 1 commit intoapache:masterfrom
cfmcgrady:SPARK-37566

Conversation

@cfmcgrady
Copy link
Contributor

What changes were proposed in this pull request?

As the RangePartitioner does a sampling job in the input RDD, and the sampling job has the same execution context (execution id and metrics id) as a real job, then the number of output rows metrics of the RangePartitioner child will be increased by mistake.

spark.range(0, 10)
  .repartitionByRange(10, col("id"))
  .show()

Before this pr:

截屏2021-12-07 下午5 17 12

After this pr:

截屏2021-12-07 下午6 00 28

Does this PR introduce any user-facing change?

No, only bug fix.

How was this patch tested?

Added UT.

@github-actions github-actions bot added the SQL label Dec 7, 2021
@cfmcgrady cfmcgrady closed this Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant