[SPARK-41530][CORE] Rename MedianHeap to PercentileMap and support percentile#39076
[SPARK-41530][CORE] Rename MedianHeap to PercentileMap and support percentile#39076cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
|
cc @Ngone51 |
Can you give more details on this please ? |
core/src/test/scala/org/apache/spark/util/collection/PercentileHeapSuite.scala
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Technically, MedianHeap is removed by being superceded by PercentileHeap after this PR.
- Since new class is important than the removed one, could you revise the PR title to mention
PercentileHeapexplicitly? - Also, in the test suite, I believe it would be better not to mention
mediaHeap.
Using the median task duration to trigger task speculation may not be the best option. It's better to allow users to configure the percentile if they want task speculation to happen more or less likely. |
Sounds good. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @cloud-fan .
|
Merged to master for Apache Spark 3.4. Thank you, @cloud-fan and @mridulm . |
What changes were proposed in this pull request?
MedianHeapwas added to track the median of task durations, for task speculation. However, median may not be the best option and a configurable percentile could be better.This PR extends the existing
MedianHeapto support percentile.Why are the changes needed?
Prepare for tracking more statistics of task durations.
Does this PR introduce any user-facing change?
no
How was this patch tested?
new tests