[SPARK-39922][COER][TESTS] Introduce IteratorSizeBenchmark for Utils#getIteratorSize method#37339
[SPARK-39922][COER][TESTS] Introduce IteratorSizeBenchmark for Utils#getIteratorSize method#37339LuciferYang wants to merge 2 commits intoapache:masterfrom
IteratorSizeBenchmark for Utils#getIteratorSize method#37339Conversation
IteratorSizeBenchmark for Utils#getIteratorSize methodIteratorSizeBenchmark for Utils#getIteratorSize method
In fact, this conclusion may no longer be correct after using Scala 2.13 (the test results will update) |
|
The result of |
| @@ -0,0 +1,105 @@ | |||
| OpenJDK 64-Bit Server VM 1.8.0_342-b07 on Linux 5.15.0-1014-azure | |||
There was a problem hiding this comment.
spark/core/src/main/scala/org/apache/spark/util/Utils.scala
Lines 1910 to 1915 in 82cd993
From the benchmark results, the above conclusion is correct only when Scala 2.12 + Java 8 is used.
IteratorSizeBenchmark for Utils#getIteratorSize methodIteratorSizeBenchmark for Utils#getIteratorSize method
|
If do the following change in Scala 2.13, the performance of def getIteratorSize(iterator: Iterator[_]): Long = {
if (iterator.knownSize > 0) {
iterator.knownSize.toLong
} else {
var count = 0L
while (iterator.hasNext) {
count += 1L
iterator.next()
}
count
}
} |
HyukjinKwon
left a comment
There was a problem hiding this comment.
I feel like this is too narrow to benchmark. The observed perf diff was minor too. Also given that it heavily depends on Scala version, I guess nothing we can do even when it's slower.
Ok, I will close this and I will optimize the implementation of this method when Scala 2.13 is the default version |
What changes were proposed in this pull request?
This pr add a micro-benchmark for
o.a.spark.util.Utils#getIteratorSizemethod.Why are the changes needed?
spark/core/src/main/scala/org/apache/spark/util/Utils.scala
Lines 1910 to 1915 in 82cd993
From the method comments,
Utils#getIteratorSizemethod was added due toscala.collection.Iterator#sizeuses a for loop and it is slightly slower in the current version of Scala.When adding this method, Spark uses Scala 2.10. Currently, Spark use Scala 2.12. this pr add introduce
IteratorSizeBenchmarkto ensure that the conclusion is still correct when upgrading the major version of Scala, otherwise we should useIterator#sizedirectly.Does this PR introduce any user-facing change?
No, just for test
How was this patch tested?
Pass GitHub Actions.