[SPARK-38052][SQL] Refactor UnsafeRow#isFixedLength and UnsafeRow#isMutable use looping#35350
[SPARK-38052][SQL] Refactor UnsafeRow#isFixedLength and UnsafeRow#isMutable use looping#35350LuciferYang wants to merge 1 commit intoapache:masterfrom
Conversation
|
Seeing the code here and changed it. I'm not sure if it's really valuable |
|
@LuciferYang do you have any benchmark for this? The only recursion I see is the unpacking of the UDT. |
|
Let me make one |
|
The following two methods are simulated and tested with nested objects: I do a test for the following 2 methods: // use loop
public static boolean isNumber1(Object o) {
while (true) {
if (o instanceof NestedObject) {
o = ((NestedObject) o).innerValue();
continue;
}
return o instanceof Number;
}
}
// use recursion
public static boolean isNumber2(Object o) {
if(o instanceof NestedObject) {
return isNumber2(((NestedObject) o).innerValue());
}
return o instanceof Number;
}The test code as follows: object LoopAndRecursionBenchmark extends BenchmarkBase {
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
val valuesPerIteration = 10000;
doTest(valuesPerIteration, create1LayerNestedObject(), 1)
doTest(valuesPerIteration, create3LayerNestedObject(), 3)
doTest(valuesPerIteration, create5LayerNestedObject(), 5)
...
doTest(valuesPerIteration, create19LayerNestedObject(), 19)
}
def doTest(
valuesPerIteration: Int, obj: NestedObject, layer: Int): Unit = {
val benchmark = new Benchmark(
s"Test loop and Recursion $layer",
valuesPerIteration,
output = output)
benchmark.addCase("Use loop") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
TestLoopUtils.isNumber1(obj)
}
}
benchmark.addCase("Use Recursion") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
TestLoopUtils.isNumber2(obj)
}
}
benchmark.run()
}
def create1LayerNestedObject(): NestedObject = {
new NestedObject(1)
}
def create3LayerNestedObject(): NestedObject = {
new NestedObject(new NestedObject(new NestedObject(1)))
}
def create5LayerNestedObject(): NestedObject = {
new NestedObject(new NestedObject(create3LayerNestedObject()))
}
....
def create19LayerNestedObject(): NestedObject = {
new NestedObject(new NestedObject(create17LayerNestedObject()))
}
}
class NestedObject(inner: Any) {
def innerValue(): Any = inner
} |
|
The bench result generate with GA: |
|
In x86 architecture, multi-layer nested recursion is slightly slower than loop, and there is no significant performance difference use Apple Silicon(Manual test on My mac with M1). @hvanhovell |
|
If there's no significant performance improve, I think we can just drop it. It's just about recursive UDTs, right? |
Yes, in terms of relative proportion, loop is 20% ~ 50% better than recursion(in x86), but the absolute value is relatively small, for example, from |
What changes were proposed in this pull request?
Methods UnsafeRow#isFixedLength and UnsafeRow#isMutable use tail recursion now , this pr can refactor with looping, which will be considerably faster.
Why are the changes needed?
Replace tail recursion with looping.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GA