Skip to content

Comments

[SPARK-38052][SQL] Refactor UnsafeRow#isFixedLength and UnsafeRow#isMutable use looping#35350

Closed
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:SPARK-38052
Closed

[SPARK-38052][SQL] Refactor UnsafeRow#isFixedLength and UnsafeRow#isMutable use looping#35350
LuciferYang wants to merge 1 commit intoapache:masterfrom
LuciferYang:SPARK-38052

Conversation

@LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

Methods UnsafeRow#isFixedLength and UnsafeRow#isMutable use tail recursion now , this pr can refactor with looping, which will be considerably faster.

Why are the changes needed?

Replace tail recursion with looping.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GA

@github-actions github-actions bot added the SQL label Jan 28, 2022
@LuciferYang
Copy link
Contributor Author

LuciferYang commented Jan 28, 2022

Seeing the code here and changed it. I'm not sure if it's really valuable

@hvanhovell
Copy link
Contributor

@LuciferYang do you have any benchmark for this? The only recursion I see is the unpacking of the UDT.

@LuciferYang
Copy link
Contributor Author

Let me make one

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Feb 2, 2022

The following two methods are simulated and tested with nested objects:

I do a test for the following 2 methods:

// use loop
public static boolean isNumber1(Object o) {
        while (true) {
            if (o instanceof NestedObject) {
                o = ((NestedObject) o).innerValue();
                continue;
            }

            return o instanceof Number;
        }
    }

   // use recursion
    public static boolean isNumber2(Object o) {
        if(o instanceof NestedObject) {
            return isNumber2(((NestedObject) o).innerValue());
        }
        return o instanceof Number;
    }

The test code as follows:

object LoopAndRecursionBenchmark extends BenchmarkBase {

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    val valuesPerIteration = 10000;
    doTest(valuesPerIteration, create1LayerNestedObject(), 1)
    doTest(valuesPerIteration, create3LayerNestedObject(), 3)
    doTest(valuesPerIteration, create5LayerNestedObject(), 5)
   ...
      doTest(valuesPerIteration, create19LayerNestedObject(), 19)
  }

  def doTest(
      valuesPerIteration: Int, obj: NestedObject, layer: Int): Unit = {

    val benchmark = new Benchmark(
      s"Test loop and Recursion $layer",
      valuesPerIteration,
      output = output)

    benchmark.addCase("Use loop") { _: Int =>
      for (_ <- 0L until valuesPerIteration) {
        TestLoopUtils.isNumber1(obj)
      }
    }

    benchmark.addCase("Use Recursion") { _: Int =>
      for (_ <- 0L until valuesPerIteration) {
        TestLoopUtils.isNumber2(obj)
      }
    }
    benchmark.run()
  }

  def create1LayerNestedObject(): NestedObject = {
    new NestedObject(1)
  }

  def create3LayerNestedObject(): NestedObject = {
    new NestedObject(new NestedObject(new NestedObject(1)))
  }

  def create5LayerNestedObject(): NestedObject = {
    new NestedObject(new NestedObject(create3LayerNestedObject()))
  }
  ....
 def create19LayerNestedObject(): NestedObject = {
    new NestedObject(new NestedObject(create17LayerNestedObject()))
  }
}

class NestedObject(inner: Any) {
  def innerValue(): Any = inner
}

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Feb 2, 2022

The bench result generate with GA:

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 1:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0        114.4           8.7       1.0X
Use Recursion                                         0              0           0        127.1           7.9       1.1X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 3:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0        155.8           6.4       1.0X
Use Recursion                                         0              0           0         77.2          13.0       0.5X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 5:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0        129.2           7.7       1.0X
Use Recursion                                         0              0           0         86.4          11.6       0.7X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 7:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0        104.6           9.6       1.0X
Use Recursion                                         0              0           0         71.4          14.0       0.7X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 9:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0         88.3          11.3       1.0X
Use Recursion                                         0              0           0         61.3          16.3       0.7X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 11:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0         69.5          14.4       1.0X
Use Recursion                                         0              0           0         53.5          18.7       0.8X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 13:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0         58.5          17.1       1.0X
Use Recursion                                         0              0           0         34.0          29.4       0.6X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 15:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0         50.2          19.9       1.0X
Use Recursion                                         0              0           0         30.3          33.0       0.6X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 17:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0         43.9          22.8       1.0X
Use Recursion                                         0              0           0         27.0          37.1       0.6X

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.11.0-1027-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test loop and Recursion 19:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Use loop                                              0              0           0         37.8          26.5       1.0X
Use Recursion                                         0              0           0         23.3          42.9       0.6X

@LuciferYang
Copy link
Contributor Author

In x86 architecture, multi-layer nested recursion is slightly slower than loop, and there is no significant performance difference use Apple Silicon(Manual test on My mac with M1). @hvanhovell

@HyukjinKwon
Copy link
Member

If there's no significant performance improve, I think we can just drop it. It's just about recursive UDTs, right?

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Feb 3, 2022

If there's no significant performance improve, I think we can just drop it. It's just about recursive UDTs, right?

Yes, in terms of relative proportion, loop is 20% ~ 50% better than recursion(in x86), but the absolute value is relatively small, for example, from 40ns/per row to 20ns/per row.

@LuciferYang LuciferYang closed this Feb 3, 2022
@LuciferYang LuciferYang deleted the SPARK-38052 branch October 22, 2023 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants