Skip to content

[SPARK-49506][SQL] Optimize ArrayBinarySearch for foldable array#47984

Closed
panbingkun wants to merge 30 commits intoapache:masterfrom
panbingkun:SPARK-49506
Closed

[SPARK-49506][SQL] Optimize ArrayBinarySearch for foldable array#47984
panbingkun wants to merge 30 commits intoapache:masterfrom
panbingkun:SPARK-49506

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Sep 4, 2024

What changes were proposed in this pull request?

The pr aims to

  • optimize ArrayBinarySearch for foldable array.
  • fix a bug in the original implementation.

Why are the changes needed?

The changes improve performance of the array_binary_search() function.

  • create an instance of foldable{DataType}ArrayData only once at the initialization ( avoid frequent calls to ArrayData.to{DataType}Array() ), and reuse it inside of replacement in the case when the array parameter is foldable.

Before:

Running benchmark: array binary search
  Running case: no foldable optimize
  Stopped after 100 iterations, 93668 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
no foldable optimize                                916            937          24         10.9          91.6       1.0X

After:

Running benchmark: array binary search
  Running case: has foldable optimize
  Stopped after 100 iterations, 17206 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
has foldable optimize                               164            172          22         61.1          16.4       1.0X

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Update existed UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Sep 4, 2024
@panbingkun
Copy link
Contributor Author

panbingkun commented Sep 4, 2024

object ArrayBinarySearchBenchmark extends SqlBasedBenchmark {
  private val N = 10000000
  private val M = 100

  private val arrayData = (0 until M).mkString("array(", ",", ")")
  private val exprs = s"array_binary_search($arrayData, value % $M)"
  private val df = spark.range(N).toDF("value")

  private def doBenchmark(): Unit = {
    df.selectExpr(exprs).noop()
  }

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    runBenchmark("array binary search") {
      val benchmark = new Benchmark("array binary search", N, output = output)
      benchmark.addCase("no foldable optimize", M) { _ =>
        doBenchmark()
      }
      benchmark.run()
    }
  }
}

Before:

Running benchmark: array binary search
  Running case: no foldable optimize
  Stopped after 100 iterations, 93668 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
no foldable optimize                                916            937          24         10.9          91.6       1.0X

After:

Running benchmark: array binary search
  Running case: has foldable optimize
  Stopped after 100 iterations, 17206 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
has foldable optimize                               164            172          22         61.1          16.4       1.0X

@panbingkun panbingkun marked this pull request as ready for review September 4, 2024 07:54
@zhengruifeng
Copy link
Contributor

Thanks @panbingkun for working on this!
Existing usages (pyspark and ml) both apply binary search with a literal double array, this optimization will improve the performance of them.

@transient private lazy val isPrimitiveType: Boolean = CodeGenerator.isPrimitiveType(elementType)
@transient private lazy val canPerformFastBinarySearch: Boolean = isPrimitiveType &&
elementType != BooleanType && !resultArrayElementNullable
@transient private lazy val arrayIsFoldable: Boolean = array.foldable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't worth a lazy val. Can a simple def work?


// boolean
// foldable optimize
public static int binarySearchNullSafe(Boolean[] data, Boolean value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to take ArrayData here to simplify the expression implementation. We can call arrayData.toBooleanArray or .toObjectArray for non-nullable and nullable arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nullable arrays, I don't think concrete types can help the performance. Object[] should be OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

"binarySearch",
Seq(array, value),
inputTypes)
if (arrayIsFoldable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. What's the difference between foldable and non-foldable arrays regarding this optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take the following case as an example:

val a6_0 = Literal.create(Seq(1.0d, 2.0d, 3.0d), ArrayType(DoubleType, containsNull = false))
checkEvaluation(ArrayBinarySearch(a6_0, Literal(1.0d)), 0)

Before:
image

After:
image
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where the array is foldable, after optimization, the array only needs to be toArray once instead of toArray every time.

@zhengruifeng
Copy link
Contributor

zhengruifeng commented Sep 9, 2024

has an offline discussion with @panbingkun , another approach maybe:

the bottleneck is ArrayData.toXXXArray that requires a deep copy of the whole array, if ConstantFolding can convert all foldable arrays to literals (with GenericArrayData type value), then maybe we can optimize it by override toXXXArray methods to directly return the val array in some way

@cloud-fan
Copy link
Contributor

@zhengruifeng This is a good idea. I think ArrayBinarySearch should be replaced by invoke_binary_search_function(ToJavaArray(array_expr)), and ConstantFolding should do the optimization automatically. For the new expression ToJavaArray, it can create primitive java array if the element type is primitive type and not nullable.

@panbingkun
Copy link
Contributor Author

Great suggestion, let me give it a try, thanks!

@panbingkun
Copy link
Contributor Author

@zhengruifeng This is a good idea. I think ArrayBinarySearch should be replaced by invoke_binary_search_function(ToJavaArray(array_expr)), and ConstantFolding should do the optimization automatically. For the new expression ToJavaArray, it can create primitive java array if the element type is primitive type and not nullable.

@zhengruifeng @cloud-fan
The latest logic based on ToJavaType has been submitted.
Please help review it when you have free time, thanks!

@panbingkun
Copy link
Contributor Author

panbingkun commented Sep 11, 2024

ArrayBinarySearchBenchmark

  • Before:
Running benchmark: array binary search
  Running case: has foldable optimize
  Stopped after 100 iterations, 371269 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
has foldable optimize                              3585           3713         100          2.8         358.5       1.0X
  • After:
Running benchmark: array binary search
  Running case: has foldable optimize
  Stopped after 100 iterations, 21097 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
has foldable optimize                               201            211          23         49.9          20.1       1.0X

* Results will be written to "benchmarks/ArrayBinarySearchBenchmark-results.txt".
* }}}
*/
object ArrayBinarySearchBenchmark extends SqlBasedBenchmark {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need ArrayBinarySearchBenchmark, I can delete it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need it, we can delete it before merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already deleted it.

@@ -90,7 +90,24 @@ trait InvokeLike extends Expression with NonSQLExpression with ImplicitCastInput
// serializability, because the type-level info with java.io.Serializable and
// java.io.Externalizable marker interfaces are not strong guarantees.
// This restriction can be relaxed in the future to expose more optimizations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the comment here. We do not block all ObjectType now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@panbingkun
Copy link
Contributor Author

@zhengruifeng @cloud-fan
I'm very sorry that I broke this PR and couldn't restore it, so I opened a new one
#48225

@panbingkun
Copy link
Contributor Author

I will close it.

@panbingkun panbingkun closed this Sep 24, 2024
@panbingkun panbingkun deleted the SPARK-49506 branch September 24, 2024 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants