[SPARK-28421][ML] SparseVector.apply performance optimization by zhengruifeng · Pull Request #25178 · apache/spark

zhengruifeng · 2019-07-17T10:57:10Z

What changes were proposed in this pull request?

optimize the SparseVector.apply by avoiding internal conversion
Since the speed up is significant (2.5X ~ 5X), and this method is widely used in ml, I suggest back porting.

size	nnz	apply(old)	apply2(new impl)	apply3(new impl with extra range check)
10000000	100	75294	12208	18682
10000000	10000	75616	23132	32932
10000000	1000000	92949	42529	48821

How was this patch tested?

existing tests

using following code to test performance (here the new impl is named apply2, and another impl with extra range check is named apply3):

import scala.util.Random
import org.apache.spark.ml.linalg._

val size = 10000000
for (nnz <- Seq(100, 10000, 1000000)) {
	val rng = new Random(123)
	val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted
	val values = Array.fill(nnz)(rng.nextDouble)
	val vec = Vectors.sparse(size, indices, values).toSparse

	val tic1 = System.currentTimeMillis;
	(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec(i); i+=1} };
	val toc1 = System.currentTimeMillis;

	val tic2 = System.currentTimeMillis;
	(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply2(i); i+=1} };
	val toc2 = System.currentTimeMillis;

	val tic3 = System.currentTimeMillis;
	(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply3(i); i+=1} };
	val toc3 = System.currentTimeMillis;
	
	println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3))
}

SparkQA · 2019-07-17T12:09:01Z

Test build #107784 has finished for PR 25178 at commit 1484602.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-07-17T13:21:54Z

mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala

+      throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)")
+    }
+
+    if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 1)) {


Isn't this case already covered by the binarySearch case below? if it's not found for any reason you get a negative results.

1, the impl of Arrays.binarySearch does not chek the range:

public static int binarySearch(int[] a, int key) { return binarySearch0(a, 0, a.length, key); } // Like public version, but without range checks. private static int binarySearch0(long[] a, int fromIndex, int toIndex, long key) { int low = fromIndex; int high = toIndex - 1; while (low <= high) { int mid = (low + high) >>> 1; long midVal = a[mid]; if (midVal < key) low = mid + 1; else if (midVal > key) high = mid - 1; else return mid; // key found } return -(low + 1); // key not found. }

2, in breeze.collection.mutable.SparseArray, the findOffset function called in apply to perform binary seach, take the special case that the key is out of range into account

if (used == 0) { // empty list do nothing -1 } else { val index = this.index if (i > index(used - 1)) { // special case for end of list - this is a big win for growing sparse arrays ~used

so I added those simple checking between binary search.

Yes you need the check that the index is < 0 or >= length, keep that.
But binarySearch already handles the case that the query index is >= 0 but before the first actual index:

scala> java.util.Arrays.binarySearch(Array(2,3), 1) res0: Int = -1 scala> java.util.Arrays.binarySearch(Array(2,3), 4) res1: Int = -3

Why repeat that part?

I see. This performance improvement comes from avoiding to walk a binary tree if a key is not included in a given sorted array. It makes sense to me .

One question. What do you mean avoiding internal conversion in the description?

Yeah, but you're also always paying the cost of these two checks. It depends on the access pattern, but assuming pretty uniform distribution, the check will rarely save checks and always add a few. It seems simpler to avoid it unless there's a clear case it's a win.

@srowen I add the checks just because in the impl of findOffset in breeze.collection.mutable.SparseArray,
it says // special case for end of list - this is a big win for growing sparse arrays, and I think it is reasonable.

@zhengruifeng Would it be possible to show the performance comparison in the case that @srowen mentions. In other words, most of keys exist in indices. I hope the overhead of adding three tests would be negligible.

Where does a conversion happen? this is just avoiding binarySearch, no?

Existing SparseVector do not override the apply method inheriting from Vector:

/** * Gets the value of the ith element. * @param i index */ @Since("2.0.0") def apply(i: Int): Double = asBreeze(i)

So a spark.ml.linalg.SparseVector will first be converted to a breeze.collection.mutable.SparseArray and then a breeze.linalg.SparseVector.

As to the range check, I think it is just a tiny optimization.

Oh right I see. That's a big win.
Well I'm OK with it though still not clear the extra range checks are an optimization.

dongjoon-hyun · 2019-07-22T22:29:13Z

Retest this please.

SparkQA · 2019-07-22T23:36:28Z

Test build #108024 has finished for PR 25178 at commit 1484602.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhengruifeng · 2019-07-23T02:25:39Z

I added the extra range check just because in breeze.collection.mutable.SparseArray, the findOffset function does this and comments special case for end of list - this is a big win for growing sparse arrays

@srowen @dongjoon-hyun @kiszk I will do anothe simple test to find whether the extra range chek helps.

zhengruifeng · 2019-07-23T03:07:38Z

I test the perf among current impl (apply) , direct binary-search (apply2), binary-seach with extra range check (apply3)

  def apply2(i: Int): Double = {
    if (i < 0 || i >= size) {
      throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)")
    }

    val j = util.Arrays.binarySearch(indices, i)
    if (j < 0) 0.0 else values(j)
  }

  def apply3(i: Int): Double = {
    if (i < 0 || i >= size) {
      throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)")
    }

    if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 1)) {
      0.0
    } else {
      val j = util.Arrays.binarySearch(indices, i)
      if (j < 0) 0.0 else values(j)
    }
  }

the test suite is similar with the above one

import scala.util.Random
import org.apache.spark.ml.linalg._

val size = 10000000
for (nnz <- Seq(100, 10000, 1000000)) {
	val rng = new Random(123)
	val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted
	val values = Array.fill(nnz)(rng.nextDouble)
	val vec = Vectors.sparse(size, indices, values).toSparse

	val tic1 = System.currentTimeMillis;
	(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec(i); i+=1} };
	val toc1 = System.currentTimeMillis;

	val tic2 = System.currentTimeMillis;
	(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply2(i); i+=1} };
	val toc2 = System.currentTimeMillis;

	val tic3 = System.currentTimeMillis;
	(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply3(i); i+=1} };
	val toc3 = System.currentTimeMillis;
	
	println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3))
}

size	nnz	apply(old)	apply2	apply3
10000000	100	75294	12208	18682
10000000	10000	75616	23132	32932
10000000	1000000	92949	42529	48821

So the version without range check is faster, I will update the pr.

zhengruifeng · 2019-07-23T03:43:36Z

The expected cost without range check is E(cost(apply2)) = log(NNZ);
while the one with range check is E(cost(apply3)) = 2 + P(key in range)*log(NNZ);
The diff is E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * log(NNZ), so the optimization is high related to the key distribution and the NNZ.
~~The above suite suppose the input key is from an uniform distribution. And show that, if the NNZ is small, range check will cost extra 10% cost; otherwise, the range check will save about 50% cost.~~

previous test suite uses val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz) to generate indices, which is biased.
I just change it to val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted.
Now the version without range check is faster, since P(key out of range) in most case should be a probability near 0%.

SparkQA · 2019-07-23T05:53:43Z

Test build #108034 has finished for PR 25178 at commit 99dfe7e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2019-07-23T08:44:01Z

LGTM

srowen

That's convincing, thanks for checking!

srowen · 2019-07-24T01:20:27Z

Merged to master

zhengruifeng · 2019-07-24T02:05:48Z

@srowen How about backporting it to 2.X?

srowen · 2019-07-24T03:32:31Z

I'm OK with it. It should be a pretty safe optimization.

srowen · 2019-07-24T12:09:40Z

Hm, I can't seem to back-port a merged PR with the merge script right now. I've seen this before. @dongjoon-hyun are you seeing problems like "not mergeable in its current form" if you try the merge script on this one again to backport it?

dongjoon-hyun · 2019-07-24T15:01:17Z

Yes, @srowen . I thought it's the current behavior of our script.
For later backporting, I did manual cherry-picking always.

## What changes were proposed in this pull request? optimize the `SparseVector.apply` by avoiding internal conversion Since the speed up is significant (2.5X ~ 5X), and this method is widely used in ml, I suggest back porting. | size| nnz | apply(old) | apply2(new impl) | apply3(new impl with extra range check)| |------|----------|------------|----------|----------| |10000000|100|75294|12208|18682| |10000000|10000|75616|23132|32932| |10000000|1000000|92949|42529|48821| ## How was this patch tested? existing tests using following code to test performance (here the new impl is named `apply2`, and another impl with extra range check is named `apply3`): ``` import scala.util.Random import org.apache.spark.ml.linalg._ val size = 10000000 for (nnz <- Seq(100, 10000, 1000000)) { val rng = new Random(123) val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted val values = Array.fill(nnz)(rng.nextDouble) val vec = Vectors.sparse(size, indices, values).toSparse val tic1 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec(i); i+=1} }; val toc1 = System.currentTimeMillis; val tic2 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply2(i); i+=1} }; val toc2 = System.currentTimeMillis; val tic3 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply3(i); i+=1} }; val toc3 = System.currentTimeMillis; println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3)) } ``` Closes #25178 from zhengruifeng/sparse_vec_apply. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

srowen · 2019-07-25T12:59:56Z

OK, also manually merged to 2.4 in a285c0d

dongjoon-hyun · 2019-07-25T16:03:21Z

Thank you, @srowen !

## What changes were proposed in this pull request? optimize the `SparseVector.apply` by avoiding internal conversion Since the speed up is significant (2.5X ~ 5X), and this method is widely used in ml, I suggest back porting. | size| nnz | apply(old) | apply2(new impl) | apply3(new impl with extra range check)| |------|----------|------------|----------|----------| |10000000|100|75294|12208|18682| |10000000|10000|75616|23132|32932| |10000000|1000000|92949|42529|48821| ## How was this patch tested? existing tests using following code to test performance (here the new impl is named `apply2`, and another impl with extra range check is named `apply3`): ``` import scala.util.Random import org.apache.spark.ml.linalg._ val size = 10000000 for (nnz <- Seq(100, 10000, 1000000)) { val rng = new Random(123) val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted val values = Array.fill(nnz)(rng.nextDouble) val vec = Vectors.sparse(size, indices, values).toSparse val tic1 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec(i); i+=1} }; val toc1 = System.currentTimeMillis; val tic2 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply2(i); i+=1} }; val toc2 = System.currentTimeMillis; val tic3 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply3(i); i+=1} }; val toc3 = System.currentTimeMillis; println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3)) } ``` Closes apache#25178 from zhengruifeng/sparse_vec_apply. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

init

1484602

srowen reviewed Jul 17, 2019

View reviewed changes

dongjoon-hyun added the ML label Jul 18, 2019

srowen approved these changes Jul 22, 2019

View reviewed changes

kiszk approved these changes Jul 22, 2019

View reviewed changes

del range check

99dfe7e

srowen approved these changes Jul 23, 2019

View reviewed changes

srowen closed this in a3bbc37 Jul 24, 2019

zhengruifeng deleted the sparse_vec_apply branch July 26, 2019 01:32

Conversation

zhengruifeng commented Jul 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 22, 2019

Uh oh!

SparkQA commented Jul 22, 2019

Uh oh!

zhengruifeng commented Jul 23, 2019

Uh oh!

zhengruifeng commented Jul 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengruifeng commented Jul 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jul 23, 2019

Uh oh!

mgaido91 commented Jul 23, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen commented Jul 24, 2019

Uh oh!

zhengruifeng commented Jul 24, 2019

Uh oh!

srowen commented Jul 24, 2019

Uh oh!

srowen commented Jul 24, 2019

Uh oh!

dongjoon-hyun commented Jul 24, 2019

Uh oh!

srowen commented Jul 25, 2019

Uh oh!

dongjoon-hyun commented Jul 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Comments

zhengruifeng commented Jul 17, 2019 •

edited

Loading

zhengruifeng commented Jul 23, 2019 •

edited

Loading

zhengruifeng commented Jul 23, 2019 •

edited

Loading