[SPARK-19759][ML] not using blas in ALSModel.predict for optimization #19685

mgaido91 · 2017-11-07T16:48:08Z

What changes were proposed in this pull request?

In ALS.predict currently we are using blas.sdot function to perform a dot product on two Seqs. It turns out that this is not the most efficient way.

I used the following code to compare the implementations:

def time[R](block: => R): Unit = {
    val t0 = System.nanoTime()
    block
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0) + "ns")
}
val r = new scala.util.Random(100)
val input = (1 to 500000).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq)
def f(a:Seq[Float], b:Seq[Float]): Float = {
    var r = 0.0f
    for(i <- 0 until a.length) {
        r+=a(i)*b(i)
    }
    r
}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
val b = (1 to 100).map(_ => r.nextFloat).toSeq
time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) }
// on average it takes 2968718815 ns
time { input.foreach(a=>f(a,b)) }
// on average it takes 515510185 ns

Thus this PR proposes the old-style for loop implementation for performance reasons.

How was this patch tested?

existing UTs

SparkQA · 2017-11-07T17:59:54Z

Test build #83551 has finished for PR 19685 at commit 8b0add6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Yes, seems that people have found often in the past that level 1 BLAS ops were not actually faster. Here using it requires overhead of converting to an array. I can believe this is faster. Usually a while loop is a little faster.

Interestingly there is another PR suggesting that Level 1 BLAS is faster when native libs are available.

WeichenXu123 · 2017-11-08T01:27:47Z

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

-      // potential optimization.
-      blas.sdot(rank, featuresA.toArray, 1, featuresB.toArray, 1)
+      var dotProduct = 0.0f
+      for(i <- 0 until rank) {


You should use while instead of for

WeichenXu123 · 2017-11-08T01:28:35Z

Have you made some test to check the performance difference for this ?

mgaido91 · 2017-11-08T13:41:41Z

@srowen I tried enabling native BLAS, but native BLAS implementation is still much slower: average on 10 runs is 2529,922753 ms against 515,510185 ms of the for loop. As a reference, I am using a OSX 2.5 GHz Intel Core i7.
What is worth to notice, though, is that I tried to run the same code but performing the toArray before, thus excluding its time from the computation. In this case, native BLAS implementation is much faster: 100,969697 ms. Thus here the "performance killer" is the conversion to array, as you pointed out.

@WeichenXu123 In the description od the PR and here you can see the tests I made. Do you think something else is needed?

SparkQA · 2017-11-08T14:55:45Z

Test build #83598 has finished for PR 19685 at commit 4867345.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-11-11T10:11:56Z

Merged to master

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization

8b0add6

srowen requested changes Nov 7, 2017

View reviewed changes

WeichenXu123 reviewed Nov 8, 2017

View reviewed changes

convert for to while loop

4867345

srowen approved these changes Nov 10, 2017

View reviewed changes

asfgit closed this in 3eb315d Nov 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization #19685

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization #19685

mgaido91 commented Nov 7, 2017

SparkQA commented Nov 7, 2017

srowen left a comment

WeichenXu123 Nov 8, 2017 •

edited

WeichenXu123 commented Nov 8, 2017

mgaido91 commented Nov 8, 2017

SparkQA commented Nov 8, 2017

srowen commented Nov 11, 2017

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization #19685

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization #19685

Conversation

mgaido91 commented Nov 7, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Nov 7, 2017

srowen left a comment

Choose a reason for hiding this comment

WeichenXu123 Nov 8, 2017 • edited

Choose a reason for hiding this comment

WeichenXu123 commented Nov 8, 2017

mgaido91 commented Nov 8, 2017

SparkQA commented Nov 8, 2017

srowen commented Nov 11, 2017

WeichenXu123 Nov 8, 2017 •

edited