Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization #19685

Closed
wants to merge 2 commits into from

Conversation

mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Nov 7, 2017

What changes were proposed in this pull request?

In ALS.predict currently we are using blas.sdot function to perform a dot product on two Seqs. It turns out that this is not the most efficient way.

I used the following code to compare the implementations:

def time[R](block: => R): Unit = {
    val t0 = System.nanoTime()
    block
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0) + "ns")
}
val r = new scala.util.Random(100)
val input = (1 to 500000).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq)
def f(a:Seq[Float], b:Seq[Float]): Float = {
    var r = 0.0f
    for(i <- 0 until a.length) {
        r+=a(i)*b(i)
    }
    r
}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
val b = (1 to 100).map(_ => r.nextFloat).toSeq
time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) }
// on average it takes 2968718815 ns
time { input.foreach(a=>f(a,b)) }
// on average it takes 515510185 ns

Thus this PR proposes the old-style for loop implementation for performance reasons.

How was this patch tested?

existing UTs

@SparkQA
Copy link

SparkQA commented Nov 7, 2017

Test build #83551 has finished for PR 19685 at commit 8b0add6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems that people have found often in the past that level 1 BLAS ops were not actually faster. Here using it requires overhead of converting to an array. I can believe this is faster. Usually a while loop is a little faster.

Interestingly there is another PR suggesting that Level 1 BLAS is faster when native libs are available.

// potential optimization.
blas.sdot(rank, featuresA.toArray, 1, featuresB.toArray, 1)
var dotProduct = 0.0f
for(i <- 0 until rank) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use while instead of for

@WeichenXu123
Copy link
Contributor

Have you made some test to check the performance difference for this ?

@mgaido91
Copy link
Contributor Author

mgaido91 commented Nov 8, 2017

@srowen I tried enabling native BLAS, but native BLAS implementation is still much slower: average on 10 runs is 2529,922753 ms against 515,510185 ms of the for loop. As a reference, I am using a OSX 2.5 GHz Intel Core i7.
What is worth to notice, though, is that I tried to run the same code but performing the toArray before, thus excluding its time from the computation. In this case, native BLAS implementation is much faster: 100,969697 ms. Thus here the "performance killer" is the conversion to array, as you pointed out.

@WeichenXu123 In the description od the PR and here you can see the tests I made. Do you think something else is needed?

@SparkQA
Copy link

SparkQA commented Nov 8, 2017

Test build #83598 has finished for PR 19685 at commit 4867345.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Nov 11, 2017

Merged to master

@asfgit asfgit closed this in 3eb315d Nov 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants