Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16499][ML][MLLib] improve ApplyInPlace function in ANN code #14156

Conversation

WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

I re-code the following fuction using breeze's matrix operating function.
def apply(x: BDM[Double], y: BDM[Double], func: Double => Double): Unit

How was this patch tested?

Existing test.

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62172 has finished for PR 14156 at commit c7b2059.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor Author

cc @srowen thanks!

@srowen
Copy link
Member

srowen commented Aug 4, 2016

Is there reasonable evidence this speeds things up? just want to make sure this does not make it slower. Help me understand the := operator? I don't recognize how it's helping compute y as a function of x here. I assume the method below can't use the same mechanism?

@WeichenXu123
Copy link
Contributor Author

WeichenXu123 commented Aug 4, 2016

@srowen
The := operator in BDM is simply copy one BDM to another, and it is widely used in breeze source,
e.g, we can check DenseMatrix.copy function in Breeze:
it first use DenseMatrix.create to create a new Matrix with the same dimension
val result = DenseMatrix.create(...) , and then use
result := this to copy self into the matrix just created.

The mechanism of := operator for DenseMatrix is that the DenseMatrix contains an implicit member which implements OpSet.InPlaceImpl2 trait.
check DenseMatrix source file in breeze, in line 985, there is:
implicit val setMV_D:OpSet.InPlaceImpl2[...] = new SetDMDVOpDouble
so, the implementation code is in SetDMDVOp class
and we can see that in SetDMDVOp it do Type Specialization for Double type so that the compiling code will have high efficiency.

@srowen
Copy link
Member

srowen commented Aug 4, 2016

I see, this copies x to y then modifies y in place. OK. Is that more efficient? it seems like extra work, but does the transform method make up for it? just seeing if this has actually been observed to speed it up or not.

@WeichenXu123
Copy link
Contributor Author

yeah, currently it seems to make a little overhead (do a copy), but I think it will take advantage of breeze optimization, in the future, e.g, SIMD instructions or something ?

@srowen
Copy link
Member

srowen commented Aug 4, 2016

That's the question indeed. I'm not sure because the function that's supplied could be anything. I don't see how it could automatically be converted to a vectorized operation automatically.

@WeichenXu123
Copy link
Contributor Author

WeichenXu123 commented Aug 4, 2016

@srowen yeah, the function supplied here called cannot be turned into SIMD instructions but I think it can do some parallelization optimization on large matrix, for example we can split the matrix into several blocks and executed the "in place transform" in parallel way, although it haven't added in breeze currently.

for example, currently in scala, Seq has a Parallelizable trait so that seq.foreach can use parallel version seq.par.foreach and I think breeze will add this feature in future.

@srowen
Copy link
Member

srowen commented Aug 4, 2016

I get though sounds like there is not necessarily any such optimization now and actually not sure there can be. It could even be slower; it introduces an extra copy. It is somewhat harder to understand and different from its sibling method. I'm not sure we should do this until it is a demonstrable benefit.

@WeichenXu123
Copy link
Contributor Author

@srowen OK I close the pr for now if I found better way to optimize it I will reopen it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants