Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10906][MLlib] More efficient SparseMatrix.equals #8960

Closed

Conversation

rahulpalamuttam
Copy link
Contributor

@jkbradley
Calls toBreeze.activeIterator which, for CSCMatrix, returns an iterator
with only non zero values. The iterators are then converted to sets and
checked for equality. Since the input parameter can be of type DenseMatrix
the second set is filtered for non-zero values. This is because activeIterator
for DenseMatrix returns all values.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

case m: Matrix =>
val thisIteratorSet = toBreeze.activeIterator.toSet
val mIteratorSet = m.toBreeze.activeIterator.toSet.filter(p => p._2 != 0.0)
thisIteratorSet == mIteratorSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than converting to sets (fairly expensive), how about comparing the iterators directly? It's a bit more complex (handling zeros) but would then be linear time, with a small constant. It would also allow early stopping as soon as a non-matching value was found.

Also, this method should compare the matrix dimensions before looking at the values.

@jkbradley
Copy link
Member

You're right about them not using the same order in activeIterator; that's annoying. I like the tests you added before using the iterators. But I don't think the use of the iterators will handle zero values properly. Can you go ahead and add one or more unit tests, particularly where there are both implicit and explicit zeros, in order to catch such issues?

@jkbradley
Copy link
Member

In a sparse representation, an implicit zero is not in values or indices arrays, but an explicit zero is (with value 0). I'm sending your PR a PR to demonstrate the problem.

@jkbradley
Copy link
Member

Ping! Will you have a chance to update this soon?

…ithub.com/jkbradley/spark into jkbradley-rahulpalamuttam-RahulP_SparseMatrixEquals

Conflicts:
	mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
@rahulpalamuttam
Copy link
Contributor Author

@jkbradley
I have merged you're PR to my branch.
Both of our tests for explicit zeros pass. (I have included both).
Also I have taken out the unnecessary test involving the 10x10 identity matrix.

@rahulpalamuttam
Copy link
Contributor Author

Are there any more edge conditions left to consider in this patch?

@mengxr
Copy link
Contributor

mengxr commented Oct 20, 2015

@rahulpalamuttam May I ask your use case for Matrix.equals? It is not common to test strict equality between two double matrices. Checking the norm on their difference is more proper. The only issue we have observed is in Python serialization. But I'm not sure whether this applies to you.

Btw, a better solution would be updating the implementation in breeze instead of Spark, which benefits both projects.

@jkbradley
Copy link
Member

@rahulpalamuttam Sorry for the delay in reviewing! How would you feel about updating the implementation in Breeze, rather than in Spark? I expect you could use much of the code you've already written, and I'd be happy to help review your PR to Breeze if it's helpful.

@rahulpalamuttam
Copy link
Contributor Author

@jkbradley
No worries. I'll submit a PR to Breeze with the changes.
I've highlighted the function to be changed in the Breeze Matrix class in the following link.

https://github.com/scalanlp/breeze/blob/3f978aa177ebc11a881379a97d0618400e0b1b49/math/src/main/scala/breeze/linalg/Matrix.scala#L134-L139.

Do you want me to update the PR with the test code and remove the implementation code?

@rahulpalamuttam
Copy link
Contributor Author

@jkbradley @mengxr
I have opened the new PR in Breeze here :
scalanlp/breeze#480

@jkbradley
Copy link
Member

@rahulpalamuttam Yes, please, it'd be great to update this PR to be a test only (assuming it fails without the Breeze fix?). I'll test and comment on the Breeze PR.

@rahulpalamuttam
Copy link
Contributor Author

Actually the test passes regardless of whether the fix is there or not.

@jkbradley
Copy link
Member

Right, sorry, I was losing track of things. In that case, let's close this issue and focus on improving the Breeze method. Thanks!

@asfgit asfgit closed this in ef8fb36 Jan 25, 2016
@mengxr
Copy link
Contributor

mengxr commented Jan 25, 2016

I closed this PR based on the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants