Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly #8166

Closed
wants to merge 7 commits into from

Conversation

yanboliang
Copy link
Contributor

PySpark DenseVector, SparseVector __eq__ method should use semantics equality, and DenseVector can compared with SparseVector.
Implement PySpark DenseVector, SparseVector __hash__ method based on the first 16 entries. That will make PySpark Vector objects can be used in collections.

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40766 has finished for PR 8166 at commit 1b4ed66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40949 has finished for PR 8166 at commit 2a85d09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang yanboliang changed the title [SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector __eq__ should use semantics [SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly Aug 15, 2015
while k2 < v2_size and v2_values[k2] == 0:
k2 += 1

if k1 >= v1_size or k2 >= v2_size:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since k1 will be at most == v1_size due to the earlier while, checking for == here will suffice and is easier to read

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for k2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think checking k1 >= v1_size is more robust than k1 == v1_size, and Scala code also use the former one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's fine with me

@feynmanliang
Copy link
Contributor

LGTM after docstring change

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41666 has finished for PR 8166 at commit d63d54e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if len(self) != other.size:
return false
return Vectors.equals(list(xrange(len(self))), self.array, other.indices, other.values)
return NotImplemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it return False?

@mengxr
Copy link
Contributor

mengxr commented Sep 11, 2015

@yanboliang Please update the PR to use the first 128 nonzeros entries to compute hash.

@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42420 has finished for PR 8166 at commit 3b8ac7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -122,6 +123,15 @@ def _format_float_list(l):
return [_format_float(x) for x in l]


def _double_to_long_bits(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the code more readable:

if isnan(value):
  value = float('nan')
return struct.unpack('Q', struct.pack('d', value))[0]

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42465 has finished for PR 8166 at commit b58d1bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Sep 15, 2015

LGTM. Merged into master. @yanboliang assertEquals is deprecated. Could you make a pass over existing unit tests and make a new PR that changes assertEquals to assertEqual? Thanks!

@asfgit asfgit closed this in 4ae4d54 Sep 15, 2015
@yanboliang
Copy link
Contributor Author

@mengxr OK, I opened SPARK-10615 to track the assertEquals to assertEqual issue. I will submit a PR in a few days.

@yanboliang yanboliang deleted the spark-9793 branch May 5, 2016 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants