[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement eq and hash correctly #8166

yanboliang · 2015-08-13T14:55:17Z

PySpark DenseVector, SparseVector __eq__ method should use semantics equality, and DenseVector can compared with SparseVector.
Implement PySpark DenseVector, SparseVector __hash__ method based on the first 16 entries. That will make PySpark Vector objects can be used in collections.

SparkQA · 2015-08-13T15:22:05Z

Test build #40766 has finished for PR 8166 at commit 1b4ed66.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2015-08-15T09:02:30Z

Jenkins, test this please.

SparkQA · 2015-08-15T09:27:13Z

Test build #40949 has finished for PR 8166 at commit 2a85d09.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

feynmanliang · 2015-08-24T18:03:09Z

python/pyspark/mllib/linalg/__init__.py

+            while k2 < v2_size and v2_values[k2] == 0:
+                k2 += 1
+
+            if k1 >= v1_size or k2 >= v2_size:


nit: since k1 will be at most == v1_size due to the earlier while, checking for == here will suffice and is easier to read

ditto for k2

Actually I think checking k1 >= v1_size is more robust than k1 == v1_size, and Scala code also use the former one.

OK, that's fine with me

feynmanliang · 2015-08-26T17:21:58Z

LGTM after docstring change

SparkQA · 2015-08-27T03:48:02Z

Test build #41666 has finished for PR 8166 at commit d63d54e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-09-10T22:04:25Z

python/pyspark/mllib/linalg/__init__.py

+            if len(self) != other.size:
+                return false
+            return Vectors.equals(list(xrange(len(self))), self.array, other.indices, other.values)
+        return NotImplemented


Should it return False?

mengxr · 2015-09-11T03:15:08Z

@yanboliang Please update the PR to use the first 128 nonzeros entries to compute hash.

SparkQA · 2015-09-14T10:32:14Z

Test build #42420 has finished for PR 8166 at commit 3b8ac7a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-09-14T16:54:48Z

python/pyspark/mllib/linalg/__init__.py

@@ -122,6 +123,15 @@ def _format_float_list(l):
    return [_format_float(x) for x in l]


+def _double_to_long_bits(value):


We can make the code more readable:

if isnan(value): value = float('nan') return struct.unpack('Q', struct.pack('d', value))[0]

…ality

SparkQA · 2015-09-15T03:12:51Z

Test build #42465 has finished for PR 8166 at commit b58d1bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-09-15T04:39:03Z

LGTM. Merged into master. @yanboliang assertEquals is deprecated. Could you make a pass over existing unit tests and make a new PR that changes assertEquals to assertEqual? Thanks!

yanboliang · 2015-09-15T10:59:43Z

@mengxr OK, I opened SPARK-10615 to track the assertEquals to assertEqual issue. I will submit a PR in a few days.

yanboliang mentioned this pull request Aug 15, 2015

[SPARK-9940] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __hash__ method. #8167

Closed

yanboliang changed the title ~~[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector __eq__ should use semantics~~ [SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly Aug 15, 2015

feynmanliang reviewed Aug 24, 2015
View reviewed changes

feynmanliang mentioned this pull request Aug 25, 2015

[SPARK-9525] [PySpark] [MLlib] Optimize SparseVector initialization #7854

Closed

davies reviewed Sep 10, 2015
View reviewed changes

yanboliang added 6 commits September 14, 2015 18:10

PySpark DenseVector, SparseVector __eq__ should use semantics

1e9d1bc

PySpark DenseVector, SparseVector implement __hash__

7489a44

document the indices must be strictly increasing

83f51ed

use the first 128 nonzeros entries to compute hash for PySpark Vector

fca0f5a

move the test to tests.py

d3f8c14

equals only internal used, so rename to _equals

3b8ac7a

yanboliang force-pushed the spark-9793 branch from d63d54e to 3b8ac7a Compare September 14, 2015 10:11

mengxr reviewed Sep 14, 2015
View reviewed changes

make _double_to_long_bits more readable & use assertEqual to test equ…

b58d1bb

…ality

asfgit closed this in 4ae4d54 Sep 15, 2015

yanboliang deleted the spark-9793 branch May 5, 2016 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement eq and hash correctly #8166

[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement eq and hash correctly #8166

yanboliang commented Aug 13, 2015

SparkQA commented Aug 13, 2015

yanboliang commented Aug 15, 2015

SparkQA commented Aug 15, 2015

feynmanliang Aug 24, 2015

feynmanliang Aug 24, 2015

yanboliang Aug 26, 2015

feynmanliang Aug 26, 2015

feynmanliang commented Aug 26, 2015

SparkQA commented Aug 27, 2015

davies Sep 10, 2015

mengxr commented Sep 11, 2015

SparkQA commented Sep 14, 2015

mengxr Sep 14, 2015

SparkQA commented Sep 15, 2015

mengxr commented Sep 15, 2015

yanboliang commented Sep 15, 2015

		@@ -122,6 +123,15 @@ def _format_float_list(l):
		return [_format_float(x) for x in l]


		def _double_to_long_bits(value):

[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly #8166

[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly #8166

Conversation

yanboliang commented Aug 13, 2015

SparkQA commented Aug 13, 2015

yanboliang commented Aug 15, 2015

SparkQA commented Aug 15, 2015

feynmanliang Aug 24, 2015

Choose a reason for hiding this comment

feynmanliang Aug 24, 2015

Choose a reason for hiding this comment

yanboliang Aug 26, 2015

Choose a reason for hiding this comment

feynmanliang Aug 26, 2015

Choose a reason for hiding this comment

feynmanliang commented Aug 26, 2015

SparkQA commented Aug 27, 2015

davies Sep 10, 2015

Choose a reason for hiding this comment

mengxr commented Sep 11, 2015

SparkQA commented Sep 14, 2015

mengxr Sep 14, 2015

Choose a reason for hiding this comment

SparkQA commented Sep 15, 2015

mengxr commented Sep 15, 2015

yanboliang commented Sep 15, 2015

[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement eq and hash correctly #8166

[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement eq and hash correctly #8166