[SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing #4025

MechCoder · 2015-01-13T19:24:39Z

Slightly different than the scala code which converts the sparsevector into a densevector and then checks the index.

I also hope I've added tests in the right place.

MechCoder · 2015-01-13T19:26:33Z

ping @jkbradley @mengxr Would be great if you could have a look :)

SparkQA · 2015-01-13T19:27:45Z

Test build #25480 has started for PR 4025 at commit 3528e47.

This patch merges cleanly.

SparkQA · 2015-01-13T19:28:51Z

Test build #25480 has finished for PR 4025 at commit 3528e47.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-13T19:28:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25480/
Test FAILed.

SparkQA · 2015-01-13T19:37:40Z

Test build #25481 has started for PR 4025 at commit f02148b.

This patch merges cleanly.

mengxr · 2015-01-13T20:42:27Z

python/pyspark/mllib/linalg.py

@@ -510,6 +510,22 @@ def __eq__(self, other):
                and np.array_equal(other.indices, self.indices)
                and np.array_equal(other.values, self.values))

+    def __getitem__(self, item):


Shall we rename item to index?

SparkQA · 2015-01-13T20:46:35Z

Test build #25481 has finished for PR 4025 at commit f02148b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-13T20:46:39Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25481/
Test PASSed.

MechCoder · 2015-01-14T02:49:32Z

@mengxr I've fixed up your comments. Btw, should we use a similar logic for the scala code? Right now it seems to convert it into a dense vector, which I'm not sure is advisable.

SparkQA · 2015-01-14T02:52:35Z

Test build #25503 has started for PR 4025 at commit 07d0f26.

This patch merges cleanly.

SparkQA · 2015-01-14T04:01:18Z

Test build #25503 has finished for PR 4025 at commit 07d0f26.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-14T04:01:22Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25503/
Test PASSed.

MechCoder · 2015-01-14T07:40:27Z

Also I'm thinking out aloud if it's worthy enough to implement __setitem__

davies · 2015-01-14T18:17:25Z

This looks good to me, thanks!

MechCoder · 2015-01-14T18:22:35Z

Thanks, has this been pushed to master, so that I can close it?

davies · 2015-01-14T18:41:36Z

It's not merged yet, github will close it once it get merged.

mengxr · 2015-01-14T19:02:41Z

LGTM. @MechCoder The Scala code uses Breeze's index lookup, which uses bisection as well. You can try implementing bisection in MLlib and then doing a micro-benchmark. If there is a big difference, we will have the implementation in MLlib.

mengxr · 2015-01-14T19:04:05Z

Merged into master. Thanks!

MechCoder changed the title ~~[SPARK-2909] [Mlib] SparseVector in pyspark now supports indexing~~ [SPARK-2909] [Mlib] [PySpark] SparseVector in pyspark now supports indexing Jan 13, 2015

[SPARK-2909] [Mlib] SparseVector in pyspark now supports indexing

f02148b

MechCoder force-pushed the spark-2909 branch from 3528e47 to f02148b Compare January 13, 2015 19:33

mengxr reviewed Jan 13, 2015
View reviewed changes

MechCoder changed the title ~~[SPARK-2909] [Mlib] [PySpark] SparseVector in pyspark now supports indexing~~ [SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing Jan 14, 2015

STY: Rename item to index

07d0f26

asfgit closed this in 5840f54 Jan 14, 2015

MechCoder deleted the spark-2909 branch January 14, 2015 19:04

MechCoder mentioned this pull request Jan 18, 2015

[SPARK-5257] [MLlib] SparseVector indices must be non-negative #4096

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing #4025

[SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing #4025

MechCoder commented Jan 13, 2015

MechCoder commented Jan 13, 2015

SparkQA commented Jan 13, 2015

SparkQA commented Jan 13, 2015

AmplabJenkins commented Jan 13, 2015

SparkQA commented Jan 13, 2015

mengxr Jan 13, 2015

SparkQA commented Jan 13, 2015

AmplabJenkins commented Jan 13, 2015

MechCoder commented Jan 14, 2015

SparkQA commented Jan 14, 2015

SparkQA commented Jan 14, 2015

AmplabJenkins commented Jan 14, 2015

MechCoder commented Jan 14, 2015

davies commented Jan 14, 2015

MechCoder commented Jan 14, 2015

davies commented Jan 14, 2015

mengxr commented Jan 14, 2015

mengxr commented Jan 14, 2015

[SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing #4025

[SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing #4025

Conversation

MechCoder commented Jan 13, 2015

MechCoder commented Jan 13, 2015

SparkQA commented Jan 13, 2015

SparkQA commented Jan 13, 2015

AmplabJenkins commented Jan 13, 2015

SparkQA commented Jan 13, 2015

mengxr Jan 13, 2015

Choose a reason for hiding this comment

SparkQA commented Jan 13, 2015

AmplabJenkins commented Jan 13, 2015

MechCoder commented Jan 14, 2015

SparkQA commented Jan 14, 2015

SparkQA commented Jan 14, 2015

AmplabJenkins commented Jan 14, 2015

MechCoder commented Jan 14, 2015

davies commented Jan 14, 2015

MechCoder commented Jan 14, 2015

davies commented Jan 14, 2015

mengxr commented Jan 14, 2015

mengxr commented Jan 14, 2015