Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing #4025

Closed
wants to merge 2 commits into from

Conversation

MechCoder
Copy link
Contributor

Slightly different than the scala code which converts the sparsevector into a densevector and then checks the index.

I also hope I've added tests in the right place.

@MechCoder
Copy link
Contributor Author

ping @jkbradley @mengxr Would be great if you could have a look :)

@MechCoder MechCoder changed the title [SPARK-2909] [Mlib] SparseVector in pyspark now supports indexing [SPARK-2909] [Mlib] [PySpark] SparseVector in pyspark now supports indexing Jan 13, 2015
@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25480 has started for PR 4025 at commit 3528e47.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25480 has finished for PR 4025 at commit 3528e47.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25480/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25481 has started for PR 4025 at commit f02148b.

  • This patch merges cleanly.

@@ -510,6 +510,22 @@ def __eq__(self, other):
and np.array_equal(other.indices, self.indices)
and np.array_equal(other.values, self.values))

def __getitem__(self, item):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we rename item to index?

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25481 has finished for PR 4025 at commit f02148b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25481/
Test PASSed.

@MechCoder MechCoder changed the title [SPARK-2909] [Mlib] [PySpark] SparseVector in pyspark now supports indexing [SPARK-2909] [MLlib] [PySpark] SparseVector in pyspark now supports indexing Jan 14, 2015
@MechCoder
Copy link
Contributor Author

@mengxr I've fixed up your comments. Btw, should we use a similar logic for the scala code? Right now it seems to convert it into a dense vector, which I'm not sure is advisable.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25503 has started for PR 4025 at commit 07d0f26.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25503 has finished for PR 4025 at commit 07d0f26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25503/
Test PASSed.

@MechCoder
Copy link
Contributor Author

Also I'm thinking out aloud if it's worthy enough to implement __setitem__

@davies
Copy link
Contributor

davies commented Jan 14, 2015

This looks good to me, thanks!

@MechCoder
Copy link
Contributor Author

Thanks, has this been pushed to master, so that I can close it?

@davies
Copy link
Contributor

davies commented Jan 14, 2015

It's not merged yet, github will close it once it get merged.

@mengxr
Copy link
Contributor

mengxr commented Jan 14, 2015

LGTM. @MechCoder The Scala code uses Breeze's index lookup, which uses bisection as well. You can try implementing bisection in MLlib and then doing a micro-benchmark. If there is a big difference, we will have the implementation in MLlib.

@asfgit asfgit closed this in 5840f54 Jan 14, 2015
@mengxr
Copy link
Contributor

mengxr commented Jan 14, 2015

Merged into master. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants