Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8481] [MLlib] GaussianMixtureModel.predict, GaussianMixtureModel.predictSoft variants for a single vector #6906

Closed
wants to merge 1 commit into from

Conversation

dkobylarz
Copy link
Contributor

This PR adds GaussianMixtureModel.predict & GaussianMixtureModel.predictSoft variants for a single vector which are useful when applying the trained model in environments where spark context is not required (or not desired) and predictions are made for single data points (vectors).
Test case included.

@srowen
Copy link
Member

srowen commented Jun 19, 2015

OK to test

@mengxr
Copy link
Contributor

mengxr commented Jun 19, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35290 has finished for PR 6906 at commit dce2432.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dkobylarz
Copy link
Contributor Author

A blank line with two space characters (added by IDE) caused the test to fail. Is there some source code validation tool I could apply to my code before contributing?

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor

I think that this particular build failure is due to code that failed our style checker being committed to master; we've fixed this with a hotfix commit.

You can use the dev/lint-scala script to run these style checks yourself.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35306 has finished for PR 6906 at commit dce2432.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35318 has finished for PR 6906 at commit cb87180.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

@dkobylarz Thanks for the PR! This looks fine, but could you please add a simple test to the unit tests? It could be a new unit test in the GMM suite, or just an extra line testing this method in an existing unit test.

@jkbradley
Copy link
Member

Ping

@dkobylarz
Copy link
Contributor Author

it will be added on Monday

@jkbradley
Copy link
Member

Great thanks!

@SparkQA
Copy link

SparkQA commented Jul 6, 2015

Test build #36568 has finished for PR 6906 at commit 54af203.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dkobylarz
Copy link
Contributor Author

I quickly reviewed the Jenkins log. Mllib tests seem to be passed, it's the sql test that fails. Have there been any changes to 1.4 branch that are responsible for it?

@hadfield
Copy link

hadfield commented Jul 8, 2015

Jenkins, retest this please.

@jkbradley
Copy link
Member

@dkobylarz Could you please update the PR description to say what was done in this PR? It will become part of the git commit message (so the current description does not quite fit).

Jenkins test this please

@@ -66,6 +66,12 @@ class GaussianMixtureModel(
responsibilityMatrix.map(r => r.indexOf(r.max))
}

/** Maps given point to its cluster index. */
def predict(point: Vector) : Int = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala style: no space between parenthesis and colon

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37547 has finished for PR 6906 at commit cef1f0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

@dkobylarz Could you please update the PR description to say what was done in this PR? It will become part of the git commit message (so the current description does not quite fit).

Also, as you make updates, it is helpful if you only rebase in order to fix merge conflicts. Otherwise, it's hard to tell what changes you have made in each update.

I'll take another look.

@dkobylarz dkobylarz changed the title [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector [SPARK-8481] [MLlib] GaussianMixtureModel.predict, GaussianMixtureModel.predictSoft variants for single vector Jul 21, 2015
@dkobylarz dkobylarz changed the title [SPARK-8481] [MLlib] GaussianMixtureModel.predict, GaussianMixtureModel.predictSoft variants for single vector [SPARK-8481] [MLlib] GaussianMixtureModel.predict, GaussianMixtureModel.predictSoft variants for a single vector Jul 21, 2015
@dkobylarz
Copy link
Contributor Author

Description updated.
I treat this PR as an atomic request. The commits are always squashed.

@jkbradley
Copy link
Member

What did you update the description to? I still see "This PR addresses the issue I have just reported. After taking a look at the code It seemed trivial to implement.", which says nothing about what this PR is doing. (Is github being slow to update or something?)

When this PR gets merged, the merge script will squash all commits in this PR anyways. Separating commits during review makes review easier.

@jkbradley
Copy link
Member

This LGTM once the PR description is updated. Thanks!

@dkobylarz
Copy link
Contributor Author

Could you please tell me how to edit the PR description. In this view I am only able to edit the PR title.
OK, I won't squash them next time.

@dkobylarz
Copy link
Contributor Author

Nevermind, I didn't realize the first comment in this stream is also the PR description.

@jkbradley
Copy link
Member

@dkobylarz No problem, this looks fine now. I'll merge it with master.
Thanks for the PR!

asfgit pushed a commit that referenced this pull request Jul 21, 2015
…el.predictSoft variants for a single vector

This PR adds GaussianMixtureModel.predict & GaussianMixtureModel.predictSoft variants for a single vector which are useful when applying the trained model in environments where spark context is not required (or not desired) and predictions are made for single data points (vectors).
Test case included.

Author: Dariusz Kobylarz <darek.kobylarz@gmail.com>

Closes #6906 from dkobylarz/branch-1.4 and squashes the following commits:

cef1f0a [Dariusz Kobylarz] [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector
@jkbradley
Copy link
Member

@dkobylarz The merge script failed to close this for some reason. Would you mind closing it manually? Thanks

@dkobylarz
Copy link
Contributor Author

Done

@dkobylarz dkobylarz closed this Jul 24, 2015
@yu-iskw
Copy link
Contributor

yu-iskw commented Aug 7, 2015

@jkbradley let me just check, was this PR merged into only branch-1.4? It seems that predict(point: Vector) doesn't exist master branch. Is this right?

Related to: #7662

@jkbradley
Copy link
Member

Uh oh, I'm not sure how I ended up merging that with branch-1.4...that's very strange. I'll see about fixing that.

@jkbradley
Copy link
Member

@dkobylarz I just reverted this change in branch-1.4. It was because this PR was targeted at branch-1.4, rather than the master. Please make sure PRs are targeted at the master branch in the future (and I will need to be more careful at merge time!). I'll create a new PR and send it for branch-1.5 and master.

@dkobylarz
Copy link
Contributor Author

ok, I was thinking of getting this to 1.4 branch asap. I will contribute to master branch only in future. Sorry about that and thanks for the guidance!

@jkbradley
Copy link
Member

No problem, I need to be careful too!

@yu-iskw
Copy link
Contributor

yu-iskw commented Aug 7, 2015

@jkbradley @dkobylarz I appreciate your checking! I look forward to merging it into the master branch!

asfgit pushed a commit that referenced this pull request Aug 7, 2015
…ctor

Resubmit of [#6906] for adding single-vec predict to GMMs

CC: dkobylarz  mengxr

To be merged with master and branch-1.5
Primary author: dkobylarz

Author: Dariusz Kobylarz <darek.kobylarz@gmail.com>

Closes #8039 from jkbradley/gmm-predict-vec and squashes the following commits:

bfbedc4 [Dariusz Kobylarz] [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector

(cherry picked from commit e2fbbe7)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 7, 2015
…ctor

Resubmit of [#6906] for adding single-vec predict to GMMs

CC: dkobylarz  mengxr

To be merged with master and branch-1.5
Primary author: dkobylarz

Author: Dariusz Kobylarz <darek.kobylarz@gmail.com>

Closes #8039 from jkbradley/gmm-predict-vec and squashes the following commits:

bfbedc4 [Dariusz Kobylarz] [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
…ctor

Resubmit of [apache#6906] for adding single-vec predict to GMMs

CC: dkobylarz  mengxr

To be merged with master and branch-1.5
Primary author: dkobylarz

Author: Dariusz Kobylarz <darek.kobylarz@gmail.com>

Closes apache#8039 from jkbradley/gmm-predict-vec and squashes the following commits:

bfbedc4 [Dariusz Kobylarz] [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
…ctor

Resubmit of [apache/spark#6906] for adding single-vec predict to GMMs

CC: dkobylarz  mengxr

To be merged with master and branch-1.5
Primary author: dkobylarz

Author: Dariusz Kobylarz <darek.kobylarz@gmail.com>

Closes #8039 from jkbradley/gmm-predict-vec and squashes the following commits:

bfbedc4 [Dariusz Kobylarz] [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants