Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5012][MLLib][PySpark]Python API for Gaussian Mixture Model #4059

Closed
wants to merge 18 commits into from

Conversation

FlytxtRnD
Copy link
Contributor

Python API for the Gaussian Mixture Model clustering algorithm in MLLib.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mengxr
Copy link
Contributor

mengxr commented Jan 15, 2015

add to whitelist

@SparkQA
Copy link

SparkQA commented Jan 15, 2015

Test build #25609 has started for PR 4059 at commit 5c83825.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 15, 2015

Test build #25609 has finished for PR 4059 at commit 5c83825.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GaussianMixtureModel(object):
    • class GaussianMixtureEM(object):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25609/
Test FAILed.

@jkbradley
Copy link
Member

@FlytxtRnD Are you still running into the py4j serialization issue you mentioned on the JIRA?

@FlytxtRnD
Copy link
Contributor Author

@jkbradley py4j serialization issue has been solved by the commit 8ead999

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25634 has started for PR 4059 at commit f82750b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25634 has finished for PR 4059 at commit f82750b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GaussianMixtureModel(object):
    • class GaussianMixtureEM(object):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25634/
Test FAILed.

…nGmmWrapper' into PythonGmmWrapper

Conflicts:
	examples/src/main/python/mllib/gaussian_mixture_model.py
@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25648 has started for PR 4059 at commit c1d4c71.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25648 has finished for PR 4059 at commit c1d4c71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GaussianMixtureModel(object):
    • class GaussianMixtureEM(object):

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25648/
Test PASSed.

import random
import argparse
import numpy as np
from pyspark import SparkConf, SparkContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate spark imports from python imports

@FlytxtRnD
Copy link
Contributor Author

@mengxr Thank you for the review and comments. I am changing the code according to #3923 (tgaloppo).

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26509 has started for PR 4059 at commit d5b36ab.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26509 has finished for PR 4059 at commit d5b36ab.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GaussianMixtureModel(object):
    • class GaussianMixture(object):
    • class MultivariateGaussian(object):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26509/
Test FAILed.

@FlytxtRnD
Copy link
Contributor Author

Is it possible to start a test build in Jenkins without updating the PR? TestResults shows "no failures" but the console output shows errors. All tests were passed when ran locally.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26515 has started for PR 4059 at commit fa0a142.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26515 has finished for PR 4059 at commit fa0a142.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26515/
Test PASSed.

@FlytxtRnD
Copy link
Contributor Author

Please review and merge..

.setK(k)
.setConvergenceTol(convergenceTol)
.setMaxIterations(maxIterations)
.setSeed(seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default seed is not generated. If users don't specify seed on the Python side, it will pass in as null on the JVM side.

if (seed != null) gmmAlg.setSeed(seed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In python side I have added a default value None ..So is it required to add this statement

@SparkQA
Copy link

SparkQA commented Feb 3, 2015

Test build #26607 has started for PR 4059 at commit c973ab3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 3, 2015

Test build #26607 has finished for PR 4059 at commit c973ab3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GaussianMixtureModel(object):
    • class GaussianMixture(object):
    • class MultivariateGaussian(namedtuple('MultivariateGaussian', ['mu', 'sigma'])):

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26607/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Feb 3, 2015

LGTM. Merged into master. Thanks for adding GMM Python API!

@asfgit asfgit closed this in 50a1a87 Feb 3, 2015
@FlytxtRnD
Copy link
Contributor Author

Thanks @mengxr for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants