Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py #32533

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented May 13, 2021

What changes were proposed in this pull request?

This PR removes the check of summary.logLikelihood in ml/clustering.py - this GMM test is quite flaky. It fails easily e.g., if:

  • change number of partitions;
  • just change the way to compute the sum of weights;
  • change the underlying BLAS impl

Also uses more permissive precision on Word2Vec test case.

Why are the changes needed?

To recover the build and tests.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing test cases.

@zhengruifeng
Copy link
Contributor Author

ping @HyukjinKwon @srowen @viirya

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass

@HyukjinKwon HyukjinKwon changed the title [SPARK-35392][ML][PYTHON] remove Flaky GMM Test in ml/clustering.py [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py May 13, 2021
@SparkQA
Copy link

SparkQA commented May 13, 2021

Test build #138498 has finished for PR 32533 at commit 0081246.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43018/

@SparkQA
Copy link

SparkQA commented May 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43018/

@HyukjinKwon
Copy link
Member

@zhengruifeng would you mind fixing:

**********************************************************************
File "/__w/spark/spark/python/pyspark/ml/feature.py", line 4681, in __main__.Word2Vec
Failed example:
    model.getVectors().show()
Expected:
    +----+--------------------+
    |word|              vector|
    +----+--------------------+
    |   a|[0.09511678665876...|
    |   b|[-1.2028766870498...|
    |   c|[0.30153277516365...|
    +----+--------------------+
    ...
Got:
    +----+--------------------+
    |word|              vector|
    +----+--------------------+
    |   a|[0.09511695802211...|
    |   b|[-1.2028766870498...|
    |   c|[0.30153274536132...|
    +----+--------------------+
    <BLANKLINE>
**********************************************************************

too? feel free to change the JIRA.

I think we can just fix it like:

    +----+--------------------+
    |word|              vector|
    +----+--------------------+
    |   a|[0.0951 ...
    |   b|[-1.202 ...
    |   c|[0.3015 ...
    +----+--------------------+

@HyukjinKwon HyukjinKwon changed the title [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py [SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py May 13, 2021
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented May 13, 2021

Test build #138510 has finished for PR 32533 at commit a1fd16f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43030/

@SparkQA
Copy link

SparkQA commented May 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43030/

@HyukjinKwon
Copy link
Member

Merged to master.

@HyukjinKwon
Copy link
Member

Thanks @zhengruifeng for fixing this!

@dongjoon-hyun
Copy link
Member

Yes, one stone for two birds! Nice!

@zhengruifeng zhengruifeng deleted the SPARK_35392_disable_flaky_gmm_test branch May 14, 2021 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants