Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28736][SPARK-28735][PYTHON][ML][TESTS] Fix PySpark ML tests to pass in JDK 11 #25475

Closed
wants to merge 1 commit into from

Conversation

@HyukjinKwon
Copy link
Member

commented Aug 16, 2019

What changes were proposed in this pull request?

This PR proposes to fix both tests below:

======================================================================
FAIL: test_raw_and_probability_prediction (pyspark.ml.tests.test_algorithms.MultilayerPerceptronClassifierTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dongjoon/APACHE/spark-master/python/pyspark/ml/tests/test_algorithms.py", line 89, in test_raw_and_probability_prediction
    self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, atol=1E-4))
AssertionError: False is not true
File "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", line 386, in __main__.GaussianMixtureModel
Failed example:
    abs(softPredicted[0] - 1.0) < 0.001
Expected:
    True
Got:
    False
**********************************************************************
File "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", line 388, in __main__.GaussianMixtureModel
Failed example:
    abs(softPredicted[1] - 0.0) < 0.001
Expected:
    True
Got:
    False

to pass in JDK 11.

The root cause seems to be different float values being understood via Py4J. This issue also was found in #25132 before.

When floats are transferred from Python to JVM, the values are sent as are. Python floats are not "precise" due to its own limitation - https://docs.python.org/3/tutorial/floatingpoint.html.
For some reasons, the floats from Python on JDK 8 and JDK 11 are different, which is already explicitly not guaranteed.

This seems why only some tests in PySpark with floats are being failed.

So, this PR fixes it by increasing tolerance in identified test cases in PySpark.

Why are the changes needed?

To fully support JDK 11. See, for instance, #25443 and #25423 for ongoing efforts.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested as described in JIRAs:

$ build/sbt -Phadoop-3.2 test:package
$ python/run-tests --testnames 'pyspark.ml.tests.test_algorithms' --python-executables python
$ build/sbt -Phadoop-3.2 test:package
$ python/run-tests --testnames 'pyspark.mllib.clustering' --python-executables python
@HyukjinKwon

This comment has been minimized.

Copy link
Member Author

commented Aug 16, 2019

cc @WeichenXu123, @srowen, @dongjoon-hyun, this fixes PySpark tests on JDK 11.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

Wow. Thank you, @HyukjinKwon !

@@ -383,11 +383,11 @@ class GaussianMixtureModel(JavaModelWrapper, JavaSaveable, JavaLoader):
>>> model.predict([-0.1,-0.05])
0
>>> softPredicted = model.predictSoft([-0.1,-0.05])

This comment has been minimized.

Copy link
@HyukjinKwon

HyukjinKwon Aug 16, 2019

Author Member

For instance, weights within Gaussian mixture model:

JDK 8

weights: WrappedArray(0.49520257460263445, 0.33813075873069875, 0.16666666666666685)

JDK 11

weights: WrappedArray(0.5000000000000001, 0.33333333333333326, 0.16666666666666666)

This comment has been minimized.

Copy link
@srowen

srowen Aug 16, 2019

Member

Also probably OK for the same reason. The test was too specific.

@SparkQA

This comment has been minimized.

Copy link

commented Aug 16, 2019

Test build #109210 has finished for PR 25475 at commit 0720268.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@@ -86,7 +86,7 @@ def test_raw_and_probability_prediction(self):
expected_rawPrediction = [-11.6081922998, -8.15827998691, 22.17757045]
self.assertTrue(result.prediction, expected_prediction)
self.assertTrue(np.allclose(result.probability, expected_probability, atol=1E-4))
self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, atol=1E-4))
self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, atol=1))

This comment has been minimized.

Copy link
@dongjoon-hyun

dongjoon-hyun Aug 16, 2019

Member

Is 1 the minimum difference?

This comment has been minimized.

Copy link
@HyukjinKwon

HyukjinKwon Aug 16, 2019

Author Member

Yup ..

JDK 8:

[-11.19194106875243,-7.677866573997363,21.280214474039443]

JDK 11:

[-11.608192299802019,-8.158279986906651,22.177570449962918]

Seems multiple floats affects the results while they are roughly correct.

This comment has been minimized.

Copy link
@srowen

srowen Aug 16, 2019

Member

I'm not sure where the difference comes from, but it could be subtle differences in randomization or something across the JDKs. If these two tests are the only ones that vary, I think we're OK. I agree with loosening the bound here as these are log-odds, and I suspect the test values were picked just because it's what some previous run spit out (that is, it's too specific)

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

+1. This PR looks reasonable and good to me.

@HyukjinKwon

This comment has been minimized.

Copy link
Member Author

commented Aug 16, 2019

Im going to just merge it. This is test-only PR and should always be fixed later. I roughly checked with @WeichenXu123 too offline as well.

@HyukjinKwon

This comment has been minimized.

Copy link
Member Author

commented Aug 16, 2019

Merged to master.

@HyukjinKwon HyukjinKwon changed the title [SPARK-28736][SPARK-28735][PYTHON][ML] Fix PySpark ML tests to pass in JDK 11 [SPARK-28736][SPARK-28735][PYTHON][ML][TESTS] Fix PySpark ML tests to pass in JDK 11 Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.