[SPARK-26646][TEST][PySpark] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction #23586

viirya · 2019-01-18T15:08:31Z

What changes were proposed in this pull request?

The test pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction looks sometimes flaky.

======================================================================
FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
Test that the model improves on toy data with no. of batches
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction
    self._eventually(condition, timeout=60.0)
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 69, in _eventually
    lastValue = condition()
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 362, in condition
    self.assertGreater(errors[1] - errors[-1], 0.3)
AssertionError: -0.070000000000000062 not greater than 0.3

----------------------------------------------------------------------
Ran 13 tests in 198.327s

FAILED (failures=1, skipped=1)

Had test failures in pyspark.mllib.tests.test_streaming_algorithms with python3.4; see logs

The predict stream can possibly be consumed to the end before the input stream. When it happens, the model improvement is not high as expected and causes test failed. This patch tries to increase number of batches of streams. This won't increase test time because we have a timeout there.

How was this patch tested?

Manually test.

viirya · 2019-01-18T15:09:12Z

cc @HyukjinKwon

SparkQA · 2019-01-18T15:30:12Z

Test build #101404 has finished for PR 23586 at commit 5fb24ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-01-18T15:52:33Z

Merged to master! Thanks @viirya!

viirya · 2019-01-18T15:54:26Z

Thanks @HyukjinKwon @srowen

…_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction ## What changes were proposed in this pull request? The test pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction looks sometimes flaky. ``` ====================================================================== FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) Test that the model improves on toy data with no. of batches ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction self._eventually(condition, timeout=60.0) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 69, in _eventually lastValue = condition() File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 362, in condition self.assertGreater(errors[1] - errors[-1], 0.3) AssertionError: -0.070000000000000062 not greater than 0.3 ---------------------------------------------------------------------- Ran 13 tests in 198.327s FAILED (failures=1, skipped=1) Had test failures in pyspark.mllib.tests.test_streaming_algorithms with python3.4; see logs ``` The predict stream can possibly be consumed to the end before the input stream. When it happens, the model improvement is not high as expected and causes test failed. This patch tries to increase number of batches of streams. This won't increase test time because we have a timeout there. ## How was this patch tested? Manually test. Closes apache#23586 from viirya/SPARK-26646. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

Fix flaky test.

5fb24ce

HyukjinKwon approved these changes Jan 18, 2019

View reviewed changes

srowen approved these changes Jan 18, 2019

View reviewed changes

asfgit closed this in 8503aa3 Jan 18, 2019

ulysses-you mentioned this pull request Oct 17, 2020

[SPARK-33131][SQL][2.4] Fix grouping sets with having clause can not resolve qualified col name #30075

Closed

viirya deleted the SPARK-26646 branch December 27, 2023 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-26646][TEST][PySpark] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction #23586

[SPARK-26646][TEST][PySpark] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction #23586

viirya commented Jan 18, 2019

viirya commented Jan 18, 2019

SparkQA commented Jan 18, 2019

HyukjinKwon commented Jan 18, 2019

viirya commented Jan 18, 2019

[SPARK-26646][TEST][PySpark] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction #23586

[SPARK-26646][TEST][PySpark] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction #23586

Conversation

viirya commented Jan 18, 2019

What changes were proposed in this pull request?

How was this patch tested?

viirya commented Jan 18, 2019

SparkQA commented Jan 18, 2019

HyukjinKwon commented Jan 18, 2019

viirya commented Jan 18, 2019