-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5161] Parallelize Python test execution #7031
Conversation
Test build #35820 has finished for PR 7031 at commit
|
Test build #35823 has finished for PR 7031 at commit
|
Test build #35834 has finished for PR 7031 at commit
|
Test build #35838 has finished for PR 7031 at commit
|
Interesting, it looks like the test failures are not a direct consequence of the parallelization since they still occur even if I only use one thread: ======================================================================
ERROR: test_termination_sigterm (__main__.DaemonTests)
Ensure that daemon and workers terminate on SIGTERM.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/tests.py", line 1451, in test_termination_sigterm
self.do_termination_test(lambda daemon: os.kill(daemon.pid, SIGTERM))
File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/tests.py", line 1424, in do_termination_test
daemon = Popen([sys.executable, daemon_path], stdin=PIPE, stdout=PIPE)
File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
======================================================================
ERROR: test_termination_stdin (__main__.DaemonTests)
Ensure that daemon and workers terminate when stdin is closed.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/tests.py", line 1446, in test_termination_stdin
self.do_termination_test(lambda daemon: daemon.stdin.close())
File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/tests.py", line 1424, in do_termination_test
daemon = Popen([sys.executable, daemon_path], stdin=PIPE, stdout=PIPE)
File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1234, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
----------------------------------------------------------------------
Ran 97 tests in 102.677s
FAILED (errors=2)
Random listing order was used |
Also, it looks like my attempt to disable the Web UI in the Python tests was unsuccessful, since the
|
Test build #35841 has finished for PR 7031 at commit
|
Test build #35847 has finished for PR 7031 at commit
|
These tests are now failing since we're now attempting to run the PySpark ML tests with PyPy, but that won't work because we don't have numpy for PyPy. Let me see if I can come up with a reasonably clean way to handle this. |
Test build #35866 has finished for PR 7031 at commit
|
Jenkins, retest this please. |
Test build #35871 has finished for PR 7031 at commit
|
Test build #35880 has finished for PR 7031 at commit
|
Test build #35877 has finished for PR 7031 at commit
|
Test build #35883 has finished for PR 7031 at commit
|
Test build #35890 has finished for PR 7031 at commit
|
Yay, this passed tests! I think that we should merge my other PR, #6967, first, then loop back to review the parallelism-specific changes here. With 4 threads / processes of parallelism, the Python tests ran in ~10 minutes, whereas they usually take 30+ minutes in Jenkins. In principle, + our tests will support a much higher level of parallelism, so I bet we could cut this down to 5 minutes or less. |
It's also worth figuring out why the Spark UI doesn't seem to be disabled properly according to |
I wonder if I can just add a |
Jenkins, retest this please. |
Test build #35925 has finished for PR 7031 at commit
|
Looks like that didn't help:
|
Jenkins, retest this please. |
Test build #35927 has finished for PR 7031 at commit
|
Jenkins, retest this please. |
Test build #36011 has finished for PR 7031 at commit
|
duration = time.time() - start_time | ||
with LOG_FILE_LOCK: | ||
with open(LOG_FILE, 'ab') as log_file: | ||
per_test_output.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: only need to print the log when the test fails.
LGTM, once you finished all the TODOs. |
Test build #36056 has finished for PR 7031 at commit
|
Test build #36057 has finished for PR 7031 at commit
|
Jenkins, retest this please. |
Test build #36062 has finished for PR 7031 at commit
|
Jenkins retest this please |
Test build #36063 has finished for PR 7031 at commit
|
Hey @davies, I've addressed the TODOs, so this should be ready for a final look and sign-off. |
LGTM, merging this into master, thanks! Right now, all the python tests will finished in about 10 minutes, cool! |
This patch fixes a bug introduced in #7031 which can cause Jenkins to incorrectly report a build with failed Python tests as passing if an error occurred while printing the test failure message. Author: Josh Rosen <joshrosen@databricks.com> Closes #7112 from JoshRosen/python-tests-hotfix and squashes the following commits: c3f2961 [Josh Rosen] Hotfix for bug in Python test failure reporting
Test build #981 timed out for PR 7031 at commit |
This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times. Parallelism is now configurable by passing the
-p
or--parallelism
flags to eitherdev/run-tests
orpython/run-tests
(the default parallelism is 4, but I've successfully tested with higher parallelism).To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.