[SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python #5694

brennonyork · 2015-04-24T21:35:35Z

All, this is a first attempt at refactoring dev/run-tests into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!

/cc @srowen @pwendell @nchammas

SparkQA · 2015-04-25T00:09:28Z

Test build #30950 has started for PR 5694 at commit 3c53a1a.

SparkQA · 2015-04-25T00:30:02Z

Test build #30950 has finished for PR 5694 at commit 3c53a1a.

This patch fails some tests.
This patch merges cleanly.
This patch adds no public classes.
This patch adds the following new dependencies:
- tachyon-0.6.4.jar
- tachyon-client-0.6.4.jar
This patch removes the following dependencies:
- tachyon-0.5.0.jar
- tachyon-client-0.5.0.jar

nchammas · 2015-04-25T04:11:55Z

dev/run-tests

+import subprocess as sp
+
+# Set the Spark project root directory
+spark_proj_root = os.path.abspath("..")


The variable names and logic here are very clear, so I think we can do without all the comments in this whole starting block.

nchammas · 2015-04-25T04:19:16Z

Looks like a great start! Left some comments mostly about Python style and organization. Will take a closer look next week at the actual logic and flow.

nchammas · 2015-04-25T04:20:00Z

dev/run-tests

+        sp.check_output(cmd)
+    except sp.CalledProcessError as e:
+        print "[error] running", e.cmd, "; received return code", e.returncode
+        exit(e.returncode)


Don't you need to import sys and call this as sys.exit()?

rxin · 2015-04-25T07:34:32Z

Just a naming suggestion - would be better to have an explicit .py file, and the shell file just calls that.

…sh to call the run-tests.py script

nchammas · 2015-04-26T17:04:48Z

dev/run-tests.py

+
+def set_title_and_block(title, err_block):
+    os.environ["CURRENT_BLOCK"] = error_codes[err_block]
+    line_str = "".join(['='] * 72)


This can just be:

line_str = '=' * 72

nchammas · 2015-04-26T17:23:20Z

In the near future, I guess we want to move towards also converting the run-tests-jenkins script to Python so we can get rid of some nasty Bash-isms that have (necessarily) carried over into this port, like communicating via environment variables (e.g. the current testing block).

nchammas · 2015-04-26T17:29:53Z

dev/run-tests.py

+    """Will call Maven in the current directory with the list of mvn_args passed
+    in and returns the subprocess for any further processing"""
+
+    return subprocess.Popen(["./build/mvn"] + mvn_args)


Is there any reason to not just wait() here? It seems like that's what we're always doing with the return value.

Also, if we want to raise an exception when this fails (which I think we do), we should change Popen() to check_call(), in which case no need for the wait() either.

…output to check_call

brennonyork · 2015-04-27T15:45:32Z

@rxin Thanks and taken care of!
@nchammas First, thanks a ton for all the Python reviews (I know it can be tedious)! Second, to your point about removing the Bash-isms, you're completely right in that I left them in for this PR such that we can get incremental improvement to the codebase. Once I tackle SPARK-7018 (e.g. dev/run-tests-jenkins) I think I'll be able to slowly move some of this old Bash necessity out. Feedback on that being the right path?

nchammas · 2015-04-27T16:04:29Z

Yup, that sounds fine. We definitely don't have to tackle everything in one go, but eventually yeah we should phase out all the Bash-isms.

By the way, I don't see any recent updates from Jenkins on testing this PR since Jenkins is currently shutting down.

Just a heads up: Jenkins adds an additional layer of complexity in that it messes with the terminal and can cause weird problems with output buffering and whatnot.

So you'll definitely want to track how Jenkins is outputting results to screen as it runs tests to be sure things are working as expected.

brennonyork · 2015-04-27T16:07:42Z

Roger that. Let me look into using pipes.quote for the sbt output. Do we know what's up with Jenkins right now? I saw a thread a while back talking about a power outage at Berkeley, but thought I saw a message from Shane saying everything was back to normal. Is that not the case?

nchammas · 2015-04-27T16:09:09Z

https://amplab.cs.berkeley.edu/jenkins/view/Spark/

Looks like planned maintenance?

brennonyork · 2015-06-16T21:52:06Z

Roger. I see it now. Will have a fix up shortly.

AmplabJenkins · 2015-06-16T21:57:13Z

Merged build triggered.

AmplabJenkins · 2015-06-16T21:57:23Z

Merged build started.

SparkQA · 2015-06-16T22:00:00Z

Test build #35009 has started for PR 5694 at commit 154ed73.

SparkQA · 2015-06-16T22:54:56Z

Test build #35008 has finished for PR 5694 at commit 3922a85.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-06-16T22:55:08Z

Merged build finished. Test FAILed.

brennonyork · 2015-06-16T23:12:53Z

jenkins, retest this please

AmplabJenkins · 2015-06-16T23:17:14Z

Merged build triggered.

AmplabJenkins · 2015-06-16T23:17:19Z

Merged build started.

SparkQA · 2015-06-16T23:20:06Z

Test build #35010 has started for PR 5694 at commit 154ed73.

SparkQA · 2015-06-16T23:47:00Z

Test build #35009 has finished for PR 5694 at commit 154ed73.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-06-16T23:47:15Z

Merged build finished. Test PASSed.

SparkQA · 2015-06-17T01:08:25Z

Test build #35010 has finished for PR 5694 at commit 154ed73.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-06-17T01:08:39Z

Merged build finished. Test PASSed.

JoshRosen · 2015-06-17T18:27:10Z

dev/run-tests.py

+                                                                 doc_files]])
+        changed_core_files = set(changed_files).difference(top_level_project_files)
+
+        if changed_core_files:


It would also be nice to log a "Detected changes in Core" message in this branch.

JoshRosen · 2015-06-17T18:58:57Z

Some minor nits aside, this looks good enough to me, so I'm going to merge it into master and will submit a series of followup PRs to perform more modularization / refactoring of the dependency graph logic.

Thanks for working on this!

brennonyork · 2015-06-17T23:05:54Z

Thanks for the PR merge @JoshRosen. I'll go ahead and make a hotfix branch to capture the last few nits you have above!

JoshRosen · 2015-06-18T01:58:08Z

dev/run-tests.py

+
+def run_scala_tests_sbt(test_modules, test_profiles):
+    # declare the variable for reference
+    sbt_test_goals = None


We'll end up concatenating a list to this if we only run the MLLib tests in the code below. This breaks PRB tests for MLLib, Streaming, and GraphX only PRs.

I'm considering reverting this patch or pushing a hotfix to address this.

Hotfixed in 165f52f

Hey Josh, I'm out of pocket until tomorrow morning. Can you get a hot fix in for this? Immediately I'm not seeing what the issue is / code to fix it :/ I can figure it out tomorrow morning though if it can wait.

-----Original Message-----
From: Josh Rosen [notifications@github.commailto:notifications@github.com]
Sent: Wednesday, June 17, 2015 09:58 PM Eastern Standard Time
To: apache/spark
Cc: York, Brennon
Subject: Re: [spark] [SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python (#5694)

In dev/run-tests.pyhttps://github.com//pull/5694#discussion_r32693784:

return changed_modules

+def run_scala_tests_maven(test_profiles):

mvn_test_goals = ["test", "--fail-at-end"]

profiles_and_goals = test_profiles + mvn_test_goals

print "[info] Running Spark tests using Maven with these arguments:",

print " ".join(profiles_and_goals)

exec_maven(profiles_and_goals)

+def run_scala_tests_sbt(test_modules, test_profiles):

declare the variable for reference

sbt_test_goals = None

We'll end up concatenating a list to this if we only run the MLLib tests in the code below. This breaks PRB tests for MLLib, Streaming, and GraphX only PRs.

I'm considering reverting this patch or pushing a hotfix to address this.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5694/files#r32693784.

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

The issue here was that some of the branches in the following code do sbt_test_goals += [..something..] which will fail if sbt_test_goals == None.

I pushed a hotfix which fixed this.

All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step! /cc srowen pwendell nchammas Author: Brennon York <brennon.york@capitalone.com> Closes apache#5694 from brennonyork/SPARK-7017 and squashes the following commits: 154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set 3922a85 [Brennon York] removed necessary passed in variable f9fbe54 [Brennon York] reverted doc test change 8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated 05d435b [Brennon York] added check for jekyll install 22edb78 [Brennon York] add check if jekyll isn't installed on the path 2dff136 [Brennon York] fixed pep8 whitespace errors 767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes c42cf9a [Brennon York] unpack set operations with splat (*) fb85a41 [Brennon York] fixed minor set bug 0379833 [Brennon York] minor doc addition to print the changed modules aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity ec1ae78 [Brennon York] minor name changes, bug fixes b7c72b9 [Brennon York] reverting streaming context 03fdd7b [Brennon York] fixed the tuple () wraps around example lambda 705d12e [Brennon York] changed example to comply with pep3113 supporting python3 60b3d51 [Brennon York] prepend rather than append onto PATH 7d2f5e2 [Brennon York] updated python tests to remove unused variable 2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests eb684b6 [Brennon York] fixed sbt_test_goals reference error db7ae6f [Brennon York] reverted SPARK_HOME from start of command 1ecca26 [Brennon York] fixed merge conflicts 2fcdfc0 [Brennon York] testing targte branch dump on jenkins 1f607b1 [Brennon York] finalizing revisions to modular tests 8afbe93 [Brennon York] made error codes a global 0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run b1248dc [Brennon York] exec python rather than running python and exiting with return code f9deba1 [Brennon York] python to python2 and removed newline 6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249 f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908 703f095 [Brennon York] fixed merge conflicts b1ca593 [Brennon York] reverted the sparkR test afeb093 [Brennon York] updated to make sparkR test fail 1dada6b [Brennon York] reverted pyspark test failure 9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure d825aa4 [Brennon York] revert build break, add mima break f041d8a [Brennon York] added space from commented import to now test build breaking 983f2a2 [Brennon York] comment out import to fail build test 2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017 76335fb [Brennon York] reverted rat license issue for sparkconf e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes 56d3cb9 [Brennon York] changed test back and commented out import to break compile b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment 7613558 [Brennon York] updated to return the proper env variable for return codes a5bd445 [Brennon York] reverted license, changed test in shuffle to fail 803143a [Brennon York] removed license file for SparkContext b0b2604 [Brennon York] comment out import to see if build fails and returns properly 83e80ef [Brennon York] attempt at better python output when called from bash c095fa6 [Brennon York] removed another wait() call 26e18e8 [Brennon York] removed unnecessary wait() 07210a9 [Brennon York] minor doc string change for java version with namedtuple update ec03bf3 [Brennon York] added namedtuple for java version to add readability 2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call 639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script 3c53a1a [Brennon York] uncomment the scala tests :) 6126c4f [Brennon York] refactored run-tests into python

JoshRosen · 2015-06-19T06:44:32Z

dev/run-tests.py

+
+    test_modules = set(test_modules)
+
+    hive_profiles = ("SQL" in test_modules)


I think that this logic isn't quite right for the non-pull-request-builder SBT builds, since we always want to run the SQL tests in non-PRB builds but in those cases "ALL" will be the only test module. I'm going to fix this in my PR, but I think that it's critical that we get a fix for this in soon.

…run-tests This patch builds upon apache#5694 to add a 'module' abstraction to the `dev/run-tests` script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules. This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests. It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed. This patch also adds doctests for the new graph traversal / change mapping code. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6866 from JoshRosen/more-dev-run-tests-refactoring and squashes the following commits: 75de450 [Josh Rosen] Use module system to determine which build profiles to enable. 4224da5 [Josh Rosen] Add documentation to Module. a86a953 [Josh Rosen] Clean up modules; add new modules for streaming external projects e46539f [Josh Rosen] Fix camel-cased endswith() 35a3052 [Josh Rosen] Enable Hive tests when running all tests df10e23 [Josh Rosen] update to reflect fact that no module depends on root 3670d50 [Josh Rosen] mllib should depend on streaming dc6f1c6 [Josh Rosen] Use changed files' extensions to decide whether to run style checks 7092d3e [Josh Rosen] Skip SBT tests if no test goals are specified 43a0ced [Josh Rosen] Minor fixes 3371441 [Josh Rosen] Test everything if nothing has changed (needed for non-PRB builds) 37f3fb3 [Josh Rosen] Remove doc profiles option, since it's not actually needed (see apache#6865) f53864b [Josh Rosen] Finish integrating module changes f0249bd [Josh Rosen] WIP

…adoop-1 to 1.2.1 PR #5694 reverted PR #6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR #6384 didn't bump Hadoop 1 version defined in POM. Author: Cheng Lian <lian@databricks.com> Closes #7062 from liancheng/spark-7845 and squashes the following commits: c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1

Brennon York added 2 commits April 24, 2015 14:27

refactored run-tests into python

6126c4f

uncomment the scala tests :)

3c53a1a

nchammas reviewed Apr 25, 2015
View reviewed changes

updated with pep8 rules, fixed minor bugs, added run-tests file in ba…

639f1e9

…sh to call the run-tests.py script

nchammas reviewed Apr 26, 2015
View reviewed changes

upcased global variables, changes various calling methods from check_…

2cb413b

…output to check_call

updated finding java binary if JAVA_HOME not set

154ed73

JoshRosen reviewed Jun 17, 2015
View reviewed changes

asfgit closed this in 50a0496 Jun 17, 2015

brennonyork mentioned this pull request Jun 17, 2015

[SPARK-7017][HOTFIX][Project Infra]: Refactor dev/run-tests into Python #6865

Closed

JoshRosen mentioned this pull request Jun 18, 2015

[SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests #6866

Closed

JoshRosen reviewed Jun 18, 2015
View reviewed changes

JoshRosen reviewed Jun 19, 2015
View reviewed changes

liancheng mentioned this pull request Jun 28, 2015

[SPARK-7845] [BUILD] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1 #7062

Closed

vchrombie mentioned this pull request Sep 1, 2021

max-items is just used for issues, but not for others chaoss/grimoirelab-perceval#755

Closed


		test_modules = set(test_modules)

		hive_profiles = ("SQL" in test_modules)

[SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python #5694

[SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python #5694

Conversation

brennonyork commented Apr 24, 2015

SparkQA commented Apr 25, 2015

SparkQA commented Apr 25, 2015

Choose a reason for hiding this comment

nchammas commented Apr 25, 2015

Choose a reason for hiding this comment

rxin commented Apr 25, 2015

Choose a reason for hiding this comment

nchammas commented Apr 26, 2015

Choose a reason for hiding this comment

brennonyork commented Apr 27, 2015

nchammas commented Apr 27, 2015

brennonyork commented Apr 27, 2015

nchammas commented Apr 27, 2015

brennonyork commented Jun 16, 2015

AmplabJenkins commented Jun 16, 2015

AmplabJenkins commented Jun 16, 2015

SparkQA commented Jun 16, 2015

SparkQA commented Jun 16, 2015

AmplabJenkins commented Jun 16, 2015

brennonyork commented Jun 16, 2015

AmplabJenkins commented Jun 16, 2015

AmplabJenkins commented Jun 16, 2015

SparkQA commented Jun 16, 2015

SparkQA commented Jun 16, 2015

AmplabJenkins commented Jun 16, 2015

SparkQA commented Jun 17, 2015

AmplabJenkins commented Jun 17, 2015

Choose a reason for hiding this comment

JoshRosen commented Jun 17, 2015

brennonyork commented Jun 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

declare the variable for reference

Choose a reason for hiding this comment

Choose a reason for hiding this comment