Enable pytest-xdist for TVM CI #8576

areusch · 2021-07-28T17:31:11Z

Enable pytest-xdist to paralellize CI jobs on worker nodes. The thought here is: right now our concurrency model in CI is to keep with 1 CPU to reduce debugging headache and run $(nproc) workers per CI CPU node. Meanwhile our CI is heterogenous so a long ci_cpu-bound step will negatively affect the overall runtime of all PRs. Going the other way, the idea is to run each CI job as fast as possible and let jobs pile into queues where we can see more clearly the bottlenecks. This means that when the queues are drained, devs get faster response times from the CI, and the CPU nodes should still be used optimally (or perhaps get a slight boost since caching may work better with fewer workloads).

Restricting to 2 CPUs since this is a test; in the future, CI_PYTEST_NUM_CPUS should be used to actually control from Jenkinsfile.

cc @tqchen @jroesch @Lunderberg

Lunderberg

Changes here look good overall, just a couple of documentation/ordering questions.

Lunderberg · 2021-07-28T19:23:41Z

tests/scripts/setup-pytest-env.sh

 function run_pytest() {
+    local extra_args=( )
+    if [ "$1" == "--parallel" ]; then


Since the documentation currently recommends using task_python_unittest.sh to run the unit tests, we should either make sure that location also has pytest-xdist in the list of packages to install in order to run the tests, or check whether pytest-xdist is installed by wrapping this in if python3 -c "import xdist" > /dev/null 2>&1; then.

Lunderberg · 2021-07-28T19:29:04Z

tests/scripts/task_python_unittest.sh

@@ -31,9 +31,9 @@ if [ -z "${TVM_UNITTEST_TESTSUITE_NAME:-}" ]; then
 fi

 # First run minimal test on both ctypes and cython.
-run_pytest ctypes ${TVM_UNITTEST_TESTSUITE_NAME}-platform-minimal-test tests/python/all-platform-minimal-test
-run_pytest cython ${TVM_UNITTEST_TESTSUITE_NAME}-platform-minimal-test tests/python/all-platform-minimal-test
+run_pytest --parallel ctypes ${TVM_UNITTEST_TESTSUITE_NAME}-platform-minimal-test tests/python/all-platform-minimal-test


Looks like this applies to everything in the unittest directory, including test_tvm_testing_features.py. I don't see the explicit ordering that will be required for some of those tests, so we should add it.

Confirmed this change is in place, looks good to me.

Lunderberg · 2021-07-29T19:29:24Z

python/tvm/testing.py

@@ -379,7 +379,7 @@ def _get_targets(target_str=None):
    if len(target_str) == 0:
        target_str = DEFAULT_TEST_TARGETS

-    target_names = set(t.strip() for t in target_str.split(";") if t.strip())
+    target_names = list(sorted(set(t.strip() for t in target_str.split(";") if t.strip())))


A thought from this morning. Do we want to have the list in sorted order, or in the order specified by target_str? Either way would be deterministic, but I'd lean toward the latter so that we can have explicit control over the order of targets. We can get this behavior if we use a dict instead of a set for removing duplicates.

target_names = list({t.strip():None for t in target_str.split(';') if t.strip()})

ah yeah probably in order of target_str

jroesch

LGTM

tests/scripts/task_python_unittest.sh

tqchen · 2021-08-12T00:04:33Z

vta/python/vta/__init__.py

+
+# do not from tvm import topi when __main__ is in vta_onboard
+# to maintain minimum Python dependencies on the board.
+# TODO(vta-team): move board-only logic imported above into vta_onboard.


The introduction of a separate vta_board package is likely no longer needed after my update https://github.com/apache/tvm/blob/main/vta/python/vta/__init__.py#L36.

We should be able to use the original vta as it is. Would be great to do so if that is the case

ah oops this was a bad merge, fixing

* yeah we're gonna have to think about testing this approach.

manupak

This is really good work @areusch .
I had a small question.

manupak · 2021-10-20T17:47:54Z

tests/scripts/setup-pytest-env.sh

+        PYTEST_NUM_CPUS=$(expr ${PYTEST_NUM_CPUS} - 1)  # Don't nuke interactive work.
+    fi
+
+    # Don't use >4 CPUs--in general, we only use 4 CPUs in testing, so we want to retain this


Why 4 ?

Also would it be possible for us to specialize run_pytest to for certain subset of tests that we think could benefit from this. i.e. --parallel <some_number> ?

4 just relates to how we currently allocate Jenkins executors to CI nodes. what i really want to do is make it possible to use $(nproc) here always (so that each node is maximally used). however, initial tests indicate that there is a threshold after which nodes become RAM-limited.

can you clarify your second question? how would we leverage this better in the subset?

masahi · 2022-01-09T22:25:11Z

I assume this is superceded by #9834

Lunderberg requested changes Jul 28, 2021

View reviewed changes

Lunderberg reviewed Jul 29, 2021

View reviewed changes

areusch force-pushed the pytest-xdist branch 4 times, most recently from dfa2dd8 to 37c39f9 Compare August 2, 2021 22:34

jroesch approved these changes Aug 3, 2021

View reviewed changes

areusch mentioned this pull request Aug 6, 2021

[CI] Determine which gpu tests, if any, can be parallelized, and strategy to do so #8675

Closed

areusch marked this pull request as ready for review August 10, 2021 16:22

areusch requested review from comaniac, junrushao, merrymercy, tmoreau89, tqchen, vegaluisjose, yzhliu, zhiics and a team as code owners August 10, 2021 16:22

leandron reviewed Aug 11, 2021

View reviewed changes

tests/scripts/task_python_unittest.sh Show resolved Hide resolved

leandron approved these changes Aug 11, 2021

View reviewed changes

tqchen reviewed Aug 12, 2021

View reviewed changes

areusch force-pushed the pytest-xdist branch from 58b8a43 to e80ec7f Compare August 12, 2021 19:59

areusch added 8 commits September 15, 2021 13:31

Try pytest-xdist (constrained to 2 CPU max for CI)

5430415

serialize test_tvm_testing_features

a0b17e1

fix unbound local and only run --parallel for build and CPU integration

daa6849

fix typo

43b9f31

why is it running so many tests?

86a6a78

Fix using nvcc from xdist and also whenever stdin is closed :|

77a870b

rename scheduler

870e999

commit num cpus hook

8ad193a

areusch and others added 14 commits September 15, 2021 13:57

black format

4978c90

EXECUTOR_NUMBER is indeed 0-based

cba9358

Use all available ARM cpus

4166f90

actually use --cpuset-cpus...

e853b6a

* yeah we're gonna have to think about testing this approach.

hardcode build -j<n>

7261c3e

uncomment cmake

c9e1709

remove -j flag from Jenkinsfile since it is useless now

aea9e1e

clean up dockerfile

3b76e38

clean up task_build

1ba5b0a

fix empty string case

8130e8b

fix again

1e0660f

is prod Jenkins too old to set CI=???

18424f3

black

392ab8d

Fix lint again

5a1aa02

areusch force-pushed the pytest-xdist branch from d4b95b1 to 5a1aa02 Compare September 20, 2021 22:30

Add pytest timeout

935c520

manupak reviewed Oct 20, 2021

View reviewed changes

masahi closed this Jan 9, 2022

driazati mentioned this pull request Aug 12, 2022

[skip ci] Revert "[ci] Default to n=2 for test parallelism (#12376)" #12413

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable pytest-xdist for TVM CI #8576

Enable pytest-xdist for TVM CI #8576

areusch commented Jul 28, 2021

Lunderberg left a comment

Lunderberg Jul 28, 2021

Lunderberg Jul 28, 2021

Lunderberg Aug 3, 2021

Lunderberg Jul 29, 2021

areusch Jul 29, 2021

jroesch left a comment

tqchen Aug 12, 2021 •

edited

Loading

areusch Aug 12, 2021

manupak left a comment

manupak Oct 20, 2021

areusch Nov 9, 2021

masahi commented Jan 9, 2022

Enable pytest-xdist for TVM CI #8576

Enable pytest-xdist for TVM CI #8576

Conversation

areusch commented Jul 28, 2021

Lunderberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jroesch left a comment

Choose a reason for hiding this comment

tqchen Aug 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi commented Jan 9, 2022

tqchen Aug 12, 2021 •

edited

Loading