Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable pytest-xdist for TVM CI #8576

Closed
wants to merge 23 commits into from
Closed

Conversation

areusch
Copy link
Contributor

@areusch areusch commented Jul 28, 2021

Enable pytest-xdist to paralellize CI jobs on worker nodes. The thought here is: right now our concurrency model in CI is to keep with 1 CPU to reduce debugging headache and run $(nproc) workers per CI CPU node. Meanwhile our CI is heterogenous so a long ci_cpu-bound step will negatively affect the overall runtime of all PRs. Going the other way, the idea is to run each CI job as fast as possible and let jobs pile into queues where we can see more clearly the bottlenecks. This means that when the queues are drained, devs get faster response times from the CI, and the CPU nodes should still be used optimally (or perhaps get a slight boost since caching may work better with fewer workloads).

Restricting to 2 CPUs since this is a test; in the future, CI_PYTEST_NUM_CPUS should be used to actually control from Jenkinsfile.

cc @tqchen @jroesch @Lunderberg

Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here look good overall, just a couple of documentation/ordering questions.

function run_pytest() {
local extra_args=( )
if [ "$1" == "--parallel" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the documentation currently recommends using task_python_unittest.sh to run the unit tests, we should either make sure that location also has pytest-xdist in the list of packages to install in order to run the tests, or check whether pytest-xdist is installed by wrapping this in if python3 -c "import xdist" > /dev/null 2>&1; then.

@@ -31,9 +31,9 @@ if [ -z "${TVM_UNITTEST_TESTSUITE_NAME:-}" ]; then
fi

# First run minimal test on both ctypes and cython.
run_pytest ctypes ${TVM_UNITTEST_TESTSUITE_NAME}-platform-minimal-test tests/python/all-platform-minimal-test
run_pytest cython ${TVM_UNITTEST_TESTSUITE_NAME}-platform-minimal-test tests/python/all-platform-minimal-test
run_pytest --parallel ctypes ${TVM_UNITTEST_TESTSUITE_NAME}-platform-minimal-test tests/python/all-platform-minimal-test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this applies to everything in the unittest directory, including test_tvm_testing_features.py. I don't see the explicit ordering that will be required for some of those tests, so we should add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this change is in place, looks good to me.

@@ -379,7 +379,7 @@ def _get_targets(target_str=None):
if len(target_str) == 0:
target_str = DEFAULT_TEST_TARGETS

target_names = set(t.strip() for t in target_str.split(";") if t.strip())
target_names = list(sorted(set(t.strip() for t in target_str.split(";") if t.strip())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought from this morning. Do we want to have the list in sorted order, or in the order specified by target_str? Either way would be deterministic, but I'd lean toward the latter so that we can have explicit control over the order of targets. We can get this behavior if we use a dict instead of a set for removing duplicates.

target_names = list({t.strip():None for t in target_str.split(';') if t.strip()})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah probably in order of target_str

@areusch areusch force-pushed the pytest-xdist branch 4 times, most recently from dfa2dd8 to 37c39f9 Compare August 2, 2021 22:34
Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


# do not from tvm import topi when __main__ is in vta_onboard
# to maintain minimum Python dependencies on the board.
# TODO(vta-team): move board-only logic imported above into vta_onboard.
Copy link
Member

@tqchen tqchen Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of a separate vta_board package is likely no longer needed after my update https://github.com/apache/tvm/blob/main/vta/python/vta/__init__.py#L36.

We should be able to use the original vta as it is. Would be great to do so if that is the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah oops this was a bad merge, fixing

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good work @areusch .
I had a small question.

PYTEST_NUM_CPUS=$(expr ${PYTEST_NUM_CPUS} - 1) # Don't nuke interactive work.
fi

# Don't use >4 CPUs--in general, we only use 4 CPUs in testing, so we want to retain this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4 ?

Also would it be possible for us to specialize run_pytest to for certain subset of tests that we think could benefit from this. i.e. --parallel <some_number> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 just relates to how we currently allocate Jenkins executors to CI nodes. what i really want to do is make it possible to use $(nproc) here always (so that each node is maximally used). however, initial tests indicate that there is a threshold after which nodes become RAM-limited.

can you clarify your second question? how would we leverage this better in the subset?

@masahi
Copy link
Member

masahi commented Jan 9, 2022

I assume this is superceded by #9834

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants