Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][ADRENO] Enhancements to Adreno specific CI utils #15991

Merged
merged 1 commit into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions tests/scripts/ci.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,8 @@ def docker(
env["SCCACHE_CACHE_SIZE"] = os.getenv("SCCACHE_CACHE_SIZE", "50G")
env["SCCACHE_SERVER_PORT"] = os.getenv("SCCACHE_SERVER_PORT", "4226")

env["PLATFORM"] = name

docker_bash = REPO_ROOT / "docker" / "bash.sh"

command = [docker_bash]
Expand Down
4 changes: 3 additions & 1 deletion tests/scripts/setup-pytest-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,9 @@ function run_pytest() {

has_reruns=$(python3 -m pytest --help 2>&1 | grep 'reruns=' || true)
if [ -n "$has_reruns" ]; then
extra_args+=('--reruns=3')
if [[ ! "${extra_args[*]}" == *"--reruns"* ]]; then
extra_args+=('--reruns=3')
fi
fi

suite_name="${test_suite_name}-${current_shard}-${ffi_type}"
Expand Down
21 changes: 18 additions & 3 deletions tests/scripts/task_python_adreno.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,18 +54,33 @@ adb forward tcp:5002 tcp:5002
env adb shell "cd ${TARGET_FOLDER}; killall -9 tvm_rpc-${USER}; sleep 2; LD_LIBRARY_PATH=${TARGET_FOLDER}/ ./tvm_rpc-${USER} server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}" &
DEVICE_PID=$!
sleep 5 # Wait for the device connections
trap "{ kill ${TRACKER_PID}; kill ${DEVICE_PID}; }" 0
trap "{ kill ${TRACKER_PID}; kill ${DEVICE_PID}; cleanup; }" 0

# cleanup pycache
find . -type f -path "*.pyc" | xargs rm -f
# Test TVM
make cython3

# The RPC to remote Android device has issue of hang after few tests with in CI environments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! This workaround should be good if it does not significantly increase the overall CI time. It would be great to identify the root cause of the hang issue as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, individual runs increase the times significantly that too running on remote device.

I did a bit of a debug on this and it waits infinitely on socket read on host where as device has nothing to write.
Didn't perused further at the moment. I will definitely revisit the RPC issue a bit later.

# Lets run them individually on fresh rpc session.
# OpenCL texture test on Adreno
run_pytest ctypes ${TVM_INTEGRATION_TESTSUITE_NAME}-opencl-texture tests/python/relay/opencl_texture
TEXTURE_TESTS=$(./ci/scripts/jenkins/pytest_ids.py --folder tests/python/relay/opencl_texture)
i=0
IFS=$'\n'
for node_id in $TEXTURE_TESTS; do
echo "$node_id"
run_pytest ctypes "$TVM_INTEGRATION_TESTSUITE_NAME-opencl-texture-$i" "$node_id" --reruns=0
i=$((i+1))
done

# Adreno CLML test
run_pytest ctypes ${TVM_INTEGRATION_TESTSUITE_NAME}-openclml tests/python/contrib/test_clml
CLML_TESTS=$(./ci/scripts/jenkins/pytest_ids.py --folder tests/python/contrib/test_clml)
i=0
for node_id in $CLML_TESTS; do
echo "$node_id"
run_pytest ctypes "$TVM_INTEGRATION_TESTSUITE_NAME-openclml-$i" "$node_id" --reruns=0
i=$((i+1))
done

kill ${TRACKER_PID}
kill ${DEVICE_PID}