[Tracking][Vulkan] Extending topi/relay tests to run on Vulkan #8903

Lunderberg · 2021-09-01T16:51:55Z

Summary

Currently, some unit tests fail when running on the Vulkan runtime. PRs #8903 and #8947 parametrized the tests that are currently failing, so that the vulkan target can be marked as xfail without impacting any other runtimes. The Vulkan runtime should be improved so that these unit tests can pass on vulkan as well.

Status

File	Test	Parameters	Failure Step	Observed on	Status	PR
test_topi_math.py	test_ewise	topi_name="tan"	Codegen	NVIDIA/AMD	TODO
test_topi_math.py	test_ewise	topi_name="erf"	Codegen	NVIDIA/AMD	TODO
test_topi_math.py	test_ewise	topi_name="isnan"	Codegen	NVIDIA/AMD	TODO
test_topi_math.py	test_ewise	topi_name="isfinite"	Codegen	NVIDIA/AMD	TODO
test_topi_math.py	test_ewise	topi_name="isinf"	Codegen	NVIDIA/AMD	TODO
test_topi_reduce.py	test_reduce_map	reduce_type="sum"	Codegen	NVIDIA/AMD	TODO
test_topi_reduce.py	test_reduce_map	reduce_type="any"	Codegen	NVIDIA/AMD	TODO
test_topi_reduce.py	test_reduce_map	reduce_type="all"	Codegen	NVIDIA/AMD	TODO
test_topi_vision.py	test_proposal		Codegen	NVIDIA/AMD	TODO
test_topi_conv1d_transpose	test_conv1d_transpose_ncw		Numeric Output	NVIDIA only	TODO
test_topi_softmax.py	test_softmax	dtype="float64"	Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_vm.py	test_cond		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_vm.py	test_simple_if		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_level4.py	test_reduce_functions		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_level3.py	test_sparse_reshape		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_any.py	test_any_reduce		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_level5.py	TestResize1D		Numeric Output	NVIDIA/AMD	TODO
tests/python/relay/test_op_level5.py	TestResize2D		Numeric Output	NVIDIA/AMD	TODO
tests/python/relay/test_op_level5.py	TestCropAndResize		Numeric Output	NVIDIA only	TODO
tests/python/relay/test_op_level3.py	test_take		Numeric Output	NVIDIA only	TODO
tests/python/relay/test_op_level2.py	test_conv2d_run		Codegen	NVIDIA/AMD	Fixed	#9014
tests/python/relay/test_op_level3.py	test_segment_sum		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_level3.py	test_scatter_add		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_level1.py	test_unary_op	relay_op=erf	Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_level1.py	test_unary_op	relay_op=tan	Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_level1.py	test_unary_op	relay_op=atan	Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_grad_level10.py	test_cross_entropy_grad		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_grad_level1.py	test_log_softmax_grad		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_grad_level1.py	test_softmax_grad		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_op_grad_level1.py	test_unary_op	Several	Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_any.py	test_any_batch_matmul		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_any.py	test_any_conv2d_NCHWc		Codegen	NVIDIA/AMD	TODO
tests/python/relay/test_any.py	test_any_dense		Codegen	NVIDIA/AMD	TODO

…lures - Parametrized topi modules - test_topi_conv1d_transpose_ncw.py - test_topi_conv2d_nhwc.py - test_topi_correlation.py - test_topi_loss.py - test_topi_math.py - test_topi_reduce.py - test_topi_softmax.py - test_topi_sort.py - test_topi_unique.py - test_topi_vision.py - Unit Tests fixed - `test_topi_loss::test_nll_loss`, failure due to `supports_float64` not being passed from the target to the codegen. - Known Vulkan failures (tracked in apache#8903) - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"] Unimplemented CallNode operations - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"] Fails during codegen, unexpected size of data type. - test_topi_vision.py::test_proposal Marked test_proposal as xfail on vulkan, currently has a type error between bool/int8. - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw Incorrect numeric output, a few elements outside of allowed tolerance, only occurs on vulkan backend. - test_softmax.py::test_softmax Marked float64 operations as xfail in vulkan, because GLSL.std.450 only supports 16/32-bit floats.

Lunderberg · 2021-09-01T17:02:54Z

@mbrookhart Regarding your comments that several of the failing unit tests had run correctly on vulkan in the past, the main breaking point was in #8127, which reads the device parameters from the physical device when the target is "vulkan -from_device=0". Several of the unit tests had a hard-coded target of "vulkan", tried to run with the minimum vulkan capabilities, and failed at codegen because the capability requested (e.g. 64-bit float support) wasn't listed in the target. Those fixes came along for free by parametrizing the topi tests, since the default vulkan test target uses the device query.

That said, at some point I want to ensure all tests either run correctly or have an appropriate xfail for the minimum vulkan feature set, but that will be a different issue.

…lures - Parametrized topi modules - test_topi_conv1d_transpose_ncw.py - test_topi_conv2d_nhwc.py - test_topi_correlation.py - test_topi_loss.py - test_topi_math.py - test_topi_reduce.py - test_topi_softmax.py - test_topi_sort.py - test_topi_unique.py - test_topi_vision.py - Unit Tests fixed - `test_topi_loss::test_nll_loss`, failure due to `supports_float64` not being passed from the target to the codegen. - Known Vulkan failures (tracked in apache#8903) - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"] Unimplemented CallNode operations - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"] Fails during codegen, unexpected size of data type. - test_topi_vision.py::test_proposal Marked test_proposal as xfail on vulkan, currently has a type error between bool/int8. - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw Incorrect numeric output, a few elements outside of allowed tolerance, only occurs on vulkan backend. - test_softmax.py::test_softmax Marked float64 operations as xfail in vulkan, because GLSL.std.450 only supports 16/32-bit floats.

masahi · 2021-09-01T22:13:35Z

This result is on a NV driver, or do they also fail on AMD?

…ilures (#8904) * [Pytest] Fixed TestTargetAutoParametrization in cases where LLVM is disabled. * [UnitTests][Vulkan] Improved robustness of test_tir_intrin::test_clz Previously, would fail during build since support for Int64 primitives wasn't declared in the `"vulkan"` target. Now, uses `"vulkan -from_device=0"` target and marks the test as xfail if the current target doesn't support Int64. * [UnitTest][Topi] Parametrized several unit tests, identify vulkan failures - Parametrized topi modules - test_topi_conv1d_transpose_ncw.py - test_topi_conv2d_nhwc.py - test_topi_correlation.py - test_topi_loss.py - test_topi_math.py - test_topi_reduce.py - test_topi_softmax.py - test_topi_sort.py - test_topi_unique.py - test_topi_vision.py - Unit Tests fixed - `test_topi_loss::test_nll_loss`, failure due to `supports_float64` not being passed from the target to the codegen. - Known Vulkan failures (tracked in #8903) - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"] Unimplemented CallNode operations - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"] Fails during codegen, unexpected size of data type. - test_topi_vision.py::test_proposal Marked test_proposal as xfail on vulkan, currently has a type error between bool/int8. - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw Incorrect numeric output, a few elements outside of allowed tolerance, only occurs on vulkan backend. - test_softmax.py::test_softmax Marked float64 operations as xfail in vulkan, because GLSL.std.450 only supports 16/32-bit floats.

Lunderberg · 2021-09-02T13:33:01Z

Thank you for checking, and all except the test_conv1d_transpose_ncw occur on AMD as well. It's the only one that is a numerical failure, while the rest of errors that occur during codegen. I'll update the table with that information.

This commit allows the relay test suite to be run targeting Vulkan with `TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`. All tests that require a specific environment are skipped if that environment isn't present. All tests that are known to fail when running on Vulkan are marked as expected failure, and will be tracked in apache#8903. - Failures during code generation - Type mismatches, boolean vs int8 - tests/python/relay/test_any.py::test_any_reduce - tests/python/relay/test_op_level3.py::test_sparse_reshape - tests/python/relay/test_op_level4.py::test_reduce_functions - tests/python/relay/test_vm.py::test_cond - tests/python/relay/test_vm.py::test_simple_if - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout - tests/python/relay/test_op_level2.py::test_conv2d_run - Unresolved CallNode operation - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan] - tests/python/relay/test_op_level3.py::test_scatter_add - tests/python/relay/test_op_level3.py::test_segment_sum - Generates 64-bit calls to GLSL that have only 16-/32-bit support - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad - tests/python/relay/test_op_grad_level1.py::test_softmax_grad - tests/python/relay/test_op_grad_level1.py::test_unary_op - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad - Codegen raises error for variable size - tests/python/relay/test_any.py::test_any_batch_matmul - tests/python/relay/test_any.py::test_any_conv2d_NCHWc - tests/python/relay/test_any.py::test_any_dense - Failures when running - Numeric differences (observed on GTX 1650 with NVIDIA driver) - tests/python/relay/test_op_level3.py::test_take - tests/python/relay/test_op_level5.py::TestCropAndResize - tests/python/relay/test_op_level5.py::TestResize1D - tests/python/relay/test_op_level5.py::TestResize2D

Lunderberg · 2021-09-07T13:58:58Z

Following #8947 , added the failing relay tests to the tracking issue.

This commit allows the relay test suite to be run targeting Vulkan with `TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`. All tests that require a specific environment are skipped if that environment isn't present. All tests that are known to fail when running on Vulkan are marked as expected failure, and will be tracked in apache#8903. - Failures during code generation - Type mismatches, boolean vs int8 - tests/python/relay/test_any.py::test_any_reduce - tests/python/relay/test_op_level3.py::test_sparse_reshape - tests/python/relay/test_op_level4.py::test_reduce_functions - tests/python/relay/test_vm.py::test_cond - tests/python/relay/test_vm.py::test_simple_if - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout - tests/python/relay/test_op_level2.py::test_conv2d_run - Unresolved CallNode operation - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan] - tests/python/relay/test_op_level3.py::test_scatter_add - tests/python/relay/test_op_level3.py::test_segment_sum - Generates 64-bit calls to GLSL that have only 16-/32-bit support - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad - tests/python/relay/test_op_grad_level1.py::test_softmax_grad - tests/python/relay/test_op_grad_level1.py::test_unary_op - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad - Codegen raises error for variable size - tests/python/relay/test_any.py::test_any_batch_matmul - tests/python/relay/test_any.py::test_any_conv2d_NCHWc - tests/python/relay/test_any.py::test_any_dense - Failures when running - Numeric differences (observed on GTX 1650 with NVIDIA driver) - tests/python/relay/test_op_level3.py::test_take - tests/python/relay/test_op_level5.py::TestCropAndResize - tests/python/relay/test_op_level5.py::TestResize1D - tests/python/relay/test_op_level5.py::TestResize2D

* [UnitTest] Added ids argument to tvm.testing.parameters This matches the usage in `tvm.testing.parameter`, and allows for parameter sets to be referred to by a single name. * [Pytest] Fixed ordering issue of tvm.testing.parametrize_targets and known_failing_targets If an explicit list of targets is given, then the `known_failing_targets` decorator would fail to apply. This commit resolves the issue, and cleans up all target-specific marks to apply in `tvm.testing.plugin._add_target_specific_marks`. * [UnitTest][Vulkan] Runnable relay unit tests on Vulkan This commit allows the relay test suite to be run targeting Vulkan with `TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`. All tests that require a specific environment are skipped if that environment isn't present. All tests that are known to fail when running on Vulkan are marked as expected failure, and will be tracked in #8903. - Failures during code generation - Type mismatches, boolean vs int8 - tests/python/relay/test_any.py::test_any_reduce - tests/python/relay/test_op_level3.py::test_sparse_reshape - tests/python/relay/test_op_level4.py::test_reduce_functions - tests/python/relay/test_vm.py::test_cond - tests/python/relay/test_vm.py::test_simple_if - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout - tests/python/relay/test_op_level2.py::test_conv2d_run - Unresolved CallNode operation - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan] - tests/python/relay/test_op_level3.py::test_scatter_add - tests/python/relay/test_op_level3.py::test_segment_sum - Generates 64-bit calls to GLSL that have only 16-/32-bit support - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad - tests/python/relay/test_op_grad_level1.py::test_softmax_grad - tests/python/relay/test_op_grad_level1.py::test_unary_op - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad - Codegen raises error for variable size - tests/python/relay/test_any.py::test_any_batch_matmul - tests/python/relay/test_any.py::test_any_conv2d_NCHWc - tests/python/relay/test_any.py::test_any_dense - Failures when running - Numeric differences (observed on GTX 1650 with NVIDIA driver) - tests/python/relay/test_op_level3.py::test_take - tests/python/relay/test_op_level5.py::TestCropAndResize - tests/python/relay/test_op_level5.py::TestResize1D - tests/python/relay/test_op_level5.py::TestResize2D

masahi · 2021-09-13T11:37:10Z

@Lunderberg Are these two test cases any different? One has pytest.xfail("Known failing test for vulkan") but not for the other.

tvm/tests/python/relay/test_op_level2.py

Line 199 in 548675f

class TestConv2D:

tvm/tests/python/relay/test_op_level2.py

Line 360 in 548675f

def test_conv2d_run(target, dev):

Lunderberg · 2021-09-13T13:24:40Z

Thank you for that catch. When refactoring the tests in #8947, I added the updated version of test_conv2d_run, but didn't remove the original. I have #8993 open to remove the redundant test_conv2d_run, and have double-checked that there aren't any others that snuck in.

masahi · 2021-09-15T03:57:15Z

@Lunderberg The last three items in test_any.py are not specific to vulkan (fails on cuda as well), so I think we should drop them from the list.

They don't work on gpu targets since we don't support dynamic height or width in conv2d, for example.

…ilures (apache#8904) * [Pytest] Fixed TestTargetAutoParametrization in cases where LLVM is disabled. * [UnitTests][Vulkan] Improved robustness of test_tir_intrin::test_clz Previously, would fail during build since support for Int64 primitives wasn't declared in the `"vulkan"` target. Now, uses `"vulkan -from_device=0"` target and marks the test as xfail if the current target doesn't support Int64. * [UnitTest][Topi] Parametrized several unit tests, identify vulkan failures - Parametrized topi modules - test_topi_conv1d_transpose_ncw.py - test_topi_conv2d_nhwc.py - test_topi_correlation.py - test_topi_loss.py - test_topi_math.py - test_topi_reduce.py - test_topi_softmax.py - test_topi_sort.py - test_topi_unique.py - test_topi_vision.py - Unit Tests fixed - `test_topi_loss::test_nll_loss`, failure due to `supports_float64` not being passed from the target to the codegen. - Known Vulkan failures (tracked in apache#8903) - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"] Unimplemented CallNode operations - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"] Fails during codegen, unexpected size of data type. - test_topi_vision.py::test_proposal Marked test_proposal as xfail on vulkan, currently has a type error between bool/int8. - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw Incorrect numeric output, a few elements outside of allowed tolerance, only occurs on vulkan backend. - test_softmax.py::test_softmax Marked float64 operations as xfail in vulkan, because GLSL.std.450 only supports 16/32-bit floats.

Lunderberg mentioned this issue Sep 1, 2021

[Vulkan][Topi] Parametrizing additional topi tests, marking vulkan failures #8904

Merged

Lunderberg mentioned this issue Sep 7, 2021

[UnitTest][Vulkan] Runnable relay unit tests on Vulkan #8947

Merged

Lunderberg changed the title ~~[Tracking][Vulkan] Extending topi/unit tests to run on Vulkan~~ [Tracking][Vulkan] Extending topi/relay tests to run on Vulkan Sep 7, 2021

masahi mentioned this issue Sep 15, 2021

[Strategy] Disable cuda int8 schedule for non-cuda gpu target #9014

Merged

areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking][Vulkan] Extending topi/relay tests to run on Vulkan #8903

[Tracking][Vulkan] Extending topi/relay tests to run on Vulkan #8903

Lunderberg commented Sep 1, 2021 •

edited by masahi

Loading

Lunderberg commented Sep 1, 2021

masahi commented Sep 1, 2021

Lunderberg commented Sep 2, 2021

Lunderberg commented Sep 7, 2021

masahi commented Sep 13, 2021

Lunderberg commented Sep 13, 2021

masahi commented Sep 15, 2021

[Tracking][Vulkan] Extending topi/relay tests to run on Vulkan #8903

[Tracking][Vulkan] Extending topi/relay tests to run on Vulkan #8903

Comments

Lunderberg commented Sep 1, 2021 • edited by masahi Loading

Summary

Status

Lunderberg commented Sep 1, 2021

masahi commented Sep 1, 2021

Lunderberg commented Sep 2, 2021

Lunderberg commented Sep 7, 2021

masahi commented Sep 13, 2021

Lunderberg commented Sep 13, 2021

masahi commented Sep 15, 2021

Lunderberg commented Sep 1, 2021 •

edited by masahi

Loading