Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking][Vulkan] Extending topi/relay tests to run on Vulkan #8903

Open
Lunderberg opened this issue Sep 1, 2021 · 7 comments
Open

[Tracking][Vulkan] Extending topi/relay tests to run on Vulkan #8903

Lunderberg opened this issue Sep 1, 2021 · 7 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it

Comments

@Lunderberg
Copy link
Contributor

Lunderberg commented Sep 1, 2021

Summary

Currently, some unit tests fail when running on the Vulkan runtime. PRs #8903 and #8947 parametrized the tests that are currently failing, so that the vulkan target can be marked as xfail without impacting any other runtimes. The Vulkan runtime should be improved so that these unit tests can pass on vulkan as well.

Status

File Test Parameters Failure Step Observed on Status Owner PR
test_topi_math.py test_ewise topi_name="tan" Codegen NVIDIA/AMD TODO
test_topi_math.py test_ewise topi_name="erf" Codegen NVIDIA/AMD TODO
test_topi_math.py test_ewise topi_name="isnan" Codegen NVIDIA/AMD TODO
test_topi_math.py test_ewise topi_name="isfinite" Codegen NVIDIA/AMD TODO
test_topi_math.py test_ewise topi_name="isinf" Codegen NVIDIA/AMD TODO
test_topi_reduce.py test_reduce_map reduce_type="sum" Codegen NVIDIA/AMD TODO
test_topi_reduce.py test_reduce_map reduce_type="any" Codegen NVIDIA/AMD TODO
test_topi_reduce.py test_reduce_map reduce_type="all" Codegen NVIDIA/AMD TODO
test_topi_vision.py test_proposal Codegen NVIDIA/AMD TODO
test_topi_conv1d_transpose test_conv1d_transpose_ncw Numeric Output NVIDIA only TODO
test_topi_softmax.py test_softmax dtype="float64" Codegen NVIDIA/AMD TODO
tests/python/relay/test_vm.py test_cond Codegen NVIDIA/AMD TODO
tests/python/relay/test_vm.py test_simple_if Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_level4.py test_reduce_functions Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_level3.py test_sparse_reshape Codegen NVIDIA/AMD TODO
tests/python/relay/test_any.py test_any_reduce Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_level5.py TestResize1D Numeric Output NVIDIA/AMD TODO
tests/python/relay/test_op_level5.py TestResize2D Numeric Output NVIDIA/AMD TODO
tests/python/relay/test_op_level5.py TestCropAndResize Numeric Output NVIDIA only TODO
tests/python/relay/test_op_level3.py test_take Numeric Output NVIDIA only TODO
tests/python/relay/test_op_level2.py test_conv2d_run Codegen NVIDIA/AMD Fixed #9014
tests/python/relay/test_op_level3.py test_segment_sum Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_level3.py test_scatter_add Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_level1.py test_unary_op relay_op=erf Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_level1.py test_unary_op relay_op=tan Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_level1.py test_unary_op relay_op=atan Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_grad_level10.py test_cross_entropy_grad Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_grad_level1.py test_log_softmax_grad Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_grad_level1.py test_softmax_grad Codegen NVIDIA/AMD TODO
tests/python/relay/test_op_grad_level1.py test_unary_op Several Codegen NVIDIA/AMD TODO
tests/python/relay/test_any.py test_any_batch_matmul Codegen NVIDIA/AMD TODO
tests/python/relay/test_any.py test_any_conv2d_NCHWc Codegen NVIDIA/AMD TODO
tests/python/relay/test_any.py test_any_dense Codegen NVIDIA/AMD TODO
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 1, 2021
…lures

- Parametrized topi modules
  - test_topi_conv1d_transpose_ncw.py
  - test_topi_conv2d_nhwc.py
  - test_topi_correlation.py
  - test_topi_loss.py
  - test_topi_math.py
  - test_topi_reduce.py
  - test_topi_softmax.py
  - test_topi_sort.py
  - test_topi_unique.py
  - test_topi_vision.py

- Unit Tests fixed

  - `test_topi_loss::test_nll_loss`, failure due to `supports_float64`
    not being passed from the target to the codegen.

- Known Vulkan failures (tracked in apache#8903)

  - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"]

    Unimplemented CallNode operations

  - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"]

    Fails during codegen, unexpected size of data type.

  - test_topi_vision.py::test_proposal

    Marked test_proposal as xfail on vulkan, currently has a type error
    between bool/int8.

  - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw

    Incorrect numeric output, a few elements outside of allowed
    tolerance, only occurs on vulkan backend.

  - test_softmax.py::test_softmax

    Marked float64 operations as xfail in vulkan, because GLSL.std.450
    only supports 16/32-bit floats.
@Lunderberg
Copy link
Contributor Author

@mbrookhart Regarding your comments that several of the failing unit tests had run correctly on vulkan in the past, the main breaking point was in #8127, which reads the device parameters from the physical device when the target is "vulkan -from_device=0". Several of the unit tests had a hard-coded target of "vulkan", tried to run with the minimum vulkan capabilities, and failed at codegen because the capability requested (e.g. 64-bit float support) wasn't listed in the target. Those fixes came along for free by parametrizing the topi tests, since the default vulkan test target uses the device query.

That said, at some point I want to ensure all tests either run correctly or have an appropriate xfail for the minimum vulkan feature set, but that will be a different issue.

Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 1, 2021
…lures

- Parametrized topi modules
  - test_topi_conv1d_transpose_ncw.py
  - test_topi_conv2d_nhwc.py
  - test_topi_correlation.py
  - test_topi_loss.py
  - test_topi_math.py
  - test_topi_reduce.py
  - test_topi_softmax.py
  - test_topi_sort.py
  - test_topi_unique.py
  - test_topi_vision.py

- Unit Tests fixed

  - `test_topi_loss::test_nll_loss`, failure due to `supports_float64`
    not being passed from the target to the codegen.

- Known Vulkan failures (tracked in apache#8903)

  - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"]

    Unimplemented CallNode operations

  - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"]

    Fails during codegen, unexpected size of data type.

  - test_topi_vision.py::test_proposal

    Marked test_proposal as xfail on vulkan, currently has a type error
    between bool/int8.

  - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw

    Incorrect numeric output, a few elements outside of allowed
    tolerance, only occurs on vulkan backend.

  - test_softmax.py::test_softmax

    Marked float64 operations as xfail in vulkan, because GLSL.std.450
    only supports 16/32-bit floats.
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 1, 2021
…lures

- Parametrized topi modules
  - test_topi_conv1d_transpose_ncw.py
  - test_topi_conv2d_nhwc.py
  - test_topi_correlation.py
  - test_topi_loss.py
  - test_topi_math.py
  - test_topi_reduce.py
  - test_topi_softmax.py
  - test_topi_sort.py
  - test_topi_unique.py
  - test_topi_vision.py

- Unit Tests fixed

  - `test_topi_loss::test_nll_loss`, failure due to `supports_float64`
    not being passed from the target to the codegen.

- Known Vulkan failures (tracked in apache#8903)

  - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"]

    Unimplemented CallNode operations

  - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"]

    Fails during codegen, unexpected size of data type.

  - test_topi_vision.py::test_proposal

    Marked test_proposal as xfail on vulkan, currently has a type error
    between bool/int8.

  - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw

    Incorrect numeric output, a few elements outside of allowed
    tolerance, only occurs on vulkan backend.

  - test_softmax.py::test_softmax

    Marked float64 operations as xfail in vulkan, because GLSL.std.450
    only supports 16/32-bit floats.
@masahi
Copy link
Member

masahi commented Sep 1, 2021

This result is on a NV driver, or do they also fail on AMD?

masahi pushed a commit that referenced this issue Sep 2, 2021
…ilures (#8904)

* [Pytest] Fixed TestTargetAutoParametrization in cases where LLVM is disabled.

* [UnitTests][Vulkan] Improved robustness of test_tir_intrin::test_clz

Previously, would fail during build since support for Int64 primitives wasn't
declared in the `"vulkan"` target.  Now, uses `"vulkan -from_device=0"` target
and marks the test as xfail if the current target doesn't support Int64.

* [UnitTest][Topi] Parametrized several unit tests, identify vulkan failures

- Parametrized topi modules
  - test_topi_conv1d_transpose_ncw.py
  - test_topi_conv2d_nhwc.py
  - test_topi_correlation.py
  - test_topi_loss.py
  - test_topi_math.py
  - test_topi_reduce.py
  - test_topi_softmax.py
  - test_topi_sort.py
  - test_topi_unique.py
  - test_topi_vision.py

- Unit Tests fixed

  - `test_topi_loss::test_nll_loss`, failure due to `supports_float64`
    not being passed from the target to the codegen.

- Known Vulkan failures (tracked in #8903)

  - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"]

    Unimplemented CallNode operations

  - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"]

    Fails during codegen, unexpected size of data type.

  - test_topi_vision.py::test_proposal

    Marked test_proposal as xfail on vulkan, currently has a type error
    between bool/int8.

  - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw

    Incorrect numeric output, a few elements outside of allowed
    tolerance, only occurs on vulkan backend.

  - test_softmax.py::test_softmax

    Marked float64 operations as xfail in vulkan, because GLSL.std.450
    only supports 16/32-bit floats.
@Lunderberg
Copy link
Contributor Author

Thank you for checking, and all except the test_conv1d_transpose_ncw occur on AMD as well. It's the only one that is a numerical failure, while the rest of errors that occur during codegen. I'll update the table with that information.

Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 7, 2021
This commit allows the relay test suite to be run targeting Vulkan with
`TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`.  All
tests that require a specific environment are skipped if that environment
isn't present.  All tests that are known to fail when running on Vulkan
are marked as expected failure, and will be tracked in
apache#8903.

- Failures during code generation
  - Type mismatches, boolean vs int8
    - tests/python/relay/test_any.py::test_any_reduce
    - tests/python/relay/test_op_level3.py::test_sparse_reshape
    - tests/python/relay/test_op_level4.py::test_reduce_functions
    - tests/python/relay/test_vm.py::test_cond
    - tests/python/relay/test_vm.py::test_simple_if

  - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout
    - tests/python/relay/test_op_level2.py::test_conv2d_run

  - Unresolved CallNode operation
    - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan]
    - tests/python/relay/test_op_level3.py::test_scatter_add
    - tests/python/relay/test_op_level3.py::test_segment_sum

  - Generates 64-bit calls to GLSL that have only 16-/32-bit support
    - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_unary_op
    - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad

  - Codegen raises error for variable size
    - tests/python/relay/test_any.py::test_any_batch_matmul
    - tests/python/relay/test_any.py::test_any_conv2d_NCHWc
    - tests/python/relay/test_any.py::test_any_dense

- Failures when running
  - Numeric differences (observed on GTX 1650 with NVIDIA driver)
    - tests/python/relay/test_op_level3.py::test_take
    - tests/python/relay/test_op_level5.py::TestCropAndResize
    - tests/python/relay/test_op_level5.py::TestResize1D
    - tests/python/relay/test_op_level5.py::TestResize2D
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 7, 2021
This commit allows the relay test suite to be run targeting Vulkan with
`TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`.  All
tests that require a specific environment are skipped if that environment
isn't present.  All tests that are known to fail when running on Vulkan
are marked as expected failure, and will be tracked in
apache#8903.

- Failures during code generation
  - Type mismatches, boolean vs int8
    - tests/python/relay/test_any.py::test_any_reduce
    - tests/python/relay/test_op_level3.py::test_sparse_reshape
    - tests/python/relay/test_op_level4.py::test_reduce_functions
    - tests/python/relay/test_vm.py::test_cond
    - tests/python/relay/test_vm.py::test_simple_if

  - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout
    - tests/python/relay/test_op_level2.py::test_conv2d_run

  - Unresolved CallNode operation
    - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan]
    - tests/python/relay/test_op_level3.py::test_scatter_add
    - tests/python/relay/test_op_level3.py::test_segment_sum

  - Generates 64-bit calls to GLSL that have only 16-/32-bit support
    - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_unary_op
    - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad

  - Codegen raises error for variable size
    - tests/python/relay/test_any.py::test_any_batch_matmul
    - tests/python/relay/test_any.py::test_any_conv2d_NCHWc
    - tests/python/relay/test_any.py::test_any_dense

- Failures when running
  - Numeric differences (observed on GTX 1650 with NVIDIA driver)
    - tests/python/relay/test_op_level3.py::test_take
    - tests/python/relay/test_op_level5.py::TestCropAndResize
    - tests/python/relay/test_op_level5.py::TestResize1D
    - tests/python/relay/test_op_level5.py::TestResize2D
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 7, 2021
This commit allows the relay test suite to be run targeting Vulkan with
`TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`.  All
tests that require a specific environment are skipped if that environment
isn't present.  All tests that are known to fail when running on Vulkan
are marked as expected failure, and will be tracked in
apache#8903.

- Failures during code generation
  - Type mismatches, boolean vs int8
    - tests/python/relay/test_any.py::test_any_reduce
    - tests/python/relay/test_op_level3.py::test_sparse_reshape
    - tests/python/relay/test_op_level4.py::test_reduce_functions
    - tests/python/relay/test_vm.py::test_cond
    - tests/python/relay/test_vm.py::test_simple_if

  - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout
    - tests/python/relay/test_op_level2.py::test_conv2d_run

  - Unresolved CallNode operation
    - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan]
    - tests/python/relay/test_op_level3.py::test_scatter_add
    - tests/python/relay/test_op_level3.py::test_segment_sum

  - Generates 64-bit calls to GLSL that have only 16-/32-bit support
    - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_unary_op
    - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad

  - Codegen raises error for variable size
    - tests/python/relay/test_any.py::test_any_batch_matmul
    - tests/python/relay/test_any.py::test_any_conv2d_NCHWc
    - tests/python/relay/test_any.py::test_any_dense

- Failures when running
  - Numeric differences (observed on GTX 1650 with NVIDIA driver)
    - tests/python/relay/test_op_level3.py::test_take
    - tests/python/relay/test_op_level5.py::TestCropAndResize
    - tests/python/relay/test_op_level5.py::TestResize1D
    - tests/python/relay/test_op_level5.py::TestResize2D
@Lunderberg
Copy link
Contributor Author

Following #8947 , added the failing relay tests to the tracking issue.

Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 7, 2021
This commit allows the relay test suite to be run targeting Vulkan with
`TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`.  All
tests that require a specific environment are skipped if that environment
isn't present.  All tests that are known to fail when running on Vulkan
are marked as expected failure, and will be tracked in
apache#8903.

- Failures during code generation
  - Type mismatches, boolean vs int8
    - tests/python/relay/test_any.py::test_any_reduce
    - tests/python/relay/test_op_level3.py::test_sparse_reshape
    - tests/python/relay/test_op_level4.py::test_reduce_functions
    - tests/python/relay/test_vm.py::test_cond
    - tests/python/relay/test_vm.py::test_simple_if

  - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout
    - tests/python/relay/test_op_level2.py::test_conv2d_run

  - Unresolved CallNode operation
    - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan]
    - tests/python/relay/test_op_level3.py::test_scatter_add
    - tests/python/relay/test_op_level3.py::test_segment_sum

  - Generates 64-bit calls to GLSL that have only 16-/32-bit support
    - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_unary_op
    - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad

  - Codegen raises error for variable size
    - tests/python/relay/test_any.py::test_any_batch_matmul
    - tests/python/relay/test_any.py::test_any_conv2d_NCHWc
    - tests/python/relay/test_any.py::test_any_dense

- Failures when running
  - Numeric differences (observed on GTX 1650 with NVIDIA driver)
    - tests/python/relay/test_op_level3.py::test_take
    - tests/python/relay/test_op_level5.py::TestCropAndResize
    - tests/python/relay/test_op_level5.py::TestResize1D
    - tests/python/relay/test_op_level5.py::TestResize2D
@Lunderberg Lunderberg changed the title [Tracking][Vulkan] Extending topi/unit tests to run on Vulkan [Tracking][Vulkan] Extending topi/relay tests to run on Vulkan Sep 7, 2021
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 7, 2021
This commit allows the relay test suite to be run targeting Vulkan with
`TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`.  All
tests that require a specific environment are skipped if that environment
isn't present.  All tests that are known to fail when running on Vulkan
are marked as expected failure, and will be tracked in
apache#8903.

- Failures during code generation
  - Type mismatches, boolean vs int8
    - tests/python/relay/test_any.py::test_any_reduce
    - tests/python/relay/test_op_level3.py::test_sparse_reshape
    - tests/python/relay/test_op_level4.py::test_reduce_functions
    - tests/python/relay/test_vm.py::test_cond
    - tests/python/relay/test_vm.py::test_simple_if

  - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout
    - tests/python/relay/test_op_level2.py::test_conv2d_run

  - Unresolved CallNode operation
    - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan]
    - tests/python/relay/test_op_level3.py::test_scatter_add
    - tests/python/relay/test_op_level3.py::test_segment_sum

  - Generates 64-bit calls to GLSL that have only 16-/32-bit support
    - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_unary_op
    - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad

  - Codegen raises error for variable size
    - tests/python/relay/test_any.py::test_any_batch_matmul
    - tests/python/relay/test_any.py::test_any_conv2d_NCHWc
    - tests/python/relay/test_any.py::test_any_dense

- Failures when running
  - Numeric differences (observed on GTX 1650 with NVIDIA driver)
    - tests/python/relay/test_op_level3.py::test_take
    - tests/python/relay/test_op_level5.py::TestCropAndResize
    - tests/python/relay/test_op_level5.py::TestResize1D
    - tests/python/relay/test_op_level5.py::TestResize2D
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Sep 7, 2021
This commit allows the relay test suite to be run targeting Vulkan with
`TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`.  All
tests that require a specific environment are skipped if that environment
isn't present.  All tests that are known to fail when running on Vulkan
are marked as expected failure, and will be tracked in
apache#8903.

- Failures during code generation
  - Type mismatches, boolean vs int8
    - tests/python/relay/test_any.py::test_any_reduce
    - tests/python/relay/test_op_level3.py::test_sparse_reshape
    - tests/python/relay/test_op_level4.py::test_reduce_functions
    - tests/python/relay/test_vm.py::test_cond
    - tests/python/relay/test_vm.py::test_simple_if

  - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout
    - tests/python/relay/test_op_level2.py::test_conv2d_run

  - Unresolved CallNode operation
    - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan]
    - tests/python/relay/test_op_level3.py::test_scatter_add
    - tests/python/relay/test_op_level3.py::test_segment_sum

  - Generates 64-bit calls to GLSL that have only 16-/32-bit support
    - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_unary_op
    - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad

  - Codegen raises error for variable size
    - tests/python/relay/test_any.py::test_any_batch_matmul
    - tests/python/relay/test_any.py::test_any_conv2d_NCHWc
    - tests/python/relay/test_any.py::test_any_dense

- Failures when running
  - Numeric differences (observed on GTX 1650 with NVIDIA driver)
    - tests/python/relay/test_op_level3.py::test_take
    - tests/python/relay/test_op_level5.py::TestCropAndResize
    - tests/python/relay/test_op_level5.py::TestResize1D
    - tests/python/relay/test_op_level5.py::TestResize2D
masahi pushed a commit that referenced this issue Sep 7, 2021
* [UnitTest] Added ids argument to tvm.testing.parameters

This matches the usage in `tvm.testing.parameter`, and allows for
parameter sets to be referred to by a single name.

* [Pytest] Fixed ordering issue of tvm.testing.parametrize_targets and known_failing_targets

If an explicit list of targets is given, then the
`known_failing_targets` decorator would fail to apply.  This commit
resolves the issue, and cleans up all target-specific marks to apply
in `tvm.testing.plugin._add_target_specific_marks`.

* [UnitTest][Vulkan] Runnable relay unit tests on Vulkan

This commit allows the relay test suite to be run targeting Vulkan with
`TVM_TEST_TARGETS="vulkan -from_device=0" pytest tests/python/relay`.  All
tests that require a specific environment are skipped if that environment
isn't present.  All tests that are known to fail when running on Vulkan
are marked as expected failure, and will be tracked in
#8903.

- Failures during code generation
  - Type mismatches, boolean vs int8
    - tests/python/relay/test_any.py::test_any_reduce
    - tests/python/relay/test_op_level3.py::test_sparse_reshape
    - tests/python/relay/test_op_level4.py::test_reduce_functions
    - tests/python/relay/test_vm.py::test_cond
    - tests/python/relay/test_vm.py::test_simple_if

  - Incorrect strategy selection, picks NCHWc implemenation for NHWC layout
    - tests/python/relay/test_op_level2.py::test_conv2d_run

  - Unresolved CallNode operation
    - tests/python/relay/test_op_level1.py::test_unary_op[erf/tan/atan]
    - tests/python/relay/test_op_level3.py::test_scatter_add
    - tests/python/relay/test_op_level3.py::test_segment_sum

  - Generates 64-bit calls to GLSL that have only 16-/32-bit support
    - tests/python/relay/test_op_grad_level1.py::test_log_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_softmax_grad
    - tests/python/relay/test_op_grad_level1.py::test_unary_op
    - tests/python/relay/test_op_grad_level10.py::test_cross_entropy_grad

  - Codegen raises error for variable size
    - tests/python/relay/test_any.py::test_any_batch_matmul
    - tests/python/relay/test_any.py::test_any_conv2d_NCHWc
    - tests/python/relay/test_any.py::test_any_dense

- Failures when running
  - Numeric differences (observed on GTX 1650 with NVIDIA driver)
    - tests/python/relay/test_op_level3.py::test_take
    - tests/python/relay/test_op_level5.py::TestCropAndResize
    - tests/python/relay/test_op_level5.py::TestResize1D
    - tests/python/relay/test_op_level5.py::TestResize2D
@masahi
Copy link
Member

masahi commented Sep 13, 2021

@Lunderberg Are these two test cases any different? One has pytest.xfail("Known failing test for vulkan") but not for the other.

class TestConv2D:

def test_conv2d_run(target, dev):

@Lunderberg
Copy link
Contributor Author

Thank you for that catch. When refactoring the tests in #8947, I added the updated version of test_conv2d_run, but didn't remove the original. I have #8993 open to remove the redundant test_conv2d_run, and have double-checked that there aren't any others that snuck in.

@masahi
Copy link
Member

masahi commented Sep 15, 2021

@Lunderberg The last three items in test_any.py are not specific to vulkan (fails on cuda as well), so I think we should drop them from the list.

They don't work on gpu targets since we don't support dynamic height or width in conv2d, for example.

ylc pushed a commit to ylc/tvm that referenced this issue Sep 29, 2021
…ilures (apache#8904)

* [Pytest] Fixed TestTargetAutoParametrization in cases where LLVM is disabled.

* [UnitTests][Vulkan] Improved robustness of test_tir_intrin::test_clz

Previously, would fail during build since support for Int64 primitives wasn't
declared in the `"vulkan"` target.  Now, uses `"vulkan -from_device=0"` target
and marks the test as xfail if the current target doesn't support Int64.

* [UnitTest][Topi] Parametrized several unit tests, identify vulkan failures

- Parametrized topi modules
  - test_topi_conv1d_transpose_ncw.py
  - test_topi_conv2d_nhwc.py
  - test_topi_correlation.py
  - test_topi_loss.py
  - test_topi_math.py
  - test_topi_reduce.py
  - test_topi_softmax.py
  - test_topi_sort.py
  - test_topi_unique.py
  - test_topi_vision.py

- Unit Tests fixed

  - `test_topi_loss::test_nll_loss`, failure due to `supports_float64`
    not being passed from the target to the codegen.

- Known Vulkan failures (tracked in apache#8903)

  - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"]

    Unimplemented CallNode operations

  - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"]

    Fails during codegen, unexpected size of data type.

  - test_topi_vision.py::test_proposal

    Marked test_proposal as xfail on vulkan, currently has a type error
    between bool/int8.

  - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw

    Incorrect numeric output, a few elements outside of allowed
    tolerance, only occurs on vulkan backend.

  - test_softmax.py::test_softmax

    Marked float64 operations as xfail in vulkan, because GLSL.std.450
    only supports 16/32-bit floats.
ylc pushed a commit to ylc/tvm that referenced this issue Jan 13, 2022
…ilures (apache#8904)

* [Pytest] Fixed TestTargetAutoParametrization in cases where LLVM is disabled.

* [UnitTests][Vulkan] Improved robustness of test_tir_intrin::test_clz

Previously, would fail during build since support for Int64 primitives wasn't
declared in the `"vulkan"` target.  Now, uses `"vulkan -from_device=0"` target
and marks the test as xfail if the current target doesn't support Int64.

* [UnitTest][Topi] Parametrized several unit tests, identify vulkan failures

- Parametrized topi modules
  - test_topi_conv1d_transpose_ncw.py
  - test_topi_conv2d_nhwc.py
  - test_topi_correlation.py
  - test_topi_loss.py
  - test_topi_math.py
  - test_topi_reduce.py
  - test_topi_softmax.py
  - test_topi_sort.py
  - test_topi_unique.py
  - test_topi_vision.py

- Unit Tests fixed

  - `test_topi_loss::test_nll_loss`, failure due to `supports_float64`
    not being passed from the target to the codegen.

- Known Vulkan failures (tracked in apache#8903)

  - test_topi_math.py::test_ewise, ["tan", "erf", "isnan", "isfinite", "isinf"]

    Unimplemented CallNode operations

  - test_topi_reduce.py::test_reduce_map, ["sum", "any", "all"]

    Fails during codegen, unexpected size of data type.

  - test_topi_vision.py::test_proposal

    Marked test_proposal as xfail on vulkan, currently has a type error
    between bool/int8.

  - test_topi_conv1d_transpose_ncw.py::test_conv1d_transpose_ncw

    Incorrect numeric output, a few elements outside of allowed
    tolerance, only occurs on vulkan backend.

  - test_softmax.py::test_softmax

    Marked float64 operations as xfail in vulkan, because GLSL.std.450
    only supports 16/32-bit floats.
@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it
Projects
None yet
Development

No branches or pull requests

3 participants