TVMOp doesn't work well with GPU builds #17840

apeforest · 2020-03-15T20:18:02Z

Description

A few recent PRs failed at the same place related to TVM op.

leezu · 2020-03-18T17:12:25Z

@apeforest do you mean this error:

[2020-03-18T08:37:14.025Z] TVMError: Check failed: ret == 0 (-1 vs. 0) : Check failed: f != nullptr: Cannot find function less_scalar_gpufloat32_2bool_2_kernel0 in the imported modules or global registry

apeforest · 2020-03-18T17:42:29Z

Yes, but it seems to be fixed.

leezu · 2020-03-18T17:54:04Z

Why would it be fixed? I got this error 2020-03-18T08:37:14.025Z UTC

ChaiBapchya · 2020-04-28T15:28:29Z

The issue persists. Upon trying unix-gpu build on G4 in CI Dev account.
http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/mxnet-validation-bapac%2Funix-gpu/detail/update_gpu_toolchain/8/pipeline/414

all 3 failed tests fail in a similar fashion
they fail at 7 tests with the following error

TVMError: Check failed: ret == 0 (-1 vs. 0) : Check failed: f != nullptr: Cannot find function  <x> in the imported modules or global registry

Internal functions that can't be found

greater_equal_gpufloat32_0float32_0bool_0_kernel0 (x2)
logical_and_gpufloat32_1float32_1bool_1_kernel0 (x2)
equal_gpufloat32_2float32_2bool_2_kernel0 (x2)
sum_gpureduce1st_dim_1req_kWriteTobool_5float32_2float32_2_kernel0 (x3)
cuda_rad2degfloat32_2float32_2_kernel0 (x2)

7 Tests that fail as a result

tests/python/unittest/test_numpy_interoperability.py:test_np_array_function_protocol
tests/python/unittest/test_numpy_interoperability.py:test_np_array_ufunc_protocol

tests/python/unittest/test_numpy_ndarray.py:test_np_ndarray_binary_element_wise_ops

tests/python/unittest/test_numpy_op.py:test_np_sum
tests/python/unittest/test_numpy_op.py:test_np_mean
tests/python/unittest/test_numpy_op.py:test_np_unary_funcs
tests/python/unittest/test_numpy_op.py:test_np_binary_funcs

leezu · 2020-04-29T23:04:57Z

Reproducer
Compile MXNet with USE_TVMOP=1.

import mxnet as mx

x = mx.np.array([[0, 1], [1, 1], [2, 2]], ctx=mx.gpu())
idx = x < 2
x[idx]

Due to issues #17886 #17840

leezu · 2020-05-01T00:15:33Z

Has been disabled on CI: #18204

Let's track fixing TVMOp in this issue?

yzhliu · 2020-05-09T19:25:09Z

@jinboci will be helping

Due to issues apache#17886 apache#17840

* fix the error message of reshape() * Fixing issue #16655 reshape() error message * test pr * fixing #17840 * fixing issue #17840 * Update compile.py * Update ndarray.py * Update c_api.cc * Update op_module.cc * Update op_module.h * Update op_module.h * Update op_module.h * fixing tvmgpu issue & not restoring tvmop checks Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-249.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-37-194.ap-northeast-1.compute.internal> Co-authored-by: jinboci <cijinbo@outlook.com>

apeforest added Bug TVM OP Operators implemented using TVM labels Mar 15, 2020

apeforest mentioned this issue Mar 16, 2020

CI: Attempt fixing illegal instruction errors #17842

Merged

apeforest closed this as completed Mar 18, 2020

leezu reopened this Mar 18, 2020

ChaiBapchya mentioned this issue Apr 28, 2020

Boolean tvm operators broken on gpu #17886

Closed

ChaiBapchya mentioned this issue Apr 28, 2020

[CI] Upgrade unix gpu toolchain #18186

Merged

4 tasks

This was referenced Apr 30, 2020

Changes to mxnet.metric #18083

Merged

Disable -DUSE_TVM_OP on GPU builds #18204

Merged

leezu added a commit that referenced this issue May 1, 2020

Disable -DUSE_TVM_OP on GPU builds (#18204)

03fdfe0

Due to issues #17886 #17840

leezu changed the title ~~CI unix-gpu build failure in TVM op~~ TVMOp doesn't work well with GPU builds May 1, 2020

leezu assigned yzhliu May 1, 2020

yzhliu added v2.0 Numpy WIP labels May 6, 2020

waytrue17 mentioned this issue May 23, 2020

[v1.7.x] update jetson dockerfile to support CUDA 10.0 #18339

Merged

1 task

jinboci linked a pull request Jun 9, 2020 that will close this issue

[PLEASE DO NOT REVIEW] Trying to fix issue #17840 #18521

Draft

7 tasks

jinboci pushed a commit to jinboci/incubator-mxnet that referenced this issue Jun 9, 2020

fixing apache#17840

087a100

jinboci pushed a commit to jinboci/incubator-mxnet that referenced this issue Jun 9, 2020

fixing issue apache#17840

75f975b

jinboci pushed a commit to jinboci/incubator-mxnet that referenced this issue Jun 9, 2020

fixing issue apache#17840

84c389a

This was referenced Jun 9, 2020

Fixing issue #17840 #18526

Open

Restoring TVMOp tests #18542

Draft

AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this issue Jul 6, 2020

Disable -DUSE_TVM_OP on GPU builds (apache#18204)

d5d762d

Due to issues apache#17886 apache#17840

jinboci mentioned this issue Jul 15, 2020

[RFC] Use TVMOp with GPU & Build without libcuda.so in CI #18716

Open

jinboci mentioned this issue Jul 29, 2020

Fixing tvmgpu issue & not restoring tvmop checks #18818

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TVMOp doesn't work well with GPU builds #17840

TVMOp doesn't work well with GPU builds #17840

apeforest commented Mar 15, 2020

leezu commented Mar 18, 2020

apeforest commented Mar 18, 2020

leezu commented Mar 18, 2020

ChaiBapchya commented Apr 28, 2020 •

edited

leezu commented Apr 29, 2020

leezu commented May 1, 2020

yzhliu commented May 9, 2020

TVMOp doesn't work well with GPU builds #17840

TVMOp doesn't work well with GPU builds #17840

Comments

apeforest commented Mar 15, 2020

Description

leezu commented Mar 18, 2020

apeforest commented Mar 18, 2020

leezu commented Mar 18, 2020

ChaiBapchya commented Apr 28, 2020 • edited

leezu commented Apr 29, 2020

leezu commented May 1, 2020

yzhliu commented May 9, 2020

ChaiBapchya commented Apr 28, 2020 •

edited