Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[TVM] TVM Integration Issue after changing to Boolean Mask. #1425

Closed
sxjscience opened this issue Nov 4, 2020 · 4 comments
Closed

[TVM] TVM Integration Issue after changing to Boolean Mask. #1425

sxjscience opened this issue Nov 4, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@sxjscience
Copy link
Member

Description

I'm changing the mask to use the boolean type in #1405 to pass the AMP. However, it's causing issues in TVM integration. I created this issue to track this error and will skip the TVM test.

test_models.py:145: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../tvm/python/tvm/relay/frontend/mxnet.py:2869: in from_mxnet
    func = _from_mxnet_impl(symbol, shape, dtype, params, mod)
../../tvm/python/tvm/relay/frontend/mxnet.py:2792: in _from_mxnet_impl
    res = _convert_map[op_name](*op_params)
../../tvm/python/tvm/relay/frontend/mxnet.py:793: in _mx_batch_dot
    a_shape = _infer_type(a).checked_type.shape
../../tvm/python/tvm/relay/frontend/common.py:482: in infer_type
    new_mod = _transform.InferType()(new_mod)
../../tvm/python/tvm/ir/transform.py:127: in __call__
    return _ffi_transform_api.RunPass(self, mod)
tvm/_ffi/_cython/./packed_func.pxi:321: in tvm._ffi._cy3.core.PackedFuncBase.__call__
    ???
tvm/_ffi/_cython/./packed_func.pxi:256: in tvm._ffi._cy3.core.FuncCall
    ???
tvm/_ffi/_cython/./packed_func.pxi:245: in tvm._ffi._cy3.core.FuncCall3
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   tvm._ffi.base.TVMError: Traceback (most recent call last):
E     [bt] (7) /home/ubuntu/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7fd9125f2095]
E     [bt] (6) /home/ubuntu/tvm/build/libtvm.so(+0x6fc086) [0x7fd911bbc086]
E     [bt] (5) /home/ubuntu/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1ee) [0x7fd911bbb85e]
E     [bt] (4) /home/ubuntu/tvm/build/libtvm.so(+0xf662b8) [0x7fd9124262b8]
E     [bt] (3) /home/ubuntu/tvm/build/libtvm.so(+0xf65495) [0x7fd912425495]
E     [bt] (2) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::GlobalVar, tvm::relay::Function)+0x67) [0x7fd912424947]
E     [bt] (1) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeSolver::Solve()+0xc37) [0x7fd9122b5d67]
E     [bt] (0) /home/ubuntu/tvm/build/libtvm.so(+0xdf21c2) [0x7fd9122b21c2]
E     [bt] (8) /home/ubuntu/tvm/build/libtvm.so(+0x6fc086) [0x7fd911bbc086]
E     [bt] (7) /home/ubuntu/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1ee) [0x7fd911bbb85e]
E     [bt] (6) /home/ubuntu/tvm/build/libtvm.so(+0xf662b8) [0x7fd9124262b8]
E     [bt] (5) /home/ubuntu/tvm/build/libtvm.so(+0xf65495) [0x7fd912425495]
E     [bt] (4) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::GlobalVar, tvm::relay::Function)+0x67) [0x7fd912424947]
E     [bt] (3) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeSolver::Solve()+0x375) [0x7fd9122b54a5]
E     [bt] (2) /home/ubuntu/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<bool (tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>(bool (*)(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x63b) [0x7fd911c0f36b]
E     [bt] (1) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::BroadcastRel(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)+0x350) [0x7fd91223f330]
E     [bt] (0) /home/ubuntu/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x82) [0x7fd911a89ba2]
E     File "../src/relay/analysis/type_solver.cc", line 621
E   TVMError: 
E   ---------------------------------------------------------------
E   An internal invariant was violated during the execution of TVM.
E   Please read TVM's error reporting guidelines.
E   More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
E   ---------------------------------------------------------------
E     Check failed: false == false: [21:48:08] ../src/relay/op/type_relations.cc:107: Check failed: t0->dtype == t1->dtype (float32 vs. bool) :
@sxjscience sxjscience added the bug Something isn't working label Nov 4, 2020
@Zha0q1
Copy link

Zha0q1 commented Nov 18, 2020

On my EC2 instance I commented out 'google_albert_base_v2' and the rest of the models seemed to work fine.

created this pr to re-enable the tests #1437

@Zha0q1
Copy link

Zha0q1 commented Nov 18, 2020

The error I got:

==================================================== FAILURES ====================================================
____________________________ test_tvm_integration[ctx0-TN-1-4-google_albert_base_v2] _____________________________
model_name = 'google_albert_base_v2', batch_size = 1, seq_length = 4, layout = 'TN', ctx = cpu(0)
    @pytest.mark.serial
    @pytest.mark.seed(123)
    @pytest.mark.parametrize('model_name',
                             ['google_albert_base_v2'])
    @pytest.mark.parametrize('batch_size,seq_length', [(1, 4)])
    @pytest.mark.parametrize('layout', ['TN'])
    # @pytest.mark.skipif(not tvm_enabled(),
    #                    reason='TVM is not supported. So this test is skipped.')
    # @pytest.mark.skip('TVM issue https://github.com/dmlc/gluon-nlp/issues/1425.')
    def test_tvm_integration(model_name, batch_size, seq_length, layout, ctx):
        tvm = try_import_tvm()
        from tvm import relay
        from tvm.contrib import graph_runtime
        tvm_recommended_flags = get_ec2_tvm_flags()
        if ctx.device_type == 'gpu':
            flags = tvm_recommended_flags['g4']
        elif ctx.device_type == 'cpu':
            flags = tvm_recommended_flags['c4']
            if model_name != 'google_albert_base_v2':
                # Skip all other tests
                return
        else:
            raise NotImplementedError
        with tempfile.TemporaryDirectory() as root, ctx:
            model_cls, cfg, tokenizer, backbone_param_path, _ = get_backbone(model_name, root=root)
            cfg.defrost()
            cfg.MODEL.layout = layout
            cfg.freeze()
            model = model_cls.from_cfg(cfg)
            model.load_parameters(backbone_param_path)
            model.hybridize()
            if layout == 'NT':
                token_ids = mx.np.random.randint(0, cfg.MODEL.vocab_size, (batch_size, seq_length),
                                                 dtype=np.int32)
                token_types = mx.np.random.randint(0, 2, (batch_size, seq_length), dtype=np.int32)
                valid_length = mx.np.random.randint(seq_length // 2, seq_length, (batch_size,),
                                                    dtype=np.int32)
            else:
                token_ids = mx.np.random.randint(0, cfg.MODEL.vocab_size, (seq_length, batch_size),
                                                 dtype=np.int32)
                token_types = mx.np.random.randint(0, 2, (seq_length, batch_size), dtype=np.int32)
                valid_length = mx.np.random.randint(seq_length // 2, seq_length, (batch_size,),
                                                    dtype=np.int32)
            if 'bart' in model_name:
                mx_out = model(token_ids, valid_length, token_ids, valid_length)
                shape_dict = {
                    'data0': token_ids.shape,
                    'data1': valid_length.shape,
                    'data2': token_ids.shape,
                    'data3': valid_length.shape,
                }
                dtype_dict = {
                    'data0': token_ids.dtype.name,
                    'data1': valid_length.dtype.name,
                    'data2': token_ids.dtype.name,
                    'data3': valid_length.dtype.name,
                }
            elif 'roberta' in model_name or 'xlmr' in model_name:
                mx_out = model(token_ids, valid_length)
                shape_dict = {
                    'data0': token_ids.shape,
                    'data1': valid_length.shape,
                }
                dtype_dict = {
                    'data0': token_ids.dtype.name,
                    'data1': valid_length.dtype.name,
                }
            else:
                mx_out = model(token_ids, token_types, valid_length)
                shape_dict = {
                    'data0': token_ids.shape,
                    'data1': token_types.shape,
                    'data2': valid_length.shape
                }
                dtype_dict = {
                    'data0': token_ids.dtype.name,
                    'data1': token_types.dtype.name,
                    'data2': valid_length.dtype.name
                }
            sym = model._cached_graph[1]
            params = {}
            for k, v in model.collect_params().items():
                params[v._var_name] = tvm.nd.array(v.data().asnumpy())
>           mod, params = relay.frontend.from_mxnet(sym, shape=shape_dict, dtype=dtype_dict, arg_params=params)
tests/test_models.py:143: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../tvm/python/tvm/relay/frontend/mxnet.py:2869: in from_mxnet
    func = _from_mxnet_impl(symbol, shape, dtype, params, mod)
../tvm/python/tvm/relay/frontend/mxnet.py:2792: in _from_mxnet_impl
    res = _convert_map[op_name](*op_params)
../tvm/python/tvm/relay/frontend/mxnet.py:793: in _mx_batch_dot
    a_shape = _infer_type(a).checked_type.shape
../tvm/python/tvm/relay/frontend/common.py:482: in infer_type
    new_mod = _transform.InferType()(new_mod)
../tvm/python/tvm/ir/transform.py:127: in __call__
    return _ffi_transform_api.RunPass(self, mod)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <tvm.runtime.packed_func.PackedFunc object at 0x7fc8404d4be0>
args = (Run Module pass: InferType at the optimization level 0, #[version = "0.0.5"]
def @main(%data2: Tensor[(1), int32], %v...=", 
    "P6G0lvBAXt0AAAAAAAAAAAEAAAAAAAAAAAAAAAAgAQAEAAAAAAAAAAEAAAA="
  ], 
  "attrs": {"tvm_version": "0.8.dev0"}
})
temp_args = [], values = <tvm._ffi._ctypes.packed_func.TVMValue_Array_2 object at 0x7fc834ee1680>
tcodes = <mxnet._ffi._ctypes.function.c_int_Array_2 object at 0x7fc834ee1830>
    def __call__(self, *args):
        """Call the function with positional arguments
        args : list
           The positional arguments to the function call.
        """
        temp_args = []
        values, tcodes, num_args = _make_tvm_args(args, temp_args)
        ret_val = TVMValue()
        ret_tcode = ctypes.c_int()
        if (
            _LIB.TVMFuncCall(
                self.handle,
                values,
                tcodes,
                ctypes.c_int(num_args),
                ctypes.byref(ret_val),
                ctypes.byref(ret_tcode),
            )
            != 0
        ):
>           raise get_last_ffi_error()
E           tvm._ffi.base.TVMError: Traceback (most recent call last):
E             [bt] (7) /home/ubuntu/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7fc842ca3595]
E             [bt] (6) /home/ubuntu/tvm/build/libtvm.so(+0x7007c2) [0x7fc84225f7c2]
E             [bt] (5) /home/ubuntu/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1b7) [0x7fc84225f077]
E             [bt] (4) /home/ubuntu/tvm/build/libtvm.so(+0xfcee2f) [0x7fc842b2de2f]
E             [bt] (3) /home/ubuntu/tvm/build/libtvm.so(+0xfce085) [0x7fc842b2d085]
E             [bt] (2) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::GlobalVar, tvm::relay::Function)+0x67) [0x7fc842b2c637]
E             [bt] (1) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeSolver::Solve()+0xd39) [0x7fc8429b3269]
E             [bt] (0) /home/ubuntu/tvm/build/libtvm.so(+0xe50402) [0x7fc8429af402]
E             [bt] (8) /home/ubuntu/tvm/build/libtvm.so(+0x7007c2) [0x7fc84225f7c2]
E             [bt] (7) /home/ubuntu/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1b7) [0x7fc84225f077]
E             [bt] (6) /home/ubuntu/tvm/build/libtvm.so(+0xfcee2f) [0x7fc842b2de2f]
E             [bt] (5) /home/ubuntu/tvm/build/libtvm.so(+0xfce085) [0x7fc842b2d085]
E             [bt] (4) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::GlobalVar, tvm::relay::Function)+0x67) [0x7fc842b2c637]
E             [bt] (3) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::TypeSolver::Solve()+0x36d) [0x7fc8429b289d]
E             [bt] (2) /home/ubuntu/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<bool (tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>(bool (*)(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x7d7) [0x7fc8422b5f97]
E             [bt] (1) /home/ubuntu/tvm/build/libtvm.so(tvm::relay::BroadcastRel(tvm::runtime::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)+0x404) [0x7fc842937014]
E             [bt] (0) /home/ubuntu/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x82) [0x7fc84211beb2]
E             File "/home/ubuntu/tvm/src/relay/analysis/type_solver.cc", line 621
E           TVMError: 
E           ---------------------------------------------------------------
E           An internal invariant was violated during the execution of TVM.
E           Please read TVM's error reporting guidelines.
E           More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
E           ---------------------------------------------------------------
E             Check failed: false == false: [20:17:50] /home/ubuntu/tvm/src/relay/op/type_relations.cc:107: 
E           ---------------------------------------------------------------
E           An internal invariant was violated during the execution of TVM.
E           Please read TVM's error reporting guidelines.
E           More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
E           ---------------------------------------------------------------
E           
E             Check failed: t0->dtype == t1->dtype (float32 vs. int32) :
../tvm/python/tvm/_ffi/_ctypes/packed_func.py:237: TVMError
---------------------------------------------- Captured stdout call ----------------------------------------------
Downloading /tmp/tmpgmgdq8n2/google_albert_base_v2/spm-65999e5d.model from https://gluonnlp-numpy-data.s3-accelerate.amazonaws.com/models/google_albert_base_v2/spm-65999e5d.model...
Downloading /tmp/tmpgmgdq8n2/google_albert_base_v2/vocab-2ee53ae7.json from https://gluonnlp-numpy-data.s3-accelerate.amazonaws.com/models/google_albert_base_v2/vocab-2ee53ae7.json...
Downloading /tmp/tmpgmgdq8n2/google_albert_base_v2/model-125be477.params from https://gluonnlp-numpy-data.s3-accelerate.amazonaws.com/models/google_albert_base_v2/model-125be477.params...
---------------------------------------------- Captured stderr call ----------------------------------------------
100%|██████████| 760k/760k [00:00<00:00, 8.96MiB/s]
100%|██████████| 373k/373k [00:00<00:00, 9.29MiB/s]
100%|██████████| 46.7M/46.7M [00:01<00:00, 46.6MiB/s]
[20:17:50] ../src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for CPU
============================================ short test summary info =============================================
FAILED tests/test_models.py::test_tvm_integration[ctx0-TN-1-4-google_albert_base_v2] - tvm._ffi.base.TVMError: ...
=============================================== 1 failed in 2.99s ================================================

@sxjscience
Copy link
Member Author

For me, I think one potential cause is that the TVM does not allow mixed data types in the where operartor, e.g., https://github.com/apache/incubator-tvm/blob/7649075fbb71ecab0a41c6fe4d41a86724e42e7a/python/tvm/relay/frontend/mxnet.py#L2419-L2434. Thus, we may print the dtypes of the cond, lhs and rhs to see if it's the root cause.

@Zha0q1
Copy link

Zha0q1 commented Nov 18, 2020

#1437 passed CPU CI but on GPU the remaining three models still all failed

'google_en_cased_bert_base',
'google_electra_small',
'fairseq_bart_base'

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants