[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction. #8143

jwfromm · 2021-05-26T21:29:05Z

There's been a long-known issue where sometimes during alter_op_layout, the source IRModule is mutated. Sometimes this can cause errors during task extraction. One example model where the issue pops up is yolov3-tiny. An easy workaround to avoid this bug is making a copy of the input module before applying optimization passes. This PR adds a copy step to both autotvm and auto_scheduler. I'm not sure what tests to add since the bug is extremely difficult to pin down. It does trigger with the above linked yolo model though.

comaniac

Otherwise LGTM

comaniac · 2021-05-26T21:38:22Z

python/tvm/auto_scheduler/relay_integration.py

-            opt_mod, _ = relay.optimize(mod, target, params)
+            # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+            # source module is fixed. Until then, create a clone.
+            mod_clone = deepcopy(mod)


This line can be lifted out of the try-catch block so that L79 can be simplified.

I think we actually need to have both. The problem is that in the first try we attempt to apply optimize, which can mutate the source module. Then if that fails, we try to use compiler.lower, which again can mutate the source module. If we tried to apply compiler.lower to mod_clone after optimize without a second copy, we could hit an error due to invalid shapes from alter_op_layout.

Ah I see...that's what you meant by the source module was mutated. Yeah this is definitely a bug to be fixed and this is a reasonable workaround.

comaniac · 2021-05-26T21:38:32Z

python/tvm/autotvm/task/relay_integration.py

-        opt_mod, _ = relay.optimize(mod, target, params)
+        # TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
+        # source module is fixed. Until then, create a clone.
+        mod_clone = deepcopy(mod)


jcf94

Thanks! @jwfromm This looks good.

Just I'm thinking that does AutoScheduler really have such bug? In my mind, AutoScheduler trends to choose simple strategies without AlterOpLayout.

jwfromm · 2021-05-28T16:43:32Z

AlterOpLayout is applied during task extraction so it definitely has this bug. Try autoscheduling the linked yolo model and you'll encounter it without this fix.

…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.

masahi · 2021-06-04T07:37:26Z

I'm getting a strange error during task extraction after this commit. Something bad happens during deepcopy:

Traceback (most recent call last):                                                                                                                                           
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 926, in _bootstrap_inner                                                                       
    self.run()                                                                                                                                                               
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 870, in run                                                                                    
    self._target(*self._args, **self._kwargs)                                                                                                                                
  File "/home/masa/projects/dev/tvm/python/tvm/auto_scheduler/relay_integration.py", line 79, in call_all_topi_funcs                                                         
    mod_clone = deepcopy(mod)                                                                                                                                                
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 180, in deepcopy                                                                                    
    y = _reconstruct(x, memo, *rv)                                                                                                                                           
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 283, in _reconstruct                                                                                
    y.__setstate__(state)                                                                                                                                                    
  File "/home/masa/projects/dev/tvm/python/tvm/runtime/object.py", line 91, in __setstate__                                                                                  
    self.__init_handle_by_constructor__(_ffi_node_api.LoadJSON, handle)                                                                                                      
  File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/object.py", line 136, in __init_handle_by_constructor__                                                          
    handle = __init_by_constructor__(fconstructor, args)                                                                                                                     
  File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 260, in __init_handle_by_constructor__                                                     
    raise get_last_ffi_error()                                                                                                                                               
tvm._ffi.base.TVMError: Traceback (most recent call last):                                                                                                                   
  5: TVMFuncCall                                                                                                                                                             
  4: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::ObjectRef (std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
> >)>(tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRe
tValue*&&)                                                                                                                                                                   
  3: tvm::LoadJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)                                                                          
  2: tvm::ReflectionVTable::VisitAttrs(tvm::runtime::Object*, tvm::AttrVisitor*) const                                                                                       
  1: tvm::FieldDependencyFinder::Visit(char const*, tvm::runtime::ObjectRef*)                                                                                                
  0: void tvm::FieldDependencyFinder::ParseValue<unsigned long>(char const*, unsigned long*) const                                                                           
  File "../src/node/serialization.cc", line 291                                                                                                                              
JSONReader: cannot find field axis

…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.

jwfromm requested review from jroesch and merrymercy May 26, 2021 21:29

comaniac reviewed May 26, 2021

View reviewed changes

comaniac approved these changes May 26, 2021

View reviewed changes

jcf94 approved these changes May 27, 2021

View reviewed changes

Josh Fromm added 2 commits May 28, 2021 16:44

Add workaround to alter op layout bug in task extraction.

b4c33e2

Only copy mod.

2703705

jwfromm force-pushed the avoid_alter_op_bug branch from 170f871 to 2703705 Compare May 28, 2021 16:44

jwfromm removed request for jroesch and merrymercy May 28, 2021 16:45

masahi merged commit e26990f into apache:main May 30, 2021

mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Jun 3, 2021

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in tas…

959e3c7

…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.

masahi mentioned this pull request Jun 14, 2021

AlterOpLayout modifies input module inplace (and other issues) #7979

Closed

m3at mentioned this pull request Jun 15, 2021

[Relay] Module mutated in-place #6624

Closed

masahi mentioned this pull request Jun 16, 2021

Fix GatherND attribute registration #8269

Merged

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 17, 2021

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in tas…

030b5ea

…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 17, 2021

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in tas…

dc98c40

…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.

jwfromm deleted the avoid_alter_op_bug branch April 12, 2023 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction. #8143

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction. #8143

jwfromm commented May 26, 2021

comaniac left a comment

comaniac May 26, 2021

jwfromm May 26, 2021 •

edited

comaniac May 26, 2021

comaniac May 26, 2021

jcf94 left a comment

jwfromm commented May 28, 2021

masahi commented Jun 4, 2021

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction. #8143

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction. #8143

Conversation

jwfromm commented May 26, 2021

comaniac left a comment

Choose a reason for hiding this comment

comaniac May 26, 2021

Choose a reason for hiding this comment

jwfromm May 26, 2021 • edited

Choose a reason for hiding this comment

comaniac May 26, 2021

Choose a reason for hiding this comment

comaniac May 26, 2021

Choose a reason for hiding this comment

jcf94 left a comment

Choose a reason for hiding this comment

jwfromm commented May 28, 2021

masahi commented Jun 4, 2021

jwfromm May 26, 2021 •

edited