Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction. #8143

Merged
merged 2 commits into from May 30, 2021

Conversation

jwfromm
Copy link
Contributor

@jwfromm jwfromm commented May 26, 2021

There's been a long-known issue where sometimes during alter_op_layout, the source IRModule is mutated. Sometimes this can cause errors during task extraction. One example model where the issue pops up is yolov3-tiny. An easy workaround to avoid this bug is making a copy of the input module before applying optimization passes. This PR adds a copy step to both autotvm and auto_scheduler. I'm not sure what tests to add since the bug is extremely difficult to pin down. It does trigger with the above linked yolo model though.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

opt_mod, _ = relay.optimize(mod, target, params)
# TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
# source module is fixed. Until then, create a clone.
mod_clone = deepcopy(mod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be lifted out of the try-catch block so that L79 can be simplified.

Copy link
Contributor Author

@jwfromm jwfromm May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually need to have both. The problem is that in the first try we attempt to apply optimize, which can mutate the source module. Then if that fails, we try to use compiler.lower, which again can mutate the source module. If we tried to apply compiler.lower to mod_clone after optimize without a second copy, we could hit an error due to invalid shapes from alter_op_layout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see...that's what you meant by the source module was mutated. Yeah this is definitely a bug to be fixed and this is a reasonable workaround.

opt_mod, _ = relay.optimize(mod, target, params)
# TODO(jwfromm) Remove this once AlterOpLayout bug that mutates
# source module is fixed. Until then, create a clone.
mod_clone = deepcopy(mod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @jwfromm This looks good.

Just I'm thinking that does AutoScheduler really have such bug? In my mind, AutoScheduler trends to choose simple strategies without AlterOpLayout.

@jwfromm
Copy link
Contributor Author

jwfromm commented May 28, 2021

AlterOpLayout is applied during task extraction so it definitely has this bug. Try autoscheduling the linked yolo model and you'll encounter it without this fix.

@masahi masahi merged commit e26990f into apache:main May 30, 2021
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Jun 3, 2021
…k extraction. (apache#8143)

* Add workaround to alter op layout bug in task extraction.

* Only copy mod.
@masahi
Copy link
Member

masahi commented Jun 4, 2021

I'm getting a strange error during task extraction after this commit. Something bad happens during deepcopy:

Traceback (most recent call last):                                                                                                                                           
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 926, in _bootstrap_inner                                                                       
    self.run()                                                                                                                                                               
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/threading.py", line 870, in run                                                                                    
    self._target(*self._args, **self._kwargs)                                                                                                                                
  File "/home/masa/projects/dev/tvm/python/tvm/auto_scheduler/relay_integration.py", line 79, in call_all_topi_funcs                                                         
    mod_clone = deepcopy(mod)                                                                                                                                                
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 180, in deepcopy                                                                                    
    y = _reconstruct(x, memo, *rv)                                                                                                                                           
  File "/home/masa/anaconda3/envs/torch-1.7/lib/python3.7/copy.py", line 283, in _reconstruct                                                                                
    y.__setstate__(state)                                                                                                                                                    
  File "/home/masa/projects/dev/tvm/python/tvm/runtime/object.py", line 91, in __setstate__                                                                                  
    self.__init_handle_by_constructor__(_ffi_node_api.LoadJSON, handle)                                                                                                      
  File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/object.py", line 136, in __init_handle_by_constructor__                                                          
    handle = __init_by_constructor__(fconstructor, args)                                                                                                                     
  File "/home/masa/projects/dev/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 260, in __init_handle_by_constructor__                                                     
    raise get_last_ffi_error()                                                                                                                                               
tvm._ffi.base.TVMError: Traceback (most recent call last):                                                                                                                   
  5: TVMFuncCall                                                                                                                                                             
  4: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::ObjectRef (std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
> >)>(tvm::runtime::ObjectRef (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRe
tValue*&&)                                                                                                                                                                   
  3: tvm::LoadJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)                                                                          
  2: tvm::ReflectionVTable::VisitAttrs(tvm::runtime::Object*, tvm::AttrVisitor*) const                                                                                       
  1: tvm::FieldDependencyFinder::Visit(char const*, tvm::runtime::ObjectRef*)                                                                                                
  0: void tvm::FieldDependencyFinder::ParseValue<unsigned long>(char const*, unsigned long*) const                                                                           
  File "../src/node/serialization.cc", line 291                                                                                                                              
JSONReader: cannot find field axis  

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 17, 2021
…k extraction. (apache#8143)

* Add workaround to alter op layout bug in task extraction.

* Only copy mod.
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 17, 2021
…k extraction. (apache#8143)

* Add workaround to alter op layout bug in task extraction.

* Only copy mod.
@jwfromm jwfromm deleted the avoid_alter_op_bug branch April 12, 2023 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants