New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoTVM][AutoScheduler] Add workaround to alter op layout bug in task extraction. #8143
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
opt_mod, _ = relay.optimize(mod, target, params) | ||
# TODO(jwfromm) Remove this once AlterOpLayout bug that mutates | ||
# source module is fixed. Until then, create a clone. | ||
mod_clone = deepcopy(mod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be lifted out of the try-catch block so that L79 can be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually need to have both. The problem is that in the first try we attempt to apply optimize
, which can mutate the source module. Then if that fails, we try to use compiler.lower
, which again can mutate the source module. If we tried to apply compiler.lower
to mod_clone
after optimize
without a second copy, we could hit an error due to invalid shapes from alter_op_layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see...that's what you meant by the source module was mutated. Yeah this is definitely a bug to be fixed and this is a reasonable workaround.
opt_mod, _ = relay.optimize(mod, target, params) | ||
# TODO(jwfromm) Remove this once AlterOpLayout bug that mutates | ||
# source module is fixed. Until then, create a clone. | ||
mod_clone = deepcopy(mod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @jwfromm This looks good.
Just I'm thinking that does AutoScheduler really have such bug? In my mind, AutoScheduler trends to choose simple strategies without AlterOpLayout.
AlterOpLayout is applied during task extraction so it definitely has this bug. Try autoscheduling the linked yolo model and you'll encounter it without this fix. |
170f871
to
2703705
Compare
…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.
I'm getting a strange error during task extraction after this commit. Something bad happens during
|
…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.
…k extraction. (apache#8143) * Add workaround to alter op layout bug in task extraction. * Only copy mod.
There's been a long-known issue where sometimes during alter_op_layout, the source IRModule is mutated. Sometimes this can cause errors during task extraction. One example model where the issue pops up is yolov3-tiny. An easy workaround to avoid this bug is making a copy of the input module before applying optimization passes. This PR adds a copy step to both autotvm and auto_scheduler. I'm not sure what tests to add since the bug is extremely difficult to pin down. It does trigger with the above linked yolo model though.