Remove tophub from being the default configuration when compiling #59

tkonolige · 2022-03-04T22:36:13Z

Prompted by apache/tvm#10474, this RFC proposes removing tophub from the default compilation workflow. If you rely on tophub being the implicit default, please comment on this RFC.

@masahi @comaniac

comaniac · 2022-03-05T00:50:00Z

rfcs/0059-remove-tophub-from-default.md

+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+Will there be any actual performance impact in removing tophub from being the default?


Based on the CI, at least we know that the workloads in MobileNet on GPU require TopHub schedules to be compiled. The default schedules are invalid for some GPUs (e.g., the one used in CI) and will result in compilation failure. Given this fact, we cannot guarantee default schedules are valid for all workloads on GPU, and TopHub could more or less help complement this issue.

Thus, if we go with this RFC, the concern I may have is how should we port the TopHub schedule to be the default one? I could imagine that we may end up having a decision-tree like logic for each op to dispatch the desired default schedule. This may make TOPI even more ad-hoc and complicate. I would definitely like to hear others feedback about this.

Do the default schedules not work for the specific GPU on CI or do they not work for any GPU? My guess is they don't work for any GPU because they have never been tested. And they haven't been tested because using tophub by default causes the default schedules to not be used. To me it seems like this is a good reason to bring to not use tophub by default.

Branching could be difficult. How many different targets are used in the tophub schedules?

I might be wrong, but my impression is the default schedule must work for at least one GPU by the time it was committed, because there's no point to put an invalid schedule anyways (but maybe that valid schedule is no longer valid...)

I also feel that branching is difficult. IIRC, TopHub CUDA should include 3-4 GPUs, but once we rely on default schedules, any future "default" schedules have to be in TOPI as well.

In short, I feel the easiest way for now is 1) removing the tophub context as you suggested, but 2) making sure default schedules could pass CIs. Meanwhile, 3) adding a deprecate warning.

cc @tqchen @masahi @junrushao1994 @areusch

yeah a couple thoughts:

right now we have no way to assert that a particular schedule is actually selected or used by TVM. we ran into this with microTVM's original prototype where we had to hack the tuning log lookup to always return one schedule. if we had such a thing, we could write tests that assert that a particular TopHub schedule was applied and then we could document which GPU it's for.

it does seem like, lacking this way of testing (and correct me if i'm missing something--perhaps we do have a way to test this), we will be forced to make a hard cutover to a world where we document which hardware each schedule should compile on via some type of test

whatever we do here, there should be a very easy way to get back the old behavior since it's likely some folks will need to use TopHub

the root problem @tkonolige set out to address was differences in compiler output between different entry points. it seems like whatever we do above, we should absolutely make sure we shore up that discrepancy.

right now we have no way to assert that a particular schedule is actually selected or used by TVM. we ran into this with microTVM's original prototype where we had to hack the tuning log lookup to always return one schedule. if we had such a thing, we could write tests that assert that a particular TopHub schedule was applied and then we could document which GPU it's for.
That would be very nice to have. It'll be a little bit of work to pipe through though.

it does seem like, lacking this way of testing (and correct me if i'm missing something--perhaps we do have a way to test this), we will be forced to make a hard cutover to a world where we document which hardware each schedule should compile on via some type of test

whatever we do here, there should be a very easy way to get back the old behavior since it's likely some folks will need to use TopHub
with autotvm.tophub.context(target) is the way to do that (added this to the RFC).

the root problem @tkonolige set out to address was differences in compiler output between different entry points. it seems like whatever we do above, we should absolutely make sure we shore up that discrepancy.
This is one part of the issues I've been trying to solve. The other is behavior that is based on content external to the TVM repository.

i think @mehrdadh is working on a test framework that selects a specific schedule for an operator. perhaps that may help?

comaniac · 2022-03-05T00:54:04Z

rfcs/0059-remove-tophub-from-default.md

+
+Will there be any actual performance impact in removing tophub from being the default?
+
+Should we provide a deprecation notice, or a change notice, or no notice at all?


If we go with this RFC, I vote for a WARNING in graph executor saying that TopHub is no longer applied automatically, please xxx (provide a link or just describe how to bring TopHub back if users want). We could further discuss how long should the warning be. Formally, it should be there for at least one release cycle (0.9 -> 1.0), but we could make it shorter if everyone agrees.

tkonolige · 2022-03-07T22:41:09Z

I ran with and without tophub on a selection of models in tvm.relay.testing:

resnet
With Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.5029       0.1609       1.1974       0.1431       0.4449

Without Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.0885       0.0842       0.1020       0.0823       0.0072

densenet
With Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   1.7610       1.9728       1.9733       0.9131       0.4239

Without Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   4.3721       5.5795       5.5805       0.5085       1.9678

inception_v3
With Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   4.5442       6.2537       6.2550       0.3917       2.3228

Without Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
  43.6982      64.7263      64.9195       0.3586      26.9297

mlp
With Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.0184       0.0185       0.0197       0.0172       0.0010

Without Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.0154       0.0152       0.0167       0.0148       0.0007

mobilenet
With Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.0872       0.0832       0.1042       0.0816       0.0086

Without Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.0844       0.0840       0.0896       0.0816       0.0028

squeezenet
With Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.1058       0.1051       0.1103       0.1033       0.0024

Without Tophub
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
   0.1056       0.1051       0.1084       0.1040       0.0016

comaniac · 2022-03-08T01:48:48Z

I ran with and without tophub on a selection of models in tvm.relay.testing:

Thanks for the experiments. What's your target device?

tkonolige · 2022-03-08T17:36:02Z

@comaniac rtx-3070

areusch

it would be helpful to lay out:

what's the API being added here to let people specify TopHub? How should people explicitly enable it after this RFC lands?
how are we shoring up the config differences between VMCompiler.optimize and VMCompiler.lower? which i assume we answer with (1), but then need to explain the impact as it's explained in the RFC PR.

areusch · 2022-04-08T22:29:31Z

rfcs/0059-remove-tophub-from-default.md

+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+Will there be any actual performance impact in removing tophub from being the default?


yeah a couple thoughts:

right now we have no way to assert that a particular schedule is actually selected or used by TVM. we ran into this with microTVM's original prototype where we had to hack the tuning log lookup to always return one schedule. if we had such a thing, we could write tests that assert that a particular TopHub schedule was applied and then we could document which GPU it's for.

it does seem like, lacking this way of testing (and correct me if i'm missing something--perhaps we do have a way to test this), we will be forced to make a hard cutover to a world where we document which hardware each schedule should compile on via some type of test

whatever we do here, there should be a very easy way to get back the old behavior since it's likely some folks will need to use TopHub

the root problem @tkonolige set out to address was differences in compiler output between different entry points. it seems like whatever we do above, we should absolutely make sure we shore up that discrepancy.

Tristan Konolige added 2 commits March 4, 2022 14:33

Remove tophub from being the default configuration when compiling

aec5479

should we provide notice

54085f0

comaniac reviewed Mar 5, 2022

View reviewed changes

areusch reviewed Apr 8, 2022

View reviewed changes

show how to use tophub context

2d098ba

tkonolige mentioned this pull request Apr 21, 2022

[PROFILER] Theoretical roofline models apache/tvm#11066

Merged

areusch mentioned this pull request Jun 10, 2022

[TOPHUB] Remove tophub context from VM and Graph Executor apache/tvm#10474

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove tophub from being the default configuration when compiling #59

Remove tophub from being the default configuration when compiling #59

tkonolige commented Mar 4, 2022

comaniac Mar 5, 2022

tkonolige Mar 7, 2022

comaniac Mar 8, 2022

areusch Apr 8, 2022

tkonolige Apr 18, 2022

areusch May 11, 2022

comaniac Mar 5, 2022

tkonolige commented Mar 7, 2022

comaniac commented Mar 8, 2022

tkonolige commented Mar 8, 2022

areusch left a comment

areusch Apr 8, 2022


		Will there be any actual performance impact in removing tophub from being the default?

		Should we provide a deprecation notice, or a change notice, or no notice at all?

Remove tophub from being the default configuration when compiling #59

Are you sure you want to change the base?

Remove tophub from being the default configuration when compiling #59

Conversation

tkonolige commented Mar 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkonolige commented Mar 7, 2022

comaniac commented Mar 8, 2022

tkonolige commented Mar 8, 2022

areusch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment