Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor compile_engine to introduce TETranslator #6888

Closed
wants to merge 5 commits into from

Conversation

mbaret
Copy link
Contributor

@mbaret mbaret commented Nov 9, 2020

This PR refactors the compile_engine to expose a Relay -> TE translator as a standalone component. See the RFC here.

TODO

  • Agree on names for the API
  • Add further tests for just the TETranslator

@tqchen
Copy link
Member

tqchen commented Nov 9, 2020

cc @spectrometerHBH @Hzfengsy

Change-Id: Icd500c7372f73dba592ac358911067b9faa3a304
Change-Id: I10c68fc8db95cc855cf6d95cf5c539e9cdf54feb
Change-Id: I94a5028bbe122543cc9f157b58245852ec675bb2
@mbaret mbaret changed the title [WIP] Refactor compile_engine to introduce TETranslator Refactor compile_engine to introduce TETranslator Nov 17, 2020
Change-Id: I509a45767d3359bbc828294a9528ba7ca06ab5eb
Change-Id: Ibecb41267450b437ebd47b4b8c237e4b117be065
@mbaret
Copy link
Contributor Author

mbaret commented Nov 18, 2020

cc @comaniac @tqchen could you take a look at this? I'd like to make some progress to see whether there's agreement that this is a useful refactor.

@comaniac
Copy link
Contributor

In this implementation we need to call lower twice, which I think would introduce unnecessary overheads to the build flow. If your ultimate goal is to lower a Relay function to a TE compute without applying the TOPI schedule, then it looks like we can improve #6903. It currently works like the following:

  • Set PassContext.config.relay.backend.use_auto_scheduler to True.
  • In compile_engine, if PassContext.config.relay.backend.use_auto_scheduler is True and there is no auto_scheduler schedule, then returns an initial schedule (created by te.create_schedule).

To achieve your goal, maybe we simply set PassContext.config.relay.backend.use_auto_scheduler to True and directly call ScheduleGetter in TVM_REGISTER_GLOBAL("relay.backend._TranslateToTE"). In this case, since we don't provide auto_scheduler schedule, you'll just get an initial schedule. In this way, we won't affect any existing flow but just provide another path for getting the TE graph. Then we just need to focus on refining the message such as "Cannot find a config for ...".

In addition, the name "translate_to_te" is misleading to me. First, Relay and TE are not at the same IR level, so "lower" would be more proper than "translate". Second, if you prefer just provide an API to "lower a Relay function to a TE compute with an initial schedule" instead of "any possbile schedule (i.e., TOPI schedule or auto_scheduler shcedule)", then this API should be like lower_to_te_compute (the initial schedule is just the same as the compute); otherwise it should be lower_to_te(func, schedule=[initial, autotvm, auto_scheduler]).

cc @zhiics @merrymercy

@mbaret
Copy link
Contributor Author

mbaret commented Nov 18, 2020

I agree that just having a default schedule be produced is one such workaround (and it's actually the initial workaround I used). What I didn't like though was how much of a 'hack' it appeared. Doing a trick with the autoscheduler looks like another one of these hacks that will further decrease the readability of the compile_engine.

I have been working on a more extensive refactor to split lower_call into a part that gets the compute and a part that gets the schedule. However, given you need to query OpImplementation for both cases this seems like a fairly arbitrary distinction to make.

I agree that translate_to_te is a pretty poor name, not intended as much more than a strawman. What I want to convey is that the scheduling process does not necessarily require access to the original Relay function. AutoTVM needs this due to its implementation, but for my cascading scheduler it should just accept a TE compute. In that case I'd probably prefer lower_to_te_compute.

@mbaret
Copy link
Contributor Author

mbaret commented Nov 18, 2020

Also with the CI failure, it's failing in a CUDA test and I don't currently have access to a machine that can run it. I'll have a go at debugging it blind to begin with, but otherwise it may be a while before I can get access to the necessary hardware.

@comaniac
Copy link
Contributor

What I didn't like though was how much of a 'hack' it appeared. Doing a trick with the autoscheduler looks like another one of these hacks that will further decrease the readability of the compile_engine.

This won't be too hacky to me, because what auto_scheduler needs is exactly a compute. We may just need to change use_auto_scheduler to a more general name. Instead of introducing additional overhead by calling lower_call twice, I would rather consider this direction.

I have been working on a more extensive refactor to split lower_call into a part that gets the compute and a part that gets the schedule. However, given you need to query OpImplementation for both cases this seems like a fairly arbitrary distinction to make.

Understood. it's not straightforward to separate lower_call. As I mentioned in the RFC, the logic of getting compute and schedule is actually tightly-coupled. I also had the same feeling, so that's why #6903 ends up being like that.

@mbaret
Copy link
Contributor Author

mbaret commented Nov 25, 2020

Given the existing two scheduling strategies (AutoTVM/Ansor) can't make particularly good use of this refactor, I'll close this for now and just make it a standalone pass for my use-case.

@mbaret mbaret closed this Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants