Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove compile_engine.h for real this time. #8775

Closed
wants to merge 2 commits into from

Conversation

jroesch
Copy link
Member

@jroesch jroesch commented Aug 17, 2021

This removes the compile_engine.h header enabling us to start refactoring the te_compiler.h interface to better match the rest of the compiler. After we land the final refactors from @mikepapadim and @mbs-octoml we can start cleaning up the interface to better match the desired end state.

cc @electriclilies @csullivan @tmoreau89

@junrushao
Copy link
Member

CC @icemelon @comaniac @yzhliu

Comment on lines -282 to -335
TVM_REGISTER_PASS_CONFIG_OPTION("relay.backend.use_auto_scheduler", Bool);
TVM_REGISTER_PASS_CONFIG_OPTION("relay.backend.disable_compile_engine_cache", Bool);

TVM_REGISTER_GLOBAL("relay.backend._make_LoweredOutput")
.set_body_typed([](tvm::Array<te::Tensor> outputs, OpImplementation impl) {
return LoweredOutput(outputs, impl);
});

TVM_REGISTER_GLOBAL("relay.backend._make_CCacheKey")
.set_body_typed([](Function source_func, Target target) {
return CCacheKey(source_func, target);
});

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineGlobal").set_body_typed([]() {
return CompileEngine::Global();
});

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineClear").set_body_typed([](CompileEngine self) {
self->Clear();
});

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineLower")
.set_body_typed([](CompileEngine self, CCacheKey key, const String mod_name) {
return self->Lower(key, mod_name);
});

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineLowerShapeFunc")
.set_body_typed([](CompileEngine self, CCacheKey key) { return self->LowerShapeFunc(key); });

TVM_REGISTER_GLOBAL("relay.backend._CompileLowerExternalFunctions")
.set_body_typed([](CompileEngine self) { return self->LowerExternalFunctions(); });

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineJIT")
.set_body_typed([](CompileEngine self, CCacheKey key) { return self->JIT(key); });

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineListItems").set_body_typed([](CompileEngine self) {
CompileEngineImpl* ptr = dynamic_cast<CompileEngineImpl*>(self.operator->());
ICHECK(ptr != nullptr);
return ptr->ListItems();
});

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineListShapeFuncItems")
.set_body_typed([](CompileEngine self) {
CompileEngineImpl* ptr = dynamic_cast<CompileEngineImpl*>(self.operator->());
ICHECK(ptr != nullptr);
return ptr->ListShapeFuncItems();
});

TVM_REGISTER_GLOBAL("relay.backend._CompileEngineGetCurrentCCacheKey")
.set_body_typed([](CompileEngine self) {
CompileEngineImpl* ptr = dynamic_cast<CompileEngineImpl*>(self.operator->());
ICHECK(ptr != nullptr);
return ptr->GetCurrentCCacheKey();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document the corresponding APIs in this PR? For example, I didn't find the alternative of relay.backend._CompileEngineJIT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one JIT method in src/relay/backend/te_compiler.h. Is it what we want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the one, but it doesn't have a TVM global symbol anymore. Does that mean we are only allowed to access these functions via C++ by including te_compiler.h in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The push for the new lowering is to refactor away single-op lowering to lowering the entire program in one shot. Unless I'm missing something fundamental all uses of the compile engine have been lowered to C++ or removed. The only occurrences left are in testing code.

@junrushao
Copy link
Member

I know there is some dependency on the relay compile engine in some other projects. Shall we slow down a little bit and wait for some consensus? Thanks a lot!

@icemelon
Copy link
Member

Is this still WIP? otherwise, how will this PR pass the CI?

@jroesch
Copy link
Member Author

jroesch commented Aug 18, 2021

No this builds perfectly fine locally, after the last interpreter patch went in there is no longer any need for the compile_engine.h header. Most of the uses have been refactored away, there is a little bit of clean up in a few tests which I will follow up with.

@junrushao
Copy link
Member

I know there is strong dependency to compile engine in other projects, and would love to propose several actions we could take to make sure the process working properly for both parties.

  1. @comaniac @icemelon would you like to provide a list of APIs that is currently in use, and we discuss how to replace with them the new TE compiler.

  2. We can discuss the possibility of moving all the workflow away from compile engine on mainline, making it obsolete, but keeping the code as it is for a while. This chunk of code will be unused and untested but kept for several release cycles to make sure downstream projects are not affected. This is what we did for NNVM.

@jroesch
Copy link
Member Author

jroesch commented Aug 18, 2021

@junrushao1994 We can keep an adaptor interface for a while longer, the only challenge is at some point we are going to have to refactor the internals to handle other requirements from ARM, and other folks and so the implementation might change even if we preserve the interface. Happy to discuss more tomorrow.

@junrushao
Copy link
Member

@jroesch @comaniac @icemelon you guys can discuss together without me :-) I am very supportive of the new lowering process, which makes the logic much clearer, but aware of the challenges in downstream projects as well

@mbs-octoml
Copy link
Contributor

LGTM. This is dead code in-tree, appears we are uncertain about out-of-tree uses for which AFAIKT s/CompileEngine/TECompiler/g would work. Could we hot-swap in this PR?

@comaniac
Copy link
Contributor

I've checked our use cases and here is my summary:

  • The use of APIs such as _CompileEngineJIT can be replaced directly by te_compiler.JIT, so it should be safe to remove them.
  • You might need to work on compile_engine.py as well?
  • I have a use case that has to be done in Python for reasons. This case use compile_engine.get().lower(func, target) in compile_engine.py to get a CachedFunc and then get its output TE tensors. Note that I cannot use lower_call in compiler_engine.py because I don't have the input tensors. It would be great if you're going to change this file to te_compiler.py with most of these API (especially te_compiler.lower) preserved.

@junrushao
Copy link
Member

Thanks @comaniac for the list! It is definitely helpful for narrowing down the scope for discussion. Looks like we are pretty close to a solution to hot-swap the out-of-tree implementation :-)

Looks like the first and the second point are relatively okay. On the second point, we might just want to remove compile_engine.py as well.

@jroesch would you like to respond to the last point? Looks like something doable

@comaniac
Copy link
Contributor

I guess we cannot remove compile_engine.py but need to rename and refactor it, because some functions there, such as lower_call and select_implementation are still used by TE compiler.

@electriclilies
Copy link
Contributor

LGTM, it's great to see that we've finally reached this milestone in the refactoring work!

@comaniac Since we are trying to unify lowering around one API, I think that that lowering function will need to be removed at some point (though @jroesch can comment better on when it needs to happen by). You'll still be able to lower through the function LowerTE.

Also, can you elaborate a bit more on your use case where you don't have the input tensors? Are you allocating inside the function you are lowering? Thanks in advance

@junrushao
Copy link
Member

junrushao commented Aug 19, 2021

As the moderator in this thread, I do expect sufficient discussion as an architectural change. Let’s keep the discussion purely technical as we always enjoy in the community :-)

We have already done great work moving to TE compiler. It seems that there are some final sticky points we wanted to address.

Yep Cody is right and I was wrong. I just checked and saw several python functions registered in compile_engine.py still in use by TE compiler. We might want to move them to te_compiler.py.

Lily, Cody is referring to some methods implemented in python instead of c++. Although it doesn’t actually cause divergent code path right now, we are not in favor of mixing c++ and python if there is no specific reason. So as you said, we might want to migrate them to c++ some day in the future.

The input tensors are not exposed by methods in te compiler, which I assume is a fixable thing? Would you guys like to discuss more specifically on this? Do we want to slightly tweak the interface or add a new API to make it happen? Would love to hear concrete proposals from you guys! Thanks a lot!

@comaniac
Copy link
Contributor

In TE compiler, the input tensors are internally created by ScheduleBuilder. Thus, it could be a common case for developers who want to lower a Relay function to TE but don't have tensor inputs (i.e., te.placeholder instead of relay.var). IMHO, one exciting feature of compile engine refactoring is to decouple TE lowering and scheduling, so that developers can get the lowered TE before scheduling, so it would be great if we could at least expose the interface like the following to developers:

te_func = lower(relay_func, target)
tensors = te_func.outputs
for tensor in tensors:
    print(tensor) # Print the TE compute AST before scheduling.

@junrushao
Copy link
Member

@comaniac This proposal sounds good to me, in terms of minimal API change and being consistent with the TE compiler’s design philosophy

@jroesch Would you like to take a look at the proposal and give some feedback? Would be nice if we act quickly and unblock both sides.

@jroesch
Copy link
Member Author

jroesch commented Aug 19, 2021

@comaniac Hey Cody,

Sorry for slow response, been having a busy week. Let me clarify all my assumptions and models and see if we can clarify from there:

  1. I think exposing compile_engine.py was a mistake that was made at some point in the past, we intentionally made compile_engine.h a private header to not make it part of the public interface. For whatever reason it has been implicitly public due to being in Python so we now need to deal with that.
  2. I think we should only have 1 endorsed way to control scheduling, and a unified single lowering mechanism. A lot of the pain for accelerator flows and BYOC, etc are due to the fact there are many ways people use the "compile engine style API".
  3. My intention is to eventually make it impossible to lower functions one by one (i.e JIT/Lower/LowerInternal) as it doesn't really match any of the current compilation flows well and causes issues.
  4. We can split out the internals of TECompiler to provide a functional API for producing tensors from a Relay function which is then used by the compiler and potentially you in your use case.
  5. Ideally end state is that the TECompiler itself would be part of the private interface and not exposed in Python.

I am happy to support use cases along the way to this world if I better understand critical need. I do agree that we need to move the few registered helped functions, but other then that compile_engine is dead code. All the other uses in the code base are just calls to turn off the cache. Currently graph_executor, vm, aot, and interpreter have all moved off that API.

My main question is what use case really needs the existing API or would more fine grained helper APIs allow you to do the customization you want?

Thanks for the feedback!

@comaniac
Copy link
Contributor

Thanks for the response :)

I completely understand the concern of unifying the lowering mechanism and feel this is the right track as well. Meanwhile, I have two cases:

  1. We do lower Relay function one-by-one with JIT. It seems to me that this might be a reasonable exception, because we could naturally have a JIT compiler that compiles each function when visiting them at the first time. Meanwhile, the current Relay interpreter wraps each function to a single module and performs lowering, which seems not necessary but a workaround. cc @icemelon @yzhliu

  2. The case of using Python could be a relatively miner. Specifically, if we lower a Relay function using TE compiler directly in C++, then we have no way to configure AutoTVM and AutoScheduler (e.g., tracing and logging mode), since both tuning frameworks have some logic purely in Python.

Finally, while I agree that TE compiler could be as private as possible, it doesn't hurt to keep the Python interface for debugging and experiments. We just need to make sure they are only used for developments and no one really uses them in the submitted PRs.

@mbs-octoml
Copy link
Contributor

Meanwhile, the current Relay interpreter wraps each function to a single module and performs lowering.

Not any more! I just rejigged the interpreter to work module-at-a-time. However the TIR-to-PackedFunc 'build' step is still one-function-at-a-time and done on demand but I think at some point we can similarly fold that into a module-at-a-time transform.

@junrushao
Copy link
Member

Okay, I didn’t see any actual conflict here. Shall we just add a few helpers and proceed?

@mikepapadim
Copy link
Contributor

mikepapadim commented Oct 18, 2021

[UPDATE] The removal of the compile engine is up to date now in ---> #9282.
Please take a look to see if we have a consensus on what functionality has been left exposed in this intermediate refactoring step.

@jroesch
Copy link
Member Author

jroesch commented Oct 19, 2021

Closing for @mikepapadim's change.

@jroesch jroesch closed this Oct 19, 2021
masahi pushed a commit that referenced this pull request Oct 22, 2021
…p of #8775 (#9282)

* Remove compile_engine.h for real

* Fix format

* RM compile_engine.cc

* Swap compile engine with TECompiler

* Cleanup on compile engine py leftovers

* [WIP] Exposing legacy compile engine capabilities through TE Compiler

* Swap usages for depreciated compile engine with TE compiler

* Track and replace usages of compile engine refactor them to TE compiler

* [Docs] Log helper mod

* Remove depreciated function for lookup compile engine cachce

* Fix typos

* Debug misc cleanups

* Register global pass for using te compiler for auto scheduler

* Fix tests using the legacy compile engine

* Fix broken autotuner tests and minor cleanups

* Swap compile engine with te_compiler in rst config

* PR nits

* Fix failed test

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…p of apache#8775 (apache#9282)

* Remove compile_engine.h for real

* Fix format

* RM compile_engine.cc

* Swap compile engine with TECompiler

* Cleanup on compile engine py leftovers

* [WIP] Exposing legacy compile engine capabilities through TE Compiler

* Swap usages for depreciated compile engine with TE compiler

* Track and replace usages of compile engine refactor them to TE compiler

* [Docs] Log helper mod

* Remove depreciated function for lookup compile engine cachce

* Fix typos

* Debug misc cleanups

* Register global pass for using te compiler for auto scheduler

* Fix tests using the legacy compile engine

* Fix broken autotuner tests and minor cleanups

* Swap compile engine with te_compiler in rst config

* PR nits

* Fix failed test

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…p of apache#8775 (apache#9282)

* Remove compile_engine.h for real

* Fix format

* RM compile_engine.cc

* Swap compile engine with TECompiler

* Cleanup on compile engine py leftovers

* [WIP] Exposing legacy compile engine capabilities through TE Compiler

* Swap usages for depreciated compile engine with TE compiler

* Track and replace usages of compile engine refactor them to TE compiler

* [Docs] Log helper mod

* Remove depreciated function for lookup compile engine cachce

* Fix typos

* Debug misc cleanups

* Register global pass for using te compiler for auto scheduler

* Fix tests using the legacy compile engine

* Fix broken autotuner tests and minor cleanups

* Swap compile engine with te_compiler in rst config

* PR nits

* Fix failed test

Co-authored-by: Jared Roesch <roeschinc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants