Remove compile_engine.h for real this time. #8775

jroesch · 2021-08-17T23:54:44Z

This removes the compile_engine.h header enabling us to start refactoring the te_compiler.h interface to better match the rest of the compiler. After we land the final refactors from @mikepapadim and @mbs-octoml we can start cleaning up the interface to better match the desired end state.

cc @electriclilies @csullivan @tmoreau89

junrushao · 2021-08-18T00:00:14Z

CC @icemelon @comaniac @yzhliu

comaniac · 2021-08-18T00:34:04Z

src/relay/backend/compile_engine.cc

-TVM_REGISTER_PASS_CONFIG_OPTION("relay.backend.use_auto_scheduler", Bool);
-TVM_REGISTER_PASS_CONFIG_OPTION("relay.backend.disable_compile_engine_cache", Bool);
-
-TVM_REGISTER_GLOBAL("relay.backend._make_LoweredOutput")
-    .set_body_typed([](tvm::Array<te::Tensor> outputs, OpImplementation impl) {
-      return LoweredOutput(outputs, impl);
-    });
-
-TVM_REGISTER_GLOBAL("relay.backend._make_CCacheKey")
-    .set_body_typed([](Function source_func, Target target) {
-      return CCacheKey(source_func, target);
-    });
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineGlobal").set_body_typed([]() {
-  return CompileEngine::Global();
-});
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineClear").set_body_typed([](CompileEngine self) {
-  self->Clear();
-});
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineLower")
-    .set_body_typed([](CompileEngine self, CCacheKey key, const String mod_name) {
-      return self->Lower(key, mod_name);
-    });
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineLowerShapeFunc")
-    .set_body_typed([](CompileEngine self, CCacheKey key) { return self->LowerShapeFunc(key); });
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileLowerExternalFunctions")
-    .set_body_typed([](CompileEngine self) { return self->LowerExternalFunctions(); });
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineJIT")
-    .set_body_typed([](CompileEngine self, CCacheKey key) { return self->JIT(key); });
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineListItems").set_body_typed([](CompileEngine self) {
-  CompileEngineImpl* ptr = dynamic_cast<CompileEngineImpl*>(self.operator->());
-  ICHECK(ptr != nullptr);
-  return ptr->ListItems();
-});
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineListShapeFuncItems")
-    .set_body_typed([](CompileEngine self) {
-      CompileEngineImpl* ptr = dynamic_cast<CompileEngineImpl*>(self.operator->());
-      ICHECK(ptr != nullptr);
-      return ptr->ListShapeFuncItems();
-    });
-
-TVM_REGISTER_GLOBAL("relay.backend._CompileEngineGetCurrentCCacheKey")
-    .set_body_typed([](CompileEngine self) {
-      CompileEngineImpl* ptr = dynamic_cast<CompileEngineImpl*>(self.operator->());
-      ICHECK(ptr != nullptr);
-      return ptr->GetCurrentCCacheKey();
-    });


Can we document the corresponding APIs in this PR? For example, I didn't find the alternative of relay.backend._CompileEngineJIT?

There is one JIT method in src/relay/backend/te_compiler.h. Is it what we want?

That's the one, but it doesn't have a TVM global symbol anymore. Does that mean we are only allowed to access these functions via C++ by including te_compiler.h in the future?

The push for the new lowering is to refactor away single-op lowering to lowering the entire program in one shot. Unless I'm missing something fundamental all uses of the compile engine have been lowered to C++ or removed. The only occurrences left are in testing code.

junrushao · 2021-08-18T00:51:39Z

I know there is some dependency on the relay compile engine in some other projects. Shall we slow down a little bit and wait for some consensus? Thanks a lot!

icemelon · 2021-08-18T00:56:36Z

Is this still WIP? otherwise, how will this PR pass the CI?

jroesch · 2021-08-18T05:53:33Z

No this builds perfectly fine locally, after the last interpreter patch went in there is no longer any need for the compile_engine.h header. Most of the uses have been refactored away, there is a little bit of clean up in a few tests which I will follow up with.

junrushao · 2021-08-18T06:07:33Z

I know there is strong dependency to compile engine in other projects, and would love to propose several actions we could take to make sure the process working properly for both parties.

@comaniac @icemelon would you like to provide a list of APIs that is currently in use, and we discuss how to replace with them the new TE compiler.
We can discuss the possibility of moving all the workflow away from compile engine on mainline, making it obsolete, but keeping the code as it is for a while. This chunk of code will be unused and untested but kept for several release cycles to make sure downstream projects are not affected. This is what we did for NNVM.

jroesch · 2021-08-18T06:09:09Z

@junrushao1994 We can keep an adaptor interface for a while longer, the only challenge is at some point we are going to have to refactor the internals to handle other requirements from ARM, and other folks and so the implementation might change even if we preserve the interface. Happy to discuss more tomorrow.

junrushao · 2021-08-18T06:11:27Z

@jroesch @comaniac @icemelon you guys can discuss together without me :-) I am very supportive of the new lowering process, which makes the logic much clearer, but aware of the challenges in downstream projects as well

mbs-octoml · 2021-08-18T16:06:18Z

LGTM. This is dead code in-tree, appears we are uncertain about out-of-tree uses for which AFAIKT s/CompileEngine/TECompiler/g would work. Could we hot-swap in this PR?

comaniac · 2021-08-18T18:11:10Z

I've checked our use cases and here is my summary:

The use of APIs such as _CompileEngineJIT can be replaced directly by te_compiler.JIT, so it should be safe to remove them.
You might need to work on compile_engine.py as well?
I have a use case that has to be done in Python for reasons. This case use compile_engine.get().lower(func, target) in compile_engine.py to get a CachedFunc and then get its output TE tensors. Note that I cannot use lower_call in compiler_engine.py because I don't have the input tensors. It would be great if you're going to change this file to te_compiler.py with most of these API (especially te_compiler.lower) preserved.

junrushao · 2021-08-18T19:19:17Z

Thanks @comaniac for the list! It is definitely helpful for narrowing down the scope for discussion. Looks like we are pretty close to a solution to hot-swap the out-of-tree implementation :-)

Looks like the first and the second point are relatively okay. On the second point, we might just want to remove compile_engine.py as well.

@jroesch would you like to respond to the last point? Looks like something doable

comaniac · 2021-08-18T20:35:11Z

I guess we cannot remove compile_engine.py but need to rename and refactor it, because some functions there, such as lower_call and select_implementation are still used by TE compiler.

electriclilies · 2021-08-19T02:07:59Z

LGTM, it's great to see that we've finally reached this milestone in the refactoring work!

@comaniac Since we are trying to unify lowering around one API, I think that that lowering function will need to be removed at some point (though @jroesch can comment better on when it needs to happen by). You'll still be able to lower through the function LowerTE.

Also, can you elaborate a bit more on your use case where you don't have the input tensors? Are you allocating inside the function you are lowering? Thanks in advance

junrushao · 2021-08-19T09:57:18Z

As the moderator in this thread, I do expect sufficient discussion as an architectural change. Let’s keep the discussion purely technical as we always enjoy in the community :-)

We have already done great work moving to TE compiler. It seems that there are some final sticky points we wanted to address.

Yep Cody is right and I was wrong. I just checked and saw several python functions registered in compile_engine.py still in use by TE compiler. We might want to move them to te_compiler.py.

Lily, Cody is referring to some methods implemented in python instead of c++. Although it doesn’t actually cause divergent code path right now, we are not in favor of mixing c++ and python if there is no specific reason. So as you said, we might want to migrate them to c++ some day in the future.

The input tensors are not exposed by methods in te compiler, which I assume is a fixable thing? Would you guys like to discuss more specifically on this? Do we want to slightly tweak the interface or add a new API to make it happen? Would love to hear concrete proposals from you guys! Thanks a lot!

comaniac · 2021-08-19T17:19:14Z

In TE compiler, the input tensors are internally created by ScheduleBuilder. Thus, it could be a common case for developers who want to lower a Relay function to TE but don't have tensor inputs (i.e., te.placeholder instead of relay.var). IMHO, one exciting feature of compile engine refactoring is to decouple TE lowering and scheduling, so that developers can get the lowered TE before scheduling, so it would be great if we could at least expose the interface like the following to developers:

te_func = lower(relay_func, target)
tensors = te_func.outputs
for tensor in tensors:
    print(tensor) # Print the TE compute AST before scheduling.

junrushao · 2021-08-19T17:49:02Z

@comaniac This proposal sounds good to me, in terms of minimal API change and being consistent with the TE compiler’s design philosophy

@jroesch Would you like to take a look at the proposal and give some feedback? Would be nice if we act quickly and unblock both sides.

jroesch · 2021-08-19T23:32:13Z

@comaniac Hey Cody,

Sorry for slow response, been having a busy week. Let me clarify all my assumptions and models and see if we can clarify from there:

I think exposing compile_engine.py was a mistake that was made at some point in the past, we intentionally made compile_engine.h a private header to not make it part of the public interface. For whatever reason it has been implicitly public due to being in Python so we now need to deal with that.
I think we should only have 1 endorsed way to control scheduling, and a unified single lowering mechanism. A lot of the pain for accelerator flows and BYOC, etc are due to the fact there are many ways people use the "compile engine style API".
My intention is to eventually make it impossible to lower functions one by one (i.e JIT/Lower/LowerInternal) as it doesn't really match any of the current compilation flows well and causes issues.
We can split out the internals of TECompiler to provide a functional API for producing tensors from a Relay function which is then used by the compiler and potentially you in your use case.
Ideally end state is that the TECompiler itself would be part of the private interface and not exposed in Python.

I am happy to support use cases along the way to this world if I better understand critical need. I do agree that we need to move the few registered helped functions, but other then that compile_engine is dead code. All the other uses in the code base are just calls to turn off the cache. Currently graph_executor, vm, aot, and interpreter have all moved off that API.

My main question is what use case really needs the existing API or would more fine grained helper APIs allow you to do the customization you want?

Thanks for the feedback!

comaniac · 2021-08-20T00:52:40Z

Thanks for the response :)

I completely understand the concern of unifying the lowering mechanism and feel this is the right track as well. Meanwhile, I have two cases:

We do lower Relay function one-by-one with JIT. It seems to me that this might be a reasonable exception, because we could naturally have a JIT compiler that compiles each function when visiting them at the first time. Meanwhile, the current Relay interpreter wraps each function to a single module and performs lowering, which seems not necessary but a workaround. cc @icemelon @yzhliu
The case of using Python could be a relatively miner. Specifically, if we lower a Relay function using TE compiler directly in C++, then we have no way to configure AutoTVM and AutoScheduler (e.g., tracing and logging mode), since both tuning frameworks have some logic purely in Python.

Finally, while I agree that TE compiler could be as private as possible, it doesn't hurt to keep the Python interface for debugging and experiments. We just need to make sure they are only used for developments and no one really uses them in the submitted PRs.

mbs-octoml · 2021-08-20T16:41:29Z

Meanwhile, the current Relay interpreter wraps each function to a single module and performs lowering.

Not any more! I just rejigged the interpreter to work module-at-a-time. However the TIR-to-PackedFunc 'build' step is still one-function-at-a-time and done on demand but I think at some point we can similarly fold that into a module-at-a-time transform.

junrushao · 2021-08-21T00:37:21Z

Okay, I didn’t see any actual conflict here. Shall we just add a few helpers and proceed?

mikepapadim · 2021-10-18T08:35:51Z

[UPDATE] The removal of the compile engine is up to date now in ---> #9282.
Please take a look to see if we have a consensus on what functionality has been left exposed in this intermediate refactoring step.

jroesch · 2021-10-19T19:36:50Z

Closing for @mikepapadim's change.

…p of #8775 (#9282) * Remove compile_engine.h for real * Fix format * RM compile_engine.cc * Swap compile engine with TECompiler * Cleanup on compile engine py leftovers * [WIP] Exposing legacy compile engine capabilities through TE Compiler * Swap usages for depreciated compile engine with TE compiler * Track and replace usages of compile engine refactor them to TE compiler * [Docs] Log helper mod * Remove depreciated function for lookup compile engine cachce * Fix typos * Debug misc cleanups * Register global pass for using te compiler for auto scheduler * Fix tests using the legacy compile engine * Fix broken autotuner tests and minor cleanups * Swap compile engine with te_compiler in rst config * PR nits * Fix failed test Co-authored-by: Jared Roesch <roeschinc@gmail.com>

…p of apache#8775 (apache#9282) * Remove compile_engine.h for real * Fix format * RM compile_engine.cc * Swap compile engine with TECompiler * Cleanup on compile engine py leftovers * [WIP] Exposing legacy compile engine capabilities through TE Compiler * Swap usages for depreciated compile engine with TE compiler * Track and replace usages of compile engine refactor them to TE compiler * [Docs] Log helper mod * Remove depreciated function for lookup compile engine cachce * Fix typos * Debug misc cleanups * Register global pass for using te compiler for auto scheduler * Fix tests using the legacy compile engine * Fix broken autotuner tests and minor cleanups * Swap compile engine with te_compiler in rst config * PR nits * Fix failed test Co-authored-by: Jared Roesch <roeschinc@gmail.com>

Remove compile_engine.h for real

7d9d9c9

jroesch requested review from anijain2305, junrushao, jwfromm, MarisaKirisame, mbrookhart, slyubomirsky, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners August 17, 2021 23:54

comaniac requested changes Aug 18, 2021

View reviewed changes

Fix format

79d42e5

mbs-octoml mentioned this pull request Sep 14, 2021

[3/6] Arm(R) Ethos(TM)-U NPU TIR compiler with conv2d support #8806

Merged

mikepapadim mentioned this pull request Oct 14, 2021

[CORE][Relay] Swap and remove compile_engine with te_compiler followup of #8775 #9282

Merged

jroesch closed this Oct 19, 2021

Mousius mentioned this pull request Dec 14, 2022

[Relay][TIR] Add utility to lower Relay func to TIR prim func #13606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove compile_engine.h for real this time. #8775

Remove compile_engine.h for real this time. #8775

jroesch commented Aug 17, 2021

junrushao commented Aug 18, 2021

comaniac Aug 18, 2021

junrushao Aug 18, 2021

comaniac Aug 18, 2021

jroesch Aug 18, 2021

junrushao commented Aug 18, 2021

icemelon commented Aug 18, 2021

jroesch commented Aug 18, 2021 •

edited

junrushao commented Aug 18, 2021

jroesch commented Aug 18, 2021

junrushao commented Aug 18, 2021

mbs-octoml commented Aug 18, 2021

comaniac commented Aug 18, 2021

junrushao commented Aug 18, 2021

comaniac commented Aug 18, 2021

electriclilies commented Aug 19, 2021

junrushao commented Aug 19, 2021 •

edited

comaniac commented Aug 19, 2021

junrushao commented Aug 19, 2021

jroesch commented Aug 19, 2021 •

edited

comaniac commented Aug 20, 2021

mbs-octoml commented Aug 20, 2021

junrushao commented Aug 21, 2021

mikepapadim commented Oct 18, 2021 •

edited

jroesch commented Oct 19, 2021

Remove compile_engine.h for real this time. #8775

Remove compile_engine.h for real this time. #8775

Conversation

jroesch commented Aug 17, 2021

junrushao commented Aug 18, 2021

comaniac Aug 18, 2021

Choose a reason for hiding this comment

junrushao Aug 18, 2021

Choose a reason for hiding this comment

comaniac Aug 18, 2021

Choose a reason for hiding this comment

jroesch Aug 18, 2021

Choose a reason for hiding this comment

junrushao commented Aug 18, 2021

icemelon commented Aug 18, 2021

jroesch commented Aug 18, 2021 • edited

junrushao commented Aug 18, 2021

jroesch commented Aug 18, 2021

junrushao commented Aug 18, 2021

mbs-octoml commented Aug 18, 2021

comaniac commented Aug 18, 2021

junrushao commented Aug 18, 2021

comaniac commented Aug 18, 2021

electriclilies commented Aug 19, 2021

junrushao commented Aug 19, 2021 • edited

comaniac commented Aug 19, 2021

junrushao commented Aug 19, 2021

jroesch commented Aug 19, 2021 • edited

comaniac commented Aug 20, 2021

mbs-octoml commented Aug 20, 2021

junrushao commented Aug 21, 2021

mikepapadim commented Oct 18, 2021 • edited

jroesch commented Oct 19, 2021

jroesch commented Aug 18, 2021 •

edited

junrushao commented Aug 19, 2021 •

edited

jroesch commented Aug 19, 2021 •

edited

mikepapadim commented Oct 18, 2021 •

edited