Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Target Hooks RFC #10

Merged
merged 5 commits into from
Sep 27, 2021
Merged

Conversation

Mousius
Copy link
Member

@Mousius Mousius commented Jul 14, 2021

This is the an initial RFC for adding additional hooks onto the Target to
allow splitting up some of the compile flow but also unifying the
registration of these additional functions.

This is the an initial RFC for adding additional hooks onto the `Target` to
allow splitting up some of the compile flow but also unifying the
registration of these additional functions.
@Mousius
Copy link
Member Author

Mousius commented Jul 16, 2021

CC: @tqchen @jroesch

I think this overlaps with the TE Compiler work, would be good to get your feedback 😸

@jroesch
Copy link
Member

jroesch commented Aug 2, 2021

Thanks for the work on this, after thinking about it for a while I believe the proposed hooks runs counter to goals of the TECompiler and "unified lowering" refactor that we've been working on in pieces. Our design goal is to not allow arbitrary customization of "lowering" but instead seal it behind a stable interface. The lowering process should be a straight forward mapping from TE -> TIR, and then any necessary customization should be possible in resulting passes which are allowed to view the entire program including both the Relay code and TIR code.

The goal is to remove the need for many slightly different code paths through the central parts of the compiler and instead provide standard interfaces that allow targets, and users to customize the compiler without having to touch any of the flows or modify the data structures. The proliferation of customizations in the old compile engine for example makes it nearly impossible to refactor the compiler in its current state, and has lead to the world where we need lots of per-flow customization because the compiler is insufficiently factored into a common prefix of code.

I plan on providing a longer form document on this hopefully this week that argues this more in depth but I believe the tenants should be customization of the compiler happens via passes, passes only can produce IRModules, and all data is stored in the IRModule. Today's compiler requires a little more work to get to this world, but I think we should push in this direction.

Now that doesn't mean the desire to have per target behavior is necessarily bad, but I think instead of overriding the lowering code with arbitrary functions we could provide hooks that allow the insertion of target specific passes at the "correct" time in the compiler flow. In this world the proposed hooks should return a Pass object which can be inserted after the appropriate phases.

Happy to discuss more, and have others such as @electriclilies @csullivan @mbs-octoml

@Mousius
Copy link
Member Author

Mousius commented Aug 3, 2021

Hi @jroesch,

It'd be great to discuss this further, as there's some interesting points you've raised.

The lowering process should be a straight forward mapping from TE -> TIR, and then any necessary customization should be possible in resulting passes which are allowed to view the entire program including both the Relay code and TIR code.

In the case of library adaptations such as CMSIS-NN, translating through Relay->TE->TIR isn't required as we can do a more straight forward translation directly from Relay->TIR without any of the other pass overheads. In effect, our hook here is a single Pass to run over any marked functions rather than aggregating other Passes so conceptually they're similar.

The goal is to remove the need for many slightly different code paths through the central parts of the compiler and instead provide standard interfaces that allow targets, and users to customize the compiler without having to touch any of the flows or modify the data structures.

I believe we're on the same page here, the eventual goal being to register and use the hooks directly as target properties that can be accessed as part of a standard flow. The hooks proposed here are slightly larger than you envisaged but should allow a standard and straight forward interface for users to customise the flow of the compiler. The initial implementation isn't as clean as I'd like due to the issues you've raised already with the current compile engine.

I plan on providing a longer form document on this hopefully this week that argues this more in depth but I believe the tenants should be customization of the compiler happens via passes, passes only can produce IRModules, and all data is stored in the IRModule. Today's compiler requires a little more work to get to this world, but I think we should push in this direction.

The hooks proposed could in fact be such Passes, going IRModule -> IRModule rather than taking a Relay Function. As stated above, it would be more optimal to be given a IRModule without TIR, just with the Relay attached.

Now that doesn't mean the desire to have per target behavior is necessarily bad, but I think instead of overriding the lowering code with arbitrary functions we could provide hooks that allow the insertion of target specific passes at the "correct" time in the compiler flow. In this world the proposed hooks should return a Pass object which can be inserted after the appropriate phases.

I'm interested to understand how this would appear to users in a way that allows them to compose the compiler passes efficiently, being able to write a pipeline of passes seems more intuitive than providing many hooks to insert at different times? Potentially the hooks should return an array of passes or one aggregate pass to ensure the contract is upheld?

@manupak
Copy link
Contributor

manupak commented Aug 4, 2021

Hi @jroesch @Mousius ,

I think having Pass'es (instead of functions) could also work.

I guess what we are after is the ability to assemble and re-use TVM's compilation passes on a target basis.

apache/tvm#8110 (comment)
Continuing the above comment, this also removes various places otherwise devs need to insert custom passes. Therefore, centralizing them closer to the target seems much cleaner to me. Therefore, it becomes straightforward to follow what passes are being run in the lowering for a given target while also not running passes that are not relevant (that are currently run anyway due to them having identity effect on IRModule in the absense of special attr/nodes/vars in the IRModule).

I believe the tenants should be customization of the compiler happens via passes, passes only can produce IRModules, and all data is stored in the IRModule. Today's compiler requires a little more work to get to this world, but I think we should push in this direction.

I think we broadly agrees on this. @Mousius ?

The proliferation of customizations in the old compile engine for example makes it nearly impossible to refactor the compiler in its current state, and has lead to the world where we need lots of per-flow customization because the compiler is insufficiently factored into a common prefix of code.

While we agree that its better to share the many passes that are required, but does it need to mandate a monolith of TIR passes being applied irrespective of the target ? We'd like to hear you thoughts (I think this has a connection to the so-called phases in lower implementation, right ?). While we could codify the requirement, is it absolutely necessary ? Also will it (having target based pass pipeline) preclude us from having the code factored into a common prefix ?

but I think instead of overriding the lowering code with arbitrary functions we could provide hooks that allow the insertion of target specific passes at the "correct" time in the compiler flow.

The arbitrary function mentioned here could be a pass pipeline (and we could make it a requirement by making it a Pass as you suggested). However, it is interesting know if we can define a universally 'correct' time for the flow considering targets beyond CPUs and GPUs (w/o needing to do inverse lowering to recreate high-level abstraction that was forced lowered due to mandatory passes). We would love to hear if you have an alternative suggestion to support target-based lowering

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Mousius ! I did a thorough review in light of our discussion early last week. summarizing that discussion here for posterity:

  • Ultimately we have a desire to maintain a single lowering pipeline for AOT, Graph, VM executors
  • In the unified lowering pipeline, we hope to bring everything into a single IRModule, including the AOT main function. This allows most TIR-based analysis and transformation (e.g. memory planning, TIR-based operator fusion, etc) to be represented as compiler Passes
  • In order for these passes to have something solid to work against, we generally want to consolidate the number of ways people generate TIR.
  • However we realize the typical scheduling flow doesn't work for some Targets and agree with the shortcomings discussed here in requiring these Targets to reinvent the entire compliation pipeline. That seems worse than allowing some variance in generated TIR.
  • Therefore, while ultimately we think all TIR transforms should be Passes, this particular case is essentially an override to the LowerTE pass introduced in the TE-Compiler refactor PR. Therefore, we support it being a hook with a well-defined signature.

rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
Change-Id: I578145c37f9a10b4c15ed64fa86d6d7c2fade04e
@xqdan
Copy link

xqdan commented Aug 24, 2021

This is a great disscussion here. Actually, we are supporting a DSA with TVM, let me share my practice.

1, We only re-use some of tvm relay or tir passes, less than 10 passes, such storage flatten, we don't need most of tvm passes, keep them in our flow means wasting compilation time.
2, We develop passes, and enhance tvm passes for our target, such as stroage rewrite.
3, We develop a hybrid ir on tir, in which we can do unified memory allocation and schedule insnstruction across operators.

So we are excited to see tvm can do these in main line.

What we want to have:
1, customzation compilation flow for both relay and tir flow.
2, unified ir to view both graph and tir, supporting inter/intra primfunc pass developing.
3, also need to consider which place we should put our target code if we want to upstream in the future.

Copy link
Member Author

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just coming back to this thread, I believe there’s a way to introduce the hooks in a less intrusive way to the LowerTEPass, by placing it just above it in build_module.cc. This should mean each Target can register a relay_to_tir Pass which is ran there rather than having to wire it via LowerTEPass and any PrimFunc will be skipped as they’re not relay functions anymore.

Any thoughts on this @jroesch / @csullivan / @mbs-octoml? I've updated this RFC from @areusch's comments and added the above proposal.

rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved
@mbs-octoml
Copy link
Contributor

One core distinction here is that the tir_to_runtime hook isn't a pass, it's an override of code generation.

Ah, excellent, that's where we've been trying to get to but when we started this conversation the refactoring to support it seemed to much to foist upon you. But we've now (thanks Lily!) extended IRModule to be able to support a mix of Functions and target-specific PrimFuncs, so it is now conceptually possible to run multiple lowing passes in series and have each only focus on lowering the Functions they care about based on the compiler tagging. With luck it's just a matter of working through the usual surprises and glitches.

@mbs-octoml
Copy link
Contributor

And I think the just-another-pass approach implies some of the private machinery in te_compliler needs to be hoisted to be reusable for all lowering-like passes. Eg LowerTensorExprMutator. So instead of
monolithic lowering + target-specific callbacks
we have
target-specific lowering passes + built-in lowering pass
which share impl via, hopefully, conventional subclassing.

@mbs-octoml
Copy link
Contributor

This LGTM but I'm thinking going back to relay_to_tir being a Function->PrimFunc packed func as you implemented in the companion PR is better since it makes it easier for all the caching, call rewriting, dynamic shape handling and other bookkeeping to be shared. Have you tried out the pass approach and convinced youself that reuse is still possible?

@Mousius
Copy link
Member Author

Mousius commented Sep 6, 2021

Hi @mbs-octoml, sorry I missed a few replies!

The reason I hoisted it outside of the LowerTE pass is that it effectively bypasses it anyway, LowerExternalFunctions in te_compiler.cc is a massive bypass which starts there and goes around most of TVM before it ends up as a runtime::Module right at the very end. Effectively what's happening is the "default" relay_to_tir is LowerTE and then Targets register their own variants. I think where @jroesch was going with the (IRModule, Function) -> (IRModule, GlobalVar) is that you will want to produce many PrimFuncs to implement one Function and thus you need the entire module, by just introducing this as a Pass we won't have to introduce a new signature specifically for this hook.

@mbs-octoml
Copy link
Contributor

(IRModule, Function) -> (IRModule, GlobalVar)
I'm still is favor of this signature since it's a cheep and cheerful way to ensure we don't end up with N ways to implement the lower-and-rewrite-calls machinery embedded in te_complier. I think my earlier question still stands:
Have you tried out the pass approach and convinced youself that reuse is still possible?

Based on the discussions around these hooks, we now have a better idea
of how to introduce them into the codebase.
@Mousius
Copy link
Member Author

Mousius commented Sep 22, 2021

@mbs-octoml / @areusch apologies for the delay, there was a fair amount to tweak in here to reflect all the discussions around this RFC. Please could you take another look as I think this is lined up now 😸

Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nits for clarity.

We're working on cleaning up the py/cpp build split so your TIRToRuntime hook should indeed be on the unique choke point.

@Mousius
Copy link
Member Author

Mousius commented Sep 23, 2021

Thanks @mbs-octoml, I've incorporated this back into the document 😸

@mbs-octoml
Copy link
Contributor

LGTM again (CI will go faster that way, right?)

@Mousius
Copy link
Member Author

Mousius commented Sep 27, 2021

@areusch is there anything outstanding from you on this RFC? It seems ready to merge after the changes you've requested.

@areusch areusch merged commit a6f9a25 into apache:main Sep 27, 2021
@areusch
Copy link
Contributor

areusch commented Sep 27, 2021

thanks for working through this with us @Mousius ! excited to see this land in TVM!

@Mousius Mousius deleted the additional-target-hooks branch September 28, 2021 08:51
MichaelJKlaiber added a commit to MichaelJKlaiber/tvm-rfcs that referenced this pull request May 24, 2022
Update target registration and add pass phases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants