[TensorIR][M1b] Scaffolding ScheduleState data structure #7765

junrushao · 2021-03-29T19:19:01Z

This PR is part of the stage M1b, TensorIR upstreaming plan (#7527), on the core data structure, ScheduleState.

This PR introduces two key concepts: BlockScope and ScheduleState. The ScheduleState provides a key method Replace, which allows all the schedule primitives to be developed around.

Detailed explanation of all the terminologies, concepts and algorithms is provided in the documentation.

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>

tqchen · 2021-03-29T19:27:00Z

cc @comaniac @jroesch @yzhliu @icemelon9 @jcf94

comaniac

Overall LGTM. Most comments are for comments and clarifications.
One major suggestion as listed in the comment is whether we should preserve the block name "root" and disallow duplicated block names in a scope of PrimFunc.

python/tvm/tir/stmt.py

python/tvm/tir/schedule/block_scope.py

include/tvm/tir/schedule/block_scope.h

python/tvm/tir/schedule/state.py

src/tir/schedule/state.cc

comaniac · 2021-03-29T22:44:28Z

src/tir/schedule/state.cc

+  constexpr int kVerifyAffineBinding = static_cast<int>(ScheduleDebugMask::kVerifyAffineBinding);
+  constexpr int kVerifyRegionCover = static_cast<int>(ScheduleDebugMask::kVerifyRegionCover);
+  constexpr int kVerifyStagePipeline = static_cast<int>(ScheduleDebugMask::kVerifyStagePipeline);
+  ICHECK_GE(debug_mode, -1);


We can just assign 8 to the True case, so that you don't need to deal with negative debug_mode.

I prefer not to. The reason is that we don't want to introduce a really magic number, and even worse perhaps in the future we do want to extend the verification so the magic number may change over time

Sorry I meant 15 (1111b, a full mask).

yeah a full mask (like INT_MAX) definitely works, but I would prefer -1 here to make the logic clear and invariant

... In my experience, I'm thinking that it will be better to use unsigned int as bit masks. 😄

Changing to unsigned integers makes sense to me, and we could use like (11...11)_2 for the full mask. What I was worrying about is passing unsigned integers around the TVM FFI would cause potential issues

Since Python doesn't have unsigned int, changing to unsigned int will result in inconsistency between Python and C++. Plus this debug_mode won't be increased to a large number in the future AFAIK, I'll take the current solution.

comaniac · 2021-03-29T22:50:11Z

src/tir/schedule/utils.h

+ * \param Type The type to be casted to, can be Block or For
+ * \note The `E` in the macro means `error`, which means allowing to customize error message
+ */
+#define TVM_SREF_TO_E(Result, SRef, Type) \


The E in macro looks confusing, and I still cannot get the point even after reading the note...

This macro itself is also a bit confusion, as Result is being assigned in the LHS. I would suggest simply using inline functions for the cases in this file.

Inlined functions are generally good, but I am not in favor of it in this particular case - I considered this before but didn't go with that idea, and here is the reason:

When an error occurs, we want to print the exact line/function/file that throws that error: if we use an inline function, then instead of rendering the caller, it throws in the inline function in utils.h, which is much less informative.

The caller should be responsible for writing the declaration of the variable. Comparing the following two, I would go for the first one, because it writes the type clearly, allows re-assignment of a variable, and makes it really clear what we are doing.

The only disadvantage is that we need to repeat the name "block" twice, which is not quite inconvenient IMO.

const BlockNode* block = TVM_SREF_TO_BLOCK(block, block_sref); // compared with TVM_SREF_TO_BLOCK(block, block_sref);

A good alternative I considered before is to use inlined lambda function which expands like:

const auto* block = [&]() -> const BlockNode* { const BlockNode* stmt = sref->StmtAs<BlockNode>(); ICHECK(stmt != nullptr) << "Error Message"; return stmt; }();

The disadvantage of this approach is that the error message is not customizable, and it really depends on the compiler to optimize the lambda out.

The macro with "E" suffix is short for "error", which means the error message is customizable. It is only exposed for flexibility, and rarely used in the codebase. Given it is an internal util macro, and has been well documented, I think it is fine to keep it here. Of course better names are definitely welcome :-)

The reasons you illustrated make sense to me, although I still don't like the coding style. It seems like extracting a common piese of expression to be a macro. Anyways, as you mentioned, this macro is an internal utility which is more like just a helper, so I'm not strongly against it.

For the naming, probably TVM_SREF_AS_OR_ERR would be more straightforward? "E" is not a proper and common abbrevation of "Error" so I don't think it is informative.

Yeah I think TVM_SREF_AS_OR_ERR is a better name. I will go with the name :-)

The reason that we introduced this macro is that there are too many places using such conversion-and-check with almost the same error message, because most of the subsequent methods/analysis/primitives are mostly based on sref, not stmt. Introducing such macro could help alleviate the burden of writing several lines of almost identical checks and provide a consistent template of the error message.

Aha, a better name might be TVM_SREF_STMT_AS_OR_ERR

jroesch · 2021-03-30T02:26:45Z

Would like to get a chance to read this throughly, put some time on calendar to do it tomorrow.

junrushao · 2021-03-30T19:18:30Z

Although out of the scope of this PR, I am really glad that we have the discussion about block names.

@comaniac brought up the point #7765 (comment):

This function makes me think that we should make root as a preserved block name, and we should not allow duplicated block names in every tree of a PrimFunc.

I kinda agree with Cody about his points, but would love to hear more discussion on the block name. Particularly, we have three points to discuss:

A1. Block names need to be unique. The reason is that the canonical way of retrieving a block is to use its name, i.e. schedule.get_block(name). Without a unique name, we are unable to even retrieve a block, which makes scheduling almost impossible. (of course, it is possible to retrieve a block by the buffer it produces or via a statement, but it is not the canonical way)
A2. We need reserved names for the root block. I am kinda in favor of this idea too, because we do provide syntactic sugar to auto complete the root block with the name "root". This could help us eliminate possible name conflicts.
A3. Users could specify the names of newly created blocks/loops. Yes, it is doable when implementing schedule primitives.

tqchen · 2021-03-30T20:14:00Z

Thanks @junrushao1994 @comaniac These are great points, although i think they are somewhat parallel to the data structure itself and have things to do with primitive implementations.

So we could try to make discussions in parallel with respect to this PR.

In terms of the "root" name, given that we are uniquely identifying function already via the global names, an easy way is to just use function name in the module to obtain the root, which removes on concept here.

The main Q for the block name uniqueness is about how to enforce them. For manual operations they certainly makes sense. For general automated transformations it might create an extra burden to introduce name tables or allocation mechanism. Since automated transformations rules works on a sub-region and may not be aware of the names from other parts. Due to that reason, allowing pointer uniqueness might still be a better approach. This also aligns with our existing approach to handle loop vars, which saves a lot of trouble during automatic transformations.

This being said, we should be able to introduce canonicalization pass to uniquely rename block names. We can also add a flag in the Schedule to enforce such uniqueness if it is turned on

comaniac · 2021-03-30T20:34:31Z

Thanks @junrushao1994 @comaniac These are great points, although i think they are somewhat parallel to the data structure itself and have things to do with primitive implementations.

Make sense. I'm fine with a follow-up PR to implement the result of this discussion.

So we could try to make discussions in parallel with respect to this PR.

In terms of the "root" name, given that we are uniquely identifying function already via the global names, an easy way is to just use function name in the module to obtain the root, which removes on concept here.

This is also a good point. IMHO, as long as the interface makes sense to schedule primitive developers, it should be fine.

The main Q for the block name uniqueness is about how to enforce them. For manual operations they certainly makes sense. For general automated transformations it might create an extra burden to introduce name tables or allocation mechanism. Since automated transformations rules works on a sub-region and may not be aware of the names from other parts. Due to that reason, allowing pointer uniqueness might still be a better approach. This also aligns with our existing approach to handle loop vars, which saves a lot of trouble during automatic transformations.

It makes sense to use unique pointers in the automation framework. One thing I would like to highlight is that even we leverage unique pointer to access blocks and don't have to worry about their names during optimization, it might still be worthwhile to maintain block name uniqueness. The reason is, IIUC, we will have a mechanism to print out the schedule in Python format for debugging and investigation. In the printed schedule, block name will be the only referenced.

This being said, we should be able to introduce canonicalization pass to uniquely rename block names. We can also add a flag in the Schedule to enforce such uniqueness if it is turned on

Exactly. Calling a canonicalization pass before printing out the schedule could also solve the issue I mentioned above.

junrushao · 2021-03-30T21:03:11Z

Per discussion with @tqchen.

A1. I agree that enforcing name uniqueness in scheduling is important and it is something we should do. We also recognize the name uniqueness is a bit misleading and not super useful in subsequent IR passes. Therefore, we want to divide the problems in two steps:

Scheduling: Require name uniqueness - we can keep a table in the schedule class.
Passes after scheduling: Don't require name uniqueness.

A2. Reserve names for the root block: Yes, we should do that. We have two proposals:

A2.1. Use "root" as the reserved name for the block
A2.2. Use the PrimFunc's name in the IRModule, e..g "main", as the reserved name for the block

A3. Yes, we want to enable users who call scheduling primitives to specify the names of the blocks. Particularly, we want to hear some further discussions on the user experience should look like. Here is our proposal:

A3.1. Error out when user provide a duplicate name.
A3.2. If the name string the user provided is suffixed with "", e.g. "unique_name", then our system will find a unique name whose prefix is "unique_name" and doesn't conflict with other names.

jcf94 · 2021-03-31T02:42:09Z

include/tvm/tir/schedule/block_scope.h

+  /*! \brief Lookup table for the `src` of dependencies */
+  std::unordered_map<StmtSRef, Array<Dependency>, ObjectPtrHash, ObjectPtrEqual> src2deps;
+  /*! \brief Lookup table for the `dst` of dependencies */
+  std::unordered_map<StmtSRef, Array<Dependency>, ObjectPtrHash, ObjectPtrEqual> dst2deps;
+  /*! \brief The mapping from the buffer to the blocks who write it */
+  std::unordered_map<Buffer, Array<StmtSRef>, ObjectPtrHash, ObjectPtrEqual> buffer_writers;


Why not to use tvm::Map here instead of std::unordered_map? Then these members can be visited.

We had been trying with tvm::Map<Buffer, Array<StmtSRef>> in the very beginning, but it turned out that we need the values (the Array<StmtSRef>) of the map to be mutable to make sure they are maintained properly during transformations, but with tvm::Map we are unable to do so in an easy way :-( Therefore, we have to provide workarounds like providing APIs get_deps_by_src on the python side.

We had been trying with tvm::Map<Buffer, Array<StmtSRef>> in the very beginning, but it turned out that we need the values (the Array<StmtSRef>) of the map to be mutable to make sure they are maintained properly during transformations, but with tvm::Map we are unable to do so in an easy way :-( Therefore, we have to provide workarounds like providing APIs get_deps_by_src on the python side.

might be worth writing this down for future people in a NB

Yeah I will add this to "\note"

jcf94 · 2021-03-31T03:07:56Z

src/tir/schedule/state.cc

+  constexpr int kVerifyAffineBinding = static_cast<int>(ScheduleDebugMask::kVerifyAffineBinding);
+  constexpr int kVerifyRegionCover = static_cast<int>(ScheduleDebugMask::kVerifyRegionCover);
+  constexpr int kVerifyStagePipeline = static_cast<int>(ScheduleDebugMask::kVerifyStagePipeline);
+  ICHECK_GE(debug_mode, -1);


... In my experience, I'm thinking that it will be better to use unsigned int as bit masks. 😄

junrushao · 2021-04-05T00:04:16Z

Hey would you guys take another look? Thanks a lot! @comaniac @jcf94 @MasterJH5574 @jroesch @tqchen

comaniac

LGTM. Two concerns were addressed per offline discussion with @junrushao1994

The naming of InlineMark and RootMark should be improved in the future, but I don't have a better suggestion so I'll leave them there for now.
It's better to use a full mask as the default debug_mode, but since Python doesn't have unsigned int, this will cause inconsistency between Python and C++. Meanwhile, the debug mode shouldn't be increased dramatically, we could avoid spending too much time on this point.

comaniac · 2021-04-06T17:34:14Z

src/tir/schedule/state.cc

+  constexpr int kVerifyAffineBinding = static_cast<int>(ScheduleDebugMask::kVerifyAffineBinding);
+  constexpr int kVerifyRegionCover = static_cast<int>(ScheduleDebugMask::kVerifyRegionCover);
+  constexpr int kVerifyStagePipeline = static_cast<int>(ScheduleDebugMask::kVerifyStagePipeline);
+  ICHECK_GE(debug_mode, -1);


Since Python doesn't have unsigned int, changing to unsigned int will result in inconsistency between Python and C++. Plus this debug_mode won't be increased to a large number in the future AFAIK, I'll take the current solution.

junrushao · 2021-04-06T17:42:16Z

Per offline discussion with @comaniac:

More documentation on InlineMark and RootMark is desirable. Especially we should mention that they are only used in ComputeAt/ReverseComputeAt to change the compute-at to compute-inline/no-op. This will be done as we upstreaming the schedule class.

jroesch

Thanks so much for waiting for my slow review, just spent my energy on the interface and documentation. Most of them should be really easy to apply.

include/tvm/tir/schedule/block_scope.h

jroesch · 2021-04-06T17:58:27Z

include/tvm/tir/schedule/block_scope.h

+  int64_t seq_index;
+
+  void VisitAttrs(AttrVisitor* v) {
+    // `stmt` is not visited


Is this because of the weak pointer optimization? It isn't clear why I can't read these fields

Oh we don't want to visit the weak references in the visitors, because those void pointers are less meaningful on the python side. Instead, we provide FFI functions that return strong references: see block_scope.cc:144-151

include/tvm/tir/schedule/block_scope.h

include/tvm/tir/schedule/state.h

python/tvm/tir/schedule/__init__.py

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Cody Yu <comaniac0422@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com>

tqchen · 2021-04-07T12:06:59Z

Thanks @junrushao1994 for keep improving the PR.
Thanks @jroesch @comaniac @jcf94 @Hzfengsy @MasterJH5574 for reviewing. This PR is merged..We can also followup with more PRs to add additional clarifications when we see future needs

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Cody Yu <comaniac0422@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com>

junrushao mentioned this pull request Mar 29, 2021

[RFC][Tracking Issue] TensorIR Scheduling #7527

Closed

29 tasks

tqchen added the status: need review label Mar 29, 2021

tqchen self-assigned this Mar 29, 2021

comaniac requested changes Mar 29, 2021

View reviewed changes

jcf94 reviewed Mar 31, 2021

View reviewed changes

tqchen approved these changes Apr 5, 2021

View reviewed changes

MasterJH5574 approved these changes Apr 6, 2021

View reviewed changes

Hzfengsy approved these changes Apr 6, 2021

View reviewed changes

comaniac approved these changes Apr 6, 2021

View reviewed changes

jroesch approved these changes Apr 6, 2021

View reviewed changes

tqchen merged commit bf0f87d into apache:main Apr 7, 2021

tqchen added status: accepted and removed status: need review labels Apr 7, 2021

junrushao linked an issue Apr 12, 2021 that may be closed by this pull request

[RFC][Tracking Issue] TensorIR Scheduling #7527

Closed

29 tasks

junrushao mentioned this pull request Apr 14, 2021

[TensorIR][M1b] Schedule class #7847

Merged

junrushao changed the title ~~[M1b] Scaffolding ScheduleState data structure~~ [TensorIR][M1b] Scaffolding ScheduleState data structure Apr 16, 2021

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorIR][M1b] Scaffolding ScheduleState data structure #7765

[TensorIR][M1b] Scaffolding ScheduleState data structure #7765

junrushao commented Mar 29, 2021 •

edited

Loading

tqchen commented Mar 29, 2021

comaniac left a comment

comaniac Mar 29, 2021

junrushao Mar 30, 2021

comaniac Mar 30, 2021

junrushao Mar 30, 2021

jcf94 Mar 31, 2021

junrushao Mar 31, 2021

comaniac Apr 6, 2021

comaniac Mar 29, 2021

junrushao Mar 30, 2021 •

edited

Loading

junrushao Mar 30, 2021 •

edited

Loading

comaniac Mar 30, 2021

junrushao Mar 30, 2021

junrushao Mar 30, 2021

jroesch commented Mar 30, 2021

junrushao commented Mar 30, 2021 •

edited

Loading

tqchen commented Mar 30, 2021 •

edited

Loading

comaniac commented Mar 30, 2021

junrushao commented Mar 30, 2021

jcf94 Mar 31, 2021

junrushao Mar 31, 2021 •

edited

Loading

jroesch Apr 6, 2021

junrushao Apr 6, 2021

jcf94 Mar 31, 2021

junrushao commented Apr 5, 2021 •

edited

Loading

comaniac left a comment •

edited

Loading

comaniac Apr 6, 2021

junrushao commented Apr 6, 2021

jroesch left a comment

jroesch Apr 6, 2021

junrushao Apr 6, 2021 •

edited

Loading

tqchen commented Apr 7, 2021

[TensorIR][M1b] Scaffolding ScheduleState data structure #7765

[TensorIR][M1b] Scaffolding ScheduleState data structure #7765

Conversation

junrushao commented Mar 29, 2021 • edited Loading

tqchen commented Mar 29, 2021

comaniac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrushao Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

junrushao Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jroesch commented Mar 30, 2021

junrushao commented Mar 30, 2021 • edited Loading

tqchen commented Mar 30, 2021 • edited Loading

comaniac commented Mar 30, 2021

junrushao commented Mar 30, 2021

Choose a reason for hiding this comment

junrushao Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrushao commented Apr 5, 2021 • edited Loading

comaniac left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrushao commented Apr 6, 2021

jroesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrushao Apr 6, 2021 • edited Loading

Choose a reason for hiding this comment

tqchen commented Apr 7, 2021

junrushao commented Mar 29, 2021 •

edited

Loading

junrushao Mar 30, 2021 •

edited

Loading

junrushao Mar 30, 2021 •

edited

Loading

junrushao commented Mar 30, 2021 •

edited

Loading

tqchen commented Mar 30, 2021 •

edited

Loading

junrushao Mar 31, 2021 •

edited

Loading

junrushao commented Apr 5, 2021 •

edited

Loading

comaniac left a comment •

edited

Loading

junrushao Apr 6, 2021 •

edited

Loading