Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorIR][M1b] Scaffolding ScheduleState data structure #7765

Merged
merged 1 commit into from
Apr 7, 2021
Merged

[TensorIR][M1b] Scaffolding ScheduleState data structure #7765

merged 1 commit into from
Apr 7, 2021

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Mar 29, 2021

This PR is part of the stage M1b, TensorIR upstreaming plan (#7527), on the core data structure, ScheduleState.

This PR introduces two key concepts: BlockScope and ScheduleState. The ScheduleState provides a key method Replace, which allows all the schedule primitives to be developed around.

Detailed explanation of all the terminologies, concepts and algorithms is provided in the documentation.

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>

@tqchen
Copy link
Member

tqchen commented Mar 29, 2021

cc @comaniac @jroesch @yzhliu @icemelon9 @jcf94

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Most comments are for comments and clarifications.
One major suggestion as listed in the comment is whether we should preserve the block name "root" and disallow duplicated block names in a scope of PrimFunc.

python/tvm/tir/stmt.py Show resolved Hide resolved
python/tvm/tir/schedule/block_scope.py Show resolved Hide resolved
python/tvm/tir/schedule/block_scope.py Outdated Show resolved Hide resolved
python/tvm/tir/schedule/block_scope.py Outdated Show resolved Hide resolved
include/tvm/tir/schedule/block_scope.h Outdated Show resolved Hide resolved
python/tvm/tir/schedule/state.py Outdated Show resolved Hide resolved
src/tir/schedule/state.cc Show resolved Hide resolved
src/tir/schedule/state.cc Outdated Show resolved Hide resolved
constexpr int kVerifyAffineBinding = static_cast<int>(ScheduleDebugMask::kVerifyAffineBinding);
constexpr int kVerifyRegionCover = static_cast<int>(ScheduleDebugMask::kVerifyRegionCover);
constexpr int kVerifyStagePipeline = static_cast<int>(ScheduleDebugMask::kVerifyStagePipeline);
ICHECK_GE(debug_mode, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just assign 8 to the True case, so that you don't need to deal with negative debug_mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to. The reason is that we don't want to introduce a really magic number, and even worse perhaps in the future we do want to extend the verification so the magic number may change over time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant 15 (1111b, a full mask).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah a full mask (like INT_MAX) definitely works, but I would prefer -1 here to make the logic clear and invariant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... In my experience, I'm thinking that it will be better to use unsigned int as bit masks. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to unsigned integers makes sense to me, and we could use like (11...11)_2 for the full mask. What I was worrying about is passing unsigned integers around the TVM FFI would cause potential issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Python doesn't have unsigned int, changing to unsigned int will result in inconsistency between Python and C++. Plus this debug_mode won't be increased to a large number in the future AFAIK, I'll take the current solution.

* \param Type The type to be casted to, can be Block or For
* \note The `E` in the macro means `error`, which means allowing to customize error message
*/
#define TVM_SREF_TO_E(Result, SRef, Type) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The E in macro looks confusing, and I still cannot get the point even after reading the note...
  • This macro itself is also a bit confusion, as Result is being assigned in the LHS. I would suggest simply using inline functions for the cases in this file.

Copy link
Member Author

@junrushao junrushao Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined functions are generally good, but I am not in favor of it in this particular case - I considered this before but didn't go with that idea, and here is the reason:

  • When an error occurs, we want to print the exact line/function/file that throws that error: if we use an inline function, then instead of rendering the caller, it throws in the inline function in utils.h, which is much less informative.
  • The caller should be responsible for writing the declaration of the variable. Comparing the following two, I would go for the first one, because it writes the type clearly, allows re-assignment of a variable, and makes it really clear what we are doing.
  • The only disadvantage is that we need to repeat the name "block" twice, which is not quite inconvenient IMO.
const BlockNode* block = TVM_SREF_TO_BLOCK(block, block_sref);
// compared with 
TVM_SREF_TO_BLOCK(block, block_sref);

A good alternative I considered before is to use inlined lambda function which expands like:

const auto* block = [&]() -> const BlockNode* {
  const BlockNode* stmt = sref->StmtAs<BlockNode>();
  ICHECK(stmt != nullptr) << "Error Message";
  return stmt;
}();

The disadvantage of this approach is that the error message is not customizable, and it really depends on the compiler to optimize the lambda out.

Copy link
Member Author

@junrushao junrushao Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro with "E" suffix is short for "error", which means the error message is customizable. It is only exposed for flexibility, and rarely used in the codebase. Given it is an internal util macro, and has been well documented, I think it is fine to keep it here. Of course better names are definitely welcome :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasons you illustrated make sense to me, although I still don't like the coding style. It seems like extracting a common piese of expression to be a macro. Anyways, as you mentioned, this macro is an internal utility which is more like just a helper, so I'm not strongly against it.

For the naming, probably TVM_SREF_AS_OR_ERR would be more straightforward? "E" is not a proper and common abbrevation of "Error" so I don't think it is informative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think TVM_SREF_AS_OR_ERR is a better name. I will go with the name :-)

The reason that we introduced this macro is that there are too many places using such conversion-and-check with almost the same error message, because most of the subsequent methods/analysis/primitives are mostly based on sref, not stmt. Introducing such macro could help alleviate the burden of writing several lines of almost identical checks and provide a consistent template of the error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, a better name might be TVM_SREF_STMT_AS_OR_ERR

@jroesch
Copy link
Member

jroesch commented Mar 30, 2021

Would like to get a chance to read this throughly, put some time on calendar to do it tomorrow.

@junrushao
Copy link
Member Author

junrushao commented Mar 30, 2021

Although out of the scope of this PR, I am really glad that we have the discussion about block names.

@comaniac brought up the point #7765 (comment):

This function makes me think that we should make root as a preserved block name, and we should not allow duplicated block names in every tree of a PrimFunc.

I kinda agree with Cody about his points, but would love to hear more discussion on the block name. Particularly, we have three points to discuss:

  • A1. Block names need to be unique. The reason is that the canonical way of retrieving a block is to use its name, i.e. schedule.get_block(name). Without a unique name, we are unable to even retrieve a block, which makes scheduling almost impossible. (of course, it is possible to retrieve a block by the buffer it produces or via a statement, but it is not the canonical way)
  • A2. We need reserved names for the root block. I am kinda in favor of this idea too, because we do provide syntactic sugar to auto complete the root block with the name "root". This could help us eliminate possible name conflicts.
  • A3. Users could specify the names of newly created blocks/loops. Yes, it is doable when implementing schedule primitives.

@tqchen
Copy link
Member

tqchen commented Mar 30, 2021

Thanks @junrushao1994 @comaniac These are great points, although i think they are somewhat parallel to the data structure itself and have things to do with primitive implementations.

So we could try to make discussions in parallel with respect to this PR.

In terms of the "root" name, given that we are uniquely identifying function already via the global names, an easy way is to just use function name in the module to obtain the root, which removes on concept here.

The main Q for the block name uniqueness is about how to enforce them. For manual operations they certainly makes sense. For general automated transformations it might create an extra burden to introduce name tables or allocation mechanism. Since automated transformations rules works on a sub-region and may not be aware of the names from other parts. Due to that reason, allowing pointer uniqueness might still be a better approach. This also aligns with our existing approach to handle loop vars, which saves a lot of trouble during automatic transformations.

This being said, we should be able to introduce canonicalization pass to uniquely rename block names. We can also add a flag in the Schedule to enforce such uniqueness if it is turned on

@comaniac
Copy link
Contributor

Thanks @junrushao1994 @comaniac These are great points, although i think they are somewhat parallel to the data structure itself and have things to do with primitive implementations.

Make sense. I'm fine with a follow-up PR to implement the result of this discussion.

So we could try to make discussions in parallel with respect to this PR.

In terms of the "root" name, given that we are uniquely identifying function already via the global names, an easy way is to just use function name in the module to obtain the root, which removes on concept here.

This is also a good point. IMHO, as long as the interface makes sense to schedule primitive developers, it should be fine.

The main Q for the block name uniqueness is about how to enforce them. For manual operations they certainly makes sense. For general automated transformations it might create an extra burden to introduce name tables or allocation mechanism. Since automated transformations rules works on a sub-region and may not be aware of the names from other parts. Due to that reason, allowing pointer uniqueness might still be a better approach. This also aligns with our existing approach to handle loop vars, which saves a lot of trouble during automatic transformations.

It makes sense to use unique pointers in the automation framework. One thing I would like to highlight is that even we leverage unique pointer to access blocks and don't have to worry about their names during optimization, it might still be worthwhile to maintain block name uniqueness. The reason is, IIUC, we will have a mechanism to print out the schedule in Python format for debugging and investigation. In the printed schedule, block name will be the only referenced.

This being said, we should be able to introduce canonicalization pass to uniquely rename block names. We can also add a flag in the Schedule to enforce such uniqueness if it is turned on

Exactly. Calling a canonicalization pass before printing out the schedule could also solve the issue I mentioned above.

@junrushao
Copy link
Member Author

Per discussion with @tqchen.

A1. I agree that enforcing name uniqueness in scheduling is important and it is something we should do. We also recognize the name uniqueness is a bit misleading and not super useful in subsequent IR passes. Therefore, we want to divide the problems in two steps:

  • Scheduling: Require name uniqueness - we can keep a table in the schedule class.
  • Passes after scheduling: Don't require name uniqueness.

A2. Reserve names for the root block: Yes, we should do that. We have two proposals:

  • A2.1. Use "root" as the reserved name for the block
  • A2.2. Use the PrimFunc's name in the IRModule, e..g "main", as the reserved name for the block

A3. Yes, we want to enable users who call scheduling primitives to specify the names of the blocks. Particularly, we want to hear some further discussions on the user experience should look like. Here is our proposal:

  • A3.1. Error out when user provide a duplicate name.
  • A3.2. If the name string the user provided is suffixed with "", e.g. "unique_name", then our system will find a unique name whose prefix is "unique_name" and doesn't conflict with other names.

Comment on lines 192 to 218
/*! \brief Lookup table for the `src` of dependencies */
std::unordered_map<StmtSRef, Array<Dependency>, ObjectPtrHash, ObjectPtrEqual> src2deps;
/*! \brief Lookup table for the `dst` of dependencies */
std::unordered_map<StmtSRef, Array<Dependency>, ObjectPtrHash, ObjectPtrEqual> dst2deps;
/*! \brief The mapping from the buffer to the blocks who write it */
std::unordered_map<Buffer, Array<StmtSRef>, ObjectPtrHash, ObjectPtrEqual> buffer_writers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use tvm::Map here instead of std::unordered_map? Then these members can be visited.

Copy link
Member Author

@junrushao junrushao Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had been trying with tvm::Map<Buffer, Array<StmtSRef>> in the very beginning, but it turned out that we need the values (the Array<StmtSRef>) of the map to be mutable to make sure they are maintained properly during transformations, but with tvm::Map we are unable to do so in an easy way :-( Therefore, we have to provide workarounds like providing APIs get_deps_by_src on the python side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had been trying with tvm::Map<Buffer, Array<StmtSRef>> in the very beginning, but it turned out that we need the values (the Array<StmtSRef>) of the map to be mutable to make sure they are maintained properly during transformations, but with tvm::Map we are unable to do so in an easy way :-( Therefore, we have to provide workarounds like providing APIs get_deps_by_src on the python side.

might be worth writing this down for future people in a NB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I will add this to "\note"

constexpr int kVerifyAffineBinding = static_cast<int>(ScheduleDebugMask::kVerifyAffineBinding);
constexpr int kVerifyRegionCover = static_cast<int>(ScheduleDebugMask::kVerifyRegionCover);
constexpr int kVerifyStagePipeline = static_cast<int>(ScheduleDebugMask::kVerifyStagePipeline);
ICHECK_GE(debug_mode, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... In my experience, I'm thinking that it will be better to use unsigned int as bit masks. 😄

@junrushao
Copy link
Member Author

junrushao commented Apr 5, 2021

Hey would you guys take another look? Thanks a lot! @comaniac @jcf94 @MasterJH5574 @jroesch @tqchen

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two concerns were addressed per offline discussion with @junrushao1994

  • The naming of InlineMark and RootMark should be improved in the future, but I don't have a better suggestion so I'll leave them there for now.
  • It's better to use a full mask as the default debug_mode, but since Python doesn't have unsigned int, this will cause inconsistency between Python and C++. Meanwhile, the debug mode shouldn't be increased dramatically, we could avoid spending too much time on this point.

constexpr int kVerifyAffineBinding = static_cast<int>(ScheduleDebugMask::kVerifyAffineBinding);
constexpr int kVerifyRegionCover = static_cast<int>(ScheduleDebugMask::kVerifyRegionCover);
constexpr int kVerifyStagePipeline = static_cast<int>(ScheduleDebugMask::kVerifyStagePipeline);
ICHECK_GE(debug_mode, -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Python doesn't have unsigned int, changing to unsigned int will result in inconsistency between Python and C++. Plus this debug_mode won't be increased to a large number in the future AFAIK, I'll take the current solution.

@junrushao
Copy link
Member Author

Per offline discussion with @comaniac:

More documentation on InlineMark and RootMark is desirable. Especially we should mention that they are only used in ComputeAt/ReverseComputeAt to change the compute-at to compute-inline/no-op. This will be done as we upstreaming the schedule class.

Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for waiting for my slow review, just spent my energy on the interface and documentation. Most of them should be really easy to apply.

include/tvm/tir/schedule/block_scope.h Show resolved Hide resolved
int64_t seq_index;

void VisitAttrs(AttrVisitor* v) {
// `stmt` is not visited
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because of the weak pointer optimization? It isn't clear why I can't read these fields

Copy link
Member Author

@junrushao junrushao Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we don't want to visit the weak references in the visitors, because those void pointers are less meaningful on the python side. Instead, we provide FFI functions that return strong references: see block_scope.cc:144-151

include/tvm/tir/schedule/block_scope.h Show resolved Hide resolved
include/tvm/tir/schedule/block_scope.h Outdated Show resolved Hide resolved
include/tvm/tir/schedule/block_scope.h Outdated Show resolved Hide resolved
include/tvm/tir/schedule/block_scope.h Outdated Show resolved Hide resolved
include/tvm/tir/schedule/block_scope.h Outdated Show resolved Hide resolved
include/tvm/tir/schedule/state.h Outdated Show resolved Hide resolved
include/tvm/tir/schedule/state.h Show resolved Hide resolved
python/tvm/tir/schedule/__init__.py Outdated Show resolved Hide resolved
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Co-authored-by: Jared Roesch <roeschinc@gmail.com>
@tqchen tqchen merged commit bf0f87d into apache:main Apr 7, 2021
@tqchen
Copy link
Member

tqchen commented Apr 7, 2021

Thanks @junrushao1994 for keep improving the PR.
Thanks @jroesch @comaniac @jcf94 @Hzfengsy @MasterJH5574 for reviewing. This PR is merged..We can also followup with more PRs to add additional clarifications when we see future needs

@junrushao junrushao linked an issue Apr 12, 2021 that may be closed by this pull request
29 tasks
@junrushao junrushao changed the title [M1b] Scaffolding ScheduleState data structure [TensorIR][M1b] Scaffolding ScheduleState data structure Apr 16, 2021
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Co-authored-by: Jared Roesch <roeschinc@gmail.com>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Co-authored-by: Jared Roesch <roeschinc@gmail.com>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Co-authored-by: Jared Roesch <roeschinc@gmail.com>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Co-authored-by: Jared Roesch <roeschinc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC][Tracking Issue] TensorIR Scheduling
7 participants