[CORE][MVP] POC of fuse ops pass based on the DFPatterns by mikepapadim · Pull Request #9628 · apache/tvm

mikepapadim · 2021-12-01T16:56:44Z

This is a WIP of reproducing the functionality of the fuse_ops pass by using the pattern language instead.

The main goal is to replace the legacy fuse_ops with a cleaner and easier to maintain pass.
Also, we want to be able to extend it with pattern selection based on specific targets.

This MVP currently showcases the following patterns:

Max number of elemwise ops that can be fused together.
kOutEWiseFusable -> Broadcast* -> Elemwise

This is a draft as I am still migrating patterns from other branches and assertions for IR structural equality are missing.
@mbs-octoml @electriclilies @jroesch

mbs-octoml

I'll take another look once you've commented pattern_fuse.cc a bit, thanks!

mbs-octoml · 2021-12-02T17:11:08Z

include/tvm/relay/dataflow_matcher.h

 Expr PartitionPattern(DFPattern pattern, Expr expr, Map<String, ObjectRef> attrs, PackedFunc check);

+/*!
+ * \brief Partition all matches of a DFPattern inside an Expr into separate Function calls


You'll need to explain the 'hierarchical order' part here, perhaps explain they are expected to be in most-specific to most-general form and the first pattern to succeed is taken.

Oh now that i look at the impl i see it's not that at all. So yeah will need to explain :-)

mbs-octoml · 2021-12-02T17:11:58Z

include/tvm/relay/transform.h

+/*!
+ * \brief Annoate primitive functions
+ *
+ * The result is an update module with annotated the primitive functions originated from the fuse


nit: ...updated module with fused functions annotation...

mbs-octoml · 2021-12-02T17:12:57Z

src/relay/ir/dataflow_matcher.cc

    return Call(func, args);
  }

+  // Expr DispatchVisitExpr(const Expr& pre) override {


nit: nuke i guess

mbs-octoml · 2021-12-02T17:13:31Z

src/relay/ir/dataflow_matcher.cc

    auto post = MixedModeMutator::DispatchVisitExpr(pre);
-    if (gid_assignments_.count(pre) && pre == groups_[gid_assignments_[pre]].root_node &&
-        static_cast<bool>(check_(pre))) {
+    if (gid_assignments_.count(pre) && pre == groups_[gid_assignments_[pre]].root_node) {


sorry i don't understand this change

src/relay/transforms/annotate_post_fuse_funcs.cc

mbs-octoml · 2021-12-02T17:18:22Z

src/relay/transforms/annotate_post_fuse_funcs.cc

+
+Pass AnnotatePostFuseFuncs() {
+  auto pass_info = PassInfo(0, "AnnotatePostFuseFuncs", {});
+  return tvm::transform::CreateModulePass(


this can be a FunctionPass right?

mbs-octoml · 2021-12-02T17:20:10Z

src/relay/transforms/annotate_post_fuse_funcs.cc

+            auto func = GetRef<Function>(func_node);
+
+            // add check from where it originate
+            func = WithAttr(std::move(func), attr::kPrimitive, tvm::Integer(1));


this will annotate all functions, including used defined ones.

meanwhile isn't fusion rewriting the sub-expression to a call to a function literal, and it's those that need the primitive annotation?

mbs-octoml · 2021-12-02T17:20:35Z

src/relay/transforms/pattern_fuse.cc

+ */
+
+/*!
+ * \file src/relay/transforms/fold_explicit_padding.cc


nit: update

could you comment throughout here?

mbs-octoml · 2021-12-02T17:23:04Z

src/relay/ir/dataflow_matcher.cc

  return PatternPartitioner().Partition(pattern, expr, attrs, check);
 }

+Expr PartitionPattern(Array<DFPattern> patterns, Expr expr, Map<String, ObjectRef> attrs,


I'm not sure, but perhaps this is better expressed as a 'sequence' pattern combinator who's matching rule is what you've written here.

…se o f the DFPattern language implementation

jroesch · 2022-01-19T21:57:09Z

ping @mikepapadim and @mbs-octoml

mbs-octoml · 2022-01-20T00:48:16Z

Hi @mikepapadim, in an effort to cleanup the pending PRs it would be good if you capture the above comments (I think you already did?) then remove this PR. Though keep this alive in your branch obviously! Thanks.

This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works. The test case manually creates a simple fused mod with two `relay.dense`. But in the future, an effort like #9628 should make it easier to construct multi-anchor subgraphs. The extracted TensorIR block corresponding to two TE `dense` compute looks like this: ``` @tvm.script.ir_module class Module: @T.prim_func def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None: # function attr dict T.func_attr({"global_symbol": "main", "tir.noalias": True}) # body # with T.block("root") T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32") for i0, i1, i2 in T.grid(128, 128, 128): with T.block("T_matmul_NT"): i, j, k = T.axis.remap("SSR", [i0, i1, i2]) T.reads(placeholder[i, k], placeholder_1[j, k]) T.writes(T_matmul_NT_1[i, j]) T.block_attr({"layout_free_placeholders":[placeholder_1]}) with T.init(): T_matmul_NT_1[i, j] = T.float32(0) T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k] for i0, i1, i2 in T.grid(128, 128, 128): with T.block("T_matmul_NT_1"): i, j, k = T.axis.remap("SSR", [i0, i1, i2]) T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k]) T.writes(T_matmul_NT[i, j]) T.block_attr({"layout_free_placeholders":[placeholder_2]}) with T.init(): T_matmul_NT[i, j] = T.float32(0) T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k] ```

This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works. The test case manually creates a simple fused mod with two `relay.dense`. But in the future, an effort like apache#9628 should make it easier to construct multi-anchor subgraphs. The extracted TensorIR block corresponding to two TE `dense` compute looks like this: ``` @tvm.script.ir_module class Module: @T.prim_func def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None: # function attr dict T.func_attr({"global_symbol": "main", "tir.noalias": True}) # body # with T.block("root") T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32") for i0, i1, i2 in T.grid(128, 128, 128): with T.block("T_matmul_NT"): i, j, k = T.axis.remap("SSR", [i0, i1, i2]) T.reads(placeholder[i, k], placeholder_1[j, k]) T.writes(T_matmul_NT_1[i, j]) T.block_attr({"layout_free_placeholders":[placeholder_1]}) with T.init(): T_matmul_NT_1[i, j] = T.float32(0) T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k] for i0, i1, i2 in T.grid(128, 128, 128): with T.block("T_matmul_NT_1"): i, j, k = T.axis.remap("SSR", [i0, i1, i2]) T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k]) T.writes(T_matmul_NT[i, j]) T.block_attr({"layout_free_placeholders":[placeholder_2]}) with T.init(): T_matmul_NT[i, j] = T.float32(0) T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k] ```

mikepapadim force-pushed the mvp_fuse_ops branch from 052e6d3 to 4e4696d Compare December 2, 2021 16:14

mbs-octoml suggested changes Dec 2, 2021

View reviewed changes

mikepapadim force-pushed the mvp_fuse_ops branch 3 times, most recently from 4d93c79 to 2633cf7 Compare December 9, 2021 14:06

[CORE][MVP] Introduces an initial POC of fuse ops pass with the the u…

a7381d9

…se o f the DFPattern language implementation

mikepapadim force-pushed the mvp_fuse_ops branch from 2633cf7 to a7381d9 Compare December 9, 2021 14:13

masahi mentioned this pull request Apr 1, 2022

[Metaschedule] Add test case for multi-anchor subgraph #10856

Merged

areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022

tqchen closed this Sep 6, 2024

Conversation

mikepapadim commented Dec 1, 2021

Uh oh!

mbs-octoml left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jroesch commented Jan 19, 2022

Uh oh!

mbs-octoml commented Jan 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants