Skip to content

[CORE][MVP] POC of fuse ops pass based on the DFPatterns#9628

Closed
mikepapadim wants to merge 1 commit intoapache:mainfrom
mikepapadim:mvp_fuse_ops
Closed

[CORE][MVP] POC of fuse ops pass based on the DFPatterns#9628
mikepapadim wants to merge 1 commit intoapache:mainfrom
mikepapadim:mvp_fuse_ops

Conversation

@mikepapadim
Copy link
Contributor

This is a WIP of reproducing the functionality of the fuse_ops pass by using the pattern language instead.

The main goal is to replace the legacy fuse_ops with a cleaner and easier to maintain pass.
Also, we want to be able to extend it with pattern selection based on specific targets.

This MVP currently showcases the following patterns:

  • Max number of elemwise ops that can be fused together.
  • kOutEWiseFusable -> Broadcast* -> Elemwise

This is a draft as I am still migrating patterns from other branches and assertions for IR structural equality are missing.
@mbs-octoml @electriclilies @jroesch

Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another look once you've commented pattern_fuse.cc a bit, thanks!

Expr PartitionPattern(DFPattern pattern, Expr expr, Map<String, ObjectRef> attrs, PackedFunc check);

/*!
* \brief Partition all matches of a DFPattern inside an Expr into separate Function calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to explain the 'hierarchical order' part here, perhaps explain they are expected to be in most-specific to most-general form and the first pattern to succeed is taken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh now that i look at the impl i see it's not that at all. So yeah will need to explain :-)

/*!
* \brief Annoate primitive functions
*
* The result is an update module with annotated the primitive functions originated from the fuse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ...updated module with fused functions annotation...

return Call(func, args);
}

// Expr DispatchVisitExpr(const Expr& pre) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nuke i guess

auto post = MixedModeMutator::DispatchVisitExpr(pre);
if (gid_assignments_.count(pre) && pre == groups_[gid_assignments_[pre]].root_node &&
static_cast<bool>(check_(pre))) {
if (gid_assignments_.count(pre) && pre == groups_[gid_assignments_[pre]].root_node) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i don't understand this change


Pass AnnotatePostFuseFuncs() {
auto pass_info = PassInfo(0, "AnnotatePostFuseFuncs", {});
return tvm::transform::CreateModulePass(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a FunctionPass right?

auto func = GetRef<Function>(func_node);

// add check from where it originate
func = WithAttr(std::move(func), attr::kPrimitive, tvm::Integer(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will annotate all functions, including used defined ones.

meanwhile isn't fusion rewriting the sub-expression to a call to a function literal, and it's those that need the primitive annotation?

*/

/*!
* \file src/relay/transforms/fold_explicit_padding.cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you comment throughout here?

return PatternPartitioner().Partition(pattern, expr, attrs, check);
}

Expr PartitionPattern(Array<DFPattern> patterns, Expr expr, Map<String, ObjectRef> attrs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but perhaps this is better expressed as a 'sequence' pattern combinator who's matching rule is what you've written here.

@mikepapadim mikepapadim force-pushed the mvp_fuse_ops branch 3 times, most recently from 4d93c79 to 2633cf7 Compare December 9, 2021 14:06
…se o f the DFPattern language implementation
@jroesch
Copy link
Member

jroesch commented Jan 19, 2022

ping @mikepapadim and @mbs-octoml

@mbs-octoml
Copy link
Contributor

Hi @mikepapadim, in an effort to cleanup the pending PRs it would be good if you capture the above comments (I think you already did?) then remove this PR. Though keep this alive in your branch obviously! Thanks.

junrushao pushed a commit that referenced this pull request Apr 1, 2022
This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works.

The test case manually creates a simple fused mod with two `relay.dense`. But in the future, an effort like #9628 should make it easier to construct multi-anchor subgraphs.

The extracted TensorIR block corresponding to two TE `dense` compute looks like this:

```
@tvm.script.ir_module
class Module:
    @T.prim_func
    def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None:
        # function attr dict
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        # body
        # with T.block("root")
        T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32")
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(placeholder[i, k], placeholder_1[j, k])
                T.writes(T_matmul_NT_1[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_1]})
                with T.init():
                    T_matmul_NT_1[i, j] = T.float32(0)
                T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k]
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT_1"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k])
                T.writes(T_matmul_NT[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_2]})
                with T.init():
                    T_matmul_NT[i, j] = T.float32(0)
                T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k]
    
```
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works.

The test case manually creates a simple fused mod with two `relay.dense`. But in the future, an effort like apache#9628 should make it easier to construct multi-anchor subgraphs.

The extracted TensorIR block corresponding to two TE `dense` compute looks like this:

```
@tvm.script.ir_module
class Module:
    @T.prim_func
    def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None:
        # function attr dict
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        # body
        # with T.block("root")
        T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32")
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(placeholder[i, k], placeholder_1[j, k])
                T.writes(T_matmul_NT_1[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_1]})
                with T.init():
                    T_matmul_NT_1[i, j] = T.float32(0)
                T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k]
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT_1"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k])
                T.writes(T_matmul_NT[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_2]})
                with T.init():
                    T_matmul_NT[i, j] = T.float32(0)
                T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k]
    
```
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 11, 2022
This adds a demonstration of extracting, scheduling, and e2e-compiling relay subgraphs with multiple anchor ops. Since task extraction is not associated with TE scheduling anymore, extracting a subgraph with multiple anchor TE compute just works.

The test case manually creates a simple fused mod with two `relay.dense`. But in the future, an effort like apache#9628 should make it easier to construct multi-anchor subgraphs.

The extracted TensorIR block corresponding to two TE `dense` compute looks like this:

```
@tvm.script.ir_module
class Module:
    @T.prim_func
    def main(placeholder: T.Buffer[(128, 128), "float32"], placeholder_1: T.Buffer[(128, 128), "float32"], placeholder_2: T.Buffer[(128, 128), "float32"], T_matmul_NT: T.Buffer[(128, 128), "float32"]) -> None:
        # function attr dict
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        # body
        # with T.block("root")
        T_matmul_NT_1 = T.alloc_buffer([128, 128], dtype="float32")
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(placeholder[i, k], placeholder_1[j, k])
                T.writes(T_matmul_NT_1[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_1]})
                with T.init():
                    T_matmul_NT_1[i, j] = T.float32(0)
                T_matmul_NT_1[i, j] = T_matmul_NT_1[i, j] + placeholder[i, k] * placeholder_1[j, k]
        for i0, i1, i2 in T.grid(128, 128, 128):
            with T.block("T_matmul_NT_1"):
                i, j, k = T.axis.remap("SSR", [i0, i1, i2])
                T.reads(T_matmul_NT_1[i, k], placeholder_2[j, k])
                T.writes(T_matmul_NT[i, j])
                T.block_attr({"layout_free_placeholders":[placeholder_2]})
                with T.init():
                    T_matmul_NT[i, j] = T.float32(0)
                T_matmul_NT[i, j] = T_matmul_NT[i, j] + T_matmul_NT_1[i, k] * placeholder_2[j, k]
    
```
@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
@tqchen tqchen closed this Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants