[Primitive] Add fallback fusion by chhzh123 · Pull Request #78 · awslabs/slapo

chhzh123 · 2023-03-09T05:59:54Z

Description

This PR adds a fallback fusion option for .fuse() primitive, which directly puts the operations in the given subgraph into a nn.Sequential module but preserves exactly the same computation rules. It is useful for debugging and further dispatching for different backends. In this way, users even do not need to register a new compiler for Slapo, but can just replace this "fake fused" module with their efficient module using .replace().

Checklist

PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

comaniac · 2023-03-09T06:17:37Z

This is an interesting feature. I agree that this could be useful for debugging, but I didn't get the point of replacing it with an efficient module. Shouldn't it be done with just .find and .replace?

In addition, I would suggest use torchscript as the default value instead of None, as I believe most users would expect to see speedups with .fuse.

chhzh123 · 2023-03-09T06:53:30Z

I'm imagining some backend compilers or passes to further handle these "fake fused" modules. Since the optimized fusion module may not be available ahead of time, this can be viewed as a delayed version of op fusion. TorchScript tightly couples with CPU/GPU backend and has tracer limitations, if users want another full-graph compiler to handle those fused op, using TorchScript will just make things complicated. I can think of two use cases now:

It actually mimics the preprocessing process of quantization in PyTorch. Specifically, torch.quantization.fuse_modules is doing the same thing as this PR does. Later on, PyTorch will take another pass to convert those fake fused modules to real fused modules on CPU.
If I leverage other backend compilers like HeteroCL which focuses more on operator-level optimization, it will be easier for low-level compilers to accept an encapsulated operation and then dispatch the fused op to accelerators.

This feature may not be very useful for now, but it does not break current facilities and also provides users more options to conduct graph-level optimizations.

comaniac · 2023-03-09T06:59:28Z

So you actually meant users may want to compile the matched subgraphs in an arbitrary way, and use .replace to put the compiled module back. It makes sense to me.

comaniac

LGTM

comaniac · 2023-03-09T07:00:27Z

Thanks @chhzh123

Add fallback fusion

4ae1a2e

chhzh123 requested a review from comaniac March 9, 2023 05:59

Set TS as default

964d8c8

comaniac approved these changes Mar 9, 2023

View reviewed changes

comaniac merged commit 10ac17d into awslabs:main Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Primitive] Add fallback fusion#78

[Primitive] Add fallback fusion#78
comaniac merged 2 commits intoawslabs:mainfrom
chhzh123:fallback_fusion

chhzh123 commented Mar 9, 2023

Uh oh!

comaniac commented Mar 9, 2023

Uh oh!

chhzh123 commented Mar 9, 2023

Uh oh!

comaniac commented Mar 9, 2023

Uh oh!

comaniac left a comment

Uh oh!

comaniac commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chhzh123 commented Mar 9, 2023

Description

Checklist

Uh oh!

comaniac commented Mar 9, 2023

Uh oh!

chhzh123 commented Mar 9, 2023

Uh oh!

comaniac commented Mar 9, 2023

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

comaniac commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants