Conversation
|
This is an interesting feature. I agree that this could be useful for debugging, but I didn't get the point of replacing it with an efficient module. Shouldn't it be done with just .find and .replace? In addition, I would suggest use torchscript as the default value instead of None, as I believe most users would expect to see speedups with .fuse. |
|
I'm imagining some backend compilers or passes to further handle these "fake fused" modules. Since the optimized fusion module may not be available ahead of time, this can be viewed as a delayed version of op fusion. TorchScript tightly couples with CPU/GPU backend and has tracer limitations, if users want another full-graph compiler to handle those fused op, using TorchScript will just make things complicated. I can think of two use cases now:
This feature may not be very useful for now, but it does not break current facilities and also provides users more options to conduct graph-level optimizations. |
|
So you actually meant users may want to compile the matched subgraphs in an arbitrary way, and use .replace to put the compiled module back. It makes sense to me. |
|
Thanks @chhzh123 |
Description
This PR adds a fallback fusion option for
.fuse()primitive, which directly puts the operations in the given subgraph into ann.Sequentialmodule but preserves exactly the same computation rules. It is useful for debugging and further dispatching for different backends. In this way, users even do not need to register a new compiler for Slapo, but can just replace this "fake fused" module with their efficient module using.replace().Checklist